Pollution Source Identification and Parameter Sensitivity Analysis in Urban Drainage Networks Using a Coupled SWMM–Bayesian Framework

Wang, Ronghuan; Chen, Xuekai; Liu, Xiaobo; Lan, Guoxin; Dong, Fei; Yang, Jiangnan

doi:10.3390/pr14040699

Open AccessArticle

Pollution Source Identification and Parameter Sensitivity Analysis in Urban Drainage Networks Using a Coupled SWMM–Bayesian Framework

by

Ronghuan Wang

^1,2,

Xuekai Chen

²

,

Xiaobo Liu

^2,*,

Guoxin Lan

¹

,

Fei Dong

² and

Jiangnan Yang

^2,3

¹

College of Environment and Chemical Engineering, Chongqing Three Gorges University, Chongqing 404100, China

²

China Institute of Water Resources and Hydropower Research, Beijing 100038, China

³

School of Water Conservancy, North China University of Water Resources and Electric Power, Zhengzhou 450045, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(4), 699; https://doi.org/10.3390/pr14040699

Submission received: 19 January 2026 / Revised: 9 February 2026 / Accepted: 13 February 2026 / Published: 19 February 2026

(This article belongs to the Special Issue Advances in Hydrodynamics, Pollution and Bioavailable Transfers)

Download

Browse Figures

Versions Notes

Abstract

Addressing the challenge of tracing hidden and transient cross-connections in urban drainage networks, this study develops a SWMM–Bayesian coupled model based on the Py SWMM interface using the Daming Lake area in Jinan as a case study. By employing a Markov Chain Monte Carlo (MCMC) algorithm to drive the interaction between dynamic simulation and statistical inference, the model achieves multidimensional joint posterior estimation of pollution source location (J_x), discharge intensity (M), and discharge timing (T). The results indicate: (1) Model accuracy: The coupled model demonstrates strong source tracing capability, with mean absolute errors below 0.6% in single-parameter inversion. Under multi-parameter joint inversion, the true values of all parameters consistently fall within the 95% confidence intervals. (2) Parameter sensitivity: The influence of MCMC step size on the uncertainty of pollution tracing results is systematically clarified. Discrete source location estimates (J_x) exhibit high robustness to step size variation due to spatial heterogeneity in hydraulic responses, whereas continuous physical parameters (M and T) show strong dependence on the selected step size scale. (3) Practical application: The impact of spatial monitoring network configuration on pollution tracing performance is examined. By deploying a complementary monitoring system integrating trunk and branch pipelines, the inversion accuracy for mass (M) and time (T) parameters is significantly improved by 84.2% and 88.5%, respectively. Overall, the proposed pollution source tracing method for urban drainage networks effectively overcomes the multi-solution challenge in complex network inversion, providing critical technical support for refined urban water environment management.

Keywords:

drainage pipe network; Bayesian algorithm; SWMM; pollution source tracing; MCMC step size; urban drainage monitoring

1. Introduction

With the rapid advancement in urbanization in China, the complexity of urban drainage pipe networks has increased substantially. As a core component of urban drainage systems [1], the proper functioning of these networks directly affects the quality and stability of urban river and lake ecosystems [2]. However, owing to the concealed configuration of drainage pipeline networks and the transient characteristics of sewage discharge activities, illegal discharges, leakages, and misconnections continue to occur despite repeated regulatory prohibitions [3]. Intermittent and episodic discharges of high-concentration sewage, if released into urban rivers and lakes without effective source tracing and interception, can result in direct sewage inputs, river–lake backflow, external water infiltration, and overflow-driven pollution. These processes ultimately lead to severe black and odorous conditions and substantial ecological degradation [4,5]. Therefore, the rapid and accurate identification of key attributes—such as pollution source locations, discharge intensities, and discharge timing—within complex urban underground drainage networks has emerged as a critical technical challenge in the refined management of urban water environments.

With respect to pollution source tracing in urban drainage networks, extensive research has been conducted worldwide. Existing technical approaches can generally be classified into three categories: physical detection, water quality fingerprint analysis, and mathematical model–based inversion. (1) Physical detection and inspection techniques primarily rely on manual surveys or equipment-assisted direct inspection of drainage pipelines. Traditional tracer methods [6,7], which involve the release of dyes or chemical tracers to track flow pathways, are intuitive in principle but are typically time-consuming and labor-intensive, rendering them impractical for application in large-scale drainage networks. With continued technological advancement, instrument-based localization approaches, such as closed-circuit television (CCTV) inspection [8,9], have become increasingly prevalent. For example, Ma et al. [10] reviewed the application of pipeline robotic technologies and highlighted their effectiveness in identifying physical defects, including pipe ruptures and corrosion. However, these methods primarily target static structural anomalies and remain limited in their ability to detect dynamic and transient illegal discharge activities [11]. In addition, Liao et al. [12] integrated geographic information system (GIS) technology to construct a directed graph–based network model, enabling spatial and topological tracing of discharge outlets, while other studies have focused on urban drainage sensor networks [13], thereby enhancing the level of information integration in physical inspection processes. (2) Characteristic factor and chemical mass balance approaches: This class of methods seeks to identify and trace pollution sources by exploiting distinctive water quality “chemical fingerprints”, such as fluorescence excitation-emission matrices (EEMs), stable isotopes, and specific chemical markers [14,15]. The core principle involves establishing quantitative linkages between pollution sources and receiving water bodies through the identification of characteristic differences among multiple water sources [16]. For example, Xu et al. [17] successfully quantified groundwater infiltration rates in a localized area of Shanghai by monitoring total nitrogen and water hardness indicators. However, the effectiveness of this approach depends on the idealized assumption that the transport behavior of characteristic factors remains invariant. Moreover, the neglect of sampling uncertainty and data stochasticity limits its ability to ensure accurate source attribution under complex discharge conditions. (3) Mathematical model–based inverse problem solving: With advances in hydrodynamic and water quality modeling tools—such as the Stormwater Management Model (SWMM), MIKE URBAN, and Info Works ICM [18,19,20]—simulation–optimization–based model inversion has emerged as a major research focus in pollution source tracing. Early studies predominantly relied on deterministic optimization algorithms—such as genetic algorithms [21], neural networks [22], and random forests [23]—to identify optimal parameter sets by minimizing residuals between simulated and observed values. For example, Salem et al. [24] developed a neural network–based surrogate model coupled with SWMM and a genetic algorithm, enabling efficient inversion of multiple pollution source characteristics in drainage networks. Similarly, Zhao et al. [25] integrated Monte Carlo simulations with mass balance approaches to quantitatively assess the contribution ratios of infiltration sources. However, pollution source identification in pipe networks is inherently a classical inverse problem [26]. In pipeline network systems, multicollinearity among model parameters combined with sparse monitoring data often hampers the clear discrimination of individual parameter effects. Consequently, multiple distinct pollutant source parameter sets may produce nearly indistinguishable downstream monitoring responses. Traditional deterministic approaches have limited capability to rigorously quantify such uncertainties and are therefore prone to convergence toward local optima.

To overcome these limitations, Bayesian inference approaches grounded in probabilistic statistics have emerged as a state-of-the-art solution. Rather than seeking a single optimal estimate, Bayesian methods [27,28] infer the posterior probability distributions of pollution source parameters, thereby enabling a rigorous quantification of uncertainty. In recent studies, Yang et al. [29] developed a coupled SWMM–Bayesian framework, in which a likelihood function was introduced to optimize the initial states of the Markov Chain Monte Carlo (MCMC) sampling process. This strategy substantially improved the global search capability and convergence performance of the model, particularly under multi-pollutant inversion scenarios. However, while these studies focused on optimizing initial sampling states, they generally treated the MCMC random-walk step size as a fixed or empirical parameter, overlooking its critical impact on convergence efficiency. Consequently, systematic investigations remain scarce regarding the influence of the Markov chain random-walk step size [30] on the accuracy of multi-parameter joint identification of pollution sources in drainage networks within a Bayesian–MCMC inversion framework.

In this context, this study develops an efficient probabilistic source-tracing framework by integrating Bayesian stochastic inversion theory with a finely resolved SWMM-based drainage network model. Within a Python environment, the application programming interface (API) between Python and the SWMM simulation engine [31] is employed to enable dynamic coupling between water quality simulation and statistical inference. To address the limitations of existing studies, this work systematically investigates two critical aspects. First, it clarifies the sensitivity of Bayesian inversion performance to the MCMC random-walk step size, addressing the lack of guidance on parameter tuning in existing literature. Second, it explores the synergistic optimization mechanism of monitoring station layouts, proposing a complementary strategy that integrates trunk and branch pipelines to significantly enhance tracing accuracy. Based on these methodological developments, a representative urban catchment is selected as the study area to evaluate the proposed framework under complex real-world drainage conditions. Its drainage system is characterized by the coexistence of separate and combined sewer systems, with severe misconnections and cross-connections in older urban districts. The combination of high-intensity rainfall events and a structurally complex drainage network leads to pronounced water-level fluctuations and frequent overflow events during the wet season. Current research and management in this area primarily rely on conventional physical surveys, which are insufficient for identifying intermittent and concealed pollution sources. Consequently, the area constitutes a sensitive zone for urban water quality degradation and provides a representative setting for evaluating the effectiveness of precision pollution source tracing methodologies. These analyses provide technical support for addressing complex challenges in identifying pollution sources within urban drainage systems.

2. Materials and Methods

2.1. Research Area

The Daming Lake watershed in Jinan City was selected as a representative study area for investigating urbanized runoff and pollutant export under intense rainfall conditions. The watershed is located in the transitional zone between the residual ranges of the Taishan Mountains and the Yellow River alluvial plain, and is characterized by a pronounced temperate monsoon climate. Precipitation exhibits strong intra-annual variability, with 75.7% of the annual rainfall occurring during the summer flood season (mean annual precipitation: 671.1 mm), making the area highly susceptible to rainfall-driven non-point source pollution.

A representative high-density built-up area of approximately 74 hm² was selected as the demonstration unit (Figure 1). The area includes 34 hydraulic nodes and 33 pipeline segments. The monitoring site is located at the downstream node J11, where rapid flow convergence occurs during heavy rainfall. A multi-source integrated database was established for model calibration (Table 1).

During the model development phase, regional historical rainfall and hydrological records were integrated with high-resolution Digital Elevation Model (DEM) data. Pipeline network geographic information and land-use datasets, supplemented by detailed GIS survey results, were jointly employed to comprehensively verify the drainage network topology and surface parameterization. In the parameter calibration and validation phase, continuous flow and water quality data were collected over a monitoring period spanning from November 2023 to August 2024 through monitoring stations deployed at key outfalls and critical network nodes, thereby providing a robust observational basis for evaluating model performance and accuracy.

2.2. SWMM

The Stormwater Management Model (SWMM), developed by the U.S. Environmental Protection Agency [32], is a dynamic rainfall–runoff simulation tool widely applied to quantify runoff generation and pollutant transport in urban environments. In this study, the Runoff, Flow Routing, and Water Quality modules were explicitly activated. The Dynamic Wave routing method was employed to solve the complete Saint-Venant equations, enabling the accurate computation of hydraulic variables such as flow velocity and water depth within pipes and channels, while simultaneously resolving spatiotemporal variations in water quality. This capability enables the dynamic simulation of pollutant mass and concentration evolution across network nodes and conduit segments [33].

2.3. Bayesian Algorithm Construction

Bayesian inversion algorithms combine prior information with observational data to probabilistically update model parameters. The inversion outcomes are represented as posterior probability density functions, which characterize the full range of parameter values consistent with the observations and quantify their associated likelihoods [34].

2.3.1. Prior Distribution

In Bayesian inference, the prior distribution reflects assumptions derived from historical data, prior experience, and expert judgment before incorporating observational evidence [35]. In this study, uniform distributions are assigned as priors for the three key parameters—pollution source location, discharge mass, and discharge timing—based on their predefined physically reasonable ranges. Accordingly, the prior probability density functions for each parameter are defined as follows:

P (J_{x}) = \frac{1}{N}, x \in {0, 1, 2, \dots, N - 1}

(1)

P (M = m) = \frac{1}{m_{m a x} - m_{m i n}}, m \in [m_{m i n}, m_{m a x}]

(2)

P (T = t) = \frac{1}{t_{m a x} {- t}_{m i n} + 1}, t \in {t_{m i n}, t_{m i n} + 1, \dots, t_{m a x}}

(3)

In the formulation,

J_{X}

denotes a candidate pollution source node, and

N

represents the total number of nodes in the drainage network. Equation (1) implies a non-informative uniform prior, assuming every node has an equal probability (1/N) of being the source. Similarly,

M

corresponds to the pollutant discharge mass, while

T

denotes the discharge time. The symbols

m_{m a x}

and

t_{m a x}

define the upper bounds of the corresponding parameters, whereas

m_{m i n}

and

t_{m i n}

specify their lower bounds. To ensure the comprehensiveness of the inversion, the selection basis for these parameters is defined as follows:

J_{X}

covers all network nodes to prevent omitting potential source locations. The feasible range for

M

is set to [200, 1000], which was determined by extending the observed historical maximum and minimum pollution levels in the region to ensure the true discharge mass is fully enclosed. Meanwhile, the search space for T is defined as a 12 h time window from 11:00 to 23:00, over which a uniform prior is assigned and discretized according to the simulation time steps to ensure strict temporal consistency. Equations (2) and (3) represent that the mass and time are uniformly distributed within their feasible ranges.

Furthermore, assuming that the source location, discharge mass, and timing are statistically independent, the joint prior probability distribution

P (J_{X}, M = m, T = t)

of the three parameters is calculated as the product of their marginal probabilities:

P (X = (J_{x}, m, t)) = P (J_{x}) \times P (M = m) \times P (T = t)

(4)

2.3.2. Likelihood Function

The formulation of the likelihood function plays a critical role in determining the performance of the inversion process [36]. In this study, the model–observation residuals are assumed to follow a Gaussian distribution with zero mean and variance

σ^{2}

, i.e.,

ε \sim N (0, σ^{2})

. Accordingly, the likelihood function is defined as follows:

P (Y ∣ X = (J_{x}, m, t)) = \prod_{i = 1}^{N_{sensors}} \prod_{j = 1}^{T} \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{(y_{i j}^{sim} - y_{ij}^{obs})^{2}}{2 σ^{2}})

(5)

In the equation,

N_{sensors}

denotes the total number of monitoring points.

y_{i j}^{sim}

represents the pollutant concentration at the

i

-th monitoring point and the

j

-th time step simulated by the SWMM under a given parameter set

(J_{X}, m, t)

, whereas

y_{i j}^{obs}

denotes the corresponding observed concentration. The parameter

σ

represents the standard deviation of the observational errors, where its square (

σ^{2}

) is estimated as the variance of residuals from the model calibration phase, thereby implicitly capturing both measurement noise and structural uncertainty.

2.3.3. Posterior Distribution

According to Bayes’ theorem, the posterior probability distribution of a parameter is proportional to the product of its prior distribution and the likelihood function, and can be written as:

P (X ∣ Y) \propto P (Y ∣ X = (J_{x}, m, t)) \times P (J_{x}) \times P (m) \times P (t)

(6)

In the equation,

P (X ∣ Y)

denotes the posterior probability density function of the parameter of interest. This framework facilitates accurate spatial source identification, a method validated by recent inversion studies [37,38]. Additionally, the monitoring station layout aligns with information entropy theory to ensure optimal signal capture [39].

2.4. Research Framework

Integrated Source-Tracing Framework: This study develops an integrated intelligent decision-making framework that spans algorithm parameter tuning to practical operational guidance (Figure 2). To ensure the reproducibility and efficiency of the framework, the model was constructed in a Python 3.12 environment. The dynamic coupling between the hydraulic simulation and the statistical inference was achieved using the Py SWMM library (wrapping the SWMM 5.2 engine), while the MCMC sampling algorithm was implemented using NumPy and SciPy packages. The overall methodological workflow is outlined as follows:

Reference Model Construction

Using the Daming Lake area as the study domain, a SWMM was developed through the integration of multi-source spatial datasets. In this study, Chemical Oxygen Demand (COD) was selected as the representative water quality indicator because it is a widely used metric for characterizing urban organic pollution and exhibits relatively conservative behavior over short in-sewer transport distances [40,41]. Accordingly, model calibration and validation were conducted using observed hydraulic and water quality data, ensuring that the model reliably reproduces the runoff generation and wastewater discharge characteristics of the study area.

2.: SWMM-Bayesian Coupled Inversion Model

Prior distribution and observation data preparation: Prior distributions are specified for the pollution source node location

J_{X}

, discharge mass

M

, and discharge timing

T

. Concurrently, pollutant concentration time series measured at downstream monitoring nodes are extracted to construct the observation vector

Y_{_obs}

.

MCMC Sampling and Model Integration: The MCMC algorithm is executed on the Python platform to generate candidate parameter sets

(J_{X}^{0}, M^{0}, T^{0})

. The script automatically updates the corresponding parameters in the SWMM input file (.inp) and drives the model engine to perform dynamic simulations.

Likelihood Function Evaluation and Sample Acceptance: The simulated concentration series

Y_{_sim}

is extracted from the SWMM output file. The consistency between

Y_{_sim}

and the observation vector

Y_{_obs}

is assessed using a Gaussian likelihood function based on the sum of squared errors. A random number

u \sim U (0, 1)

is then generated according to the Metropolis-Hastings criterion. If

u \leq α

, the candidate parameter set is accepted as the new state of the Markov chain.

Posterior Distribution Inference: After running the Markov chain for sufficient iterations and discarding burn-in samples, the posterior probability distributions of the parameters are constructed by plotting histograms based on the retained samples. This approach enables probabilistic inference of the pollution source characteristics

(J_{X}, M, T)

.

3.: Pollution Scenario Setup

To evaluate the accuracy and applicability of the SWMM–Bayesian algorithm for pollution source inversion, this study established multiple representative operating scenarios for comparative analysis. All scenarios were simulated under an instantaneous discharge mode to assess the effects of different combinations of unknown parameters on inversion performance. This assumption is primarily employed for methodological testing, aiming to rigorously evaluate the algorithm’s capability to identify source characteristics under the most distinct signal conditions. The detailed configuration for each scenario is provided in Table 2. The temporal window considered encompasses the full pollutant transport process. Scenario S1 was designated as the “real event,” in which a 500 mg/L pollutant (COD) was instantaneously released at node J₂₆ at 22:00. Using the aforementioned monitoring data, the unknown source node

J_{X}

was subsequently identified through inversion.

4.: Key Parameter Sensitivity Analysis.

The MCMC step size [42], as a critical factor in Bayesian inversion, directly governs the Markov chain’s exploration efficiency and convergence stability within the parameter space. To investigate the sensitivity of different model parameters to step size, three comparative experiments were designed, using the D2 operating condition as a reference. To eliminate coupling effects during multi-parameter inversion, a single-variable control approach was adopted: when assessing the influence of one parameter (e.g.,

M

), the other two parameters (

J_{X}

and

T

) were held constant at their known values. The specific step size settings for each experimental group are summarized in Table 3. The MCMC step sizes were grouped into three levels (Small, Medium, and Large) based on preliminary trial runs to enable comparison between convergence efficiency and inversion accuracy. A fixed step size strategy was adopted to explicitly assess step-size sensitivity without the confounding effects of adaptive schemes.

5.: Monitoring Point Layout and Operational Scenario Analysis.

This study focuses on node

J_{17}

. Based on the multi-parameter joint inversion results under operating condition D2, alternative monitoring point layouts were designed to assess their impact on Bayesian inversion outcomes. As summarized in Table 4, scenarios A, B, and C deploy monitoring points at key hydraulic branch nodes, near-source nodes, and combinations of both, respectively. Branch nodes are particularly valuable for monitoring, as they integrate pollution information from multiple tributaries [43]. The effectiveness of the monitoring deployment schemes was evaluated and optimized through comparative analysis of inversion performance.

6.: Accuracy Assessment and Uncertainty Quantification Metrics

This study employed the median error and mean error [44] as the primary metrics for evaluating inversion accuracy. In addition, a 95% confidence interval [45] was used to quantify the uncertainty in pollution source tracing estimates. Based on the distribution of simulation outcomes, the 2.5th and 97.5th percentiles were extracted to define the lower and upper bounds of the parameter confidence intervals:

CI = [percentile (N, 2.5 %), percentile (N, 97.5 %)]

(7)

To further evaluate the relative uncertainty of parameter estimates, the relative width of the confidence interval was calculated, providing a measure of its size relative to the estimated parameter values:

Relative Width = \frac{Upper CI Limit - Lower CI Limit}{Estimate} \times 100 %

(8)

These metrics provide a more comprehensive assessment of the uncertainty in model predictions, offering robust statistical support for identifying the root causes of pollution.

3. Results and Discussion

3.1. SWMM Calibration and Validation

3.1.1. Calibration Verification Results

This study calibrated and validated the hydrological and water quality parameters of the SWMM using flow and COD concentration data collected from fixed monitoring stations within a representative drainage basin. The results indicate that, during the two calibration rainfall events (5 November and 14 December 2023), the model accurately reproduced runoff dynamics and the temporal variation in COD. For the single validation rainfall event (1 July 2024), the simulated trends in flow and COD closely matched the observed measurements. According to established model evaluation guidelines [46], which suggest that an NSE greater than 0.5 indicates satisfactory performance, the results presented in Table 5 and Figure 3 demonstrate that the model achieves acceptable accuracy. Consequently, the model demonstrates sufficient reliability to support subsequent scenario simulations and result analyses.

3.1.2. Model Parameter Settings

In this study, the initial ranges for key parameters were determined by referring to the SWMM User Manual [47] and relevant literature on urban stormwater simulation [48,49]. These parameters were subsequently calibrated and refined using observed monitoring data from typical rainfall events. The final configuration of the model’s key parameters is presented in Table 6.

3.2. Analysis of Bayesian Inversion Results

3.2.1. Accuracy Analysis of Single-Parameter Inversion

1.: Spatial Position Recognition Features

Node identification performance exhibits pronounced topological heterogeneity (Figure 4(S1/S2)). For example, node

J_{26}

shows a sharply peaked posterior probability distribution, indicating that its emission signal remains highly distinguishable during transport. In contrast, spatially adjacent nodes (e.g.,

J_{15}

,

J_{18}

) produce highly similar downstream responses. This spatial autocorrelation [50] introduces ambiguity in parameter identification, resulting in a posterior distribution for

J_{17}

with notable tailing and dispersion. To quantify this separability, the Top-2 Probability Ratio was calculated as 1.68 (0.37/0.22). This relatively low ratio indicates that while the true source (J₁₇) was correctly identified, the probabilistic margin relative to the adjacent node (J₁₆) remains moderate, quantitatively confirming the observed tailing effect due to hydraulic proximity.

2.: Accuracy and Uncertainty Analysis of Emission Quality Inversion

Table 7 and Figure 4(S3/S4) illustrate the influence of emission intensity on the inversion performance of the mass parameter

M

. At concentrations of 300 and 800 mg/L, the median inversion error consistently remained below 0.6%. As emission intensity increased, the relative mean error decreased from 0.51% to 0.13%, exhibiting a nonlinear accelerated convergence trend. Moreover, the 95% confidence intervals (CI) encompassed the true parameter values under all operating conditions. The relative widths of these intervals narrowed from 27.6% to 10.5% with increasing concentration, effectively constraining the parameter search space and substantially reducing inversion uncertainty.

3.: Precise Time Parameter Inversion

As shown in Figure 4(S5), both the median and mean errors in the inversion of emission time

T

remain around 0.5%, with a 95% confidence interval width of only 6 min. This high temporal resolution is supported by the adoption of a 1 min simulation time step and a synchronized observation sampling interval. The sharply concentrated posterior distribution indicates that the model can precisely constrain the timing of pollutant release within a very narrow window, effectively mitigating random disturbances. By capturing the propagation lag characteristics of the pollution signal, the model enables accurate back-inversion of the emission time.

3.2.2. Multi-Parameter Joint Inversion and Uncertainty Analysis

Using operating condition D1 as an example (Figure 5a), the strong coupling between location and mass parameters generates non-uniqueness in the solution space, causing the relative width of the 95% confidence interval to increase from 17.4% to 25.97% (Table 8). Spatial uncertainty in source localization is propagated into the temporal dimension via the hydraulic transport pathway, resulting in the confidence interval for the temporal parameter

T

expanding to 17 min, reflecting substantial interference between spatiotemporal parameters. The experiments show that the true parameter values across all operating conditions fall within the 95% confidence intervals, confirming the reliability of the model estimates. Comparing conditions D1 and D2 (Figure 5b), the additional information provided by strong pollution signals effectively mitigates some parameter coupling effects, reducing the temporal confidence interval for D2 to 7 min. These results indicate that, under strong hydraulic constraints, the model maintains excellent temporal resolution for multi-parameter joint inversion.

3.3. Sensitivity Analysis of Key Parameters

3.3.1. Step Sensitivity Analysis for Discrete Parameters

Under different step-size settings (Figure 6), the MCMC algorithm consistently exhibited stable convergence, with the peak of the posterior probability distribution consistently concentrated at the target emission node (

J_{17}

). This indicates that variations in step size have minimal impact on the spatial localization of the pollution source.

3.3.2. Step Sensitivity Analysis for Continuous Parameters

Local stagnation caused by excessively small step size

When the step size is set too small (

σ_{\ln M} = 0.002

,

σ_{T} = 1

), the mixing performance of the Markov chain deteriorates markedly. Candidate states become confined to an extremely narrow neighborhood, substantially reducing the chain’s ability to explore the parameter space effectively. This restriction on movement across low-probability regions hinders the identification of the global optimum. This phenomenon is consistent with the optimal scaling theory of MCMC algorithms, which posits that an appropriate step size is essential to balance the acceptance rate and the mixing efficiency of the chain [44].

Analysis of the posterior probability distribution reveals that excessively small step sizes induce strong autocorrelation in the Markov chain (Figure 7). The chain’s inability to traverse low-probability regions causes it to become confined near local extrema around the initial value, preventing escape. As a result, the posterior probability density function erroneously converges to an extremely narrow peak centered on the initial value, obscuring the true uncertainty range and substantially undermining the credibility of the inversion results.

2.: Sampling stagnation caused by excessive step size

When the step size is set excessively large (e.g., mass parameter

σ_{\ln M} = 5.0

, temporal parameter

σ_{T} = 120

), the inversion process exhibits pronounced inefficiencies in sampling. According to the Metropolis-Hastings criterion, an overly large step size generates candidate states that diverge sharply from the current state. This drastically lowers the acceptance probability, causing the Markov chain to remain at existing states for extended periods.

Analysis of the posterior probability distribution (Figure 8) shows that an excessively large step size can encompass the interval containing the true parameter value, but the quality of the posterior distribution deteriorates markedly. High rejection rates substantially reduce the effective sample size, resulting in pronounced sparsity and jagged fluctuations. These results indicate that continuous parameters are highly sensitive to step-size settings, and inappropriate configurations can severely impair convergence performance.

3.: Stable Convergence Properties Under Reasonable Walking Step Conditions

The iterative trajectory diagram (Figure 9) shows that, for excessively small step sizes, the mass and temporal parameters fluctuate minimally around their initial values over 2500 iterations. The chain’s inability to traverse low-probability regions prevents convergence toward the target values, severely limiting global exploration. Conversely, when step sizes are excessively large (e.g.,

σ_{M} = 5

,

σ_{T} = 120

), the trajectory exhibits a characteristic “staircase-like” stagnation, with the chain remaining in a single state for dozens or even hundreds of iterations before making a substantial leap.

When an appropriate baseline step size is employed, the Markov chain rapidly escapes the influence of the initial values and converges to the true parameter range within a very short pre-burn-in period [51]. It then exhibits high-frequency, stable fluctuations around the target value. These results indicate that a properly configured step size [52] is critical for ensuring that the MCMC algorithm achieves an optimal balance between global exploration and local sampling precision.

3.4. Impact of Monitoring Point Layout

Improved accuracy in node tracing

Monitoring data serve as a critical foundation for tracing misconnections in drainage networks, with the spatial layout directly influencing both the information content and spatiotemporal resolution of the data [53]. As shown in Figure 10, increasing the density of the monitoring network enhances the accuracy of pollution source localization (J_x). By deploying additional branch nodes (Scenario A) or near-source nodes (Scenario B), spatial constraints are effectively introduced, reducing the number of candidate nodes that generate similar downstream responses and thereby substantially improving spatial localization accuracy. From a statistical perspective, under Scenario A, the 95% confidence interval encompassed five candidate nodes (J₁₄–J₁₈), with a peak posterior probability of 0.47. In contrast, under the comprehensive monitoring layout (Scenario C), the upstream interference node (J₁₄) was effectively excluded, reducing the candidate set to four nodes (J₁₅–J₁₈). More importantly, the peak posterior probability density increased to 0.52, indicating a markedly stronger concentration of belief in the true source location.

2.: Effect of Spatial Distance on Discharge Concentration Parameter (M) Inversion;

For the inversion of the discharge concentration parameter (M), the effect of adding monitoring points exhibits pronounced scenario-specific differences. Previous studies have noted that sensor location affects parameter identifiability [54]. In Scenario A, a station deployed at the confluence node J₁₅ effectively captured tributary mixing signals and transport attenuation information, reducing the median and mean inversion errors to 0.673% and 0.682%, respectively—an accuracy improvement of 57.3% compared with the baseline scenario (Figure 11). In contrast, adding a station near the source at J₁₆ (Scenario B) increased the median error to 2.63%. This is primarily attributed to insufficient mixing of pollution plumes close to the source, where strong pulse characteristics in the observed sequence are highly sensitive to local hydraulic disturbances [55]. Additionally, the lack of spatiotemporal evolution information contributed to estimation biases. These results indicate that reliable identification of discharge concentration parameters depends more on adequate hydraulic transport distance than on simple spatial proximity.

3.: “Proximity Effect” on Time Parameter (T) Inversion

For all monitoring layouts, the model effectively converges to the target values within time (Figure 12). However, as detailed in Table 9, the inclusion of the near-source monitoring point, J₁₆, significantly optimizes performance: the median error for time inversion decreases to 0.10%, representing an accuracy improvement of 87.2%. Due to the close proximity of the near-source point to the discharge outlet, the pollutant arrival time is minimally influenced by uncertainties inherent in hydrodynamic transport processes, such as flow velocity fluctuations and variations in dispersion coefficients [55]. In this study, such hydraulic uncertainty was effectively controlled by calibrating the pipe roughness and geometric parameters against observed flow records (as detailed in Section 3.1). Furthermore, residual variability was statistically quantified via the error term (

σ

) in the Bayesian likelihood function, allowing the model to accommodate minor deviations in travel time without compromising inversion accuracy. Consequently, J₁₆ facilitates the precise identification of the pollution event’s onset, thereby significantly narrowing the search space for temporal parameters.

4.: Synergistic Optimization Effects of Integrated Layout

Scenario C integrates the advantages of the preceding strategies to construct a highly complementary monitoring network. In terms of accuracy, the median error for the mass parameter (M) decreased to 0.25% (an improvement of 84.2%), while the median error for the time parameter (T) fell to 0.09% (an improvement of 88.5%), representing the optimal performance among all evaluated scenarios. Mechanistically, this layout effectively leverages the strong temporal constraints provided by the near-source point (J₁₆) and the capacity of the distal branch point (J₁₅) to characterize mass transport processes. This finding aligns with the theory of optimal sensor placement, which suggests that distributed monitoring networks can maximize information gain by capturing complementary signal features from different hydraulic zones [13,39].

A strategic layout of monitoring points facilitates the acquisition of high-quality time-series data, thereby significantly enhancing the efficiency of pollution source tracing. Experimental results indicate that near-source monitoring points are optimal for constraining the temporal parameter, T, owing to their rapid response to discharge events. However, due to the limited transport distance, these locations often exhibit insufficient mixing, which can introduce significant uncertainty into the inference of the mass parameter, M. Conversely, distant or confluence nodes benefit from extended hydraulic travel distances that ensure adequate mixing, rendering them more conducive to the stable and accurate inversion of M. However, it should be noted that although increasing monitoring density substantially enhances inversion accuracy, it inevitably entails higher installation and operational costs, particularly when sensors are densely deployed at terminal nodes and trunk pipe confluences. In practical engineering, a progressive layout strategy is recommended to balance budget constraints with performance; this approach prioritizes capturing essential characteristic values of pollution signals at key junctions while minimizing the impact of information interference during transport. In addition, the harsh conditions within drainage networks—such as high humidity, corrosive gases, and clogging risks—can adversely affect sensor durability and measurement reliability, further necessitating a strategic, robust layout to ensure long-term data stability.

4. Conclusions

This study addresses the challenges inherent in tracing pollution sources within urban drainage networks by establishing a coupled framework that integrates the SWMM with Bayesian inference. Through an MCMC sampling algorithm, the proposed approach achieves the joint inversion of pollution source location, discharge mass, and event timing. Validation in the Daming Lake district demonstrates that:

The SWMM Bayesian model features high-precision source identification and strong robustness.

In single-parameter inversion, the mean error was consistently maintained below 0.6%. During multi-parameter joint inversion, notably, all true parameter values fell within the 95% credible intervals, even though parameter interactions resulted in wider posterior distributions, specifically for discharge concentration (M) and timing (T).

2.: The efficiency and convergence quality of MCMC sampling are determined by the differential step size strategy.

Discrete node parameters exhibit strong robustness to variations in step size, whereas continuous parameters (M and T) are highly sensitive. Inappropriate step-size selection may lead to artificially narrow posterior distributions due to local chain stagnation when steps are excessively small, or to sparse and noisy posterior patterns resulting from high rejection rates when steps are overly large. This behavior is fully consistent with general MCMC theory, highlighting the necessity of careful step-size tuning to achieve reliable uncertainty quantification.

3.: Monitoring layouts exhibit significant spatial sensitivity differences and synergistic complementary effects.

Near-source monitoring points can identify the emission time (T) with errors as low as 0.10%; however, due to insufficient mixing, the estimation error of discharge concentration (M) increases to 2.63%. These results indicate a generalizable monitoring strategy: near-source sensors provide critical temporal constraints owing to their rapid response, whereas downstream confluence sensors ensure adequate mixing conditions for accurate concentration estimation. By adopting a collaborative strategy that integrates near-source and confluence monitoring points, the median inversion errors for M and T are reduced to 0.25% and 0.09%, respectively, corresponding to accuracy improvements of 84.2% and 88.5%. Collectively, these findings confirm the complementary roles of near-source and confluence locations in the optimized design of monitoring networks.

Future research should prioritize optimizing monitoring network designs through a ‘complementary layout’ of upstream branch and downstream trunk nodes, specifically to enhance source tracking within illicit cross-connection zones. While this study confirms the model’s structural optimality and mass balance under dry-weather flow, extending this framework to rainfall-driven scenarios is essential to assess how hydraulic turbulence affects uncertainty. This expansion should also account for realistic physical constraints absent in idealized models, such as adverse pipe slopes and siltation, thereby ensuring the methodology remains robust across diverse seasonal variations and irregular network topologies.

Author Contributions

R.W.: Conceptualization, Methodology, Validation, Formal Analysis, Investigation, Data Curation, Writing—Original Draft Preparation, and Visualization; X.C.: Methodology, Software, Investigation, Data Curation, Visualization, and Writing—Review & Editing; X.L.: Conceptualization, Resources, Writing—Review & Editing, Supervision, Project Administration, and Funding Acquisition; G.L.: Methodology, Validation, Formal Analysis, Resources, Writing—Review & Editing, and Supervision; F.D.: Software, Validation, Formal Analysis, Investigation, Data Curation, and Writing—Review & Editing; J.Y.: Conceptualization, Validation, Resources, Writing—Review & Editing, Supervision, and Project Administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (2024YFC3214400, 2024YFC3212600) and the National Natural Science Foundation of China (52209106, U2443225).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SWMM	Storm Water Management Model
MCMC	Markov Monte Carlo

References

Dong, X.; Guo, H.; Zeng, S. Enhancing Future Resilience in Urban Drainage System: Green versus Grey Infrastructure. Water Res. 2017, 124, 280–289. [Google Scholar] [CrossRef]
Chen, Z.; Huang, G. Numerical Simulation Study on the Effect of Underground Drainage Pipe Network in Typical Urban Flood. J. Hydrol. 2024, 638, 131481. [Google Scholar] [CrossRef]
Ellis, J.B.; Butler, D. Surface Water Sewer Misconnections in England and Wales: Pollution Sources and Impacts. Sci. Total Environ. 2015, 526, 98–109. [Google Scholar] [CrossRef] [PubMed]
Xue, B.; Lichtfouse, E.; Zhou, X. Methods to Monitor the Defects of the Drainage Pipe Network: A Review. Environ. Chem. Lett. 2025, 23, 1877–1894. [Google Scholar] [CrossRef]
Wang, J.; Liu, G.; Wang, J.; Xu, X.; Shao, Y.; Zhang, Q.; Liu, Y.; Qi, L.; Wang, H. Current Status, Existent Problems, and Coping Strategy of Urban Drainage Pipeline Network in China. Environ. Sci. Pollut. Res. 2021, 28, 43035–43049. [Google Scholar] [CrossRef]
Carrey, R.; Ballesté, E.; Blanch, A.R.; Lucena, F.; Pons, P.; López, J.M.; Rull, M.; Solà, J.; Micola, N.; Fraile, J.; et al. Combining Multi-Isotopic and Molecular Source Tracking Methods to Identify Nitrate Pollution Sources in Surface and Groundwater. Water Res. 2021, 188, 116537. [Google Scholar] [CrossRef]
Hachad, M.; Lanoue, M.; Duy, S.V.; Villemur, R.; Sauvé, S.; Prévost, M.; Dorner, S. Locating Illicit Discharges in Storm Sewers in Urban Areas Using Multi-Parameter Source Tracking: Field Validation of a Toolbox Composite Index to Prioritize High Risk Areas. Sci. Total Environ. 2022, 811, 152060. [Google Scholar] [CrossRef]
Tizmaghz, Z.; van Zyl, J.E.; Henning, T.F.P.; Donald, N.; Pancholy, P. Defect-Level Condition Assessment of Sewer Pipes. J. Water Resour. Plan. Manag. 2015, 151, 04024074. Available online: https://ascelibrary.org/doi/abs/10.1061/JWRMD5.WRENG-6225 (accessed on 6 January 2026). [CrossRef]
Cuingnet, R.; Bernard, M.; Sampaio, P.R.; Sakhri, I.; Chelouche, K.; Jossent, J.; Doumi, I.; Gaudichet, E.; Chenu, D.; Maitrot, A.; et al. Reliable Recommendations for CCTV Sewer Inspections Through Multi-Label Image Classification. Adv. Eng. Inform. 2025, 65, 103317. [Google Scholar] [CrossRef]
Ma, Q.; Tian, G.; Zeng, Y.; Li, R.; Song, H.; Wang, Z.; Gao, B.; Zeng, K. Pipeline In-Line Inspection Method, Instrumentation and Data Management. Sensors 2021, 21, 3862. [Google Scholar] [CrossRef] [PubMed]
Shik, A.V.; Akhmetov, R.M.; Sugakov, G.K.; Filatova, D.G.; Doroshenko, I.A.; Podrugina, T.A.; Beklemishev, M.K. Facile Detection of Illicit Wastewater Discharge into a Water Source Using a Kinetic-Based Optical Fingerprinting Strategy. Anal. Methods 2025, 17, 8061–8072. [Google Scholar] [CrossRef]
Liao, Z.; Zhi, G.; Zhou, Y.; Xu, Z.; Rink, K. To Analyze the Urban Water Pollution Discharge System Using the Tracking and Tracing Approach. Environ. Earth Sci. 2016, 75, 1080. [Google Scholar] [CrossRef]
Wang, S.; Zhang, X.; Wang, J.; Tao, T.; Xin, K.; Yan, H.; Li, S. Optimal Sensor Placement for the Routine Monitoring of Urban Drainage Systems: A Re-Clustering Method. J. Environ. Manag. 2023, 335, 117579. [Google Scholar] [CrossRef]
Rodríguez-Vidal, F.J.; García-Valverde, M.; Ortega-Azabache, B.; González-Martínez, Á.; Bellido-Fernández, A. Characterization of Urban and Industrial Wastewaters Using Excitation-Emission Matrix (EEM) Fluorescence: Searching for Specific Fingerprints. J. Environ. Manag. 2020, 263, 110396. [Google Scholar] [CrossRef] [PubMed]
Mladenov, N.; Sanfilippo, S.; Panduro, L.; Pascua, C.; Arteaga, A.; Pietruschka, B. Tracking Performance and Disturbance in Decentralized Wastewater Treatment Systems with Fluorescence Spectroscopy. Environ. Sci. Water Res. Technol. 2024, 10, 1506–1516. [Google Scholar] [CrossRef]
de Bastos, F.; Reichert, J.M.; Minella, J.P.G.; Rodrigues, M.F. Strategies for Identifying Pollution Sources in a Headwater Catchment Based on Multi-Scale Water Quality Monitoring. Environ. Monit. Assess. 2021, 193, 169. [Google Scholar] [CrossRef]
Xu, Z.; Wang, L.; Yin, H.; Li, H. Quantification of Groundwater Infiltration into Urban Drainage Networks Based on Marker Species Approach. J. Tongji Univ. (Nat. Sci.) 2016, 44, 593–599. [Google Scholar] [CrossRef]
Bragg, M.A.; Poudel, A.; Vasconcelos, J.G. Comparing SWMM and HEC-RAS Hydrological Modeling Performance in Semi-Urbanized Watershed. Water 2025, 17, 1331. [Google Scholar] [CrossRef]
Bisht, D.S.; Chatterjee, C.; Kalakoti, S.; Upadhyay, P.; Sahoo, M.; Panda, A. Modeling Urban Floods and Drainage Using SWMM and MIKE URBAN: A Case Study. Nat. Hazards 2016, 84, 749–776. [Google Scholar] [CrossRef]
An Automatic Calibration Framework Based on the InfoWorks ICM Model: The Effect of Multiple Objectives During Multiple Water Pollutant Modeling-Web of Science Core Collection. Available online: https://webofscience.clarivate.cn/wos/woscc/full-record/WOS:000620114900002 (accessed on 13 January 2026).
Katoch, S.; Chauhan, S.S.; Kumar, V. A Review on Genetic Algorithm: Past, Present, and Future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef]
Pan, H.; Li, Y.; Zhang, J.; Cao, C.; Cheng, Y.; Zhou, Y.; Wang, Y.; Bai, S.; Liu, J.; Jin, Q.; et al. Identifying Urban River Pollution Sources from Wet-Weather Discharges Using an Integrated Deep Learning and Data Assimilation Approach. J. Hydrol. 2025, 661, 133797. [Google Scholar] [CrossRef]
Grbčić, L.; Lučin, I.; Kranjčević, L.; Družeta, S. Water Supply Network Pollution Source Identification by Random Forest Algorithm. J. Hydroinform. 2020, 22, 1521–1535. [Google Scholar] [CrossRef]
Salem, A.K.; Abokifa, A.A. Machine Learning–Based Source Identification in Sewer Networks. J. Water Resour. Plan. Manag. 2023, 149, 04023034. [Google Scholar] [CrossRef]
Zhao, Z.; Yin, H.; Xu, Z.; Peng, J.; Yu, Z. Pin-Pointing Groundwater Infiltration into Urban Sewers Using Chemical Tracer in Conjunction with Physically Based Optimization Model. Water Res. 2020, 175, 115689. [Google Scholar] [CrossRef]
Jerez, D.J.; Jensen, H.A.; Beer, M.; Broggi, M. Contaminant Source Identification in Water Distribution Networks: A Bayesian Framework. Mech. Syst. Signal Process. 2021, 159, 107834. [Google Scholar] [CrossRef]
Wang, X.; Jin, Y.; Schmitt, S.; Olhofer, M. Recent Advances in Bayesian Optimization. ACM Comput. Surv. 2023, 55, 1–36. [Google Scholar] [CrossRef]
Wang, Q.-A.; Chen, J.; Ni, Y.; Xiao, Y.; Liu, N.; Liu, S.; Feng, W. Application of Bayesian Networks in Reliability Assessment: A Systematic Literature Review. Structures 2025, 71, 108098. [Google Scholar] [CrossRef]
Yang, L.; Huang, B.; Liu, J.; Qian, S.; Feng, J. Pollution Source Tracing in Sewer Networks Using a SWMM-Bayesian Coupling Approach. J. Hohai Univ. (Nat. Sci.) 2024, 52, 20–29. [Google Scholar] [CrossRef]
Liu, T.; Surjanovic, N.; Biron-Lattes, M.; Bouchard-Côté, A.; Campbell, T. AutoStep: Locally Adaptive Involutive MCMC 2025. arXiv 2024, arXiv:2410.18929. [Google Scholar]
Pichler, M. swmm_api: A Python Package for Automation, Customization, and Visualization in SWMM-Based Urban Drainage Modeling. Water 2025, 17, 1373. [Google Scholar] [CrossRef]
Zhang, X.; Qiao, W.; Huang, J.; Li, H.; Wang, X. Impact and Analysis of Urban Water System Connectivity Project on Regional Water Environment Based on Storm Water Management Model (SWMM). J. Clean. Prod. 2023, 423, 138840. [Google Scholar] [CrossRef]
Zoppou, C. Review of Urban Storm Water Models. Environ. Model. Softw. 2001, 16, 195–231. [Google Scholar] [CrossRef]
Zhao, X.; Curtis, A. Bayesian Inversion, Uncertainty Analysis and Interrogation Using Boosting Variational Inference. J. Geophys. Res. Solid Earth 2024, 129, e2023JB027789. [Google Scholar] [CrossRef]
Mattingly, H.H.; Transtrum, M.K.; Abbott, M.C.; Machta, B.B. Maximizing the Information Learned from Finite Data Selects a Simple Model. Proc. Natl. Acad. Sci. USA 2018, 115, 1760–1765. [Google Scholar] [CrossRef] [PubMed]
Hutton, C.J.; Kapelan, Z. A Probabilistic Methodology for Quantifying, Diagnosing and Reducing Model Structural and Predictive Errors in Short Term Water Demand Forecasting. Environ. Modell. Softw. 2015, 66, 87–97. [Google Scholar] [CrossRef]
Shao, Z.; Xu, L.; Chai, H.; Yost, S.A.; Zheng, Z.; Wu, Z.; He, Q. A Bayesian-SWMM Coupled Stochastic Model Developed to Reconstruct the Complete Profile of an Unknown Discharging Incidence in Sewer Networks. J. Environ. Manag. 2021, 297, 113211. [Google Scholar] [CrossRef] [PubMed]
Guozhen, W.; Zhang, C.; Li, Y.; Haixing, L.; Zhou, H. Source Identification of Sudden Contamination Based on the Parameter Uncertainty Analysis. J. Hydroinform. 2016, 18, 919–927. [Google Scholar] [CrossRef]
He, M.; Zhang, Y.; Ma, Z.; Zhao, Q. Intelligent Optimal Layout of Drainage Pipe Network Monitoring Points Based on Information Entropy Theory. Front. Environ. Sci. 2024, 12, 1401942. [Google Scholar] [CrossRef]
Azri, N.A.A.; Kasmuri, N.; Zaini, N.; Ahmad, R. Treatment of Illegal Discharge Using Catalyzed Hydrogen Peroxide (CHP): A Case Study. IOP Conf. Ser. Earth Environ. Sci. 2025, 1467, 012001. [Google Scholar] [CrossRef]
Nielsen, P.H.; Raunkjær, K.; Norsker, N.H.; Jensen, N.A.; Hvitved-Jacobsen, T. Transformation of Wastewater in Sewer Systems—A Review. Water Sci. Technol. 1992, 25, 17–31. [Google Scholar] [CrossRef]
Izzatullah, M.; van Leeuwen, T.; Peter, D. Bayesian Seismic Inversion: A Fast Sampling Langevin Dynamics Markov Chain Monte Carlo Method. Geophys. J. Int. 2021, 227, 1523–1553. [Google Scholar] [CrossRef]
Zhang, H.; Xu, Z.; Wang, W.; Peng, S.; Li, C.; Fang, S.; Guo, D.; Yin, H. Quantifying the Performance of Urban Sewer Network Using Inverse-Problem Models: An Approach for Synchronous Determination of in-Sewer Groundwater Infiltration and Pollutant Degradation. J. Hydrodyn. 2025, 37, 1–13. [Google Scholar] [CrossRef]
Gelman, A.; Gilks, W.R.; Roberts, G.O. Weak Convergence and Optimal Scaling of Random Walk Metropolis Algorithms. Ann. Appl. Probab. 1997, 7, 110–120. [Google Scholar] [CrossRef]
Sawai, K.; Uchiyama, Y.; Kosaka, M. V-Tiger Auto-Tuning Optimizing Overshoot, Settling Time, and Chattering over 95% Confidence Intervals. Control Eng. Pract. 2026, 168, 106693. [Google Scholar] [CrossRef]
Moriasi, D.N.; Arnold, J.G.; Liew, M.W.V.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model Evaluation Guidelines for Systematic Quantification of Accuracy in Watershed Simulations. Trans. ASABE 2007, 50, 885–900. [Google Scholar] [CrossRef]
Gironás, J.; Roesner, L.A.; Davis, J. Storm Water Management Model Applications Manual; U.S. Environmental Protection Agency: Washington, DC, USA, 2009.
Engman, E.T. Roughness Coefficients for Routing Surface Runoff. J. Irrig. Drain Eng. 1986, 112, 39–53. [Google Scholar] [CrossRef]
Tsihrintzis, V.A.; Hamid, R. Runoff Quality Prediction from Small Urban Catchments Using SWMM. Hydrol. Process. 1998, 12, 311–329. [Google Scholar] [CrossRef]
Zhang, Z.; Li, Z.; Song, Y. On Ignoring the Heterogeneity in Spatial Autocorrelation: Consequences and Solutions. Int. J. Geogr. Inf. Sci. 2024, 38, 2545–2571. [Google Scholar] [CrossRef]
Johndrow, J.E.; Smith, A.; Pillai, N.; Dunson, D.B. MCMC for Imbalanced Categorical Data. J. Am. Stat. Assoc. 2018, 114, 1394–1403. Available online: https://www.tandfonline.com/doi/10.1080/01621459.2018.1505626 (accessed on 9 January 2026). [CrossRef]
Wang, L.; Xi, C. An improved MCMC algorithm for inversion of source parameters using GPS data under the Bayesian framework. Chin. J. Geophys.-Chin. Ed. 2024, 67, 3367–3385. [Google Scholar] [CrossRef]
Rathi, S.; Gupta, R. Sensor Placement Methods for Contamination Detection in Water Distribution Networks: A Review. Procedia Eng. 2014, 89, 181–188. [Google Scholar] [CrossRef]
Banik, B.K.; Alfonso, L.; Di Cristo, C.; Leopardi, A.; Mynett, A. Evaluation of Different Formulations to Optimally Locate Sensors in Sewer Systems. J. Water Resour. Plan. Manag. 2017, 143, 04017026. [Google Scholar] [CrossRef]
Reuschen, S.; Nowak, W.; Guthke, A. The Four Ways to Consider Measurement Noise in Bayesian Model Selection—And Which One to Choose. Water Resour. Res. 2021, 57, e2021WR030391. [Google Scholar] [CrossRef]

Figure 1. Study area map.

Figure 2. Research framework diagram.

Figure 3. Flow/COD calibration and validation chart.

Figure 4. Posterior distribution characteristics of single parameters (S1–S5). The red dotted lines represent the target values.

Figure 5. Posterior probability distribution map after multi-parameter joint inversion (D1, D2). The red dotted lines represent the target values.

Figure 6. Posterior probability distribution of nodes under different walking stride lengths. The red dotted lines represent the target values.

Figure 7. Posterior probability distribution under conditions of excessively small step size. The red dotted lines represent the target values.

Figure 8. Posterior probability distribution under conditions of excessive step length. The red dotted lines represent the target values.

Figure 9. Quality and time iteration chart.

Figure 10. Posterior probability distribution plots for nodes (J_x) under different monitoring layouts. The red dotted lines represent the target values.

Figure 11. Posterior probability distribution of quality (M) under different monitoring layouts. The red dotted lines represent the target values.

Figure 12. Posterior probability distribution plots of time (T) under different monitoring layouts. The red dotted lines represent the target values.

Table 1. Foundational data sources for model construction.

Data Requirements		Data Accuracy	Data Source	Data Format
Type	Data Content	Data Accuracy	Data Source	Data Format
Hydrological data	rainfall	1 h	https://data.cma.cn/ (accessed on 15 January 2026)	Excel
Ground elevation	DEM Ground Elevation	5 m	https://www.gscloud.cn/ (accessed on 15 January 2026)	SHP
Substrate data	Land Use Classification, Current Land Use Status, Planning Map	/	https://www.gscloud.cn/ (accessed on 15 January 2026)	SHP
Drainage pipe network	Manhole, Pipeline, and Mixed Joint Properties	/	Site survey	SHP/CAD
Monitoring of Water Quantity and Quality at River Outfalls	Flow rate, COD concentration	5 min/ session	Field monitoring, fixed monitoring stations (SDT-500 Multi-Parameter Water Quality Monitor, Beijing Shidian Technology Co., Ltd., Beijing, China)	Excel/txt

Table 2. Operating condition settings for different combinations of unknown parameters.

Operating Condition Number	Number of Unknown Parameters	Unknown Parameter	Emission Node	Emission Concentration	Emission Time	Design Purpose
S1	1	J_x	J₂₆	500	22:00	Single-location inversion: Considering the spatial layout of the proximal tributary (J₁₇) and distal mainstem (J₂₆)
S2	1	J_x	J₁₇	500	22:00
S3	1	M	J₂₆	300	22:00	Single-Concentration Inversion: Assessing the Accuracy of Concentration Difference Attribution
S4	1	M	J₂₆	800	22:00
S5	1	T	J₂₆	500	22:00	Single-Time Inversion: Capturing time sensitivity at a specific moment (22:00)
D1	3	J_x M T	J₂₆	500	22:00	Multi-parameter joint inversion: Inversion performance under randomly combined scenarios
D2	3	J_x M T	J₁₇	800	18:00

Table 3. MCMC step size sensitivity analysis operating condition settings table.

Operating Condition Name	Emission Position Step Size (σ_Jx)	Emission Quality Step Size (σ_lnM)	Emission Time Step (σ_T)	Step Length Feature Description
Case-1	1	0.002	1	Too Small
Case-2	2	0.05	30	Baseline
Case-3	10	5	120	Too Large

Table 4. Monitoring point layout scenario design.

Monitoring Point Layout	Monitoring Point Location	Operating Conditions Description
A	J₁₁, J₁₅	Branch Node Layout
B	J₁₁, J₁₆	Upstream Layout of Branch Nodes
C	J₁₁, J₁₅, J₁₆	Comprehensive Layout

Table 5. Calibration verification table for water quantity and water quality.

Period	Number of Rainfall Events	Flow		COD
Period	Number of Rainfall Events	Average Relative Error (%)	Nash Coefficient	Average Relative Error (%)	Nash Coefficient
Regularity	6.23	0.97	8.94	0.72	6.23
Regularity	11.28	0.63	5.20	0.79	11.28
Validation Period	17.11	0.85	9.38	0.59	17.11

Table 6. Surface runoff and hydrodynamic key parameters of pipeline networks.

Parameter Name		Value Range	Model Parameter Values
Surface Water Hydrodynamic Parameters	N-imperv	0.001~0.2	0.011
	N-perv	0.001~0.80	0.386
	Zero-imperv (%)	2~50	20
Infiltration and Water Storage Parameters	Des-imperv/mm	0.05~2.54	1.106
	Des-perv/mm	2~7.62	2.028
	Max Infil Rate/(mm/h)	/	83.027
	Min Infil Rate/(mm/h)	/	24.713
	Decay constant	2~7	5.045
	Drying Time (h)	1~100	8.810
Pipeline Network Parameters	Roughness	0.010~0.020	0.015
Water Quality Parameters	Decay Coeff.	/	0.01

Table 7. Comparison of inversion results for different M operating conditions (S3–S4) with actual values.

Operating Condition Number	M
Operating Condition Number	Actual Value	Median Error (%)	Mean Error (%)	95% CI	Relative Width (%)
S3	300	0.527	0.51	[256.3, 339.1]	27.6
S4	800	0.028	0.13	[756.6, 840.2]	10.5

Table 8. Comparison of true and predicted values for mass M in multi-parameter inversion.

Operating Condition Number	M
Operating Condition Number	Actual Value	Median Error (%)	Mean Error (%)	95% CI	Relative Width (%)
D1	500	1.086	1.154	[432.19, 561.22]	25.97%
D2	800	1.580	1.573	[692.59, 883.98]	23.92%

Table 9. Comparison of results across different monitoring point layout scenarios.

Monitoring Point Layout	M			T
Monitoring Point Layout	Actual Value	Median Error (%)	Mean Error (%)	Actual Value	Median Error (%)	Mean Error (%)
A	800	0.673	0.682	18:00	0.697	0.625
B	800	2.63	2.72	18:00	0.1	0.35
C	800	0.25	0.2	18:00	0.09	0.21

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, R.; Chen, X.; Liu, X.; Lan, G.; Dong, F.; Yang, J. Pollution Source Identification and Parameter Sensitivity Analysis in Urban Drainage Networks Using a Coupled SWMM–Bayesian Framework. Processes 2026, 14, 699. https://doi.org/10.3390/pr14040699

AMA Style

Wang R, Chen X, Liu X, Lan G, Dong F, Yang J. Pollution Source Identification and Parameter Sensitivity Analysis in Urban Drainage Networks Using a Coupled SWMM–Bayesian Framework. Processes. 2026; 14(4):699. https://doi.org/10.3390/pr14040699

Chicago/Turabian Style

Wang, Ronghuan, Xuekai Chen, Xiaobo Liu, Guoxin Lan, Fei Dong, and Jiangnan Yang. 2026. "Pollution Source Identification and Parameter Sensitivity Analysis in Urban Drainage Networks Using a Coupled SWMM–Bayesian Framework" Processes 14, no. 4: 699. https://doi.org/10.3390/pr14040699

APA Style

Wang, R., Chen, X., Liu, X., Lan, G., Dong, F., & Yang, J. (2026). Pollution Source Identification and Parameter Sensitivity Analysis in Urban Drainage Networks Using a Coupled SWMM–Bayesian Framework. Processes, 14(4), 699. https://doi.org/10.3390/pr14040699

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pollution Source Identification and Parameter Sensitivity Analysis in Urban Drainage Networks Using a Coupled SWMM–Bayesian Framework

Abstract

1. Introduction

2. Materials and Methods

2.1. Research Area

2.2. SWMM

2.3. Bayesian Algorithm Construction

2.3.1. Prior Distribution

2.3.2. Likelihood Function

2.3.3. Posterior Distribution

2.4. Research Framework

3. Results and Discussion

3.1. SWMM Calibration and Validation

3.1.1. Calibration Verification Results

3.1.2. Model Parameter Settings

3.2. Analysis of Bayesian Inversion Results

3.2.1. Accuracy Analysis of Single-Parameter Inversion

3.2.2. Multi-Parameter Joint Inversion and Uncertainty Analysis

3.3. Sensitivity Analysis of Key Parameters

3.3.1. Step Sensitivity Analysis for Discrete Parameters

3.3.2. Step Sensitivity Analysis for Continuous Parameters

3.4. Impact of Monitoring Point Layout

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI