2.2. EPANET-DD Model
The EPANET-DD model solves the equations under quasi-steady flow conditions, solving the hydraulic problem under steady flow conditions with the EPANET-MATLAB-Toolkit (Eliades, et al., 2016) [
14] and the advection-diffusion-dispersion equation under dynamic flow conditions in the two-dimensional case with the classical random walk method [
13], implementing the diffusion and dispersion equations proposed by Romero-Gomez and Choi (2011) [
11].
The Toolkit uses an object-oriented approach, through the definition of a MATLAB class called epanet, which provides a standardised way to manage the network structure, to reach all functions as well as procedures that use multiple functions, to simulate and generally perform different types of network analysis, through the corresponding object. Inside there are local functions that make direct calls to EPANET. Through the function getComputedHydraulicTimeSeries, it was possible to perform the hydraulic simulation, solving the flow continuity and headloss equations recalled by EPANET.
Using the functions shown in
Table 1, it was possible to obtain the speed, flow rate and headloss values on the pipes and the pressure, hydraulic head and actual demand values on the nodes.
The flow continuity (Equation (1)) and headloss (Equation (2)) equations are solved using the “Gradient Method” developed by Todini and Pilati (1987) [
2]. The first equation is solved around all nodes through Equation (1), in which
is the flow demand at node I, and by convention,
is the positive flow from a pipe
ij into the node.
The flow–headloss relationship in a pipe between nodes
i and
j are calculated as follow:
in which
H is the nodal head,
h is the headloss,
r the resistance coefficient,
Q the flow rate,
n is the flow exponent, and
m is the minor loss coefficient. The value of the resistance coefficient will depend on which friction headloss formula is used, defined with the formula of Hagen–Poiseuille, of Colebrook–White and cubic interpolation from the Moody diagram, as a function of the flow rate.
The quality analysis was developed using the classical random walk method. As demonstrated in the literature ([
13,
14,
15]), the use of this combined method is possible due to the similarities between the Fokker–Planck–Kolmogorov equation and the advection-dispersion equation. The two equations are identical unless there is a conceptual difference between the parameters of the two equations, as the parameters present in the Fokker–Planck–Kolmogorov equation are independent of time, resulting from the stationary hypothesis. To overcome this problem and address the issues related to discontinuities that could cause local mass conservation errors, [
16], Delay et al. (2005) [
13] provided a new equivalence, making this analogy valid again. This methodology can be easily applied to any flow model because the mass of the solute is discretised and transported by the particles in the random walk. Consequently, the mass conservation principle is automatically satisfied because the particles cannot suddenly disappear.
This model allows us to determine the position of the solute particles that move inside the network in the
and
directions as a function of the different flow regimes that occur inside the network, as shown in Equations (3) and (4):
where
ux corresponds to the component along the
x axis of the flow velocity,
uy equals to the component along the
y axis of the flow velocity,
dt is the duration of the contamination event,
d is the pipe diameter, and
Ef and
Eb are the forwards and backwards diffusion coefficients, respectively, as defined by Romero-Gomez and Choi (2011). In Equation (3), the diffusion coefficient assumes the forward or backwards values depending on whether the flow direction is positive or negative. The above equation was developed considering laminar flow conditions, in which the velocities in the network are relatively low. This allows the particles to move freely along the
y axis. This characteristic is also highlighted by the presence of the term in round brackets,
, which multiplies the x component of the velocity
ux. In fact, as the velocity along the
direction increases and the flow rate changes, the particles tend to move along the preferred flow direction, and the term in brackets disappears from the equation.
To confine the particles inside the pipe section, the previous equations are solved considering the following boundary conditions (Equations (5) and (6)).
where the particle position along
y is limited above and below by the physical presence of the pipe wall. The parameters −
ymax and
ymax coincide with the value of the pipe radius and take on a positive and negative value since the
x axis has been placed at the centre of gravity concerning the cross-section of the pipe. Using these two boundary conditions, the particles are not only prevented from escaping from the pipe but are also reflected, which prevents the particles from settling along the wall. These conditions are called the boundary reflection condition.
At this point, the contaminant concentration has been determined through Equation (7), in which the concentration value at the previous time has been increased by an amount that corresponds to the concentration per unit of particles
passing through the control volume
, where
is the pipe length,
is the section number of the pipe, and
is the cross-sectional area of the pipe.
The three models (EPANET, AZRED and EPANET-DD) have been adequately calibrated both from a hydraulic and quality point of view.
The roughness coefficient was calibrated according to the flow rate measured upstream of the network (1.44 m3/h) and the diameter of each pipeline, calculating and iterating the uniform flow rate to coincide with the measured flow rate upstream of the network. Numerous experimental tests were conducted on the network, varying the pressure set at the pumping system (3.5–4.5 bar) and the flow rates drawn from the network nodes (between 5 and 15 L/min for nodes 5, 8 and 11).
Table 2,
Table 3 and
Table 4 show the calibrated roughness values of the pipes and the standard deviation (σ) values determined for the pressures at nodes 6, 7, 9, 10, the flow rates flowing into the network and the flow rates tapped at the nodes 5, 8, 11. Standard deviation of zero means that there is no variability between the data.
The backward and forward dispersion coefficients (
and
respectively) were calibrated through a trial-and-error operation using statistical parameters such as Nash–Sutcliffe efficiency (
NSE) [
17], Kling–Gupta efficiency (
KGE) [
18] and coefficient of determination (R
2) [
19].
The Nash–Sutcliffe efficiency (
NSE) coefficient [
17] is a hydrology metric that measures how well a model simulation predicts an outcome variable. It is defined as one minus the ratio of the error variance of the modelled time series divided by the variance of the observed time series, as shown in Equation (8):
where
and
correspond to the measured and simulated values of the variable, respectively, and
is the average of the measured values of
. If
NSE = 1, there is a perfect correspondence between the model and the observed data; if
NSE = 0, the model has the same predictive capacity as the average of the time series in terms of the sum of the square errors. If
NSE < 0, the observed mean is a better predictor of the model.
The Kling–Gupta efficiency (
KGE) coefficient [
18] is a metric that measures the goodness of fit (Equation (9)). It consists of three main components: the correlation coefficient between the observations and simulations
, the ratio between the standard deviation of the simulated values and the standard deviation of the observed values, and the balance between the average of the simulated values and the average of the experimental values.
Similar to the NSE coefficient, KGE = 1 indicates perfect agreement between the simulations and observations. For KGE values <= 0, analogous to what the authors observed for NSE values, all negative values below the threshold KGE = 0 indicate results with poor model performance.
The coefficient of determination (R
2) measures the goodness of fit of a statistical model. It is defined as the squared value of the linear correlation coefficient. The R
2 value ranges between 0 and 1. A value of zero indicates that there is no correlation between the two data series. On the other hand, higher coefficient values indicate a better fit for the model. However, it is not always true that large R
2 values result in a good model fit, as the linear correlation coefficient could produce a perfectly positive or negative relationship [
19].