# A Data-Driven Surrogate Modelling Approach for Acceleration of Short-Term Simulations of a Dynamic Urban Drainage Simulator

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

^{®}ICM 8.5. Using the GPE approach, rainfall intensity in each time step is considered as an independent discrete parameter. To be able to cover a wide range of rainfall intensities for each time step (corresponding to different return periods), we introduce a synthetic rainfall event generation method. This method is based on statistical analysis of nine-year observed rainfall time series recorded with 10-min time steps within the case study area. We also consider initial conditions, and actuator settings as additional parameters in training the GPEs. It should be noted that our approach can be considered when the number of governing modelling parameters are limited, for example RT-MPC application. Application of the emulator in practice is not the focus of this article.

## 2. Materials and Methods

#### 2.1. Candidate Simulator and Case Study

^{®}ICM 8.5, which requires a detailed description of the structure and geometry of urban drainage network, as well as, numerous parameters and inputs for wastewater hydraulic and quality modelling. It should be noted that, the focus of this study is on developing emulators for wastewater quantity (volume) modelling. A small area from Haute Sûre catchment in the north of Grand Duchy of Luxembourg, is considered as the case study for this research. This area is of special interest because of the lake Haute Sûre, which is located nearby. This lake is the main source of drinking water for the country and its conservation is of primary importance. Figure 1a illustrates the user interface for the case study area in InfoWorks

^{®}ICM 8.5. Figure 1b shows the focus on the combined sewer overflow (CSO) structure which is here subject to emulation.

#### 2.2. Gaussian Process Emulator (GPE) Method

_{β}, ∑(ξ

_{y}))

_{β}is a mean function which is considered linear in time, ∑ is the covariance matrix [np × np] (n is the time dimension and p is the model output dimension), and ξ

_{y}is the vector of covariance matrix parameters. The emulator parameters (Ψ), which are used for prediction, were determined by maximizing the log-likelihood function (Equation (2)), introduced in reference [29], for model output Y, over a reasonable parameter range. Detailed description of the underlying mathematical framework of the GPE method can be found in reference [28].

#### 2.3. Synthetic Rainfall Generator for GPE Training

_{R}= E(LRM1); µ

_{δ}= E(Lδ); A is the coefficient matrix of the autoregressive (AR) model; ε

_{R}and ε

_{δ}are vectors of zero-mean, normally distributed white noise processes.

_{R}, μ

_{δ}, A

_{11}, A

_{12}, A

_{21}, A

_{22}, σ

_{R}

_{2}, σ

_{δ}

_{2,}and ρ

_{Rδ}, where σ

_{R}

^{2}= var(ε

_{R}), σ

_{δ}

^{2}= var(ε

_{δ}), and ρ

_{Rδ}is the correlation between ε

_{R}and ε

_{δ}, we derive two time series of LRM1 and L

_{δ}from two observed time series RM1 and RM2. Upon model calibration we simulate from Lδ(t). This simulation should be conditional to LPo, where Po is an observed time series at a nearby location of RM1. Details about this conditional simulation are provided in reference [35].

#### 2.4. Training and Validation Datasets

^{3}) named as P1; (2) the switch-on level for the fixed flow pump, which controls the outflow of the tank (m AD), named as P2; and (3) the expected upcoming rainfall event in the catchment. Here, as an example, we consider the rainfall intensities (mm/h) during the next nine time steps, which correspond to 90 min (named as P3 to P11). Hence, we have 11 parameters in this case study (P1 to P11). The important issue to keep in mind is that, the parameters must be treated discretely in this method. That is the reason we consider rainfall time series as discrete parameters. The output of interest is the time series of the total wastewater volume for the next day (144 time steps, at 10 min resolution).

^{®}ICM 8.5 in order to build two training and validation datasets of 2500 input-output pairs to train and validate the emulator. As noticed, the rainfall events consist of three time steps more than the rainfall parameters of the emulator, i.e., rainfall events have 12 time steps, while we decided to have only nine time steps as emulator parameters (P3:P11). This was done in order to: (1) have more samples for P1, which is the initial volume in the storage tank, and; (2) neglect initial possible numerical instabilities in simulations. The same three times steps are omitted from output time series as well. Hence, one output time series has 141 time steps. We interface InfoWorks

^{®}ICM 8.5 via Ruby scripting in order to control model setup and model input, and automate the simulations, avoiding manual ensemble simulations and data extraction. Later on, we developed an R code in order to invoke the Ruby code and have all the data generation and emulator development procedure in one programming environment.

## 3. Results and Discussion

#### 3.1. Validation with Ensemble Validation Data

^{®}ICM 8.5 for training and validation purposes. The first ensemble dataset was used to train the emulator and the second one to validate the results produced by the emulator. Nash-Sutcliffe Efficiency (NSE) and Volumetric Efficiency (VE) were calculated as the statistics for quantification of the emulation error (Equations (7) and (8)). NSE indicates the relative magnitude of residual variance between simulation and observation [39]. Whereas, VE evaluates the fraction of water delivered at the proper time (volumetric mismatch) and it can be a complementary indicator to account for the existing problems associated with NSE [40]. NSE or VE equal to 1 means a perfect match between the emulator and simulator time series. The R package “hydroGOF” was used to calculate these error indicators [41].

^{®}ICM 8.5 is considered for this comparison (excluding wastewater quality modelling). This runtime acceleration factor is obtained mainly by reducing the complexity and neglecting the numerical approach behind the detailed simulator and fitting a model (emulator) solely based on the input and output data from the scenarios of the simulator runs.

#### 3.2. Selecting the Optimum Emulator

#### 3.3. Leave-One-Out Cross-Validation Analysis

#### 3.4. Validation for CSO Events Prediction

#### 3.5. Validation for Applicability in Real Consecutive Scenarios

## 4. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Asher, M.J.; Croke, B.F.W.; Jakeman, A.J.; Peeters, L.J.M. A review of surrogate models and their application to groundwater modeling. Water Resour. Res.
**2015**, 51, 5957–5973. [Google Scholar] [CrossRef] [Green Version] - Galelli, S.; Castelletti, A.; Geodbleod, A. High-Performance Integrated Control of water quality and quantity in urban water reservoirs. Water Resour. Res.
**2015**, 4840–4847. [Google Scholar] [CrossRef] - Machac, D.; Reichert, P.; Albert, C. Emulation of dynamic simulators with application to hydrology. J. Comput. Phys.
**2016**, 313, 352–366. [Google Scholar] [CrossRef] - Dipierro, F.; Khu, S.; Savic, D.; Berardi, L. Efficient multi-objective optimal design of water distribution networks on a budget of simulations using hybrid algorithms. Environ. Model. Softw.
**2009**, 24, 202–213. [Google Scholar] [CrossRef] - Stone, N. Gaussian Process Emulators for Uncertainty Analysis in Groundwater Flow. Ph.D. Thesis, The University of Nottingham, Nottingham, UK, 2011. [Google Scholar]
- Fraga, I.; Cea, L.; Puertas, J.; Suarez, J.; Jimenez, V.; Jacome, A. Global Sensitivity and GLUE-Based Uncertainty Analysis of a 2D-1D Dual Urban Drainage Model. J. Hydrol. Eng.
**2016**, 21, 1–11. [Google Scholar] [CrossRef] - O’Hagan, A. Bayesian analysis of computer code outputs: A tutorial. Reliab. Eng. Syst. Saf.
**2006**, 91, 1290–1300. [Google Scholar] [CrossRef] [Green Version] - Blanning, R.V. The construction and implementation of metamodels. Simulation
**1975**, 24, 177–184. [Google Scholar] [CrossRef] - Willcox, K.E.; Peraire, J. Balanced model reduction via the proper orthogonal decomposition. AIAA J.
**2002**, 40, 2323–2330. [Google Scholar] [CrossRef] - Bieker, H.P.; Al, E. Real-time production optimization of oil and gas production systems: A technology survey. SPE Prod. Oper.
**2007**, 22, 382–391. [Google Scholar] [CrossRef] - Robinson, T.; Eldred, M.; Willcox, K.; Haimes, R. Surrogate-based optimization using multifidelity models with variable parameterization and corrected space mapping. AIAA J.
**2008**, 46, 2814–2822. [Google Scholar] [CrossRef] - Regis, R.G.; Shoemaker, C.A. Constrained global optimization of expensive black box functions using radial basis functions. J. Glob. Opt.
**2005**, 31, 153–171. [Google Scholar] [CrossRef] - Razavi, S.; Tolson, B.A.; Burn, D.H. Review of surrogate modeling in water resources. Water Resour. Res.
**2012**, 48. [Google Scholar] [CrossRef] [Green Version] - Mahmoodian, M. Concept and Methodologies to Guide Surrogate Modelling for Real-Time Control (RTC) of Urban Drainage Systems under Uncertainty; EU ITN Project Report; QUICS: Belvaux, Luxembourg, 2018; Available online: https://www.sheffield.ac.uk/quics/dissemination/reports (accessed on 23 April 2018).
- Mahmoodian, M.; Carbajal, J.P.; Bellos, V.; Leopold, U.; Schutz, G.; Clemens, F. A Hybrid Surrogate Modelling Strategy for Simplification of Detailed Urban Drainage Simulators. Water Resour. Manag.
**2018**, 27. [Google Scholar] [CrossRef] - Zhang, Q.; Stanley, J.S. Real-time water treatment process control with artificial neural networks. J. Environ. Eng.
**2000**, 125, 124–137. [Google Scholar] - Soltani, F.; Kerachian, R.; Shirangi, E. Developing operating rules for reservoirs considering the water quality issues: Application of ANFIS-based surrogate models. Expert Syst. Appl.
**2010**, 37, 6639–6645. [Google Scholar] [CrossRef] - Wu, Z.Y.; El-Maghraby, M.; Pathak, S. Applications of deep learning for smart water networks. In 13th Computer Control for Water Industry Conference, CCWI 2015 Applications; Elsevier B.V.: New York, NY, USA, 2015; Volume 119, pp. 479–485. [Google Scholar]
- Han, H.-G.; Qiao, J.-F.; Chen, Q.-L. Model predictive control of dissolved oxygen concentration based on a self-organizing RBF neural network. Control Eng. Pract.
**2012**, 20, 465–476. [Google Scholar] [CrossRef] - Moreno-Rodenas, A.M.; Bellos, V.; Langeveld, J.G.; Clemens, F.H.L.R. A dynamic emulator for physically based flow simulators under varying rainfall and parametric conditions. Water Res.
**2018**, 142, 512–527. [Google Scholar] [CrossRef] - Carbajal, J.P.; Leitão, J.P.; Albert, C. Appraisal of data-driven and mechanistic emulators of nonlinear hydrodynamic urban drainage simulators. arXiv, 2016; arXiv:1609.08395v1. [Google Scholar]
- MUCM Community. Managing Uncertainty in Complex Models, MUCM Toolkit. Available online: http://www.mucm.ac.uk/ (accessed on 8 February 2017).
- Castelletti, A.; Galelli, S.; Restelli, M.; Soncini-Sessa, R. Data-driven dynamic emulation modelling for the optimal management of environmental systems. Environ. Model. Softw.
**2011**, 34, 30–43. [Google Scholar] [CrossRef] - Castelletti, A.; Galelli, S.; Ratto, M.; Soncini-Sessa, R.; Young, P.C. A general framework for Dynamic Emulation Modelling in environmental problems. Environ. Model. Softw.
**2012**, 34, 5–18. [Google Scholar] [CrossRef] - Machac, D.; Reichert, P.; Rieckermann, J.; Albert, C. Fast mechanism-based emulator of a slow urban hydrodynamic drainage simulator. Environ. Model. Softw.
**2016**, 78, 54–67. [Google Scholar] [CrossRef] - Gladish, D.W.; Pagendam, D.E.; Peeters, L.J.M.; Kuhnert, P.M.; Vaze, J. Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models. J. Agric. Biol. Environ. Stat.
**2017**, 23, 39–62. [Google Scholar] [CrossRef] - Reichert, P.; White, G.; Bayarri, M.J.; Pitman, E.B. Mechanism-based emulation of dynamic simulation models: Concept and application in hydrology. Comput. Stat. Data Anal.
**2011**, 55, 1638–1655. [Google Scholar] [CrossRef] - Olson, R.; Chang, W. Mathematical Framework for a Separable Gaussian Process Emulator; PennState College of Earth and Mineral Sciences: University Park, PA, USA, 2013. [Google Scholar]
- Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Olson, R.; Sriver, R.; Goes, M.; Urban, N.M.; Matthews, H.D.; Haran, M.; Keller, K. A climate sensitivity estimate using Bayesian fusion of instrumental observations and an Earth System model. J. Geophys. Res. Atmos.
**2012**, 117, 1–11. [Google Scholar] [CrossRef] - Olson, R.; Sriver, R.; Chang, W.; Haran, M.; Urban, N.M.; Keller, K. What is the effect of unresolved internal climate variability on climate sensitivity estimates? J. Geophys. Res. Atmos.
**2013**, 118, 4348–4358. [Google Scholar] [CrossRef] [Green Version] - R Foundation for Statistical Computing. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2014. [Google Scholar]
- Olson, R.; Chang, W.; Keller, K.; Haran, M.R. Package ‘stilt.’ CRAN. Available online: https://cran.r-project.org/ (accessed on 15 August 2018).
- Heuvelink, G.B.M.; Brown, J.D.; Brown, E.E. A probabilistic framework for representing and simulating uncertain environmental variables. Int. J. Geogr. Inf. Sci.
**2007**, 21, 497. [Google Scholar] [CrossRef] - Torres-Matallana, J.A.; Leopold, U.; Heuvelink, G.B.M. Multivariate autoregressive modelling and conditional simulation of precipitation time series for urban water models. Eur. Water
**2017**, 57, 299–306. [Google Scholar] - Luetkepohl, H. New Introduction to Multiple Time Series Analysis; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
- Barbosa, S.M. mAr: Multivariate AutoRegressive Analysis, R package version 1.1-2.; R Foundation for Statistical Computing: Vienna, Austria, 2012. [Google Scholar]
- Hosking, J.R.M. L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics. J. R. Stat. Soc. Ser.
**1990**, 52, 105–124. [Google Scholar] [CrossRef] - Nash, J.E.; Sutcliffe, I.V. River flow forecasting throligh conceptual models part i—A disclission of principles. J. Hydrol.
**1970**, 10, 282–290. [Google Scholar] [CrossRef] - Criss, R.E.; Winston, W.E. Do Nash values have value? Discussion and alternate proposals. Hydrol. Processes
**2008**, 22, 2723–2725. [Google Scholar] [CrossRef] - Zambrano-Bigiarini, M. R Package ‘hydroGOF’. Available online: https://cran.r-project.org/ (accessed on 8 August 2017).
- Wickham, H. Ggplot2: Elegant Graphics for Data Analysis. Available online: https://cran.r-project.org/ (accessed on 25 October 2018).
- Roudier, P. R Package ‘clhs’—Conditioned Latin Hypercube Sampling. Available online: https://cran.r-project.org/ (accessed on 10 October 2018).
- Minasny, B.; McBratney, A.B. A conditioned Latin hypercube method for sampling in the presence of ancillary information. Comput. Geosci.
**2006**, 32, 1378–1388. [Google Scholar] [CrossRef] - Toffol, S. De Sewer System Performance Assessment—An Indicators Based Methodology; Universität Innsbruck: Innsbruck, Austria, 2006. [Google Scholar]

**Figure 1.**(

**a**) Case Study area in InfoWorks

^{®}ICM 8.5 interface; (

**b**) Schematic view of combined sewer overflow (CSO) location 1.

**Figure 2.**Workflow for the multivariate autoregressive (VAR) modelling and conditional simulation (Csim), and event selection for the rainfall generator.

**Figure 3.**Theoretical fitting of the Generalized Extreme Value (GEV) probability density function for the definition of event magnitude according to return period. Precipitation depth annual maxima time series for the Esch-sur-Sûre rain gauge from 2007 to 2015.

**Figure 4.**Comparison of emulator (solid red line) vs. simulator (blue line) results for 10 sample scenarios from the validation dataset. Some random high quality predictions (

**left column**) as well as poor predictions (

**right column**) are shown. P1: initial tank volume (m

^{3}); P2: pump switch on level (m AD); 95% confidence interval for emulator prediction (CI95 = 1.96 × std) is shown in dashed red lines; std is the standard deviation.

**Figure 5.**Distribution of emulation error indicators Nash-Sutcliffe Efficiency (NSE) (

**left**) and Volumetric Efficiency (VE) (

**right**) for the validation dataset. The red square shows the mean value. Each grey dot indicates a validation run scenario.

**Figure 6.**Effect of reducing the training dataset size on emulator training time (how long it takes to train the emulator) and acceleration factor (how many times is the emulator faster than the simulator).

**Figure 7.**Effect of reducing the training dataset size on distribution of emulation error indicators NSE (

**left**) and VE (

**right**) for validation dataset (e.g., NSE075 indicates NSE distribution when 75% of the training data is used to train the emulator). Red square shows the mean value.

**Figure 8.**Leave-one-out cross-validation analysis results for time steps 7, 70, and 140 (

**a**–

**c**respectively).

**Figure 9.**Comparison between distributions of emulation error indicators NSE (

**left**) and VE (

**right**) for CSO predictions vs. total volume predictions. Red square shows the mean value.

**Figure 10.**Comparison of emulator vs. simulator results for real unseen long-term rainfall time series.

**Table 1.**Parameters of the synthetic rainfall generator calibrated. Vector μ, matrix A, and variance-covariance matrix C for definition of σ

_{R}

_{2}, σ

_{δ}

^{2}and ρ

_{Rδ}(Adapted from [35]). The presented values are case study specific.

Parameter | Value | Component | Value |
---|---|---|---|

μ_{R} | 2.8550 | C_{11} | 0.0064 |

μ_{δ} | 0.1019 | C_{33} | 0.0039 |

A_{11} | 0.9565 | C_{13} | −0.0014 |

A_{12} | 0.0398 | C_{12} | 0.0062 |

A_{21} | 0.0243 | C_{34} | 0.0034 |

A_{22} | 0.8830 | C_{23} | −0.0014 |

σ_{R}_{2} | 0.0724 | C_{14} | −0.0010 |

σ_{δ}_{2} | 0.0795 | ||

ρ_{Rδ} | −0.0388 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Mahmoodian, M.; Torres-Matallana, J.A.; Leopold, U.; Schutz, G.; Clemens, F.H.L.R.
A Data-Driven Surrogate Modelling Approach for Acceleration of Short-Term Simulations of a Dynamic Urban Drainage Simulator. *Water* **2018**, *10*, 1849.
https://doi.org/10.3390/w10121849

**AMA Style**

Mahmoodian M, Torres-Matallana JA, Leopold U, Schutz G, Clemens FHLR.
A Data-Driven Surrogate Modelling Approach for Acceleration of Short-Term Simulations of a Dynamic Urban Drainage Simulator. *Water*. 2018; 10(12):1849.
https://doi.org/10.3390/w10121849

**Chicago/Turabian Style**

Mahmoodian, Mahmood, Jairo Arturo Torres-Matallana, Ulrich Leopold, Georges Schutz, and Francois H. L. R. Clemens.
2018. "A Data-Driven Surrogate Modelling Approach for Acceleration of Short-Term Simulations of a Dynamic Urban Drainage Simulator" *Water* 10, no. 12: 1849.
https://doi.org/10.3390/w10121849