# Sediment Level Prediction of a Combined Sewer System Using Spatial Features

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

^{2}score. A gap in the field is the comparison of machine learning algorithms.

## 2. Materials and Methods

#### 2.1. Data Sets

#### 2.2. Data Modeling

#### 2.3. Predictive Methods

#### 2.3.1. Algorithm Selection Strategy

- Train the batch of models from different algorithms using standard hyperparameters.
- Compare their performance.
- Change the hyperparameters and perform different feature selection from the data model.
- Compare their performance and discard the algorithms with poor scoring.
- Optimize the rest of the models.
- Evaluate and compare the optimized models.

#### 2.3.2. Sediment Level Regression Models Based on Spatial Features

- Linear Regression
- Ridge Regression
- Lasso Regression
- Elastic net Regression
- K-nearest neighbors (KNN)
- Gradient boosting
- Artificial Neural Networks (ANN).

#### 2.3.3. Short-Term Regression Models

#### 2.3.4. Binary Classification Models

- Logistic Regression
- AdaBoost
- Random Forest
- Gradient Boosting
- Extra Trees
- Support Vector Machine
- Artificial Neural Network.

#### 2.4. Predictive Models Evaluation

#### 2.4.1. Regression Evaluation Metrics

- Coefficient of determination or R
^{2}: The proportion of variation between the predictions ($\widehat{y}$) and the real value ($y$). It measures the replicability of the model and it ranges from 0 to 1, 1 being the best case. This metric was selected over others that could also give a normalized error because of its explanation on the variability of errors. Other options like the MAPE were not selected because for its inability to take zero values [24].$${R}^{2}\left(y,\widehat{y}\right)=1-\frac{{{\displaystyle \sum}}_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}{{{\displaystyle \sum}}_{i=1}^{n}{\left({y}_{i}-\overline{y}\right)}^{2}}\phantom{\rule{0ex}{0ex}}Where\overline{y}=\frac{1}{n}{\displaystyle \sum}_{i=1}^{n}{y}_{i}$$ - Mean Absolute Error (MAE): Given a set of paired observations (a prediction paired with the real value), we calculate the arithmetic average of the absolute errors (${e}_{t}$). The objective is to minimize this value.$$\mathrm{MAE}=\frac{1}{n}{\displaystyle \sum}_{t=1}^{n}\left|{e}_{t}\right|$$
- Mean Squared Error (MSE): It follows the same strategy as the Mean Absolute Error, but instead of calculating the absolute errors, it calculates the squared errors (${e}_{t}^{2}$). The metric punishes the higher errors, and the objective is to minimize them.$$\mathrm{MSE}=\frac{1}{n}{\displaystyle \sum}_{t=1}^{n}{e}_{t}^{2}$$

#### 2.4.2. Classification Evaluation Metrics

- Accuracy: The proportion of correct predictions among the total number of observations.$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
- Recall: The fraction of the positive observations successfully predicted.$$Recall=\frac{TP}{TP+FN}$$
- Precision: The fraction of positive predictions correctly predicted.$$Precision=\frac{TP}{TP+FP}$$
- Receiver operating characteristic (ROC) curve [25]: A graphical plot to visually evaluate the prediction ability of a binary classifier by using different discrimination thresholds. The y-axis identifies the true positive rate (TPR, sensitivity, or recall) and the x-axis the false positive rate (FPR or specificity). A perfect curve goes through the top left corner, indicating a TPR of one and a FPR of zero. To finally select which model performs better, the team used the receiver operating characteristic curve (ROC curve) and the area under the curve (AUC).
- Area Under the Curve [26]: Ranges from 0 to 1. It evaluates the overall capability of the model to distinguish between positives and negatives. The higher the AUC, the better is the model at predicting the cleanings.

## 3. Results

#### 3.1. Sediment Level Regression

^{2}is too low, and their MAE and MSE are higher than the other models. The Lasso, Elastic Net, and Gradient Boosting models have similar scoring. The R

^{2}score is the same for the three of them, while the Gradient Boosting model has a better MAE, and the Lasso model has a better MSE. Finally, the Artificial Neural Network has the best scores with a 0.76 R

^{2}, 1.56 MAE, and 10.31 MSE.

#### 3.2. Sediment Level Short-Term Regression

^{2}is the ANN model, the best MAE is obtained by the KNN model, and the ANN scores the best MSE. While the KNN has a smaller MAE, the ANN has higher R

^{2}indicating less variation in the predictions and lower MSE, producing better results on observations with higher values.

#### 3.3. Cleaning Need Classification

## 4. Discussion

^{2}score produced by the ANN model is 0.61, meaning there is a large deviation between the predictions and the real observations. This second methodology used a time interpolation to create a feature indicating the sediment level 10 days before the objective variable. Although it is a linear transformation which makes the trained model not trustable, it should not harm the model score; quite the opposite, benefiting the predictions thanks to the creation of an interpolated feature that grows linearly in the face of the goal. For a further study of the short-term prediction of sediment level, short-term gathered data should be available. Furthermore, a short-term prediction of 10 days is the bare minimum to start assisting the water utility, since most of the cleaning services generally have a planned schedule, and ten days can be problematic when optimizing maintenances. A goal should be to increase the time horizon and add more days in the prediction, giving more maneuver to act.

^{2}score of 0.76 being good enough [27] to recognize it as a model to use in this predicting methodology, a 1.5 MAE and an MSE value of 10.31. The scoring and the analyzed errors from Figure 2 indicate a good result that should be further improved in the future, adding more data in the training step, and optimizing the process of feature and hyperparameter selection.

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Ashley, R.; Bertrand-Krajewski, J.-L.; Hvitved-Jacobsen, T. Sewer solids—20 years of investigation. Water Sci. Technol.
**2005**, 52, 73–84. [Google Scholar] [CrossRef] [PubMed] - Ashley, R.M.; Fraser, A.; Burrows, R.; Blanksby, J. The management of sediment in combined sewers. Urban Water
**2000**, 2, 263–275. [Google Scholar] [CrossRef] - Montserrat, A.; Bosch, L.; Kiser, M.; Poch, M.; Corominas, L. Using data from monitoring combined sewer overflows to assess, improve, and maintain combined sewer systems. Sci. Total. Environ.
**2015**, 505, 1053–1061. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Del Mundo, D.M.N.; Sutheerawattananonda, M. Influence of fat and oil type on the yield, physio-chemical properties, and microstructure of fat, oil, and grease (FOG) deposits. Water Res.
**2017**, 124, 308–319. [Google Scholar] [CrossRef] [PubMed] - Corominas, L.; Garrido-Baserba, M.; Villez, K.; Olsson, G.; Cortés, U.; Poch, M. Transforming data into knowledge for improved wastewater treatment operation: A critical review of techniques. Environ. Model. Softw.
**2018**, 106, 89–103. [Google Scholar] [CrossRef] - Eggimann, S.; Mutzner, L.; Wani, O.; Schneider, M.Y.; Spuhler, D.; De Vitry, M.M.; Beutler, P.; Maurer, M. The Potential of Knowing More: A Review of Data-Driven Urban Water Management. Environ. Sci. Technol.
**2017**, 51, 2538–2553. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Garrido-Baserba, M.; Corominas, L.; Cortés, U.; Rosso, D.; Poch, M. The Fourth-Revolution in the Water Sector Encounters the Digital Revolution. Environ. Sci. Technol.
**2020**, 54, 4698–4705. [Google Scholar] [CrossRef] [PubMed] - Blumensaat, F.; Leitão, J.P.; Ort, C.; Rieckermann, J.; Scheidegger, A.; Vanrolleghem, P.A.; Villez, K. How Urban Storm- and Wastewater Management Prepares for Emerging Opportunities and Threats: Digital Transformation, Ubiquitous Sensing, New Data Sources, and Beyond—A Horizon Scan. Environ. Sci. Technol.
**2019**, 53, 8488–8498. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Therrien, J.-D.; Nicolaï, N.; Vanrolleghem, P.A. A critical review of the data pipeline: How wastewater system operation flows from data to intelligence. Water Sci. Technol.
**2020**, 82, 2613–2634. [Google Scholar] [CrossRef] - Arthur, S.; Crow, H.; Pedezert, L. Understanding blockage formation in combined sewer networks. Proc. Inst. Civ. Eng. Water Manag.
**2008**, 161, 215–221. [Google Scholar] [CrossRef] - Laakso, T.; Kokkonen, T.; Mellin, I.; Vahala, R. Sewer Condition Prediction and Analysis of Explanatory Factors. Water
**2018**, 10, 1239. [Google Scholar] [CrossRef] [Green Version] - Mohammadi, M.M.; Najafi, M.; Tabesh, A.; Riley, J.; Gruber, J. Condition Prediction of Sanitary Sewer Pipes. Pipelines 2019
**2019**, 117–126. [Google Scholar] [CrossRef] - Savic, D.A.; Giustolisi, O.; Laucelli, D.B. Asset deterioration analysis using multi-utility data and multi-objective data mining. J. Hydroinform.
**2009**, 11, 211–224. [Google Scholar] [CrossRef] [Green Version] - Cameron, B.; McGowan, M.; Mitchell, C.; Winder, J.; Kerr, R.; Zhang, M. Predicting Sewer chokeS through Machine Learning. Water E J.
**2017**, 2, 1–13. [Google Scholar] [CrossRef] - Bailey, J.; Harris, E.; Keedwell, E.; Djordjevic, S.; Kapelan, Z. Developing Decision Tree Models to Create a Predictive Blockage Likelihood Model for Real-World Wastewater Networks. Procedia Eng.
**2016**, 154, 1209–1216. [Google Scholar] [CrossRef] [Green Version] - Chughtai, F.; Zayed, T. Infrastructure Condition Prediction Models for Sustainable Sewer Pipelines. J. Perform. Constr. Facil.
**2008**, 22, 333–341. [Google Scholar] [CrossRef] - Salman, B.; Salem, O. Modeling Failure of Wastewater Collection Lines Using Various Section-Level Regression Models. J. Infrastruct. Syst.
**2012**, 18, 146–154. [Google Scholar] [CrossRef] - Ugarelli, R.; Kristensen, S.M.; Røstum, J.; Saegrov, S.; Di Federico, V. Statistical analysis and definition of blockages-prediction formulae for the wastewater network of Oslo by evolutionary computing. Water Sci. Technol.
**2009**, 59, 1457–1470. [Google Scholar] [CrossRef] - Syachrani, S.; Jeong, H.S.D.; Chung, C.S. Decision Tree–Based Deterioration Model for Buried Wastewater Pipelines. J. Perform. Constr. Facil.
**2013**, 27, 633–645. [Google Scholar] [CrossRef] - Harvey, R.R.; McBean, E.A. Comparing the utility of decision trees and support vector machines when planning inspections of linear sewer infrastructure. J. Hydroinform.
**2014**, 16, 1265–1279. [Google Scholar] [CrossRef] [Green Version] - Mashford, J.; Marlow, D.; Tran, H.D.; May, R. Prediction of Sewer Condition Grade Using Support Vector Machines. J. Comput. Civ. Eng.
**2011**, 25, 283–290. [Google Scholar] [CrossRef] - Harvey, R.R.; McBean, E.A. Predicting the structural condition of individual sanitary sewer pipes with random forests. Can. J. Civ. Eng.
**2014**, 41, 294–303. [Google Scholar] [CrossRef] - Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput.
**1997**, 1, 67–82. [Google Scholar] [CrossRef] [Green Version] - Goodwin, P.; Lawton, R. On the asymmetry of the symmetric MAPE. Int. J. Forecast.
**1999**, 15, 405–408. [Google Scholar] [CrossRef] - Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett.
**2006**, 27, 861–874. [Google Scholar] [CrossRef] - Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology
**1982**, 143, 29–36. [Google Scholar] [CrossRef] [Green Version] - Hair, J.F.; Ringle, C.M.; Sarstedt, M. Partial Least Squares Structural Equation Modeling: Rigorous Applications, Better Results and Higher Acceptance. Long Range Plan.
**2013**, 46, 1–12. [Google Scholar] [CrossRef] - Rodríguez-Barranco, M.; Rivas-García, L.; Quiles, J.L.; Redondo-Sánchez, D.; Aranda-Ramírez, P.; Llopis-González, J.; Pérez, M.J.S.; Sánchez-González, C. The spread of SARS-CoV-2 in Spain: Hygiene habits, sociodemographic profile, mobility patterns and comorbidities. Environ. Res.
**2021**, 192, 110223. [Google Scholar] [CrossRef] [PubMed] - Aloi, A.; Alonso, B.; Benavente, J.; Cordera, R.; Echániz, E.; González, F.; Ladisa, C.; Lezama-Romanelli, R.; López-Parra, Á.; Mazzei, V.; et al. Effects of the COVID-19 Lockdown on Urban Mobility: Empirical Evidence from the City of Santander (Spain). Sustainability
**2020**, 12, 3870. [Google Scholar] [CrossRef] - Baldasano, J.M. COVID-19 lockdown effects on air quality by NO
_{2}in the cities of Barcelona and Madrid (Spain). Sci. Total. Environ.**2020**, 741, 140353. [Google Scholar] [CrossRef]

**Figure 1.**Bar plot of different identified elements in a sedimented section (x axis) and the number of appearances (y axis).

**Figure 2.**Resultant predictions from the ANN model using the test set. The line on the scatter plot represents the line of best fit, being a perfect score when the line goes from the bottom left to the top right, and the worse when it is completely flat.

**Figure 4.**Receiver operating characteristic (ROC) curve and area under the curve (AUC) on a test set (

**a**) Gradient Boosting; (

**b**) Extra Trees.

Feature Group | Feature | Mean | Minimum | Maximum | Type |
---|---|---|---|---|---|

Pipe properties | Cross section measure | 88 dm^{2} | 1.7 dm^{2} | 806 dm^{2} | Numerical |

Height | 1 m | 0.15 m | 2.5 m | Numerical | |

Width | 0.91 m | 0.15 m | 4.5 m | Numerical | |

Channel bed width | 0.76 m | 0 m ^{1} | 3.65 m | Numerical | |

Channel bed depth | 0.05 m | 0 m ^{1} | 1.25 m | Numerical | |

Length | 17 m | 0.8 m | 72.6 m | Numerical | |

Material | - | - | - | Categorical | |

Wastewater properties | Mean velocity | 0.055 m/s | 0.005 m/s | 0.32 m/s | Numerical |

Mean flow | 0.003 m^{3}/s | 0.001 m^{3}/s | 0.033 m^{3}/s | Numerical | |

Maintenance | Sediment level | 4.11 cm | 0 cm | 60 cm | Numerical |

Maintenance date | - | - | - | Date | |

Cleaning applied | - | - | - | Boolean |

^{1}If there is no channel bed, the depth is 0.

Feature | Description | Type |
---|---|---|

Height | Height of the section | Float |

Width | Width of the section | Float |

Channel bed width | Width of the channel bed | Float |

Channel bed depth | Depth of the channel bed | Float |

Mean Velocity | Mean velocity of the residual water during the dry season | Float |

Mean flow | Mean flow of the residual water during the dry season | Float |

Material | The material of the section walls | Categorical |

Sediment level lags 0 to 3 | Sedimentation level in 4 different timestamps | Integer |

Days between maintenances | Days between the sedimentation level gathering | Integer |

Cleaning applied | Indicates if cleaning was applied during the maintenance session | Boolean |

Pipe properties of nearer sections ^{1} | The size features (perimeter, channel bed width, width, height) of the 4 nearer sections. | Float |

Mean velocity of nearer sections ^{1} | Mean velocity of the residual water during the dry season on each of the near sections. | Float |

Sediment level of nearer sections ^{1} | Sediment level in 4 different timestamps, for each of the sections | Float |

Days between maintenances of nearer sections ^{1} | Days between sediment gathering, for each of the sections | Integer |

Cleaning applied in nearer sections ^{1} | Indicates if cleaning was applied during the maintenance session, for each of the sections | Boolean |

^{1}A feature for each near section.

Predicted Class | Negative | Positive | |
---|---|---|---|

Actual Class | |||

Negative | True Negative (TN) | False Positive (FP) | |

Positive | False Negative (FN) | True Positive (TP) |

Algorithm |
R^{2} Score
| MAE | MSE |
---|---|---|---|

Linear Regression | 0.68 | 1.83 | 14.23 |

Ridge | 0.68 | 1.83 | 14.23 |

Lasso | 0.72 | 1.71 | 12.58 |

Elastic Net | 0.72 | 1.71 | 12.7 |

KNN | 0.69 | 1.72 | 13.87 |

Gradient Boosting | 0.72 | 1.66 | 12.63 |

ANN | 0.76 | 1.56 | 10.31 |

ANN | Hidden Layers | Activation Function | Solver | R^{2} Score | MAE | MSE |
---|---|---|---|---|---|---|

Case 1 | 30-30-30 | Relu | Adam | 0.76 | 1.56 | 10.31 |

Case 2 | 30-50-30 | Relu | Adam | 0.75 | 1.6 | 11.12 |

Case 3 | 50-30 | Relu | Adam | 0.74 | 1.6 | 11.37 |

Algorithm | R^{2} Score | MAE | MSE |
---|---|---|---|

Linear Regression | 0.48 | 2.27 | 18.85 |

Ridge | 0.48 | 2.27 | 18.82 |

Lasso | 0.55 | 2.25 | 16.29 |

Elastic Net | 0.53 | 2.25 | 16.93 |

KNN | 0.53 | 2.04 | 16.83 |

Gradient Boosting | 0.46 | 2.34 | 19.47 |

ANN | 0.61 | 2.07 | 13.89 |

ANN | Hidden Layers | Activation Function | Solver | R^{2} score | MAE | MSE |
---|---|---|---|---|---|---|

Case 1 | 30-30-30 | Relu | Adam | 0.60 | 2.10 | 14.39 |

Case 2 | 30-50-30 | Relu | Adam | 0.61 | 2.07 | 13.89 |

Case 3 | 50-30 | Relu | Adam | 0.59 | 2.18 | 14.90 |

Algorithm | Accuracy | Recall | Precision |
---|---|---|---|

Logistic Regression | 0.78 | 0.1 | 0.39 |

Adaboost | 0.84 | 0.5 | 0.66 |

Random Forest | 0.87 | 0.46 | 0.85 |

Gradient Boosting | 0.88 | 0.53 | 0.83 |

Extra Trees | 0.88 | 0.43 | 1 |

Linear Support Vector Machine | 0.75 | 0.23 | 0.35 |

ANN | 0.80 | 0.51 | 0.53 |

Model | Number Estimators | Max Depth | Min Samples Leaf | Min Samples Split |
---|---|---|---|---|

Extra Trees | 100 | - | 18 | 13 |

Gradient Boosting | 100 | 3 | 1 | 2 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ribalta, M.; Mateu, C.; Bejar, R.; Rubión, E.; Echeverria, L.; Varela Alegre, F.J.; Corominas, L.
Sediment Level Prediction of a Combined Sewer System Using Spatial Features. *Sustainability* **2021**, *13*, 4013.
https://doi.org/10.3390/su13074013

**AMA Style**

Ribalta M, Mateu C, Bejar R, Rubión E, Echeverria L, Varela Alegre FJ, Corominas L.
Sediment Level Prediction of a Combined Sewer System Using Spatial Features. *Sustainability*. 2021; 13(7):4013.
https://doi.org/10.3390/su13074013

**Chicago/Turabian Style**

Ribalta, Marc, Carles Mateu, Ramon Bejar, Edgar Rubión, Lluís Echeverria, Francisco Javier Varela Alegre, and Lluís Corominas.
2021. "Sediment Level Prediction of a Combined Sewer System Using Spatial Features" *Sustainability* 13, no. 7: 4013.
https://doi.org/10.3390/su13074013