# Predicting Influent and Effluent Quality Parameters for a UASB-Based Wastewater Treatment Plant in Asia Covering Data Variations during COVID-19: A Machine Learning Approach

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### 1.1. Utilization of Water in Various Sectors

#### 1.2. Categorization of Wastewater

- Human excreta (faeces and urine), which is frequently combined with old toilet paper or wipes, can be the source of wastewater. If this waste is collected by flushing toilets, it is referred to as “blackwater” [4].
- Washing water (for one’s own clothing, dishes, floors, and other items), commonly referred to as greywater or sullage.
- Excess domestically produced liquids (drinks, cooking leftovers, insecticides, lubrication oil, paint, cleaning agents, etc.).
- Urban rainwater runoff from roads, parking lots, roofs, walkways and pavements (contains lubricants, animal droppings, garbage, gasoline or diesel, rubber remnants from tyres, soap scum, metals from vehicle exhausts, etc.).
- Highway runoff, including lubricants, anti-icing chemicals and rubber remnants, notably from tyres, and storm sewers (trash included) [5].
- Liquids made by humans (pesticides dumped illegally, used oils, etc.).
- Agriculture discharge (pesticides and other chemicals get mixed with the water).
- Carbon discharge from the coal and oil industry and their byproducts.
- Industrial plant discharge (loam, sand, alkali and chemical byproducts) and industrial waste, etc.) [6].

#### 1.3. Bharwara Wastewater Treatment Plant

## 2. Related Works

## 3. Methodology: wPred

#### 3.1. Data Collection

#### 3.2. Data Preprocessing

#### 3.3. Model Designing

#### 3.3.1. Model for Influent Parameter Prediction

#### ARIMA

#### SARIMA

#### MAPE

#### sMAPE

#### 3.3.2. Model for Effluent Parameter Predictions

**k-nearest neighbour**[13] regression technique uses the shortest distance between nearest neighbours to forecast the effluent, using influents as the predicting factors. Here, the ideal nearest neighbour was found to be 14 as listed in Table 5. The

**gradient boosting regression**[14] technique uses an ensemble of multiple separate decision trees, with the output from one layer serving as the input to the next to forecast the effluent using influents as the predicting variables. A depth of 3 and 100 estimators was used as listed in Table 5. The

**random forest regression**[15] method employs an ensemble of multiple separate decision trees to predict effluent concurrently, while using influents as the predicting variables. The implementation of the model includes 100 estimators, and decision tree regressor is used as base estimator as listed in Table 5.

**artificial neural network**(ANN) models due to their great adequacy, efficiency, and fairly promising applications in engineering. They can be used to improve process performance prediction [5,8,9]. Typically, an ANN makes use of process-relevant historical data. An information processing system, ANN is inspired by organic nerve systems. A neural network’s goal is to generate output values from input values using complex internal computations [16]. Pattern recognition, identification, classification, speech, vision, and automation are just a few of the complicated tasks that neural networks are trained to carry out [36]. Figure 6 describes the layers and the parameters used in the construction of the ANN-based prediction model for the effluent quality parameters.

**Inputs to the model:**BOD, pH, COD, TSS and MLD at the inlet.**Model outputs include:**Each parameter’s BOD, pH, COD, TSS, DO and MPN, one-by-one considering all input parameters listed above.- Dataset split into 70:30 ratio for training and testing.
- Mean square error is an estimator function.

## 4. Results and Evaluation

#### 4.1. Implementation Details

#### 4.2. Results of Influent Parameter Prediction

#### 4.3. Results of Effluent Parameter Prediction

**3e-3+5**. The proposed model performed efficiently, when neural networks with different hidden layers were used. The correlation coefficient in the testing set rose as high as 0.99. After comparing the efficiency of the abovementioned models, we concluded that our proposed

**ANN**model, which predicts more than 50% for each of the effluent correctly, is best for our use case.

## 5. Conclusions and Future Works

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Majumder, S.; Poornesh, M.B.; Reethupoonar, R.M. A review on working, treatment, and performance evaluation of sewage treatment plant. Int. Eng. Res. Appl.
**2019**, 9, 1–49. [Google Scholar] - Krewski, D.; Yokel, R.A.; Nieboer, E.; Borchelt, D.; Cohen, J.; Harry, J.; Kacew, S.; Lindsay, J.; Mahfouz, A.M.; Rondeau, V. Human health risk assessment for aluminium, aluminium oxide, and aluminium hydroxide. J. Toxicol. Environ. Health Part B
**2007**, 10, 1–269. [Google Scholar] - Asiwal, R.S.; Sar, S.K.; Singh, S.; Sahu, M. Wastewater treatment by effluent treatment plants. SSRG Int. J. Civil Eng.
**2016**, 3, 12. [Google Scholar] - Newhart, K.B.; Holloway, R.W.; Hering, A.S.; Cath, T.Y. Data-driven performance analyses of wastewater treatment plants: A review. Water Res.
**2019**, 157, 498–513. [Google Scholar] [CrossRef] [PubMed] - Maier, H.R.; Dandy, G.C. Neural networks for the prediction and forecasting of water resources variables: A review of modelling issues and applications. Environ. Model. Softw.
**2000**, 15, 101–124. [Google Scholar] [CrossRef] - Gernaey, K.V.; Van Loosdrecht, M.C.; Henze, M.; Lind, M.; Jørgensen, S.B. Activated sludge wastewater treatment plant modelling and simulation: State of the art. Environ. Model. Softw.
**2004**, 19, 763–783. [Google Scholar] [CrossRef] - Vesilind, P. Wastewater Treatment Plant Design; IWA Publishing: London, UK, 2003; Volume 2. [Google Scholar]
- ASCE Task Committee on Application of Artificial Neural Networks in Hydrology. Artificial neural networks in hydrology. II: Hydrologic applications. J. Hydrol. Eng.
**2000**, 5, 124–137. [Google Scholar] [CrossRef] - Neelakantan, T.R.; Brion, G.M.; Lingireddy, S. Neural network modelling of Cryptosporidium and Giardia concentrations in the Delaware River, USA. Water Sci. Technol.
**2001**, 43, 125–132. [Google Scholar] [CrossRef] [PubMed] - Yadav, P.; Chaudhary, A.; Keshari, A.; Chaudhary, N.K.; Sharma, P.; Kumar, S.; Yadav, B.S. Data Visualization of Influent and Effluent Parameters of UASB-based Wastewater Treatment Plant in Uttar Pradesh. Int. J. Adv. Comput. Sci. Appl.
**2022**, 13, 1–10. [Google Scholar] [CrossRef] - Gilbert, K. An ARIMA supply chain model. Manag. Sci.
**2005**, 51, 305–310. [Google Scholar] [CrossRef] - Nobre, F.F.; Monteiro, A.B.S.; Telles, P.R.; Williamson, G.D. Dynamic linear model and SARIMA: A comparison of their forecasting performance in epidemiology. Stat. Med.
**2001**, 20, 3051–3069. [Google Scholar] [CrossRef] [PubMed] - Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. November. KNN model-based approach in classification. In OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”; Springer: Berlin/Heidelberg, Germany, 2003; pp. 986–996. [Google Scholar]
- Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot.
**2013**, 7, 21. [Google Scholar] [CrossRef] [PubMed] - Biau, G.; Scornet, E. A random forest guided tour. Test
**2016**, 25, 197–227. [Google Scholar] [CrossRef] - Matala, A. Sample Size Requirement for Monte Carlo Simulations Using Latin Hypercube Sampling. Ph.D. Thesis, Helsinki University of Technology, Department of Engineering Physics and Mathematics, Systems Analysis Laboratory, Helsinki, Finland, 2008; p. 25. [Google Scholar]
- Tumer, A.E.; Edebali, S. An artificial neural network model for wastewater treatment plant of Konya. Int. J. Intell. Syst. Appl. Eng.
**2015**, 3, 131–135. [Google Scholar] [CrossRef] - Guo, H.; Jeong, K.; Lim, J.; Jo, J.; Kim, Y.M.; Park, J.P.; Kim, J.H.; Cho, K.H. Prediction of effluent concentration in a wastewater treatment plant using machine learning models. J. Environ. Sci.
**2015**, 32, 90–101. [Google Scholar] [CrossRef] - McCuen, R.H.; Knight, Z.; Cutter, A.G. Evaluation of the Nash–Sutcliffe efficiency index. J. Hydrol. Eng.
**2006**, 11, 597–602. [Google Scholar] [CrossRef] - Chen, R.B.; Hsieh, D.N.; Hung, Y.; Wang, W. Optimizing Latin hypercube designs by particle swarm. Stat. Comput.
**2013**, 23, 663–676. [Google Scholar] [CrossRef] - Qin, X.; Gao, F.; Chen, G. Wastewater quality monitoring system using sensor fusion and machine learning techniques. Water Res.
**2012**, 46, 1133–1144. [Google Scholar] [CrossRef] - Wang, R.; Pan, Z.; Chen, Y.; Tan, Z.; Zhang, J. Influent Quality and Quantity Prediction in Wastewater Treatment Plant: Model Construction and Evaluation. Pol. J. Environ. Stud.
**2021**, 30, 4267–4276. [Google Scholar] [CrossRef] - Manu, D.S.; Thalla, A.K. Artificial intelligence models for predicting the performance of biological wastewater treatment plant in the removal of Kjeldahl Nitrogen from wastewater. Appl. Water Sci.
**2017**, 7, 3783–3791. [Google Scholar] [CrossRef] - Weisberg, S. Applied Linear Regression; John Wiley & Sons: Hoboken, NJ, USA, 2005; Volume 528. [Google Scholar]
- McDonald, G.C. Ridge regression. Wiley Interdiscip. Rev. Comput. Stat.
**2009**, 1, 93–100. [Google Scholar] [CrossRef] - Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. A
**2005**, 67, 301–320. [Google Scholar] [CrossRef] - Gautam, S.K.; Sharma, D.; Tripathi, J.K.; Ahirwar, S.; Singh, S.K. A study of the effectiveness of sewage treatment plants in Delhi region. Appl. Water Sci.
**2013**, 3, 57–65. [Google Scholar] [CrossRef] - Schuldt, C.; Laptev, I.; Caputo, B. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, ICPR, Washington, DC, USA, 23–26 August 2004; IEEE: New York, NY, USA, 2004; Volume 3, pp. 32–36. [Google Scholar]
- Fragiadakis, N.G.; Tsoukalas, V.D.; Papazoglou, V.J. An adaptive neuro-fuzzy inference system (anfis) model for assessing occupational risk in the shipbuilding industry. Saf. Sci.
**2014**, 63, 226–235. [Google Scholar] [CrossRef] - Alnaa, S.E.; Ahiakpor, F. ARIMA (autoregressive integrated moving average) approach to predicting inflation in Ghana. J. Econ. Int. Financ.
**2011**, 3, 328–336. [Google Scholar] - Wise, J. The autocorrelation function and the spectral density function. Biometrika
**1955**, 42, 151–159. [Google Scholar] [CrossRef] - Ramsey, F.L. Characterization of the partial autocorrelation function. In The Annals of Statistics; Institute of Mathematical Statistics: Beachwood, OH, USA, 1974; pp. 1296–1301. [Google Scholar]
- Cheung, Y.W.; Lai, K.S. Lag order and critical values of the augmented Dickey–Fuller test. J. Bus. Econ. Stat.
**1995**, 13, 277–280. [Google Scholar] - Piccolo, D. A distance measure for classifying ARIMA models. J. Time Ser. Anal.
**1990**, 11, 153–164. [Google Scholar] [CrossRef] - Valipour, M. Long-term runoff study using SARIMA and ARIMA models in the United States. Meteorol. Appl.
**2015**, 22, 592–598. [Google Scholar] [CrossRef] - Yegnanarayana, B. Artificial Neural Networks; PHI Learning Pvt. Ltd.: New Delhi, India, 2004. [Google Scholar]

**Figure 9.**Rolling mean and standard deviation: (

**a**) total flow; (

**b**) first difference; (

**c**) seasonal first difference.

**Table 1.**Locations and measuring parameters [10].

Location | Parameters |
---|---|

Inlet Chamber | pH, BOD, Temperature, TSS, flow, COD, Phosphorous, oil and DO |

Outlet of UASB Reactor | BOD, pH, Suspended Solids, COD |

Polishing Pond | Dissolved Oxygen, pH |

Outlet of Chlorine Contact Tank | BOD, pH, Suspended solids, COD, Residual Chlorine, Fecal Coliform, Dissolved Oxygen. |

Primary Sludge | pH, Volatile solids, Total Solids. |

**Table 2.**Influent and effluent quality parameter range [10].

Data Parameters | Units | Range (Influent) | Range (Effluent) |
---|---|---|---|

pH | No. | 6–8 | 7–9 |

DO | mg/L | 0 | $>4$ |

TSS | mg/L | 300–600 | $<50$ |

COD | mg/L | 200–500 | $<100$ |

BOD | mg/L | 150–250 | $<30$ |

MPN | No./100 mL | 106–109 | 106–109 |

Flow Rate | Millions of Litre per Day | 250–400 |

Day | IN_PH | IN_DO | IN_TSS | IN_COD | IN_BOD | Total_MLD | |
---|---|---|---|---|---|---|---|

mean | 568 | 7 | 2 | 160 | 259 | 186 | 330 |

std | 328 | 2 | 3 | 113 | 56 | 65 | 39 |

50% | 566 | 7 | 0 | 214 | 251 | 160 | 347 |

Day | OUT_PH | OUT_DO | OUT_TSS | OUT_COD | OUT_BOD | |
---|---|---|---|---|---|---|

mean | 568 | 7 | 6 | 30 | 64 | 40 |

std | 328 | 3 | 2 | 18 | 17 | 21 |

50% | 566 | 7 | 5 | 40 | 68 | 27 |

ML Algorithm | Feature Description |
---|---|

kNN | 14 neighbours |

leaf size: 30 | |

Algorithm to compute neighbours: KDTree | |

Gradient Boosting Regression | Max Depth: 3 |

100 estimators | |

Loss Function: Squared Error | |

Random Forest Regression | 100 estimators |

base estimator: Decision Tree Regressor | |

Split criterion: Squared Error | |

Artificial Neural Network | 1000 epochs |

Xavier Initialization Weights | |

ReLU activation function | |

sigmoid activation function |

Language | Python (version 3.11.0) |

Tool | Google Colaboratory |

Libraries | Pandas, NumPy, Scikit Learn, Matplotlib, Seaborn and SciPy |

Metrics | Model | ||
---|---|---|---|

ARIMA | SARIMA | Seasonal Ordered SARIMA | |

MAPE | 2.72 | 2.72 | 2.67 |

sMAPE | 2.64 | 2.64 | 2.59 |

KNN | Gradient Boosting | Random Forest | ANN | |
---|---|---|---|---|

OUT_PH | 70.45 | 71.23 | 71 | 74.55 |

OUT_BOD | 8.40 | 9.29 | 12.83 | 56.12 |

OUT_COD | 4.86 | 3.09 | 9.29 | 60.88 |

OUT_DO | 15.48 | 14.60 | 11.50 | 51.11 |

OUT_TSS | 6.63 | 8.40 | 9.39 | 65.41 |

OUT_MPN | 4.42 | 3.53 | 4.86 | 52.65 |

OUT_PH | OUT_BOD | OUT_COD | OUT_DO | OUT_TSS | OUT_MPN | |
---|---|---|---|---|---|---|

R | 0.89 | 0.74 | 0.827 | 0.99 | 0.92 | 0.89 |

MSE | 0.06 | 0.014 | 0.02 | 0.023 | 0.038 | 0.069 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yadav, P.; Chandra, M.; Fatima, N.; Sarwar, S.; Chaudhary, A.; Saurabh, K.; Yadav, B.S.
Predicting Influent and Effluent Quality Parameters for a UASB-Based Wastewater Treatment Plant in Asia Covering Data Variations during COVID-19: A Machine Learning Approach. *Water* **2023**, *15*, 710.
https://doi.org/10.3390/w15040710

**AMA Style**

Yadav P, Chandra M, Fatima N, Sarwar S, Chaudhary A, Saurabh K, Yadav BS.
Predicting Influent and Effluent Quality Parameters for a UASB-Based Wastewater Treatment Plant in Asia Covering Data Variations during COVID-19: A Machine Learning Approach. *Water*. 2023; 15(4):710.
https://doi.org/10.3390/w15040710

**Chicago/Turabian Style**

Yadav, Parul, Manik Chandra, Nishat Fatima, Saqib Sarwar, Aditya Chaudhary, Kumar Saurabh, and Brijesh Singh Yadav.
2023. "Predicting Influent and Effluent Quality Parameters for a UASB-Based Wastewater Treatment Plant in Asia Covering Data Variations during COVID-19: A Machine Learning Approach" *Water* 15, no. 4: 710.
https://doi.org/10.3390/w15040710