Machine Learning Approach for Rapid Estimation of Five-Day Biochemical Oxygen Demand in Wastewater

: Improperly managed wastewater efﬂuent poses environmental and public health risks. BOD evaluation is complicated by wastewater treatment. Using key parameters to estimate BOD in wastewater can improve wastewater management and environmental monitoring. This study proposes a BOD determination method based on the Artiﬁcial Neural Networks (ANN) model to combine Chemical Oxygen Demand (COD), Suspended Solids (SS), Total Nitrogen (T-N), Ammonia Nitrogen (NH 4 -N), and Total Phosphorous (T-P) concentrations in wastewater. Twelve different transfer functions are investigated, including the common Hyperbolic Tangent Sigmoid (HTS), Log-sigmoid (LS), and Linear (Li) functions. This research evaluated 576,000 ANN models while considering the variable random number generator due to the ten alternative ANN conﬁguration parameters. This study proposes a new approach to assessing water resources and wastewater facility performance. It also demonstrates ANN’s environmental and educational applications. Based on their RMSE index over the testing datasets and their conﬁguration parameters, twenty ANN architectures are ranked. A BOD prediction equation written in Excel makes testing and applying in real-world applications easier. The developed and proposed ANN-LM 5-8-1 model depicting almost ideal performance metrics proved to be a reliable and helpful tool for scientists, researchers, engineers, and practitioners in water system


Introduction
The U.K. Royal Commission on River Pollution suggested the biological measurement Biochemical Oxygen Demand (BOD) in 1908 to demonstrate the organic pollution of rivers [1]. BOD is defined as the quantity of oxygen taken up by the respiratory activity of microorganisms growing on organic substances present in the sample (e.g., sludge or water) while incubated at a specific temperature (typically 20 • C) for a fixed period (usually 5 days, BOD 5 ) [1] (Table 1). It is a measurement of the organic contaminants in water that can be broken down by biological processes [1]. The main downside of this measurement is the time (5 days) required to complete it [2].
Automated control solutions for wastewater treatment facilities and environmental monitoring applications need a reliable and accurate measurement of BOD in influent and effluent samples. Standard dilution is the classic technique for determining BOD [1]. This approach has been used to identify contaminants in most water bodies and assess BOD levels with reasonable precision. BOD monitoring necessitates the use of specialized equipment and procedures, which considerably increases the difficulty and expense of Modified reference method 5 days 0-6 McDonagh et al. [19]; McEvoy et al. [20]; Xiong et al. [21]; Xu et al. [22] Photometric method 5 days 0-6 Jouanneau et al. [1] Manometric method 5 days 0-700 Jouanneau et al. [1] BOD prediction Biosensor based on bioluminescent bacteria 72 min 0-200 Sakaguchi et al. [23,24] Microbial fuel cells 315 min 0-200 Jouanneau et al. [1]; Kim et al. [25] Biosensor with entrapped bacteria 10 min 0-500 Karube [26]; Liu et al. [27]  pH, TS, T-Alk, T-Hard, Cl, PO 4 3− , K, Na, NH 4  The manuscript is organized into several sections. Section 2 presents the significance of this research. Section 3 presents material and Methods used for the development of the mathematical forecasting model for the BOD 5 . Section 4 provides the presented results on the development of a closed-form equation for the estimation of BOD 5 in wastewater and the mapping of BOD 5, revealing its strongly nonlinear nature. In Section 5 the limitations of the proposed model are presented, followed by concluding remarks in Section 6.

Research Significance
The efficient operation and management of wastewater treatment plants (WWTPs) are gaining more consideration as environmental concerns receive increasing attention. The discharge of a WWTP's effluent into a receiving water body may cause or spread a variety of human health problems if it is improperly managed, hence posing significant environmental and public health risks. Better management of a WWTP may be attained by developing a robust mathematical method for estimating the BOD content in wastewater on a dataset of a minimum number of key parameters. Nevertheless, evaluating BOD content in wastewater is challenging owing to the intricacy of the treatment processes. The complex biological, chemical, and physical systems involved in the wastewater treatment process display nonlinear tendencies that are challenging to explain using linear mathematical models.
We rely on the advantages of the ANN model to explore the performance of the ANN model in determining BOD values in wastewater. Consequently, this study is a novel attempt at proposing a BOD determination method based on the ANN model to combine the COD, SS, T-N, NH 4 -N and T-P concentrations in wastewater. This study may provide a new idea for monitoring water resources and the performance of the wastewater treatment plant.

Artificial Neural Networks
Artificial neural networks owe their name to the biological neural networks that they mimic significantly in structure and the basic principles that govern them. They are Water 2023, 15, 103 4 of 26 mathematical simulants that, after suitable training, use capable and reliable databases that contain the existing knowledge for a particular problem. They aim to discover and expose the fundamental laws that govern the studied problem each time. They were first introduced by McCulloch and Pitts [33]. However, they were applied extensively from the decade 1990, mainly in medicine for the prediction of the disease of a patient according to a series of physicochemical parameters like age and hematological indices. Although that first application refers to medicine very quickly from the decade of the 1990 s, the method of artificial neural networks was applied widely to the totality of sciences with a significant place to mechanical problems where the up to then classical deterministic mathematical methods were incapable of giving answers to multidimensional and complex problems with incredibly intense nonlinear behavior [34][35][36][37][38][39][40].
A classical feedforward ANN contains layers of nodes or neurons which have weighted connections with the nodes of the previous and preceding layers. Starting with the input layers in which each node presented for input, information or signals are propagated the subsequent layers until the information reaches the output layer. The output layer provided the predicted variable(s) and is compared with the labels of corresponding samples in the database. Figure 1 provides a typical example of the structure of ANN composed of one hidden layer.  [33]. However, they were applied extensively from the decade 1990, mainly in medicine for the prediction of the disease of a patient according to a series of physicochemical parameters like age and hematological indices. Although that first application refers to medicine very quickly from the decade of the 1990′s, the method of artificial neural networks was applied widely to the totality of sciences with a significant place to mechanical problems where the up to then classical deterministic mathematical methods were incapable of giving answers to multidimensional and complex problems with incredibly intense nonlinear behavior [34][35][36][37][38][39][40].
A classical feedforward ANN contains layers of nodes or neurons which have weighted connections with the nodes of the previous and preceding layers. Starting with the input layers in which each node presented for input, information or signals are propagated the subsequent layers until the information reaches the output layer. The output layer provided the predicted variable(s) and is compared with the labels of corresponding samples in the database. Figure 1 provides a typical example of the structure of ANN composed of one hidden layer. The error of the predicted outputs and the labels is used to update the weights in the network. The weight adjustment is so-called the backpropagation and conducted each time a set of b samples (i.e., batch size) is "consumed". This weight updating process is repeated until all samples of the train set are ingested, and an epoch is complete. The maximum number of epochs is set as the stop condition for the training process. In the end, a trained ANN model contains a set of optimized weights which provides the least error on the train set.
In work presented herein, the salient goal is developing a reliable and robust ANN model and deriving its closed-form equation to predict the 5-day biochemical oxygen demand (BOD5). Specifically, for the estimation of BOD5 in wastewater concerning COD, SS, TN, NH4-N and TP, a plethora of different ANN architectures will be trained and developed. To this end, a detailed and in-depth investigation of the crucial parameters affecting the performance of ANN models, such as the number of neurons per hidden layers, activation functions, data normalization techniques and cost functions, has been conducted, and it is presented in the following sections.

Experimental Database
The primary target during a mathematical simulant's training and development phase to predict the value of a parameter depending on several other parameters is the degree to which the proposed mathematical simulant is reliable and stable/robust. To this direction, the majority of researchers give particular attention and diligence to the computational techniques and methods that shall employ for its development, while at the same time, they do not exercise the same attention and diligence concerning the database The error of the predicted outputs and the labels is used to update the weights in the network. The weight adjustment is so-called the backpropagation and conducted each time a set of b samples (i.e., batch size) is "consumed". This weight updating process is repeated until all samples of the train set are ingested, and an epoch is complete. The maximum number of epochs is set as the stop condition for the training process. In the end, a trained ANN model contains a set of optimized weights which provides the least error on the train set.
In work presented herein, the salient goal is developing a reliable and robust ANN model and deriving its closed-form equation to predict the 5-day biochemical oxygen demand (BOD 5 ). Specifically, for the estimation of BOD 5 in wastewater concerning COD, SS, TN, NH 4 -N and TP, a plethora of different ANN architectures will be trained and developed. To this end, a detailed and in-depth investigation of the crucial parameters affecting the performance of ANN models, such as the number of neurons per hidden layers, activation functions, data normalization techniques and cost functions, has been conducted, and it is presented in the following sections.

Experimental Database
The primary target during a mathematical simulant's training and development phase to predict the value of a parameter depending on several other parameters is the degree to which the proposed mathematical simulant is reliable and stable/robust. To this direction, the majority of researchers give particular attention and diligence to the computational techniques and methods that shall employ for its development, while at the same time, they do not exercise the same attention and diligence concerning the database that shall be used for the training of the ANN. The authors of the present study consider that a mathematical simulant's reliability depends primarily on the reliability and effectiveness of the database that shall be used. We do not mean a database with a large amount of data with terms reliable and effective. Reliable and effective is considered a database with its data considered to be 'true' and covering statistically all the range of values capable of taking each of the parameters that infringe on each particular case studied problem.
The above has an even greater value when the database is comprised of experimental and not analytical results. In the case where the database is comprised of experimental data, its reliability is affected by a multitude of factors as follows: (a) the strict adherence to the international standards intended for the preparation of the specimens/samples and the conduct of the experimental tests-laboratory measurements; (b) the observance of the number of specimens with the same characteristics that must be checked; (c) the reliability of the experimental layout that was used; (d) the experience and specialization of the personnel that conducted the previous tests; and (e) the environmental conditions in which the aforementioned specimens were maintained as well the environmental conditions of the surroundings where the previous tests were conducted.
The observance requirement of the above rules is considered particularly imperative when the experimental database comprises experimental data produced from diverse laboratories and research groups.
According to the above principles for the training and development of a multitude of artificial neural networks and the selection among them of the best for the estimation of BOD 5 in wastewater, an experimental database was created, comprised of 387 datasets that correspond to 387 laboratory measurements that were conducted at the entrance of the sewage treatment plant located at Komotini region, Northern Greece. The samples were collected on a monthly basis from 2014-2021. Standard analytical methods were used to determine all parameters. All analytical methods were described in detail for water and wastewater experiments [41]. For each wastewater sample six water quality variables were laboratory measured. Precisely, for each sample were estimated the COD, SS, TN, NH 4 -N, TP and BOD 5 concentrations. The measured values of the first five variables were used as input parameters, while the value of the sixth variable (BOD 5 ) as the output parameter during the training and development process of ANN models. The database is presented in Table S1 of Supplementary Materials. Table 3 presents for each parameter the minimum, average, and maximum value as well as the standard deviation (STD) and the coefficient of variation (CV). In Table 4 and Figure 2 the importance of the Pearson correlation factors between the six parameters are presented.   These values are beneficial since they indicate if there is a strong dependence of one parameter on the other. Additionally, the values of the last row of the table are indicative at the first level if there is dependence between each of the five input parameters with the output variable. It is observed that in rank, the parameters with the most significant correlation with the biochemical oxygen demand are the COD, the TN and the TP, with Pearson correlations factors 0.78, 0.74 and 0.60, respectively. In the following subsection, they will be presented thoroughly and in-depth the sensitivity analysis results of the output parameter of BOD5 in relation with each one of the five input parameters using the Cosine Amplitude Method (CAM) [42] and the experimental database. Researchers have widely adopted CAM method to determine the effect of each input on the output [43][44][45][46].
In Figure 3 the histograms are presented for each of the six variables, and graphs showing the correlation between each of the input parameters with the BOD5. These values are beneficial since they indicate if there is a strong dependence of one parameter on the other. Additionally, the values of the last row of the table are indicative at the first level if there is dependence between each of the five input parameters with the output variable. It is observed that in rank, the parameters with the most significant correlation with the biochemical oxygen demand are the COD, the TN and the TP, with Pearson correlations factors 0.78, 0.74 and 0.60, respectively. In the following subsection, they will be presented thoroughly and in-depth the sensitivity analysis results of the output parameter of BOD 5 in relation with each one of the five input parameters using the Cosine Amplitude Method (CAM) [42] and the experimental database. Researchers have widely adopted CAM method to determine the effect of each input on the output [43][44][45][46].
In Figure 3 the histograms are presented for each of the six variables, and graphs showing the correlation between each of the input parameters with the BOD 5 .
These graphs are particularly useful since they depict for each parameter the range of values it takes and their distribution. They are beneficial since, for the ranges where we have sufficient data, the reliability of the model to be proposed shall be exceptionally high. These ranges correspond to the values of parameters for which the number of samples is greater than 5% of the dataset and are defined by the blue horizontal line in each graph. For the areas that are under the blue line, it is required to be done enrichment of the database with more data. Water 2023, 15, x FOR PEER REVIEW 7 of 27   These graphs are particularly useful since they depict for each parameter the range of values it takes and their distribution. They are beneficial since, for the ranges where we have sufficient data, the reliability of the model to be proposed shall be exceptionally high. These ranges correspond to the values of parameters for which the number of samples is greater than 5% of the dataset and are defined by the blue horizontal line in each graph.

Sensitivity Analysis of the BOD 5 on the Input Parameters Based on the Experimental Database
During the training and development of the computational models for the prediction of the value of a parameter (output parameter) as a function of several other parameters (input parameters) that enter each case problem, it is interesting to explore the sensitivity that is exhibited by the output parameter in terms of the input parameters. This is particularly useful for it enables us to exclude a series of parameters that do not affect the estimated parameter but, at the exact moment, guides to exhibit intense attention to the parameters that significantly affect the output parameter value. The exclusion of several parameters from the current studied problem has decreased the computing time and helped to discover the problem's nature and governing laws.
A first estimation of the dependence between the parameters and much more for the input parameters with the output parameter is given by the value of the Pearson correlation factor. Additionally, because of the subject's great importance, they have proposed a multitude of sensitivity analysis methods with a target the as much possible estimation of the sensitivity and dependence of the output parameter from the input parameters. Between them, there is the cosine amplitude method (CAM), which has been proposed by Jong and Lee [42] and has been accepted by a multitude of researchers [35,[47][48][49][50][51].
The cosine amplitude method (CAM) was used to construct a data array, X, as follows: where variable x i in array, X is a length vector of m expressed by: The relationship between R ij (strength of the relation) and datasets of x i and x j defined by: The result of the sensitivity analysis, based on the datasets of the experimental database used in the present work, is presented in Figure 4. It depicted that from the five input parameters, the COD and TN had the strongest influence on the BOD 5 ( Figure 4). This finding fully agrees with Pearson correlation factors presented in the preview's subsection. Furthermore, it is worth noting that all the input parameters can be characterized as crucial since they have also strongly related to BOD 5 , achieving values greater than 0.98. base with more data.

Sensitivity Analysis of the BOD5 on the Input Parameters Based on the Experimental Database
During the training and development of the computational models for the prediction of the value of a parameter (output parameter) as a function of several other parameters (input parameters) that enter each case problem, it is interesting to explore the sensitivity that is exhibited by the output parameter in terms of the input parameters. This is particularly useful for it enables us to exclude a series of parameters that do not affect the estimated parameter but, at the exact moment, guides to exhibit intense attention to the parameters that significantly affect the output parameter value. The exclusion of several parameters from the current studied problem has decreased the computing time and helped to discover the problem's nature and governing laws.
A first estimation of the dependence between the parameters and much more for the input parameters with the output parameter is given by the value of the Pearson correlation factor. Additionally, because of the subject's great importance, they have proposed a multitude of sensitivity analysis methods with a target the as much possible estimation of the sensitivity and dependence of the output parameter from the input parameters. Between them, there is the cosine amplitude method (CAM), which has been proposed by Jong and Lee [42] and has been accepted by a multitude of researchers [35,[47][48][49][50][51].
The cosine amplitude method (CAM) was used to construct a data array, X, as follows: where variable x in array, X is a length vector of m expressed by: The relationship between R (strength of the relation) and datasets of x and x defined by: The result of the sensitivity analysis, based on the datasets of the experimental database used in the present work, is presented in Figure 4. It depicted that from the five input parameters, the COD and TN had the strongest influence on the BOD5 (Figure 4). This finding fully agrees with Pearson correlation factors presented in the preview's subsection. Furthermore, it is worth noting that all the input parameters can be characterized as crucial since they have also strongly related to BOD5, achieving values greater than 0.98.
where n denotes the total number of datasets, and x i and y i represent the predicted and target values, respectively. Recent research has highlighted the limitations of the root mean square error (RMSE), the mean absolute percentage error (MAPE), and the Correlation Coefficient (R 2 ) to assess the predictive accuracy of models [30,53]. The comparison of the performance of mathematical using the Pearson correlation factor is considered precarious given that except the comparison of the values of R or R 2 it is also required the comparison of the inclination angle of the line. Such a case is when a mathematical simulant always predicts a constant value regardless of the input parameters values. In this case, the value of R = 1.00 while the inclination angle is equal to zero.
To this end, the a20-index, has been recently proposed [43,[56][57][58][59][60][61][62] for assessing the reliability of neural networks: where M denotes the number of dataset samples and m20 denotes the number of samples with a ratio of the true value to the estimated-predicted value between 0.80 and 1.20. In an ideal forecasting model, the value of a20-index is equal to 1.00. The proposed a20-index is a simple statistical index having the advantage to include a physical engineering meaning, as it reveals the number of experiments that satisfy the predicted values with a 20% deviation from the 'true' values. At this point it is worth stressing the very large significance of the database that shall be used for the training and development of the soft computing-based forecasting models. A comparison using different performance indices must be referring to an adequate number of data and indeed to be reliable must be based on the same database.

Development of ANN Models
Several hyperparameters and neural network structure/architecture must be determined ahead of time in the context of the training and development ANN models. This provides the benefit of developing an ANN model that is exceptionally optimized for the problem under investigation. However, unless a certain level of expertise is available, selecting appropriate values for these parameters and appropriate neuron layouts can be intimidating. In other words, the "human in the loop" concept is thought to be critical for tuning an efficient ANN model. Furthermore, special care about the overfitting problems should be paid. The selection of the optimum model among the plethora of training and developed soft computing models should be based not only on statistical indices but also on the derivation of curves which should be smooth revealing the nature of the problem under investigation.
However, in this study, the optimal ANN structure is not selected based on expertise or intuition. However, it is derived from an optimization procedure that trains and tests ANNs using a plethora of alternative hyper parameter combinations and ranks them according to predefined performance indices as well as it mentioned above taking care whether overfitting of the data taking place. Except for the fixed number of hidden layers, the optimization procedure combines the following parameters: (a) data normalization; (b) the number of neurons in the hidden layer; (c) cost function; and (d) activation function. Table 5 shows the alternative options for these parameters, as well as some predetermined configuration options. Twelve different activation functions are investigated, including the common Hyperbolic Tangent Sigmoid (HTS), Log-sigmoid (LS), and Linear (Li) functions. If all other parameters are held constant, this results in 144 (12 × 12) alterna-tive combinations to be trained and tested. Regarding the used cost functions, the MSE and SSE functions are investigated, whereas four data normalization techniques are applied on the input and output parameters, including no normalization at all. Considering the varying random number generator, all of these alternative ANN configuration parameters resulted in 576.000 different ANN models under evaluation (i.e., 50 × 10 × 42 × 4 × 10) (10 alternatives). At this point, it should be pointed out that the Levenberg-Marquardt algorithm (LM) has been applied during training process of ANN mathematical models.  The training of the 576.000 alternative ANN architectures did not use the entire dataset of 387 samples. The dataset was divided into three sub-datasets to evaluate the generality of the developed ANN model: the first dataset included 66.7% of the entire database (258 specimens) and was used for training the ANN architectures, the second dataset included 16.8% of the entire database (65 specimens) and was used for testing the ANN architectures, and the third dataset included the remaining 16.5% of the entire sample (64 specimens). The three datasets are referred to as "training datasets", "testing datasets", and "validation datasets", in that order. To eliminate potential bias, the sample was randomly divided into three datasets using a programmatic procedure. Table 6 tabulates the top twenty ANN architectures based on their achieved values of RMSE index and for the case of testing datasets and configuration parameters. The first, which is preferred as the best and is regarded to as ANN LM 5-8-1 (the numbers refer to the five (5) input parameters, the eight (8) neurons in the hidden layer and the one (1) output parameter which is the BOD 5 ), achieves the best overall performance metrics, both in terms of RMSE (16.8563) and R (0.9443). The best ANN LM 5-8-1 model utilizes the MinMax function for data normalization, which converts input and output values between [−1.00, 1.00]. It also applies as activation functions the Log-Sigmoid function (LS) for the input layer and the Symmetric saturating linear function (SSL) for the output layer, with the MSE function as its cost function. Figure 5 exhibits the neuron layout and overall architecture of the best ANN LM 5-8-1 model. Table 7 displays detailed and in-depth achieved performance indices for both training and testing datasets of the best ANN LM 5-8-1 model. Its performance on training datasets is expected to improve, particularly in the a20-index, where it matches over 98% of the samples within a 20 percent margin. The same index is 100.00% when compared to the testing datasets, which is an excellent value. At this point should be noted that the better achieved indices for testing datasets compared to training datasets clearly depict that not overfitting problems is taken place. To the authors best knowledge, the achieved performance is the better than any other performance reported in the related topic.  Furthermore, values for the training and testing datasets are presented in Figure 6 as scatter plots of the 'true' vs. predicted by the best ANN LM 5-8-1 model. Except for the diagonal line (ideal prediction), two more dotted lines are drawn in these diagrams, indicating a ±20% error. Furthermore, Figure 6 is a more useful figure depicts the ratio of 'true' values to predicted values of BOD 5 in wastewater both for training and testing datasets.

Closed-Form Equation for the Estimation of BOD 5 in Wastewater
As presented in the previous section, the ANN LM 5-8-1 model is the best among the many different trained and developed architectures. That is the best is documented that this model has optimal values for the performance indices used for ranking other ANN models. It is worth noting that this is a standard procedure for the multitude of scientific publications.
At this point, the authors consider it necessary to state clearly that such a procedure is not safe and reliable because it is impossible to test the reliability-validity of these values.
Additionally, the results of such a research work are not immediately applicable to the scientists of this discipline and much more to the engineers in practice.  Furthermore, values for the training and testing datasets are presented in Figure 6 as scatter plots of the 'true' vs. predicted by the best ANN LM 5-8-1 model. Except for the diagonal line (ideal prediction), two more dotted lines are drawn in these diagrams, indicating a ±20% error. Furthermore, Figure 6 is a more useful figure depicts the ratio of 'true' values to predicted values of BOD5 in wastewater both for training and testing datasets. Intending to treat the above weaknesses, the authors consider it necessary to present the architecture corresponding to the best ANN model and the final values of weights and biases to make the design of the mathematical simulant possible. Giving the mathematical equation that describes the best mathematical simulant would be more beneficial. Under the prism of all the above in the present section, the derived equation for the prediction of both normalized and absolute values of BOD 5 , using COD, SS, TN, NH 4 -N and TP are expressed by the following equation for the optimum developed ANN LM 5-8-1 model: where a = −1.00 and b = 1.00 are the lower and upper limits of the minmax normalization technique applied on the data, BOD5 max = 348 and BOD5 min = 128 are the maximum and minimum values of BOD 5 present in the database used for training and developing ANN models. The satlins and logsig are the symmetric saturating linear transfer function (SSL) and the Log-sigmoid transfer function (LS), respectively, as discussed. Their details (equations and graphs) are presented in detail in Table A1

Closed-Form Equation for the Estimation of BOD5 in Wastewater
As presented in the previous section, the ANN LM 5-8-1 model is the best among the many different trained and developed architectures. That is the best is documented that this model has optimal values for the performance indices used for ranking other ANN models. It is worth noting that this is a standard procedure for the multitude of scientific publications.
At this point, the authors consider it necessary to state clearly that such a procedure is not safe and reliable because it is impossible to test the reliability-validity of these values. Additionally, the results of such a research work are not immediately applicable to the scientists of this discipline and much more to the engineers in practice.
Intending to treat the above weaknesses, the authors consider it necessary to present the architecture corresponding to the best ANN model and the final values of weights and  Table 8. In this form of matrix multiplication, the prediction equation Equation (9) can be easily programmed in an Excel spreadsheet, and therefore it can be more easily evaluated and used in practice. It is worth noting that such an implementation can be used by various interested parties (i.e., researchers, students, engineers) without placing heavy requirements on effort and time.

Mapping of BOD 5
With the proposed optimal ANN-LM 5-8-1 model, a thorough analytical investigation was conducted of the parameters that affect the value of BOD 5 . Based on the results of this analytical investigation they were derived a set of contour maps of the BOD 5 in relation of the input parameters (Figures 7-12). Based on these figures, it is shown in a robust manner that the proposed ANN-LM 5-8-1 model ensures that the known and widely encountered phenomenon of overfitting is not taking place. This is implied by the fact that all the derived charts and the derived curves are exceptionally smooth and do not display sudden variations having as a result to exhibit the laws that govern the variation of BOD 5 concerning COD, SS, TN, NH 4 -N and TP. of this analytical investigation they were derived a set of contour maps of the BOD5 in relation of the input parameters (Figures 7-12). Based on these figures, it is shown in a robust manner that the proposed ANN-LM 5-8-1 model ensures that the known and widely encountered phenomenon of overfitting is not taking place. This is implied by the fact that all the derived charts and the derived curves are exceptionally smooth and do not display sudden variations having as a result to exhibit the laws that govern the variation of BOD5 concerning COD, SS, TN, NH4-N and TP.     Figure 7 shows that for the lowest COD and TN values, the BOD also has the lowest value, searching all over the map area when NH4-N and TP are 60 and 8 mg L −1 , respectively, and SS varies between 200 and 300 mg L −1 . Figures 8 and 9, the variations of COD and TN present smooth curvature, searching all over the map area. A more detailed look at Figure 8 (left corner) shows that the lowest contents of COD and TN, the BOD presents the highest value only in one part of the map. It is found that the COD and TN are sensitive to BOD parameter. Figures 10-12 presents that for lowest contents of TP and TN, the BOD has moderate value when SS ranges from 250 and 300 mg L −1 , and NH 4 -N and COD is 60 and 400 mg L −1 , respectively. A more detailed look at Figure 12, presents that the lowest concentrations of TP and TN, the BOD shows the highest value in the significant part of the map.

Limitations and Future Works
The proposed optimal ANN-LM 5-8-1 model, like any other mathematical simulant, has validity for values of the input parameters between the minimum and maximum values of the database that was used for the training of ANN models (Table 3). Additionally, the reliability of the proposed model is exceptionally high for ranges of the values of the parameters where according to the histograms that were presented in the previous section (Figure 3), there exists sufficient data. For the regions where the data are not considered enough, we must update the database with further data that cover these areas satisfactorily. Based on those mentioned above, the authors' aims include updating the database and data from measurements from different sewage processing plants with the target of formulating one even more reliable model for estimating BOD 5 in wastewater.

Conclusions
The proposed ANN LM 5-8-1 approach can save costs and time for actual laboratory measurements. In other words, it is a practical need to illustrate a machine learning approach to conduct BOD estimation and receive accurate findings. The variation of COD and TN exhibit smooth curvature. COD and TN are found to be sensitive to the BOD parameter. The proposed optimal ANN model is valid for input parameter values between the minimum and maximum values of the database used for ANN model training. Furthermore, the proposed model's reliability is exceptionally high for parameter value ranges where there is sufficient data. The developed and proposed ANN model proved to be a robust and valuable tool for scientists, researchers, engineers and practitioners in monitoring water systems and the design phase of wastewater treatment plants. Moreover, it is an illustrative example of ANN methodology for environmental and educational applications.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/w15010103/s1, Table S1: Experimental database used for the training, testing and development of ANN models.

Funding:
The authors received no financial support for the research, authorship, and/or publication of this article.
Informed Consent Statement: Informed consent was obtained from all individual participants included in the study.

Data Availability Statement:
The raw/processed data required to reproduce these findings will be made available on request.

Conflicts of Interest:
The authors declare that there is no conflict of interest regarding the publication of this paper.