The Use of Artificial Neural Networks to Predict the Physicochemical Characteristics of Water Quality in Three District Municipalities, Eastern Cape Province, South Africa

Reliable prediction of water quality changes is a prerequisite for early water pollution control and is vital in environmental monitoring, ecosystem sustainability, and human health. This study uses Artificial Neural Network (ANN) technique to develop the best model fits to predict water quality parameters by employing multilayer perceptron (MLP) neural network and the radial basis function (RBF) neural network, using data collected from three district municipalities. Two input combination models, MLP-4-5-4 and MLP-4-9-4, were trained, verified, and tested for their predictive performance ability, and their physicochemical prediction accuracy was compared by using each model’s observed data with the predicted data. The MLP-4-5-4 model showed a better understanding of the data sets and water quality predictive ability giving an MSE of 39.06589 and a correlation coefficient (R2) of the observed and the predicted water quality of 0.989383 compared to the MLP-4-9-4 model (R2 = 0.993532, MSE = 39.03087). These results apply to natural water resources management in South Africa and similar catchment systems. The MLP-4-5-4 system can be scaled up for future water quality prediction of the Waste Water Treatment Plants (WWTPs), groundwater, and surface water while raising awareness among the public and industry on future water quality.


Introduction
Water quality plays an essential role in any aquatic system, such as reflecting the degree of water pollution [1] and influencing the growth of aquatic organisms [2]. Predicting future water quality changes is a prerequisite for early water pollution control [3] and plays a crucial role in environmental monitoring, ecosystem management, and human health [1]. As a result, water quality prediction has tremendous practical significance [4][5][6] as an essential means of preventing water pollution in any catchment [7]. As influenced by natural and human-induced occurrences [8], the water quality of any catchment serves as scientific evidence for economic development, commercial planning, and water resources protection from future contamination of that catchment [8]. Therefore, water quality monitoring and prediction are of utmost importance to public health and are mandatory and crucial for better managing accessible water resources and building up various remediation strategies [9].
Traditionally, water quality evaluation and monitoring tools, such as the Water Quality Index (WQI), have been considerably used by researchers worldwide to evaluate the surface water quality because of its capability to summarize several water quality parameters into one numeric value, along with a defined scale of water quality [10][11][12][13][14][15][16]. Despite its widespread use, there is still a limitation on the index system as much of the data used cannot correlate with an index [17] and therefore insufficient to predict future water quality changes. The water pollution process is so complex that it is not only affected by natural factors but also anthropogenic factors such as social and economic development, resulting in a water environment system with strong nonlinear and non-deterministic characteristics [18]. Therefore, the traditional linear prediction model cannot fully reflect its changing regulation and cannot accurately predict its water quality. Owing to an increase in data scale and the growing need to investigate ways and means of linking together land use, pollutant loading and disposal, water quality, and ecosystem impacts, mathematical techniques and models that can efficiently model and predict water quality have been developed [19]. These modeling techniques can systematically and methodically understand the cause-and-effect relationships and assess water quality changes [20]. This ability is crucial to forecasting the variation trend of water quality at a particular time in the future [21].
In recent decades, the non-mechanism model has become a hotspot for research on water quality prediction modeling. As such, various water quality prediction techniques such as Autoregression (AR) [22,23], Moving Average (MA) [4,24], Exponential Smoothing (ES) [25], Hybrid Methods (HM) [26][27][28], multiple linear regression (MLR) [18], and the Autoregressive Integrated Moving Average (ARIMA) [29] have been used to predict and forecast the dependent variable in a time series [4,[22][23][24][25][26][27][28][29]. The characteristic of these methods is establishing a water quality prediction model with a specific algorithm from the perspective of the variation in water quality data and without considering the relationship of the water pollution and the changing mechanism. Among these techniques, the multilayer perception (MLP) model has mostly outperformed others in precision and accuracy [27] and is the most widely used architecture [30]. MPL is a feed-forward ANN model that maps sets of input data into appropriate outputs. It uses a supervised learning technique that involves the use of the backpropagation algorithm [31]. This is probably the reason behind the increasing popularity of ANNs in the field of water quality prediction [6][7][8]21,30,[32][33][34][35][36][37][38][39][40] and environmental analysis [27,[41][42][43][44][45][46][47], that is, researchers can utilize ANN to model nonlinear and complex phenomena even if they do not fully understand the underlying changing mechanisms [48,49], hence the increased use of ANNs in water quality classification and prediction [50].
In South Africa, as would be in any developing country, economic development has led to the gradual degradation of the nation's water resources system [51], but the extent and rate of water quality decline [52][53][54] have not been consistently and systematically measured. A few studies have utilized ANN to model and predict stream flow [55], mine water quality [56], and water demand [57] in South Africa, but none have focused on water quality prediction in a water environment system with more than one source of pollutants. In cognizance of a need to expand the modeling and prediction of water quality in South Africa, motivated by the successful applications in modeling non-linear system behaviors in a wide range of areas, ANNs are used to predict water quality parameters study. There are no current studies on modeling and predicting the physicochemical properties of water quality for natural water resources that have been conducted in South Africa. Therefore, this study uses standard water quality measuring techniques to analyze eight physicochemical parameters and further employs two input combination models (MLP-4-5-4 and MLP-4-9-4) with a multilayer perceptron feed-forward ANN to test their predictive performance required to reach input combinations capable of forecasting, with accuracy, the physicochemical parameters of selected rivers and WWTPs of three district municipalities, Eastern Cape, South Africa. This paper's findings can be used as a baseline study for future water quality prediction in South Africa. The built networks can be scaled up and may be used to predict future water quality for any other area with the parameters studied in this work, locally and internationally. The objectives of this study are to (1) obtain the best model fits to predict water quality parameters by employing multilayer perceptron (MLP) neural network, and the radial basis function (RBF) neural network, using data collected from three district municipalities; (2) evaluate the performance of each modeling approach using observed data versus predicted data from each model; and (3) compare the performance of these two modeling approaches in terms of prediction accuracy.
The rest of this paper is organized as follows: Section 2 provides the theoretical foundations of ANNs, i.e., a glance at the application of ANNs in water quality research and prediction. Section 3 describes the study's materials and methods, i.e., the study area, study data, and the principles of the ANN network. Section 4 presents Tests and Results, i.e., applying the results, with corresponding analyses and discussions, and the experimental conclusions drawn from the two MLPs models employed in this study. Finally, the conclusions and future work are discussed in Section 5.

Principles of ANN
An artificial neural network (ANN) is a computing system animated by studies of the brain and nervous system [37] as in the human brain [18]. ANN carries out perfect mathematical complex systems and is based on a system of interconnected "neurons" [36,48,58] forming the basis of neural network operation. The network has computational models that are defined by four parameters: (i) processing elements known as neurons, (ii) a topology comprising weighted connections between neurons, (iii) a learning algorithm for training the network, (iv) a recall algorithm for testing or classifying purposes [59]. The neurons are interconnected according to a particular architecture/topology/structure to achieve pattern recognition in data [59]. The most widely used architecture is Multilayer Perceptron (MLPs), with only three layers in many types of feed-forward ANNs shown in Figure 1. The interconnecting links have a numeric weight updated during the learning process and allow long-term storage in the network [60]. The structure of neural networks has three layers: input neurons that receive data from an external source, hidden neurons with input and output signals that remain in the network, and output neurons that send data to an external source. The layers consist of summing units, activation functions, bias b, weight matrix W, and output vector. Each component of the input X is connected to each neuron through weight matrix W. Each neuron has an activation function f, bias b, and an output Y ( Figure 1) [60]. In recent years, Artificial Intelligence (AI) earned enormous advances in various uses including solving complex and non-linear challenges [16,27,36,56,57,61]. Additionally, AI is regarded as a generally complementary method to conventional procedures or complete systems that can be used to execute modeling, forecasting, and optimization at full speed [62]. AI high technologies relate to the artificial neural network, genetic algorithm, and expert system chemometric techniques. The utilization of ANN in water engineering and environmental sciences has been pointed in many studies [33,63,64] due to its ability to show the hidden relationship in historical records, making it easy to predict and forecast water quality.

Sample Collection
Water samples were collected from the Tyhume River (Raymond Mhlaba Municipality), Bloukrans River (Makhanda Municipality), Buffalo River (Buffalo City Metropolitan), and WWTPs found on the banks of these rivers in the Eastern Cape Province of South Africa. Figures 2-4 show the maps and sampling sites for the respective catchments. Samples were collected from four key sampling sites (Upper, Middle, Lower, and WWTPs) in each of the three municipalities.
For each river, samples were collected from three sites: the upper stream, middle stream, lower stream, and wastewater treatment plants (influent and effluent). Samples were brought to the laboratory in a cooler box containing ice packs to preserve the temperature. The analysis of samples was performed within 48 h from the time of sampling. Table 1 represents the geographic database concerning the selected rivers' sampling points, district municipalities, and the complete site description.

Artificial Neural Network (ANN)
Two types of feed-forward ANN, namely, multilayer perception (MLP) and radial basis function, were evaluated using Statistica version 13.2 software (Round Rock, Texas, USA). The artificial neural network was trained by employing the MLP with a hidden layer of 3 to 10 neurons, the Broyden-Fletcher-Goldfarb-Shanno training algorithm, and a network approximation error of 1 × 10 −14 . In this work, a feed-forward backpropagation (BP) is adopted in an artificial neural network to determine a gradient needed in the computation of the weights for the network, which is then used to construct classifiers for water quality prediction in the study areas. Each neuron in the network computes a weighted sum of its input signals to generate an internal activity level a i , where x ij is the jth input to the ith neuron, w ij is the weight associated with the jth input, and w i0 is the threshold associated with neuron i. The internal activity is passed via a nonlinear activation function β i to generate the output of the neuron γ i , After each y i is obtained, an activation function is used to adjust it. The standard sigmoid function is of the form, The output of the activation functions β i for the neurons becomes the input for the neurons at downstream layers. The eventual output of the model is a result of the β i at the output layer. The error of hidden layers is minimized by propagating back the error desired for the output layer. The weights of the connection ω ij are optimized according to the generalized Delta Rule during the training process to reach the neural networks' desired input and output relationship. The error function, minimized by the backpropagation algorithm, is the average sum of the squares of the errors for all the outputs, and it is defined as follows, A simplified learning procedure for ANNs is summarized as follows: (1) supply the neuron network with training data including input variables and desired target outputs; (2) attain how closely the neuron networks outputs mates the target outputs; (3) optimize the weights of the connection between the neurons, so the neuron network yields better approximations of the target outputs; (4) keep on adjusting the weights until a specific desired accuracy is attained.

Optimal selection of ANN model
The optimal architecture of the network was set and kept constant according to the empirical formula where M represents the number of hidden layer nodes, i is the number of input sets, zero is the number of output sets, and c is a constant number ranging from 0 to 10.

Selection of Input and Output Variables
Specific parameters were chosen from the ten initial settings by factorial analysis that demonstrated that the water quality was primarily affected by specific physicochemical properties. The physicochemical parameters are chloride, sulfate, temperature, phosphate, pH, electrical conductivity, turbidity, and dissolved oxygen. The significant stage of developing the ANN model is to decide the model input variables, which have a considerable influence on the performance model. The input layers (dependent variables) were set with four neurons: temperature, chloride, sulfate, and phosphate, whereas the output layers (independent variables) have four neurons: pH, electrical conductivity, turbidity, and dissolved.

Data Preprocessing and Evaluation of the ANN Model's Performance
Before the network is presented with the input data, a normalization procedure is required since mixing variables with large and small magnitudes confuses the learning algorithm on each variable's importance, resulting in the rejection of variables with a smaller magnitude. Normalization scales the minimum value to 0 and the maximum value to 1. The coefficient of correlation (R 2 ), mean square error (MSE), and root mean square error (RMSE) were employed to evaluate the model's performance. The general formula of R 2 , MSE, and RMSE are mathematically indicated in Equations (6)-(8) as follows:

Training and Testing Network
Experimental data were categorized into training and testing sets. The training set was employed to generate the ANN model; validating and testing sets were used to confirm the model's generalization competencies. The measured data collection is divided into 70% of the training set, 10% of validation, and testing sets.

Tests and Results
The statistical variables of annual water quality parameters for the Tyhume, Buffalo, and Bloukrans Rivers and their municipal wastewater treatment plants are given in Table 2. The data was divided into 70% of the training set, 10% of validation, and testing sets and fed into the ANN model. Table 3 gives a summary of the two input combination networks, MLP-4-5-4 and MLP-4-9-4, used in this study. Table 3 shows the summary of two input combination networks (MLP-4-5-4 and MLP-4-9-4) with a multilayer perceptron feed-forward ANN. MLP-4-5-4 produced a correlation coefficient (R 2 ) value of 0.989383 with a mean square error (MSE) value of 39.03087, and MLP-4-9-4 produced an R 2 value of 0.993532 with an MSE value of 39.06589.
According to the test data percentage difference for the MLP 4-5-4 and MLP 4-9-4 networks (Table 4), both networks adequately understood the relationship between the data sets. The percentage difference for the first test data set was comparable/similar, and that of the second set showed a difference in predictive ability. The lowest percentage difference (3.48%) for the second test data set was given by MLP 4-5-4, and therefore, this network best understood the relationship between the independent variables and the pH of the water.
Turbidity test data set percentage differences ( Table 6) for MLP 4-5-4 and MLP 4-9-4 were above the 10% limit, suggesting that both networks did not adequately understand the relationship between the investigated independent variables and turbidity. The results indicate a nonsignificant effect of the studied variables on the test variable.
The percentage difference (Table 7) for the MLP 4-5-4 and MLP 4-9-4 networks' dissolved oxygen (DO) test data (Table 7) was within the acceptable limit of 10%. The systems understood the relationship between the data sets. MLP 4-5-4 exhibited a percentage difference significantly lower (5.42% difference) than that of MLP 4-9-4 for the first data set, while MLP 4-9-4 exhibited the slightest percentage difference for the second test data, 1.60% different from MLP 4-5-4. These results suggest that the best predictive ability was demonstrated by MLP 4-5-4.

Discussions
This study uses standard water quality measuring techniques to analyze eight physicochemical parameters in water samples collected from Wastewater Treatment Plants (WWTPs) and three major rivers. Furthermore, the study employs two input combination models (MLP-4-5-4 and MLP-4-9-4) with a multilayer perceptron feed-forward ANN to test their predictive performance required to reach input combinations capable of forecasting water quality accurately. The results obtained from the test data percentage differences of the two networks show that both networks adequately understood the relationship between the training and testing data sets, with the MLP 4-5-4 model showing better generalization competencies in understanding the relationship between the independent variables and the investigated physicochemical parameters of the water samples (Tables 4, 5 and 7). However, the results showed a nonsignificant effect between the studied variables and the turbidity test (Table 6) as the experimental and predicted percentage difference values for both networks are above the 10% limit.
The observed percentage difference (above the 10% limit) in experimental, and predicted test results between the turbidity and the independent variables can be interpreted several ways. It may be that there truly is no significant effect of the studied variables on water turbidity, suggesting that the built systems did not adequately understand the relationship between the investigated independent variables and turbidity. Alternatively, it could be that there is a significant effect, but the MLP-4-5-4 and MLP-4-9-4 models' predictive ability in the present study was not sensitive enough to test data due to a variety of potential factors. First, this result reflects on the effect of other parameters on the water turbidity. That is, the studied variables may not have a significant effect on the test variable. More variables would have to be considered in future studies. Second, the ANN model depends significantly on data quantity [58]. As a result, it may not be advised to utilize comparatively small data for input variables as some valuable data may be lost in shortterm data, resulting in unsatisfactory predicted results [77]. Third, the input combination could be a factor in the observed results. Data division is a crucial stage in the method of process modeling. Reaching a precise forecast exploiting an artificial neural network is determined by selecting an excellent input combination model [78]. Nevertheless, both networks' predictability performance in other variables tests showed significant results, implying that the experimented and predicted data are strongly correlated.
The MLP-4-5-4 and MLP-4-9-4 both showed commendable predictive performance and input combinations capable of forecasting water quality and supporting this study's objectives. The MLP-4-5-4 produced a higher correlation coefficient (R 2 ) and lower MSE than the MLP-4-9-4 network ( Table 3). The higher the R 2 and lower the RSME, the better the model fits the dataset [79]. These results suggest that the MLP method was able to learn the system significantly well. This study's outcomes are consistent with other studies conducted in South Africa [55,56,80]. A study by Isiyaka et al. 2019 [78] used a multilayer perceptron feed-forward artificial neural network to predict the level of water pollution. The authors reported the best input combination and the highest R 2 = 0.999 value with the least RMSE = 0.159 and, based on these findings, concluded that ANN could also predict the water quality index with a high level of accuracy using less complex input variables that can be adopted for water quality prediction and modeling in the subsequent analysis [78]. These results agree with the findings of the present study (Table 3). Several other studies are consistent with the present study and conclude, based on similar findings, that the ANN model can easily classify and predict water quality with the justifiable output [19,20,50,55,[81][82][83][84][85][86][87][88].
The present study is essential because the MLP-4-5-4 system can be scaled up and used for the future water quality prediction of the Waste Water Treatment Plants (WWTPs), groundwater, and surface water at the municipal, regional, and national scales. Municipalities and other water quality bodies can benefit from this research's outcomes. More significantly, the model can help manage natural resources and raise awareness among the public and industry. Furthermore, the MLP-4-5-4 system can help reduce water quality decision-maker uncertainty using a novel and refined model to predict and classify WWTPs and river water variables' quality with acceptable precision. Furthermore, the results can be used to manage water quality in the study area and other regions.

Conclusions and Future Studies
The ANN model was developed to test its predictive performance on the quality of river water and WWTPs and has a great opportunity as a predictive tool. Most notably, the method of MLP was able to learn the system reasonably well. The MLP 4-5-4 network showed the best predictive ability for water quality. The application of this model to the river basins in the study area has shown the possibility of using available data in a given catchment to predict water quality while recognizing the fact that such data-intensive models as ANN may not be successful in developing countries where data is inadequate, a notable limitation in the present study. Future research should direct attention to applying the same techniques to other catchments and provinces and consider relatively long data series to reasonably compare the performance of the models in water resources.
Furthermore, we intend to focus on water quality prediction in extreme weather conditions and the building of a uniform model for multiple catchments at a one-time step. This is crucial to testing the effect of spatial and temporal variations on water quality modeling and prediction since water quality varies at spatial and temporal scales. This research line is crucial to understanding the means of linking together land use, pollutant loading and disposal, water quality, and ecosystem impacts to efficiently model and predict water quality. Therefore, the ANN model is a golden and valid instrument that optimizes the observational network by determining important monitoring sites and predicting river water variables' quality with acceptable precision. However, while the results derived from ANN in this study are not necessarily statistically significantly better than the results derived from a combination of descriptive statistics, the water environment system is a very complex system with nonlinear solid and non-deterministic characteristics. As such, these results offer more accurate and comprehensive water prediction data. To improve prediction accuracy, accommodating uncertainty associated with the water environment system, modern algorithms are suitable for time-sequential prediction, such as the ensemble approach, transfer learning technology, and evidence theory can be used.