Prediction of Nitrate and Phosphorus Concentrations Using Machine Learning Algorithms in Watersheds with Different Landuse

: Rapid industrialization and population growth have elevated the concerns over water quality. Excessive nitrates and phosphates in the water system have an adverse effect on the aquatic ecosystem. In recent years, machine learning (ML) algorithms have been extensively employed to estimate water quality over traditional methods. In this study, the performance of nine different ML algorithms is evaluated to predict nitrate and phosphorus concentration for ﬁve different watersheds with different land-use practices. The land-use distribution affects the model performance for all methods. In urban watersheds, the regular and predictable nature of nitrate concentration from wastewater treatment plants results in more accurate estimates. For the nitrate prediction, ANN outperforms other ML models for the urban and agricultural watersheds, while RT-BO performs well for the forested Grand watershed. For the total phosphorus prediction, ensemble-BO and M-SVM outperform other ML models for the agricultural and forested watershed, while the ANN performs better than other ML models for the urban Cuyahoga watershed. In predicting phosphorus concentration, the model predictability is better for agricultural and forested watersheds. Regarding consistency, Bayesian optimized RT, ensemble, and GPR consistently yielded good performance for all watersheds. The methodology and results outlined in this study will assist policymakers in accurately predicting nitrate and phosphorus concentration which will be instrumental in drafting a proper plan to deal with the problem of water pollution.


Introduction
With the current pace of industrialization and population growth, the concerns over water quality are very sensitive at present [1]. The demand for water has increased because of the growing population and development activities. On the other hand, the level of pollution in water sources has also increased significantly [2].
Nitrate and phosphate compounds are commonly used as industrial chemical reagents and in chemical fertilizers. Higher amounts of nitrate and phosphate can have an adverse effect not only on the water resources but also on the surrounding ecosystem. Proper management of these chemicals is important in large water bodies to mitigate the harmful impact on the aquatic ecosystem. Machine learning (ML) models can be effectively employed to model the nitrate and phosphorus distribution and predict its concentration in the water system. The ability to predict nitrate and phosphorus concentration accurately can provide engineers and policymakers with a proper plan to deal with the problem. These nutrients need to be measured over multiple locations due to the distributed nature

•
The analysis of concentration-discharge (C-Q) relationship for different type of watersheds; • To assess the applicability of BO to optimize hyperparameters of RT, ensemble, and GPR to predict nitrate and phosphorus concentration; • The performance evaluation of nine different ML algorithms for the nitrate and phosphorus prediction.
The paper is arranged in the following order: Section 2 outlines the methodology employed in the study with a brief description of the study area, ML algorithms and Water 2021, 13, 3096 3 of 20 parameter settings for the training and test set, Section 3 incorporates relevant results with appropriate discussions, and Section 4 concludes the study.

Study Area and Data Structure
In this study, the water quality data were collected from five watersheds: Cuyahoga, Grand, Maumee, Raisin, and Sandusky, draining into Lake Erie ( Figure S1-Supplementary Materials). These five basins have similar climate, soil, ecoregion, and cropping systems and are also located very close to each other. National Center for Water Quality Research (NCWQR) maintains long-term daily time-series datasets for these watersheds [26]. The water quality depends on the physical and anthropogenic features of the watershed; hence, watersheds with different land-use distributions are selected in this study. As per NCWQR, Cuyahoga is an urban watershed (39.54% urban and 33.55% forest), Grand is a forested watershed (50.10% forest and 40% agricultural), while Maumee (73.33% agricultural), Raisin (49.56% agricultural), and Sandusky (77.59% agricultural) are predominantly agricultural watersheds (Table S1-Supplementary Materials). The water quality does not change drastically for an urban watershed, so the data structure is simple and consistent [3]. On the contrary, agricultural watershed experiences an influx of nutrients deposited in the soil during the high flow season, so the water quality response is partially seasonal. Likewise, the water quality response from the forest watershed can be entirely seasonal.
The long-term daily streamflow data were obtained from the United States Geological Survey (USGS) website (https://waterdata.usgs.gov/nwis/rt, accessed on 14 October 2021) and nitrate, total suspended solids, and total phosphorus concentration were obtained from the NCWQR at Heidelberg University. The total data points used in this study are 11,996 for Cuyahoga, 5049 for Grand, 12,849 for Maumee, 9241 for Raisin, and 11,468 for Sandusky. The streamflow and water quality samples were collected daily at the outlet of each watershed, supplemented by more frequent samplings (up to three per day) during high-flow periods. Hence, daily average streamflow and concentration values were used in this case. The preliminary data cleaning was performed to remove the observation with negative streamflow and concentrations.
Descriptive statistics of parameters for all watersheds are provided in Table S2 (Supplementary Material). The agricultural Maumee watershed (16, 395 km 2 ) is the largest watershed considered in this study, followed by Sandusky (3239 km 2 ) and Raisin (2698 km 2 ). Hence, the Maumee watershed had the highest mean streamflow of 180.27 m 3 /s, which is expected because of its large size. The urban Cuyahoga watershed resulted in the highest mean total suspended solids concentration of 102.627 mg/L, which can be attributed to construction activities in the urban areas. Likewise, the highest mean nitrate (4.192 mg/L) and total phosphorus (0.236 mg/L) concentrations were observed in the agricultural Maumee watershed. The standard deviation of streamflow for Maumee is very high, indicating that the streamflow varies significantly throughout the year for this watershed. The same is the case for total suspended solids in the Cuyahoga and total phosphorus and nitrate concentration in the Sandusky.

Machine Learning Algorithms
In this study, MATLAB R2020a [27]  where X andŶ represent the input and output variable, respectively. The least-square method is employed to fit the model coefficients (a and b) using the actual and the predicted data.

k Nearest Neighbors (kNN)
kNN models are based on the proximity of data points where a new object is classified based on features and training patterns [28]. Here, the output or the forecasted value is the average or the specified number of neighboring values. The specified number is described as the value k. The algorithm for kNN is: Calculate the Euclidean distance of the query example to the labeled examples where, (x, x ) is the sample point; • Sort the labeled examples from smallest to largest; • Find an optimal number k of nearest neighbors based on RMSE using cross validation; • Compute an inverse weighted average with kNN.

Regression Tree (RT)
A RT is an approach to nonlinear regression, which is built through recursive partitioning. Recursive partitioning is an iterative process of splitting the data into more manageable partitions and again splitting each partition into smaller sub-divisions. RT represents recursive partitioning in the form of a tree with each terminal node representing a cell of the partition. RT is simple to model and visualize, but the unstable nature of a single tree model resulted in the development of ensembles.

Ensemble
Ensemble methods combine a number of weak RT models to form a more accurate RT model. Ensembles create multiple diverse regression models by taking different samples of the original data set and then combining their output. Two types of ensembles: Bagging and Least-Squares Boosting (LSBoost) are used in this study.
Random sampling with replacement is employed in Bagging to generate numerous training sets. RT algorithm is applied to each data set, and then the average of the models is taken to compute predictions for the unseen data. A more accurate model is generated in Boosting by successively training models to concentrate on records with poor prediction in previous models. All predictors are combined by a weighted majority vote after completion. A new learner is fitted to the difference between the observed response and the aggregated prediction of all learners grown previously by the ensemble in LSBoost.

Random Forest (RF)
Another type of ensemble employed in this study is a RF. RF constructs a number of decision trees which is used to classify a new instance by the majority vote. A subset of randomly selected attributes from the original set is used by each decision tree node. Likewise, a different bootstrap sample data is used by each tree, such as that of Bagging. The number of trees generated in RF might range from hundreds to thousands. In this study, the number of trees is selected to be ten to construct a forest.

Artificial Neural Network (ANN)
ANN uses connected units between input and output layers to resolve a complex problem. Among several ANN topologies, MLP is employed in the current study. The connected unit is generally a simple linear equation, and the output layer uses an activation unit, which makes it very different from the polynomial regression. The activation unit acts as a logical switch that is activated only for threshold values. Some common types of activation functions are Sigmoid and ReLu. This study uses ReLu activation. Typical MLP with a hidden layer can be mathematically modeled as: where, x is the input, w is the weight, and b is the bias in hidden layer.
2.2.7. Support Vector Machine (SVM) SVM based regression model is useful for modeling complex relations, which are not easily described by lower-order polynomial equations. SVM is a powerful supervised learning technique with excellent generalization ability because of which it is extensively utilized for solving problems regarding pattern recognition, classification, regression, and prediction [29]. The predicted value is obtained using the equation: where α i and α * i are the support vectors and K(X i , X 0 ) is the kernel function. SVM function can be used with various kernel functions (KF) to implement its regression learner. The application of the gaussian kernel function (GKF) is popular with SVM classification and regression which is defined as: In this study, fine Gaussian SVM (F-SVM) and medium Gaussian SVM (M-SVM) is used. Medium and fine gaussian is defined based on the slenderness of the gaussian function being used.

Gaussian Process Regression (GPR)
GPR is a non-parametric model which works on the principles of Bayesian probability. GPR can be applied through various methods with variations in kernel type, kernel function basis function, etc. In this study, Nonisotropic Exponential, Nonisotropic Matern 3/2, Nonisotropic Matern 5/2, Nonisotropic Rational Quadratic, Nonisotropic Squared Exponential, Isotropic Exponential, Isotropic Matern 3/2, Isotropic Matern 5/2, and Isotropic Squared Exponential kernels are applied. Constant, Zero, and Linear basis functions are employed in the implementation.

Experimental Configuration
The entire dataset was divided into two sets for each watershed; an initial 70% of the data was utilized for training ML models, and the rest 30% was employed for the model assessment. A five-fold cross validation method was employed during the model development to prevent model overfitting in the training set. Daily measured streamflow and month (representing the seasonal characteristic) were taken as independent variables to predict nitrate concentration. As most of the total phosphorus is particulate phosphorus attached to suspended solid particles, total suspended solid was also taken as an independent variable alongside streamflow and month to predict total phosphorus concentration. Detailed parameters for each ML algorithm are illustrated in Table S3 (Supplementary Materials).

Bayesian Optimization (BO)
BO is a hyperparameter search method applied in ML problems by minimizing a particular objective function [30]. Based on the past evaluation results of the objective function, an alternate function is established with BO to minimize its value. In comparison to random grid search, BO refers to past evaluation results when selecting parameters in each iteration which greatly saves search time and improves optimization efficiency. Mean squared error (MSE) is taken as an objective function for this study. While minimum leaf size is the only hyperparameter to be optimized in the RT model, ensemble method, minimum leaf size, number of learners, learning rate, and number of predictors to sample are to be optimized in the ensemble method. Likewise, sigma, basis function, kernel function, kernel scale, and standardization are the hyperparameter to be optimized in the GPR model. The hyperparameters and search spaces of RT, ensemble, and GPR are listed in Table S4 (Supplementary Materials).

Evaluation Metrics
Following statistical indicators namely, coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute error (MAE) is utilized to evaluate the performance of various models. The model performance is evaluated using the equations given below [31]: where, X i and Y i represent the measured and predicted values; while X and Y represent the average measured and average estimated values. m is the number of data points. R 2 indicates the variance of the dependent variable that is explained by independent variables. RMSE value indicates the short-term performance of the model. A lower RMSE value corresponds to better performance. Similarly, MAE indicates the magnitude of error one can expect from the forecast on an average without considering their direction. The combination of RMSE and MAE can be used to analyze the variation of errors in the forecast. The RMSE is always greater than or equal to MAE, and the greater difference between them indicates the greater variance in the individual errors of the forecast.

Impact of Watershed Characteristics on Prediction
Concentration-discharge (C-Q) relationships for each watershed are given in Figure S2 (Supplementary Materials). In regard to the nitrate concentration, the slope of the C-Q regression line (b) for the Cuyahoga was negative, while that of the other four watersheds was positive. This indicates that the nitrate concentration essentially dilutes with the increasing streamflow in Cuyahoga. On the contrary, the nitrate concentration increases along with the streamflow in the other four watersheds. In an urban watershed, such as the Cuyahoga, the point sources such as wastewater treatment plants are the primary source of nitrate, which can be quickly diluted by storm events [3,32]. In agricultural areas, however, increased flow can flush the nitrate deposited in the soil, resulting in the elevated nitrate concentration. In this case, a more severe event can result in more nitrate loss in water, which is similar to the situation in a forested watershed.
In regard to the total phosphorus concentration, the slope of the C-Q regression line (b) for all five watersheds was positive. Suspended solids are generated from nonpoint sources from agricultural lands and construction areas. Likewise, increased streamflow can flush the sediment and associated soil particles and increase suspended solids concentration in the water column. As most of the total phosphorus is particulate phosphorus attached to suspended solid particles, the total phosphorus concentration increases during storm events [2]. Hence, the total phosphorus concentration increases along with the streamflow irrespective of the type of watershed. The larger slope 'b' indicates that the concentration has more reaction with the streamflow [33].

Model Development
In this study, Bayesian optimized RT, Ensemble, and GPR models were run for 30 iterations during the model development, and the convergence of objective function in the iterative process is illustrated in Figure 1. For the Cuyahoga watershed, GPR converged to the minimum objective of 0.3355 after 13th iteration. Hence, BO is more effective in improving GPR compared to RT and Ensemble for the Cuyahoga watershed. Likewise, BO is more effective in improving GPR compared to RT and Ensemble for Maumee (4.9569), Raisin (2.0403), and Sandusky (5.4210). On the contrary, BO is more effective in improving ensemble compared to RT and GPR for the Grand watershed with the objective of 0.0788. The result of Bayesian optimized hyperparameters in RT, ensemble, and GPR models for nitrate prediction are shown in Table 1. On the contrary, for the forested Grand watershed, all the ML models showed mediocre performance with ANN ( 2 = 0.31 and RMSE = 0.2800 mg/L) being the best performing model regarding the 2 and RMSE. For the agriculture dominated watersheds (Maumee, Raisin and Sandusky), F-SVM, M-SVM, ANN, RT-BO, ensemble-BO, and GPR-BO again showed acceptable performance, which was lower than that of the Cuyahoga but better than the Grand. For the agricultural watersheds of Maumee, Raisin and Sandusky, ANN was the best performing model followed by GPR-BO regarding the 2 and RMSE. Among the agricultural watersheds, the model predictability was comparatively higher for the Raisin. It was also observed that all ML models significantly outperforms traditional LR model in predicting nitrate concentration for the training dataset.
A comparison of statistical indicators showed that 2 , RMSE, and MAE, at times, followed a different trend. For the urban Cuyahoga watershed, all statistical indicators followed the same trend as the GPR-BO has the maximum 2 and the minimum RMSE and MAE. For the agricultural and forested watersheds, ANN has the maximum 2 and the minimum RMSE whereas F-SVM has the minimum MAE. Also, reasonable difference between the RMSE and MAE indicates that the variance in the individual errors of the   concentration. Among these models, GPR-BO performs better than other ML models with R 2 of 0.76, RMSE of 0.5792 mg/L, and MAE of 0.4168 mg/L for the training set. Likewise, ensemble-BO ranks second with R 2 of 0.75 and RMSE of 0.5846 mg/L. On the contrary, for the forested Grand watershed, all the ML models showed mediocre performance with ANN (R 2 = 0.31 and RMSE = 0.2800 mg/L) being the best performing model regarding the R 2 and RMSE. For the agriculture dominated watersheds (Maumee, Raisin and Sandusky), F-SVM, M-SVM, ANN, RT-BO, ensemble-BO, and GPR-BO again showed acceptable performance, which was lower than that of the Cuyahoga but better than the Grand. For the agricultural watersheds of Maumee, Raisin and Sandusky, ANN was the best performing model followed by GPR-BO regarding the R 2 and RMSE. Among the agricultural watersheds, the model predictability was comparatively higher for the Raisin. It was also observed that all ML models significantly outperforms traditional LR model in predicting nitrate concentration for the training dataset.
A comparison of statistical indicators showed that R 2 , RMSE, and MAE, at times, followed a different trend. For the urban Cuyahoga watershed, all statistical indicators followed the same trend as the GPR-BO has the maximum R 2 and the minimum RMSE and MAE. For the agricultural and forested watersheds, ANN has the maximum R 2 and the minimum RMSE whereas F-SVM has the minimum MAE. Also, reasonable difference between the RMSE and MAE indicates that the variance in the individual errors of the forecast is acceptable.

Model Testing
Model testing was carried out on 30% of the unseen test dataset after the model development. Table 3 illustrates the fitting statistics of different ML models for nitrate concentration prediction on the test dataset. Figures 2-6 show the observed versus predicted daily nitrate concentration using different ML models for the Cuyahoga, Grand, Maumee, Raisin, and Sandusky, respectively. fairly accurate modeling of nitrate concentration with the streamflow and month of the year as independent variables [34].
On the contrary, for the forested Grand watershed, all the ML models showed mediocre performance with RT-BO being the best performing model R 2 of 0.214, RMSE of 0.3236 mg/L, and MAE of 0.2263 mg/L. For the agriculture-dominated watersheds of Maumee, Raisin and Sandusky, F-SVM, M-SVM, ANN, RT-BO, ensemble-BO, and GPR-BO again showed acceptable performance, which is lower than that of the Cuyahoga but better than the Grand. For the agricultural watersheds of Maumee and Raisin, ANN is the best performing model regarding the R 2 and M-SVM is the best performing model regarding RMSE and MAE. Likewise, for the agricultural Sandusky watershed, F-SVM was the best performing model with R 2 of 0.544, RMSE of 1.9639 mg/L, and MAE of 1.4242 mg/L.

Model Development
During the model development for the total phosphorus prediction, Bayesian opti- The value for test statistics is similar for the training as well as the test set. Hence, the developed ML models can predict nitrate concentration without severely underfitting or overfitting the training dataset employing streamflow and month of the year as independent variables.

Model Development
During the model development for the total phosphorus prediction, Bayesian optimized RT, ensemble, and GPR models were also run for 30 iterations. The convergence of objective function in the iterative process is illustrated in Figure 7. For the Cuyahoga watershed, ensemble converged to the minimum objective of 0.01485 after 29th iteration. Hence, BO is more effective in improving ensemble compared to RT and GPR for the Cuyahoga watershed. Likewise, BO is more effective in improving ensemble compared to RT and GPR for Raisin (0.00559) and Sandusky (0.00365). On the contrary, BO is more effective in improving GPR compared to RT and Ensemble for Grand (0.00165) and Maumee (0.00362). The result of Bayesian optimized hyperparameters in RT, ensemble, and GPR models for phosphorus prediction is illustrated in Table 4.     Table 5 illustrates the fitting statistics of different ML models for phosphorus prediction on the training dataset. A comparison of statistical indicators shows that R 2 , RMSE, and MAE, may follow a different trend. For the urban Cuyahoga watershed, M-SVM, RF, ANN, RT-BO, ensemble-BO, and GPR-BO showed acceptable performance in explaining the variance of phosphorus concentration. Among these models, ensemble-BO performs better than other ML models with R 2 of 0.58 and RMSE of 0.1219 mg/L for the training set. Likewise, GPR-BO ranks second with R 2 of 0.56 and RMSE of 0.1237 mg/L. Regarding MAE, M-SVM is the best performing model with MAE of 0.0740 mg/L. Likewise, for the forested Grand watershed, LR, kNN, RF, ANN, RT-BO, ensemble-BO, and GPR-BO performed well in explaining the variance of phosphorus concentration with total suspended solids, streamflow, and month of the year as independent variables. Among these models, GPR-BO performed better than other ML models with The model predictability of ML models was significantly high in predicting phosphorus concentration for the forested and agricultural watersheds. It was also observed that the model predictability of traditional LR model was comparable to that of the ML models in predicting the phosphorus concentration for the training dataset.

Model Testing
Model testing is carried out on the 30% of the unseen test dataset after the model development. Table 6 illustrates the fitting statistics of different ML models for phosphorus prediction on the test dataset. Figures 8-12 show the observed versus predicted daily phosphorus concentration using different ML models for the Cuyahoga, Grand, Maumee, Raisin, and Sandusky, respectively.
For the urban Cuyahoga watershed, LR, M-SVM, RF, ANN, RT-BO, Ensemble-BO, and GPR-BO performed reasonably well in predicting the daily phosphorus concentration for the test data. Among these models, ANN outperformed other models regarding R 2 (0.829) and M-SVM outperforms other models with RMSE of 0.0766 mg/L and MAE of 0.0511 mg/L for the test set. The developed ML models performed exceptionally well in test data in comparison to the training data. Hence, it could be concluded that the ML models for the Cuyahoga was underfitting the training dataset.        Suspended solids are derived from nonpoint sources from agricultural lands and construction sites in the urban areas. Most of the total phosphorus is particulate phosphorus attached to suspended solid particles. In agricultural and forested watershed, increased streamflow can increase soil erosion which can elevate the particulate phosphorus concentration in water column. Hence, in predicting phosphorus concentration, the model predictability is better for agricultural and forested watersheds. On the contrary, in urban watershed, phosphorus inputs mainly originate from point sources such as wastewater treatment plants. Hence, in predicting phosphorus concentration, the model predictability is a bit deteriorated.

Discussion
Various ML models, namely, MLP, radial basis function (RBF), general regression neural network (GRNN), kNN, RF, MLR, evolutionary polynomial regression (EPR), naïve Bayes model (NBM) and many more have been employed to predict groundwater as well as surface water nutrient concentration. In one such study, Al-Mahallawi et al. [6] found that MLP (R 2 = 0.955 and error = 8.4322) with six input nodes and 4 hidden nodes outperformed RBF, GRNN, and other linear networks to predict groundwater nitrate concentration in the Gaza Strip Aquifer. In another study to model groundwater nitrate contamination at the African continent scale, Ouedraogo et al. [8] concluded that the predictive power of RF (R 2 = 0.97) was more than the MLR (R 2 = 0.64).
Furthermore, Markus et al. [35] compared the performance of ANN, EPR, and NBM to predict weekly fluctuations of nitrate concentration in a small agricultural watershed in Illinois. They found that the ANN (RMSE = 0.935) with two hidden nodes was the most accurate. In a more recent study, Li et al. [3] analyzed the performance of MLP, kNN, RF, and reduced error pruning tree (REPTree) to predict nitrate concentration and estimate nutrient loading in different types of watersheds. They concluded that the REPTree was the best performing model with R 2 ranging from 0.61 to 0.85. The classification tree methods (REPTree and RF) performed better than the cluster methods (MLP and kNN) for agricultural and forested watersheds. Shen et al. [36] presented a novel geo-dataset to estimate and map the nitrate and phosphorus concentrations in streams and rivers with models built using a RF. The developed model had R 2 of 0.66 on average.
The model performance is not only determined by the model complexity but also by the land-use practices in the watershed. In comparison to the published literature, the developed ML models are more accurate in predicting nitrate concentration for the urban and agricultural watershed. The ANN is the best performing model with R 2 ranging from 0.479 to 0.745. Likewise, in comparison to the published literature, the developed ML models are more accurate in predicting phosphorus concentration for all type of watersheds. The ensemble-BO is the best performing model with R 2 ranging from 0.709 to 0.878. As a limited number of independent variables are employed in the study, these methods can be applied to predict nutrient concentrations with limited data, which increases the applicability of the developed models. Hence, the ML model must be selected considering the land-use practice alongside algorithmic methods to accurately predict nutrient concentration.

Conclusions
In this study, the performance of nine different ML algorithms (LR, F-SVM, M-SVM, kNN, RF, ANN, RT-BO, ensemble-BO, and GPR-BO) was evaluated to predict nitrate and total phosphorus concentration for different types of watersheds. Initially, the C-Q relationship was analyzed for each watershed to understand the impact of watershed type on the prediction of nutrient concentration. While the nitrate concentration diluted with the increasing streamflow in the urban Cuyahoga watershed, it increased with the streamflow in agricultural and forested watersheds. Similarly, the total phosphorus concentration increased with the streamflow irrespective of the type of watershed.
For nitrate concentration prediction, the land-use distribution affected the model performance for all methods. In urban watersheds, the regular and predictable nature of nitrate concentration results in more accurate modeling with the streamflow and month of the year as independent variables. Likewise, ML models were more accurate in predicting nitrate concentration for the agricultural watershed (Maumee, Raisin, and Sandusky) in comparison to the forested Grand watershed. The ANN outperformed other ML models regarding the R 2 for the urban and agricultural watersheds. On the contrary, for the forested Grand watershed, RT-BO outperformed other ML models. Likewise, the Bayesian optimized RT, ensemble, and GPR consistently yielded good performance for all type of watersheds.
In agricultural and forested watersheds, increased streamflow could increase soil erosion which could elevate the particulate phosphorus concentration in the water column. Hence, in predicting phosphorus concentration, the model predictability was better for agricultural and forested watersheds with the streamflow, total suspended solids, and month of the year as independent variables. On the contrary, in an urban watershed, phosphorus inputs mainly originate from point sources such as wastewater treatment plants. Hence, in predicting phosphorus concentration, the model predictability was a bit deteriorated. For the urban Cuyahoga watershed, the developed ML models were underfitting the training dataset and the ANN appeared to outperform other ML models regarding the R 2 for the test data. On the contrary, ensemble-BO and M-SVM outperformed other ML models in predicting total phosphorus concentration for the agricultural and forested watershed.
In comparison to the published literature, the developed ML models were more accurate in predicting nitrate concentration for the urban and agricultural watershed. The ANN was the best performing model with R 2 ranging from 0.479 to 0.745. Likewise, in comparison to the published literature, the developed ML models were more accurate in predicting phosphorus concentration for all types of watersheds. The ensemble-BO was the best performing model with R 2 ranging from 0.709 to 0.878. As a limited number of independent variables were employed in the study, these methods could be applied to predict nutrient concentrations with limited data, which increased the applicability of the developed models. Regarding the shortcoming of the developed ML models, the model predictability for nitrate concentration in the forested Grand watershed was greatly diminished. Hence, further study is required in this regard to identify additional independent variables to improve the model predictability of ML algorithms.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/w13213096/s1, Figure S1: Locations of the Cuyahoga, Grand, Maumee, Raisin, and Sandusky watersheds and gaging stations (Source: Earthstar Geographics), Figure S2: C-Q relation for each watershed. Slope of the C-Q relationship 'b' is given alongside the figure. Both axes are in logarithmic scale, Table S1: Characteristics of studied watersheds upstream of the USGS gaging station (streamflow, total suspended solids, nitrate concentration, and total phosphorus concentration data provided by NCWQR at Heidelberg University, Ohio), Table S2: Descriptive statistics of parameters for all watersheds, Table S3: Parameter settings for each ML algorithm to predict Nitrate and Phosphorus concentration, Table S4