Classiﬁcation and Prediction of Fecal Coliform in Stream Waters Using Decision Trees (DTs) for Upper Green River Watershed, Kentucky, USA

: The classiﬁcation of stream waters using parameters such as fecal coliforms into the classes of body contact and recreation, ﬁshing and boating, domestic utilization, and danger itself is a signiﬁcant practical problem of water quality prediction worldwide. Various statistical and causal approaches are used routinely to solve the problem from a causal modeling perspective. However, a transparent process in the form of Decision Trees is used to shed more light on the structure of input variables such as climate and land use in predicting the stream water quality in the current paper. The Decision Tree algorithms such as classiﬁcation and regression tree (CART), iterative dichotomiser (ID3), random forest (RF), and ensemble methods such as bagging and boosting are applied to predict and classify the unknown stream water quality behavior from the input variables. The variants of bagging and boosting have also been looked at for more effective modeling results. Although the Random Forest, Gradient Boosting, and Extremely Randomized Tree models have been found to yield consistent classiﬁcation results, DTs with Adaptive Boosting and Bagging gave the best testing accuracies out of all the attempted modeling approaches for the classiﬁcation of Fecal Coliforms in the Upper Green River watershed, Kentucky, USA. Separately, a discussion of the Decision Support System (DSS) that uses Decision Tree Classiﬁer (DTC) is provided. bagging


Introduction
Water is a life-sustaining element that determines the establishment and survival of civilizations. The availability of innocuous and valuable quality water for household and industrial activities is essential for economic prosperity. Organizations worldwide, such as the World Health Organization (WHO), have specified quality standards for each natural source of this vital element. In the United States of America, the United States Environmental Protection Agency (USEPA) establishes standards for water quality and undertakes quality control measures. Due to the sporadic nature of rainfall globally, several countries are dependent on their rivers to be a primary source of water. Massive dams are constructed, and the water held is used for variegated purposes, including irrigation, electricity generation, domestic activities, etc. Over the past decade, industrial and human fecal waste deposition into the rivers, owing to burgeoning urbanization, had substantially exacerbated water contamination levels. Human interferences in nature have caused imbalances in nature, which, if not regulated, are deleterious and threaten humankind's very existence. Therefore, it is essential to monitor the damages that we have caused.
Water quality assessment is an integral part of environmental engineering. It includes the evaluation of the chemical, biological, and physical characteristics of water. Factors Forest (RF), Multivariate Adaptive Regression Splines (MARS), Extreme Gradient Boosting (XGB) regression models in conjunction with hyperspectral data to estimate water quality parameters for inland waters. Jerves Cobo [11] used Decision Tree models for assessment of microbial pollution in rivers by studying the presence of macroinvertebrates as indicators. This was needed to set up the pathogen pollution standards and to review the aquatic ecosystem health. Geetha Jenifel and Jemila Rose [12] have used recursive partitioning with decision trees and regression trees to predict water quality parameters and have found better results than the other models such as linear and support vector machine (SVM) models. They have found the decision tree models to be more accurate, practical, reasonable, and acceptable. Ho [13] used the ID3 decision tree model to predict the Water Quality Index (WQI) class for one of Malaysia's most polluted rivers. Sepahvand [14] compared the performances of the M5P model tree and its bagging, Random Forest (RF), and group method for data handling (GMDH) in the estimation of sodium absorption ratio (SAR). Out of all the models they have tried out, bagging M5P model was found to be the most accurate in estimating SAR. This was based on the indices such as correlation coefficient (CC), root mean square error (RMSE), and mean absolute error (MAE). The uncertainty analysis also revealed the accuracy of bagging M5P model compared to other models. Lu and Ma [15] used various hybrid Decision Tree methods and ensemble techniques such as Random Forest (RF), CEEMDAN, XGBoost, LSSVM, RBFNN, LSTM, etc and their combinations for the prediction of water quality parameters of the Tualatin river of Oregon, USA. Shin [16] predicted cholorophyll-a concentrations in the Nakdong River, Korea using various machine learning (ML) models and found the best results using recurrent neural networks (RNNs). The RNNs performed best when time-lagged memory terms are built into the model for predictions. Mosavi [17] compared the performance of two ensemble decision tree models-boosted regression trees (BRT) and random forest (RF) to predict hardness of groundwater quality. More recently, Naloufi [18] used six machine learning (ML) models, including Decision Trees, to predict E. Coli concentrations in Marne River in France. They found the Random Forest model to be the most accurate compared to other models. Using the results, they were able to come up with the best ML model for sampling optimization. The other recent discussions and applications of Decision Trees in the context of water quality modeling in rivers include that of studies [19][20][21][22][23][24][25][26][27].
In light of the above literature, the objectives of the current study are: (i) to study the potential and applicability of various Decision Tree (DTs) algorithms in prediction of Fecal Coliform from causal parameters such as climate (precipitation, and temperature), and land use parameters, (ii) to apply CART, ID3, RF and ensemble methods such as bagging and boosting specifically, and (iii) to suggest a Decision Support System (DSS) based on Decision Tree Classifier (DTC) for water quality management. The paper has been organized as follows. Section 2 describes the study area and the data considered for the present work. Section 3 details the methodology including GIS land use analysis, briefing of Decision Trees for classification, measures of attribute selection and their relation with the Decision Trees, and ensemble methods of bagging and boosting. The results are provided and discussed in the next Section 4. In the same section, a detailed discussion of Decision Tree Classifier based Decision Support System (DTCDSS) is provided. In the last section, i.e., Section 5, a concise summary of findings is provided.

Study Area
The watershed under study is situated in the lower southwestern part of Kentucky in the USA. The watershed is called the Upper Green River watershed as the main stem of Green River, and its tributaries flow through it. The watershed consists of Russell, Adair, Taylor, Metcalfe, Green, Barren, Hart, and Edmonson counties. Although the watershed contains various land uses such as deciduous, evergreen, mixed forest, transitional, low and high intensity residential, commercial, industrial/transportation, open water, pasture or hay, emergent herbaceous wetlands, row crops, woody wetlands, they can be primarily grouped into three types of land using: urban, forest, and agricultural. The Green River flows through Edmonson, Hart, Green, and Taylor counties and joins the Ohio river downstream of the watershed. Extensive underground karst formations and springs exist between the tributaries of the main stem Green River. The watershed primarily consists of flat-lying limestones, sandstones, and shales forming the karst topography that passes all the surface water through caves and smaller underground passages below the ground surface [28]. The watershed is rated as the fourth most crucial watershed in the United States by the Nature Conservancy and the Natural Heritage Program. It is also the most critical watershed in Kentucky to protect fish and mussel species.
In this watershed, many rural households are not connected to wastewater treatment plants, and the untreated wastewater is directly discharged to streams and creeks, onto overland plains and soils, or into empty spaces of underground. This form of release is known as "straight pipe" discharge. Due to such discharges and failed septic systems, increased fecal coliform bacteria concentrations are found in various portions of the watershed and have essentially impacted the water quality of the Upper Green River basin [28]. The increased concentration of bacteria is so high that the streams are unsafe for fishing, swimming, and body contact. The storage and carefully timed application of animal manure as a fertilizer has been shown to reduce bacteria entering ground water and reduce the need for expensive chemical fertilizers.

Data
The minimum and maximum elevation levels of the watershed are at 123.14 m and 497.74 m. The minimum and maximum temperatures in a year are around 10.3 • C and 28.9 • C. The average annual precipitation is found to vary between 1041 mm to 1346 mm. The stream water quality sampling stations are located between the latitudes 36.94 • and 37.43 • and longitudes −86.04 • and −85.16 • . The water quality data is obtained by measuring samples collected from nearly 42 locations along the Green river monthly from May 2002 to October 2002. The stream network with sampling stations is shown in Figure 1, and the land use map of the watershed is shown in Figure 2. The climate parameters precipitation and temperature are obtained from Kentucky Climate Center. The two-day cumulative precipitation at each sampling location is computed by inverse distance weighted average procedure. The two-day cumulative precipitation, temperature at each sampling station are used along with the land use as inputs into the Decision Tree Classifier model. The six-month period includes a few rainy months and a few non-rainy months.

GIS Landuse Analysis
The land use factors developed using GIS analysis [29] essentially indicate the percentage contribution of each land use to the total catchment area or watershed area. The three dominant land use factors are ULUF (urban land use factor), FLUF (forest land use factor), and ALUF (agricultural land use factor). They are given by Moreover, the above definitions have several advantages as compared to the land uses by simple fractions. The arcsine transformation helps in making the land use distribution near-normal or Gaussian. This transformation gives the values in radians proportional to the angle subtended at the center on a pie chart of land uses [25]. The transformation is known to stabilize the variance and scales the proportional data [30][31][32][33].

Decision Trees
The decision tree algorithm falls in the category of supervised learning methods [34]. The goal of a decision tree algorithm is to use the appropriate property of the input data (Information Gain, Gain Ratio, or Gini Index) depending on the type of decision tree selected with learning rules to cause splits in the input parameters and give output values as close to the target as possible. Various decision tree models have been devised, which differ in their data splitting methods. Each subsequent model is an improvement over the

GIS Landuse Analysis
The land use factors developed using GIS analysis [29] essentially indicate the percentage contribution of each land use to the total catchment area or watershed area. The three dominant land use factors are ULUF (urban land use factor), FLUF (forest land use factor), and ALUF (agricultural land use factor). They are given by Moreover, the above definitions have several advantages as compared to the land uses by simple fractions. The arcsine transformation helps in making the land use distribution near-normal or Gaussian. This transformation gives the values in radians proportional to the angle subtended at the center on a pie chart of land uses [25]. The transformation is known to stabilize the variance and scales the proportional data [30][31][32][33].

Decision Trees
The decision tree algorithm falls in the category of supervised learning methods [34]. The goal of a decision tree algorithm is to use the appropriate property of the input data (Information Gain, Gain Ratio, or Gini Index) depending on the type of decision tree selected with learning rules to cause splits in the input parameters and give output values as close to the target as possible. Various decision tree models have been devised, which differ in their data splitting methods. Each subsequent model is an improvement over the previous one. A decision tree classifies the input data based on these rules in a top-down fashion. The top node that is assigned all the data is called a root node. The leaf node contains data points that belong to one class or have one specific value, which demarcates the end of a decision tree branch. Further division of the data according to the rules, as mentioned earlier will not be possible. Suppose a node contains heterogeneous data belonging to two or more classes or has two different output values. In that case, that node can be further split to classify data in their respective categories. Such nodes that can be further split are called decision nodes. It is the decision node that takes decisions and propagates the tree to give relevant outputs.
Tree-structured regression methods form a hierarchical structure with the input data being split at the nodes creating smaller subsets till maximum homogeneity is obtained. For example, Y can first be split into {Y | y1 < 35} and {Y | y1 ≥ 35} engender two nodes with greater homogeneity than the first node. These two nodes can further be split into {Y | y 4 ≥ 27} and {Y | y 4 < 27} as shown in Figure 3.
Water 2021, 13, x FOR PEER REVIEW 7 of 23 previous one. A decision tree classifies the input data based on these rules in a top-down fashion. The top node that is assigned all the data is called a root node. The leaf node contains data points that belong to one class or have one specific value, which demarcates the end of a decision tree branch. Further division of the data according to the rules, as mentioned earlier will not be possible. Suppose a node contains heterogeneous data belonging to two or more classes or has two different output values. In that case, that node can be further split to classify data in their respective categories. Such nodes that can be further split are called decision nodes. It is the decision node that takes decisions and propagates the tree to give relevant outputs. Tree-structured regression methods form a hierarchical structure with the input data being split at the nodes creating smaller subsets till maximum homogeneity is obtained. For example, Y can first be split into | 1 35 and | 1 35 engender two nodes with greater homogeneity than the first node. These two nodes can further be split into {Y | y4 ≥ 27} and {Y | y4 < 27} as shown in Figure 3. Consequently, similar splitting at each node will result in a tree structure with multiple branches. Selecting a particular attribute and the value of that particular attribute that determines the splitting of data is the basis for nuances in different decision tree models. Classification and Regression Trees (CART), CHAID, ID3, C4.5, and Random Forest are a few to name. For training and testing purposes, the six-month data set covering forty-two locations spread over the stream network is randomly divided into 70% for training and 30% for testing using the scikitlearn library of python programming language. The same training set and testing set are used to examine their classification accuracy for all the decision tree models.

Entropy
Entropy indicates the degree of randomness in a given input set. A branch with 0 entropy is chosen to be the leaf node. If the entropy is not equal to zero, the branch is further split. The Entropy, E(S), measured in "bits" is given by Consequently, similar splitting at each node will result in a tree structure with multiple branches. Selecting a particular attribute and the value of that particular attribute that determines the splitting of data is the basis for nuances in different decision tree models. Classification and Regression Trees (CART), CHAID, ID3, C4.5, and Random Forest are a few to name. For training and testing purposes, the six-month data set covering forty-two locations spread over the stream network is randomly divided into 70% for training and 30% for testing using the scikitlearn library of python programming language. The same training set and testing set are used to examine their classification accuracy for all the decision tree models.

Entropy
Entropy indicates the degree of randomness in a given input set. A branch with 0 entropy is chosen to be the leaf node. If the entropy is not equal to zero, the branch is further split. The Entropy, E(S), measured in "bits" is given by where p i is the percentage of class i in the node or the probability, and index i runs from 1 to K number of classes or attributes. The process of splitting an attribute is continued until the entropy of resulting subsets is less than the previous input or training set, eventually leading to leaf nodes of zero entropy. Minimization of entropy is desired as it reduces the number of rules of the Decision Trees. The lowering of entropy essentially leads to Decision Trees with fewer branches. The entropy is defined as an information-theoretic measure of the 'uncertainty' present in the dataset due to multiple classes [35].

Information Gain
The information gain of an input attribute gives its relation with the target output. A higher information gain suggests that the parameter can separate the training data following the target output to a greater extent. Information gain is inversely proportional to the entropy. The higher the randomness in a set of inputs, the lower will be the information gain. The ID3 model [36] uses Information Gain as the method for classification.
where the index j runs from 1 to K possible classes. Maximizing the information gain essentially leads to minimization of entropy for that particular attribute. In the above Equation (5), the first term on the right-hand side is fixed or it is the entropy at the beginning. The attribute is selected for splitting first for which we obtain the minimal second term on the right-hand side resulting in maximization of information gain for that particular attribute.

Gini Index
It is obtained by the sum of squares of individual probabilities of each class from one. A Higher Gini index value indicates higher homogeneity. The CART algorithm uses the Gini Index to create splits in data [37]. The equation gives Gini index at a node where p i is the percentage of class i in the node, and the index i runs from 1 to K number of classes. It measures the "impurity" of a dataset. It takes a minimal value of zero to a maximal value of (1-1/K). In the attribute selection process of Decision Tree modeling, that particular attribute is selected for which there is a largest reduction in the value of Gini index. It turns out the reduction of Gini index essentially is accompanied by lowering of entropy.

Gain Ratio
C4.5, an improvised version of ID3, uses gain ratio to create splits in the input data. The Gain ratio removes the bias that exists in calculating the information gain of an input parameter. Information gain prefers the parameter with a large number of input values. To neutralize this, the Gain Ratio divides the Information Gain by the number of branches that would result from the split.
where the index j runs from 1 to K possible number of nonempty classes, and wj is the percentage of class i in the node or the probability. The lower the Split Information, the higher the value of Gain Ratio. The Information Gain is essentially modified by the diversity and distribution of attribute values into the quantity known as Gain Ratio.

Bagging and Boosting
Bagging, or Bootstrap aggregation, proposed by [38], is a technique used with regression methods to decrease the variance and improve prediction accuracy. It is a simple technique where several bootstrap samples are drawn from the input data, and prediction is made following the same prediction method for each sample. The results are merged by averaging (regression) or simple voting (classification) that adumbrate the input data results subjected to the same prediction method as the bootstrap samples but with reduced variance. All the bootstrap samples have the same size as the original data. The sampling is done with replacement, because of which, a few instances/samples are repeated, and a few are omitted. The stability of base classifier of each bootstrap sample essentially determines the performance of bagging. Since all the samples are equally likely to get aggregated, bagging does not suffer from issues of overfitting and works well with noisy data. Thus, the focus on a specific sample of training data is removed.
Boosting, similar to bagging, is a sample-based approach to improve classification and regression models' accuracy; however, unlike bagging, which uses a direct averaging of individual sample results, boosting uses a weighted average method to reduce the overall prediction variance. All the samples are initialized with equal weights, then the weights are updated with each boosting classification round. The weights of samples that are harder to classify are increased, and the weights are decreased for the samples that are correctly classified. This ensures the boosting algorithm to focus on samples that are harder to classify with increase in iterations. All the base classifiers of each boosting round are aggregated to obtain the final ensemble boosting classification. The fundamentals of bagging and boosting could be found in [39].

Overview
Results obtained using decision trees are discussed in this section. A preliminary statistical analysis is first performed to study the statistics of the experimental data. The results from the initial statistical analysis are given in Table 1. The correlation structure of all the input water sample parameters and the fecal coliform is given by the correlation heatmap shown in Figure 4.
The agriculture land-use factor (ALUF or a) is highly and negatively correlated with the forest land use factor (FLUF or f ). FLUF also has a strong negative correlation with urban land use factor (ULUF or u). An exciting inference from the correlation heatmap is the heavy positive correlation between precipitation and fecal coliform (please see Figure 4). A few scatter plots that reveal the relation between the input variables and the fecal coliform individually essentially display the results of the correlation map, i.e., Figure 4, and are not reproduced here. This makes precipitation the most significant variable in determining our output using the decision tree method, and this is evident from the CART and ID3 diagrams given a little later. The precipitation values are obtained at each sampling locations by interpolating the two-day cumulative precipitations at all the gauges in the watershed. The rainfall causes the surface water flow over the watershed's overland planes consisting of land uses (predominantly urban, forest, and agricultural) and eventually joining the tributaries and mainstem of the Green River. The various parts of the overland planes of the watershed contribute as surface water flows, which enters tributaries first and then the main stem of Green River before reaching the watershed outlet tip. Based on watershed characteristics and time of concentration studies, the two-day cumulative and interpolated precipitation values are most suitable drivers of fecal coliform concentrations than other precipitation measures at all the sampling sites [29]. The positive correlation of microbial indicators such as fecal coliform bacteria and precipitation/rainfall/wet weather conditions is in agreement with the other studies such as that of [40][41][42][43][44] for rivers and bays and [45,46] for lakes. The DTs are best suited for large datasets; however, an attempt is made in the current study because of the multi-dimensional feature space (five independent variables) for monthly instances of data of forty-two locations collected for the six-month period. The DTs are expected to work better for shorter data sets and fewer features as the intrinsic data complexities are reduced. In the present scenario, data limitations on the time frame are offset by the multi-dimensional feature space for varied spatial locations. The reduction of input dimension has been looked into for the same dataset using principal component analysis (PCA), canonical correlation analysis (CCA), and artificial neural networks (ANNs) elsewhere [47]. The authors have found comprehensive predictions using all the spatial parameters such as land uses, and temporal climate parameters. The histogram of fecal coliform is given in Figure 5. The variability of fecal coliform is large as is evident from the high standard deviation and the histogram. The agriculture land-use factor (ALUF or a) is highly and negatively correlated with the forest land use factor (FLUF or f). FLUF also has a strong negative correlation with urban land use factor (ULUF or u). An exciting inference from the correlation heatmap is the heavy positive correlation between precipitation and fecal coliform (please see Figure  4). A few scatter plots that reveal the relation between the input variables and the fecal coliform individually essentially display the results of the correlation map, i.e., Figure 4, and are not reproduced here. This makes precipitation the most significant variable in determining our output using the decision tree method, and this is evident from the CART and ID3 diagrams given a little later. The precipitation values are obtained at each sampling locations by interpolating the two-day cumulative precipitations at all the gauges in the watershed. The rainfall causes the surface water flow over the watershed's overland planes consisting of land uses (predominantly urban, forest, and agricultural) and eventually joining the tributaries and mainstem of the Green River. The various parts of the overland planes of the watershed contribute as surface water flows, which enters tributaries first and then the main stem of Green River before reaching the watershed outlet tip. Based on watershed characteristics and time of concentration studies, the two-day cumulative and interpolated precipitation values are most suitable drivers of fecal coliform concentrations than other precipitation measures at all the sampling sites [29]. The positive correlation of microbial indicators such as fecal coliform bacteria and precipitation/rainfall/wet weather conditions is in agreement with the other studies such as that of [40][41][42][43][44] for rivers and bays and [45,46] for lakes. The DTs are best suited for large datasets; however, an attempt is made in the current study because of the multi-dimensional fea-

Results from Decision Tree Models
The precipitation, temperature, land use data, and experimentally measured fecal coliform values are used to formulate decision tree models. Several different decision tree models are developed for fecal coliform analysis. All input parameters (precipitation or P, temperature or T, ULUF or u, FLUF or f, ALUF or a) are used to create decision tree

Results from Decision Tree Models
The precipitation, temperature, land use data, and experimentally measured fecal coliform values are used to formulate decision tree models. Several different decision tree models are developed for fecal coliform analysis. All input parameters (precipitation or P, temperature or T, ULUF or u, FLUF or f, ALUF or a) are used to create decision tree models with different data split methods. Precipitation was the single most crucial hydrological input parameter to determine fecal coliform. The correlation values obtained earlier also indicate similar results. 70% of the dataset is used as the training set, and the remaining 30% is used as the testing set. For CART, an accuracy of 63.05% was obtained in the training phase, and 60.29 was obtained in the testing phase. An accuracy of 62.22% was obtained on the entire data set. In CART decision tree modeling, the data is split based on the attribute with the lowest Gini index at the root node in a top-down process with the subsequent splitting of data of attributes of increasing Gini indices till we reach leaf nodes. In this way, the impurity or the uncertainty in the data is minimized with recursive portioning of the data [37]. Similar results were obtained for the ID3 model. Accuracies of 61.78%, 61.76%, and 61.77% were obtained in the training phase, testing phase, and the entire data set, respectively. In the ID3 model, the feature with maximum information gain or smallest entropy is used to split the data at the root node first and then the subsequent nodes till we reach leaf nodes. The least entropy corresponds to the features with the least uncertainty or randomness in the data [36]. Both DTs, CART, and ID3, belong to the family of Top-Down Induction Decision Trees (TDIDT). CART performs slightly better in training than ID3, and ID3 performs slightly better in testing than CART. However, the overall performance of CART is slightly better than ID3. CART and ID3 models were improved by augmenting the simple models with bagging and boosting methods. The highest test set accuracy was obtained for the CART model with adaptive boosting-the accuracy of 81.53%, 72.06%, and 78.67% was obtained in the training phase, testing phase, and entire dataset, respectively. The bagging and adaptive boosting of CART and ID3 perform much better than simple (without bagging and adaptive boosting) CART and ID3 models. Though bagging of ID3 results in largest training accuracy among simple and ensemble models of CART and ID3, the adaptive boosting of CART gives the largest testing accuracy among the same models. However, the overall accuracy of bagging of ID3 model is the highest among simple and ensemble models of CART and ID3 models. Apart from CART and ID3 models, Random Forest was also implemented on the experimental dataset to predict the fecal coliform density or concentration. The Random Forest model gives an accuracy of 98.7% on the training set, 64.7% on the testing set, and 88.4% on the overall dataset. The Random Forest model is built by creating an ensemble of a large number of decision trees for classification and then predicting the mode or average/mean of all the individual decision tree classification results. The more uncorrelated the individual decision trees are, the better the final prediction or outcome [48]. The individual trees or sub-samples are drawn randomly from the original tree with replacement. The trees are grown to the largest extent possible for classification without pruning of the trees. The features selected in sub-sample trees need to be useful for the effectiveness of the Random Forest model than being pure, random guessing features in classification. The Random forest model outperforms decision trees such as CART, ID3 but its testing accuracy is slightly lower than gradient trees, extremely randomized trees, and DTs with bagging and adaptive boosting. A few other models, such as extremely randomized trees, were also implemented, and the accuracy results of all models are summarized in Table 2 below: Where ID3-Bagg is ID3 with Bagging, and ID3-AB is ID3 with adaptive boosting. The extremely randomized trees (also known as "Extra Trees") give a fourth-best testing accuracy of 66.17% and a second-best overall accuracy of 88.89% of all the models. While the Random Forest model uses subsamples with replacement, the extremely randomized trees use the whole input sample. Also, while Random Forest opts for optimum split to select cut points, the extremely randomized trees go for random cut points. The extremely randomized trees are faster and have both features of reducing bias and variance due to usage of original input sample and random split points [49]. Gradient boosting model (GBM) gives third-best testing accuracy of 69.12% and best overall accuracy of 89.33% of all the models. In the Gradient Boosting model, a loss function such as mean square error is minimized with the help of gradient descent principle and an ensemble of weak learners to eventually make correct predictions and become a strong learner tree [50]. The GBM, ERT, and RF perform better than simple and ensemble models of CART, and ID3 in training, and overall accuracies; the ensemble models of CART, and ID3 are slightly better in testing accuracies. This could also be due to possible overtraining of the Decision Tree models in the case of GBM, ERT, and RF. Further, optimal cutting down of trees may result in higher Training, Testing, and Overall accuracies of GBM, ERT, and RF than simple and ensemble models of CART, and ID3. The accuracy of a decision tree model is given by the number of correct predictions made divided by the total number of predictions. Here, the prediction is the class to which the water sample belongs. These results are presented in Figure 6.
We have used various decision tree algorithms to classify data rather than regression on our current data set. For classification, the target variable, i.e., fecal coliform (FC), was divided into four classes following the United States Environmental Protection Agency (USEPA) recommendations (given in Table 3). by the number of correct predictions made divided by the total number of predictions.
Here, the prediction is the class to which the water sample belongs. These results are presented in Figure 6. We have used various decision tree algorithms to classify data rather than regression on our current data set. For classification, the target variable, i.e., fecal coliform (FC), was divided into four classes following the United States Environmental Protection Agency (USEPA) recommendations (given in Table 3). The classified results into four classes or categories, namely-body contact and recreation, fishing and boating, domestic utilization, dangerous for all decision tree algorithms, are presented in Figure 7. The number of samples in each class is also shown in the same Figure. From Figure 6, we can see that high values of precision, recall, and F1-score are obtained for Random Forest, CART with adaptive boosting, ID3 with adaptive boosting, and Extremely randomized trees for the supports shown for all of the four classes. The definitions of accuracy, precision, recall, F1-score are given by: = + The classified results into four classes or categories, namely-body contact and recreation, fishing and boating, domestic utilization, dangerous for all decision tree algorithms, are presented in Figure 7. The number of samples in each class is also shown in the same Figure. From Figure 6, we can see that high values of precision, recall, and F1-score are obtained for Random Forest, CART with adaptive boosting, ID3 with adaptive boosting, and Extremely randomized trees for the supports shown for all of the four classes. The definitions of accuracy, precision, recall, F1-score are given by: where TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative, and support is the number of occurrences of each class in ground truth (correct) target values. This means that the above four algorithms can correctly classify the positive samples from negative samples for each of the respective class, able to recall all of its positive samples and that both of these abilities are equally important in the classification. Then the best values of precision, recall, and F1-score are obtained for CART with bagging and ID3 with bagging algorithms. Simple CART and ID3 yielded lower precision, recall, and F1-score than the rest of the above algorithms discussed in their respective classifications for each support class. Although not presented here, accuracies were also highest for Extremely randomized trees and Random Forest algorithms, slightly lower for CART and ID3 with bagging and boosting, and lowest for simple CART and ID3 algorithms.

CART with Bagging and Adaptive Boosting
The CART algorithm uses Gini impurity to split data and form a binary classification tree. Implementation of the CART algorithm on our data set results in a tree with four levels (please see Figure 8).
Level 1 has the root node. Level 2 and level 3 contain the decision nodes, and level 4 has the leaf nodes with data split into classes. The tree's root node containing all 225 data points of our data set has been split based on the precipitation input variable (p) since p is the variable of the highest significance. The decision nodes at level 2 are split using the forest land use factor (f) and the temperature (t) input variables since these nodes represent points of highest Gini impurity. At level 3, we get our first leaf node of class "Dangerous" with eight samples falling in this category (p ≤ 0.308, f ≥ 0.763). The final level contains the leaf nodes that classify the data into the specified classes. The class with the highest number of samples is fishing (156 samples), followed by BCR (51 samples). The classes "Domestic" and "Dangerous" have 5 and 13 samples, respectively, giving us a total of 225 samples in our data set. The CART algorithm is a moderately accurate method to classify our data set, giving an accuracy of 63.05% on the training set and 62.22% on the entire data set. However, improving our model using bagging and boosting methods yield even higher accuracies. From Table 2, CART with adaptive boosting gives the best testing accuracy out of all the decision trees. The adaptive boosting method enables to combine several weak classifiers into a strong classifier through an iterative decision tree modeling. The weak classifiers are weighted highly and trained with a few low-weighted strong classifiers to produce a strong ensemble classifier at the end [51]. From Table 2, we can also see that CART with bagging yields second-best testing accuracy. Bagging simply means "bootstrap aggregating." The implementation of CART with bagging results in creating many random sub-samples with replacement and training CART model on each sample. Then the average prediction is made on all the samples [38]. We can see that the ensemble predictions of bagging and boosting of the CART model are better than the simple CART model results. Level 1 has the root node. Level 2 and level 3 contain the decision nodes, and level 4 has the leaf nodes with data split into classes. The tree's root node containing all 225 data points of our data set has been split based on the precipitation input variable (p) since p is the variable of the highest significance. The decision nodes at level 2 are split using the forest land use factor (f) and the temperature (t) input variables since these nodes repre-

ID3 with Bagging and Adaptive Boosting
Like the CART algorithm that uses Gini impurity to form splits in the data set, the ID3 decision tree utilizes the information gain and entropy. Implementation of the ID3 algorithm on our data set also yields a tree with four levels (please see Figure 9). strong classifiers to produce a strong ensemble classifier at the end [51]. From Table 2, we can also see that CART with bagging yields second-best testing accuracy. Bagging simply means "bootstrap aggregating." The implementation of CART with bagging results in creating many random sub-samples with replacement and training CART model on each sample. Then the average prediction is made on all the samples [38]. We can see that the ensemble predictions of bagging and boosting of the CART model are better than the simple CART model results.

ID3 with Bagging and Adaptive Boosting
Like the CART algorithm that uses Gini impurity to form splits in the data set, the ID3 decision tree utilizes the information gain and entropy. Implementation of the ID3 algorithm on our data set also yields a tree with four levels (please see Figure 9). Again, since "p" is the most significant variable, the root node is split using the "Precipitation" input variable. The decision nodes in level 3 and level 4 are split to maintain lower Information Gain and Entropy uncertainty. Level 4 of the ID3 decision tree has the dataset classified into four pre-defined classes. Similar to the CART algorithm results, the class with the highest number of samples is "Fishing" followed by "BCR," "Dangerous" and "Domestic." The accuracies obtained for the training data set and overall dataset are 61.78% and 61.77% respectively, which is slightly lower than that obtained by the CART algorithm. From Table 2, we can see that ID3 with adaptive boosting comparable results to that of CART with adaptive boosting. Adaboost is an iterative procedure with no replacement. It generates a strong ensemble classifier by putting high weights on the misclassifiers and low weights on the correctly classified trees to reduce bias and variance in the model. For this reason, it is called the "best out-of-the-box classifier" usually. The second-best testing accuracy is obtained using this method with ID3 in all attempted decision tree models. ID3 with bagging results can also be seen from Table 2, which creates many independent bootstrap aggregation models and associates weak learner with each model Again, since "p" is the most significant variable, the root node is split using the "Precipitation" input variable. The decision nodes in level 3 and level 4 are split to maintain lower Information Gain and Entropy uncertainty. Level 4 of the ID3 decision tree has the dataset classified into four pre-defined classes. Similar to the CART algorithm results, the class with the highest number of samples is "Fishing" followed by "BCR," "Dangerous" and "Domestic." The accuracies obtained for the training data set and overall dataset are 61.78% and 61.77% respectively, which is slightly lower than that obtained by the CART algorithm. From Table 2, we can see that ID3 with adaptive boosting comparable results to that of CART with adaptive boosting. Adaboost is an iterative procedure with no replacement. It generates a strong ensemble classifier by putting high weights on the mis-classifiers and low weights on the correctly classified trees to reduce bias and variance in the model. For this reason, it is called the "best out-of-the-box classifier" usually. The second-best testing accuracy is obtained using this method with ID3 in all attempted decision tree models. ID3 with bagging results can also be seen from Table 2, which creates many independent bootstrap aggregation models and associates weak learner with each model to finally aggregate them to produce average or ensemble prediction that has a lower variance. This method also yields the second best testing accuracy of all the decision tree models. We can see that the bagging and boosting ensemble methods improve the testing accuracy compared to simple ID3.

Decision Tree Classifier Based Decision Support System (DTCDSS)
A Decision Support System (DSS) can be built from the above results of decision tree models, as shown in Figure 10.
iance. This method also yields the second best testing accuracy of all the decision tree models. We can see that the bagging and boosting ensemble methods improve the testing accuracy compared to simple ID3.

Decision Tree Classifier based Decision Support System (DTCDSS)
A Decision Support System (DSS) can be built from the above results of decision tree models, as shown in Figure 10. The input parameters and the sample water quality observations come from the input databases, as shown in the decision tree classifier (DTC). The above-discussed decision tree models help classify the water quality data based on input parameters and respective decision tree algorithms. The results are shown in the output database in "water quality predictions/formulae/bounds". The outputs are put into and as per classes of US EPA water usage classification. From the output classifications, one could figure out the under/over reporting issues, and check for maximum contamination levels (MCLs). In this process, the US EPA data warehouse system will help compare the output water quality parameters with stipulated MCLs. The inner working of any of the above-discussed decision tree classifiers is shown in the flow chart Figure 11. The input parameters and the sample water quality observations come from the input databases, as shown in the decision tree classifier (DTC). The above-discussed decision tree models help classify the water quality data based on input parameters and respective decision tree algorithms. The results are shown in the output database in "water quality predictions/formulae/bounds". The outputs are put into and as per classes of US EPA water usage classification. From the output classifications, one could figure out the under/over reporting issues, and check for maximum contamination levels (MCLs). In this process, the US EPA data warehouse system will help compare the output water quality parameters with stipulated MCLs. The inner working of any of the above-discussed decision tree classifiers is shown in the flow chart Figure 11.
Firstly, the DTC classifier receives the input data from the input databases such as climate, land use data, and water quality data. The entropy or information gain, gain ratio, and Gini index are computed based on the particular model chosen of the decision tree in the DTC classifier. For example, if the decision tree in the DTC classifier is ID3, then entropy or information gain is computed. If the decision tree in the DTC classifier is CART, then the gain ratio is computed. Similarly, if the decision tree is C4.5, then the Gini index is computed. The input parameters and the output parameters of a data sample are presented at the root node first. Then the tree is split based on the decision of the "if-else" statement minimizing the heterogeneity of data or increasing the homogeneity. The tree branches keep increasing with the addition of new data samples at the root node, and slowly, the leaf nodes get formed. At every level node of the tree, the entropy or information gain, gain ratio, Gini index are computed so that the data could be split easily and the data get traversed to the leaf nodes. Thus, the DTC classifier helps us classify the data or make a decision into four classes of output, namely, body contact and recreation, fishing and boating, domestic utilization, and dangerous at the leaf nodes. The model performance can be computed using the metrics such as accuracy, precision, recall, and F1-score. The particular DT model in the classifier can be any one of the CART, ID3, C4.5 with bagging and boosting variants, Random Forest, and Extremely Randomized Trees. The output performance of the DSS can be specific to the DT model chosen and could also be data sensitive. The best DT model for the DSS can be fixed only by experimenting with above-stated decision tree models for the data of several watersheds consisting of stream networks of varied conditions. Suppose we replace the DTC classifier with Decsion Tree Regressor (DTR). In that case, we will be able to predict the output parameter such as fecal coliform, from the input parameters such as climate and land use. The output parameter predictions can also be further generalized or recasted using the regression formulae obtained of DTR and the output parameter bounds. The current Decision Support System is an improvised version of the DSS discussed in [29]. The Artificial Neural Network (ANN) model is replaced by DTC to suit the current problem of classification stream waters into four classes. The performances of DSS using DTR, the comparison of DTR and ANN models are out of the scope of the current work and are pursued elsewhere as separate research studies. Comparing output parameter predictions with respective MCLs within the DSS framework ensures the suitability of stream waters broadly into "safe" or "unsafe," thus making a helpful decision. Firstly, the DTC classifier receives the input data from the input databases such as climate, land use data, and water quality data. The entropy or information gain, gain ratio, and Gini index are computed based on the particular model chosen of the decision tree in the DTC classifier. For example, if the decision tree in the DTC classifier is ID3, then entropy or information gain is computed. If the decision tree in the DTC classifier is CART, Figure 11. Flow Chart of a Decision Tree Classifier Model.

Conclusions
The classification abilities of Decision Trees such as CART, ID3, RF, and ensemble methods such as bagging and boosting are utilized for the classification and prediction of Fecal Coliform (FC) into four classes in this study. The variable with maximum information gain and gain ratio in the case of ID3 model, and the variable with maximum Gini Index in the case of CART model are selected at the root node, and such criteria used further down the tree till the leaf nodes, using DT algorithms for best classification in terms of maximum accuracy. The algorithms perform comparably well with each other, Random Forests being the most consistent in the classification of Fecal Coliform for the Upper Green River watershed overall. It performs better than CART and ID3 in all the phases, i.e., training, testing, and overall. Gradient Boosting and Extremely Randomized Trees are the other DT algorithms that show comparable accuracies as that of Random Forest in training and testing phases. The CART decision tree with Adaptive Boosting yielded the best testing accuracy. In contrast, the CART with Bagging and ID3 with Bagging and Adaptive Boosting yielded the comparable second-best testing accuracies respectively out of all the decision tree modeling attempts. There is no proof of exactly the same feature or attribute will be chosen for each node of the resulting tree for various DT algorithms. There is also no guarantee that accuracy of classification will be higher for the proposed classifier as it needs to be tested for a variety of water quality parameters of different watersheds under climate changes. Also, being greedy at each step/node may not ensure overall minimization of entropy or global optimization of the classification process. In the present work, the authors have focused on only the classification capabilities of the Decision Trees for this particular watershed/dataset. The present work explores the classification capabilities in training and testing phases only. The size of the data was one limitation because of which, the authors could not go for cross-validation. However, the depth of the successful trees is essentially governed by maximizing the information gain or minimizing the entropy, i.e., randomness at every level. It is generally found that the shorter trees are prone to better classification capabilities than the more extended trees [52] (Mitchell, 1997). This is due to lesser overtraining of the trees, leading to more successful generalization or predictions. From the above discussion of results, the following salient conclusions can be made as follows: (i) The Decision Trees of Gradient Boosting (GB), Extremely Randomized Trees (ERT), and RF perform better than simple (without bagging and boosting) ID3, and CART models in training, testing, and overall. (ii) The bagging and adaptive boosting Decision Trees of CART, and ID3 significantly improve the performance over simple (without bagging and boosting) CART, and ID3 models. (iii) The performances of bagging and adaptive boosting Decision Trees of CART, and ID3 are slightly better than GB, ERT, and RF in testing. However, the training and overall accuracies of GB, ERT, and RF are better than all the models (including bagging and adaptive boosting) of CART and ID3. (iv) The Decision Tree models of GB, ERT, and RF are more consistent than other models in training, testing, and overall accuracies. (v) Overtraining the trees increases training accuracy at the expense of testing accuracy.
A judicious choice need to be made in cutting down the trees, so that an optimal performance of training, testing, and overall accuracies is obtained.