3.1. The Stream Water Quality and Temperature Simulated by SWAT
The SWAT simulated the stream water quality concentration (T-N, NH
4, NO
3, T-P, and PO
4) and water temperature (WT) of the 86 AEH sites to use them as input variables of Random Forest algorithm.
Figure 3 shows the annual changes (2008~2015) of precipitation, total runoff, water quality concentration and water temperature considering seasonal variation for the whole watershed. The eight years average concentrations of T-N, NH
4, and NO
3 were 1.78, 0.12, and 0.25 mg/L respectively. Nitrogen-related water quality was higher in spring than fall, because the paddy rice irrigation in South Korea begins from middle of May with fertilizers. The T-N and NO
3 concentrations were the highest at 2.52 and 0.84 mg/L in spring 2014 (
Figure 3b) due to decreased stream discharge affected by the severe drought (
Figure 3a). The average concentrations of T-P and PO
4 were 0.015 and 0.0092 mg/L, respectively. The difference of T-P concentrations between spring and fall were bigger in 2010 and 2011 than the other six years (
Figure 3c). As for the high T-P in fall 2010 and in the spring of 2011, this was due to the high rainfall compared with other years’ rainfalls and discharged to stream with particulate-dominant phosphorus attached to suspended solid from watershed runoff. The average water temperature was 17.43 °C in spring and 19.47 °C in fall with no significant difference (
Figure 3d).
As seen in
Figure 4 and
Figure 1d, the NH
4 showed high concentrations in rice paddy areas along the stream. The NO
3 showed high concentration in the urbanized areas of downstream watershed and the highland agricultural areas of upstream watershed. From the NH
4 and NO
3, the T-N showed more or less high concentrations as it goes to the downstream of the watershed. The T-P and PO
4 concentrations were high in the urbanized areas by the sewage discharges and the agricultural areas along the downstream of the watershed. The water temperature was higher in the southern areas of the watershed (
Figure 4).
3.2. Performance of Random Forest Classification Algorithm
The ratio for train data and test data was 7:3 in Random Forest application. For the training, the grade of AEH indices (FAI, TDI and BMI), water quality concentrations (T-N, T-P, NH4, NO3, and PO4) and water temperature during the AEH observed period were applied. For the test, the AEH indices grade were verified from the water quality concentrations and water temperature for the Random Forest performance.
Feature importance is calculated as the decrease in node impurity due to splits in each variable. The variable with the higher feature importance is more important as it contributes more to the reduction of node impurity.
Figure 5 shows the feature importance of input variables, where the prefix number represents month. During the observing periods, the NH
4 in spring and fall showed relatively higher importance than others. It meant that the NH
4 had more impact on FAI, TDI, and BMI classification. The T-P in April, and PO
4 in April and October were important with BMI. There was weak relationship with water temperature discussed by Woo [
48].
Table 2 is the classification report for P, R, F1 score, and S of AEH index grades. The precision (P), known as the positive predictive value, is the ratio of the number of entries belong to a class among output data that are expected to belong to the class. The recall (R) is called the sensitivity. It is calculated as the ratio of the number of entries that are expected to belong to a class among the entries of the class. The F1 score is defined as the weighted harmonic mean of P and R, where the best F1 score is 1 and the worst value is 0. The support (S) is the number of instances in each class [
49].
In spring, the average F1 score of FAI, TDI, and BMI was 0.42, 0.48, and 0.62 and in fall, it was 0.45, 0.40, and 0.58 respectively. The P, R, and F1 of BMI showed the highest values with the biggest S in both spring and fall. As the S was small, the P, R, and F1 value for the three AEH indices had tendency to decrease. Even the S was only eight for grade E in spring FAI, the P, R, and F1 showed high values of 0.50, 0.38, and 0.43 comparing with the grades B, C, and D in the condition of 50, 52, and 30 S. The reason is explained from the high relationship between water quality concentrations and AEH indices for grade E [
48]. We can infer that the performance of P, R, and F1 for grades B, C, and D can be improved with S greater than 30, as seen in
Table 2.
The confusion matrix is used to analyze and understand the misclassified grade analysis showing the number of each grade predicted by the random forest classifier and helps to compute the accuracy of overall and individual class label and to compare the predicted value and actual data set [
50]. The matrix diagonal indicates the correspondence between the predicted value and actual ecological status, and it is used to calculate the accuracy of the algorithm [
51]. The upper right part of the diagonal represents the number that is predicted to be lower than the real grade, and the lower left part shows the number that is predicted to be higher than actual value.
Table 3 shows the confusion matrices for each classification result of Random Forest algorithm for test set. For spring FAI, TDI, and BMI, the sum of lower grade than actual data among misclassified value (upper right part from the diagonal) was 43, 58, and 40 and the sum of higher predicted classes (lower left part from the diagonal) was 71, 84, and 55 respectively. Most misclassified grades were assigned to one rank of grade difference except TDI fall grade E to C (value of 22) and BMI spring grade C to E (value of 11). It showed that the Random Forest classifier model tended to evaluate stream environment as being in better ecological status than the actual with the sum of upper right part from the diagonal (43, 58, 40) was smaller than the sum of lower left part (71, 84, 59). For fall FAI, TDI, and BMI, the sum of lower grade than actual data among misclassified value was 37, 85, and 37 and the sum of higher predicted classes among mislabeling value was 70, 70, and 67 respectively. Both the spring and fall TDI showed the worst predictions of 84 and 85 respectively. The trophic diatom may have other considerable factors such as physical characteristics of bed material and slope environment to represent the healthiness for TDI prediction in addition to the water quality and temperature components.
3.3. Evaluation of the AEH Index to the Whole Watershed Streams
The AEH of the whole watershed was assessed by applying the 86 sub-watersheds Random Forest trained algorithms to 237 sub-watersheds with SWAT results.
Table 4 and
Table 5 show the number of sub-watersheds that evaluated each grade of three AEH indices, and
Figure 6 and
Figure 7 represent the annual AEH grade. The grade A of FAI and BMI are distributed at the upstream watershed. The TDI grade A does not exist in the watershed and the TDI grades from B to E were sensitive from year to year. This means that the TDI is sensitive to water quality especially NH4 concentration, which has the higher feature importance than others.
As seen in
Table 4 and
Table 5, the 2011 spring and fall AEH of three indices showed the negative movement from grade A to lower grades compared to other years. Looking at the figures, the bad grades were spread in space with FAI from grades A and B to C, TDI from grades C and D to E, and BMI from grade D to E in spring, and with FAI from grades A to B and C and BMI from grade A to B in fall respectively. As shown in
Table 1, the 2011 spring rainfall and runoff are greater than other years, and this can be the main cause of grade degradation by the increase of pollutant discharges from agricultural areas along the stream and urbanized areas at the downstream watershed. For fall 2011, the large rainfall in July might affect the degradation of AEH of FAI and BMI.