A Machine Learning Approach for Estimating the Trophic State of Urban Waters Based on Remote Sensing and Environmental Factors

: To improve the accuracy of remotely sensed estimates of the trophic state index (TSI) of inland urban water bodies, key environmental factors (water temperature and wind ﬁeld) were considered during the modelling process. Such environmental factors can be easily measured and display a strong correlation with TSI. Then, a backpropagation neural network (BP-NN) was applied to develop the TSI estimation model using remote sensing and environmental factors. The model was trained and validated using the TSI quantiﬁed by ﬁve water trophic indicators obtained for the period between 2018 and 2019, and then we selected the most appropriate combination of input variables according to the performance of the BP-NN. Our results demonstrate that the optimal performance can be obtained by combining the water temperature and single-band reﬂection values of Sentinel-2 satellite imagery as input variables (R 2 = 0.922, RMSE = 3.256, MAPE = 2.494%, and classiﬁcation accuracy rate = 86.364%). Finally, the spatial and temporal distribution of the aquatic trophic state over four months with different trophic levels was mapped in Gongqingcheng City using the TSI estimation model. In general, the predictive maps based on our proposed model show signiﬁcant seasonal changes and spatial characteristics in the water trophic state, indicating the possibility of performing cost-effective, RS-based TSI estimation studies on complex urban water bodies elsewhere. analysis sample SD and WT measured in situ model, developed thus, was used to map the trophic state of urban waters.


Introduction
As urbanization and industrialization accelerate on both local and global scales, nutrients such as nitrogen and phosphorus are discharged into urban lakes and rivers, leading to water eutrophication [1,2]. Algal blooms may occur in eutrophic water bodies as a result. Such bloom events can severely impact the health of the public and the local ecosystem [1,2]. Since eutrophication has become a significant hydro-environmental problem worldwide in the 20th century [3,4], the trophic state index (TSI) has been proposed and widely applied as a means of quantifying and thereby managing the trophic state of aquatic systems [5]. TSI integrates multiple trophic indicators including chlorophyll-a (Chl-a), total phosphorus (TP), total nitrogen (TN), Secchi depth (SD), and the permanganate index (COD Mn ). Traditionally, these trophic indicators could only be measured and monitored through fieldwork and manual data acquisition, which is often expensive. However, more recently, satellite remote sensing (RS) technology has improved and become more accessible. RS is now extensively used to monitor and evaluate the trophic states of water bodies [6,7] and has a broader range, faster speed, and lower cost [8][9][10][11][12].
Numerous studies have been conducted to explore various approaches to estimating and modelling the TSI through RS data. These studies can be divided into two categories: those using analytical methods and those using empirical methods. The former category relies on radiative transfer in the water column and requires considerable volumes of data on the spectral properties of optically active water constituents [6]. For example, Shi [13] Gongqingcheng City is a hilly, lakeside city, located north of Jiangxi Province (29 • 09 -29 • 19 N and 115 • 44 -115 • 58 E) and adjacent to the northwestern shoreline of Poyang Lake, the largest freshwater lake in China. Gongqingcheng City's main water bodies include the Boyang River, Nan Lake, and a branch of Poyang Lake. In addition, there are 19 dikes (with a total length of 71.3 km) and 21 reservoirs (with a total storage capacity of 9.6385 million m 3 ). Gongqingcheng City has a subtropical, humid, monsoon climate with strongly seasonal hydrological characteristics. Average annual temperature and rainfall are 16.7 • C and 1395.6 mm, respectively [38]. Southerly winds prevail in the summer whilst northerly winds prevail through the rest of the year.
Before the large-scale urbanization began in 2010, the city's water bodies were ecologically robust. In recent years, however, because of the expansion of industry, agriculture and fisheries, tons of urban sewage have been discharged into lakes and rivers. As a result, these water bodies have been experiencing increasingly severe eutrophication [39].
This research focuses on the period from 2018 to 2019, during which an environmental and ecological restoration program was launched in Gongqingcheng City. This program focused on the city's water bodies and involved bank slope management, the dredging of rivers and lakes, and the restoration of water system connectivity.

Datasets and Preprocessing 2.2.1. Field Data
Monthly field measurements were conducted from November 2018 to October 2019. In order to representatively gauge the environmental characteristics and the trophic state of urban water bodies throughout the research area, a total of 18 sampling sites were identified in lakes, reservoirs, rivers and wetlands. The distribution of all sample sites is shown in Figure 1. Water samples from 50 cm below the surface were collected, and the concentrations of TP, COD Mn , TN, and Chl-a were obtained by laboratory analysis within 48 h of the sample being taken. Simultaneously, SD and WT were measured during in situ surveys.

Study Area
Gongqingcheng City is a hilly, lakeside city, located north of Jiangxi Province (29°09′-29°19′ N and 115°44′-115°58′ E) and adjacent to the northwestern shoreline of Poyang Lake, the largest freshwater lake in China. Gongqingcheng City's main water bodies include the Boyang River, Nan Lake, and a branch of Poyang Lake. In addition, there are 19 dikes (with a total length of 71.3 km) and 21 reservoirs (with a total storage capacity of 9.6385 million m 3 ). Gongqingcheng City has a subtropical, humid, monsoon climate with strongly seasonal hydrological characteristics. Average annual temperature and rainfall are 16.7 °C and 1395.6 mm, respectively [38]. Southerly winds prevail in the summer whilst northerly winds prevail through the rest of the year.
Before the large-scale urbanization began in 2010, the city's water bodies were ecologically robust. In recent years, however, because of the expansion of industry, agriculture and fisheries, tons of urban sewage have been discharged into lakes and rivers. As a result, these water bodies have been experiencing increasingly severe eutrophication [39].
This research focuses on the period from 2018 to 2019, during which an environmental and ecological restoration program was launched in Gongqingcheng City. This program focused on the city's water bodies and involved bank slope management, the dredging of rivers and lakes, and the restoration of water system connectivity.

Field Data
Monthly field measurements were conducted from November 2018 to October 2019. In order to representatively gauge the environmental characteristics and the trophic state of urban water bodies throughout the research area, a total of 18 sampling sites were identified in lakes, reservoirs, rivers and wetlands. The distribution of all sample sites is shown in Figure 1. Water samples from 50 cm below the surface were collected, and the concentrations of TP, CODMn, TN, and Chl-a were obtained by laboratory analysis within 48 h of the sample being taken. Simultaneously, SD and WT were measured during in situ surveys. Figure 1. Location of the study area and the distribution of (a) the first eight sampling sites located in reservoirs, rivers, independent lakes and (b) the other ten samplings sites located in the whole Nanhu lake. Figure 1. Location of the study area and the distribution of (a) the first eight sampling sites located in reservoirs, rivers, independent lakes and (b) the other ten samplings sites located in the whole Nanhu lake.

Satellite Data
'Sentinel' is a series of satellites launched by the European Commission (EC) and the European Space Agency (ESA) under the Copernicus program to meet specific Earth observation requirements. S-2 consists of two satellites (2A and 2B), with a phase difference of 180 • , enabling a revisit period of only five days. The high spatial resolution of the S-2 satellites (13 spectral bands in the range of 400-2400 nm with 10, 20 and 60 m spatial resolution) makes them suitable for identifying and monitoring urban water bodies.
In order to estimate the TSI in an urban context, nine MSI images, covering Gongqingcheng City from November 2018 to October 2019, were downloaded from the ESA Data Hub, with time intervals of less than three days from sampling dates. The product level of MSI images is L1C, a top-of-atmosphere reflectance product with ortho-correction and geometric refinement at the sub-pixel level. Atmospherically corrected bottom-of-atmosphere reflectance, along with a scene classification map (L2A products), can be obtained from the L1C data using Sen2cor processing software provided by ESA.

Meteorological Data
Open source meteorological data corresponding to sampling dates were acquired from NOAA's National Climatic Data Center. Variables of interest included air temperature (T), wind direction (WD) and wind speed (WS). The meteorological monitoring station closest to Gongqingcheng City is No.585060 (29 • 34 N and 115 • 58 E). Precipitation was not considered in our model because neither data collection nor RS image capture was possible on cloudy or rainy days.

Framework for TSI Estimation Model
As shown in Figure 2, the framework of the TSI estimation model was based on a machine learning algorithm using S-2 data and environmental data as input variables. We set up four combination patterns of environmental and RS factors, selecting the most appropriate one by means of an accuracy assessment and a mean impact value (MIV) assessment. The final model, developed thus, was used to map the trophic state of urban waters.
'Sentinel' is a series of satellites launched by the European Commission (EC) and the European Space Agency (ESA) under the Copernicus program to meet specific Earth observation requirements. S-2 consists of two satellites (2A and 2B), with a phase difference of 180°, enabling a revisit period of only five days. The high spatial resolution of the S-2 satellites (13 spectral bands in the range of 400-2400 nm with 10, 20 and 60 m spatial resolution) makes them suitable for identifying and monitoring urban water bodies.
In order to estimate the TSI in an urban context, nine MSI images, covering Gongqingcheng City from November 2018 to October 2019, were downloaded from the ESA Data Hub, with time intervals of less than three days from sampling dates. The product level of MSI images is L1C, a top-of-atmosphere reflectance product with ortho-correction and geometric refinement at the sub-pixel level. Atmospherically corrected bottom-of-atmosphere reflectance, along with a scene classification map (L2A products), can be obtained from the L1C data using Sen2cor processing software provided by ESA.

Meteorological Data
Open source meteorological data corresponding to sampling dates were acquired from NOAA's National Climatic Data Center. Variables of interest included air temperature (T), wind direction (WD) and wind speed (WS). The meteorological monitoring station closest to Gongqingcheng City is No.585060 (29°34′ N and 115°58′ E). Precipitation was not considered in our model because neither data collection nor RS image capture was possible on cloudy or rainy days.

Framework for TSI Estimation Model
As shown in Figure 2, the framework of the TSI estimation model was based on a machine learning algorithm using S-2 data and environmental data as input variables. We set up four combination patterns of environmental and RS factors, selecting the most appropriate one by means of an accuracy assessment and a mean impact value (MIV) assessment. The final model, developed thus, was used to map the trophic state of urban waters.

Quantification of Trophic State
In this study, the trophic state of urban water bodies was quantified using a comprehensive evaluation method based on the trophic state index (TSI). This approach was proposed by the National Environmental Monitoring Center (NEMC) [40] and has been widely used in China for studying urban water trophic levels [41,42]. This method comprehensively considers the contribution degree of five trophic indicators, including Chl-a, TP, TN, SD and COD Mn . The weight coefficients were obtained by extrapolating the correlation between Chl-a and other parameters, drawing on the statistical results of a survey of 26 major lakes in China. The expression for the TSI is: TSI(TP) = 10(9.436 + 1.624 × lnTP) where the unit of Chl-a is mg/m 3 , and the units of TP, TN, and COD Mn are mg/L. SD represents the Secchi disk, where the unit is m. The larger the TSI, the higher the load of trophic indicators, thus the higher the incidence of eutrophication. The specific classifications were as follows: oligotrophic (TSI < 30), mesotrophic (30 ≤ TSI < 50), light eutrophic (50 ≤ TSI < 60), middle eutrophic (60 ≤ TSI < 70) and hypereutrophic (TSI ≥ 70).

TSI Outlier Handling
Hydrogeological datasets often include substantial deviations, with a large number of points being designated 'outliers' as a result of errors in data measurement, transmission or transcription [43]. It is essential to ensure the reliability and accuracy of the original dataset prior to use. This can be achieved through data cleaning, especially in cases of data mining and machine learning where large sample sizes are standard [44].
Here, we introduce the inter-quartile range (IQR) rule for identifying TSI outliers because the IQR rule is not constrained by any dependence on, or assumption of, Gaussian data distributions. IQR is defined as the difference between the third and first quartiles, and elements >1.5 IQR larger than the third quartile (Q3), or <1.5 IQR smaller than the first quartile (Q1), are outliers, which are expressed as: where Q 1 and Q 3 are the first and third quartiles of the sorted estimations, respectively.

Preprocessing of RS Images
Image preprocessing was performed using the SNAP and ENVI software packages [14]. This involved subsetting, resampling, reprojection, and the removal of cloud-pixel points. The region of interest (ROI) was defined as the administrative area of Gongqingcheng City. Geographically appropriate S-2 products were then resampled to 20 m resolution using a 'bicubic' interpolation method for up-sampling and a 'median' method for down-sampling. Some cloud-pixel points in the resampled images then needed to be removed by setting the threshold of the cloud confidence index to 20 so as to avoid cloud interference.

Extracting Water Bodies
In the case of S-2 satellite images, the Normalized Difference Water Index (NDWI)calculated using R rs from Band 3 (B3) and Band 8a (B8a)-is a more appropriate index for identifying open water bodies. The NDWI algorithm was developed by McFeeters [45] as a means of measuring water surface extent and can be defined as: where R Green and R Nir are the R rs of B3 and B8a (from the S-2 data) respectively. During the MSI data collection process, the measurable reflectance of water surfaces is affected by nearby structures, such as bridges and passing ships, and by factors such as water disturbance [46]. This is known as the adjacency effect, and was an important factor for consideration when analyzing water grids corresponding to in situ sampling points [28]. In this research, the nine-grid method was used to correct pixel points identified by the NDWI algorithm ( Figure 3). This involved searching the pixel points located within a 3 × 3 window of each MSI image centered on the sampling point, extracting the water pixels amongst them and calculating the mean values of those water pixel points. If no water pixels were found to exist within the search window, the in situ point was deleted.

Extracting Water Bodies
In the case of S-2 satellite images, the Normalized Difference Water Index (NDWI)calculated using Rrs from Band 3 (B3) and Band 8a (B8a)-is a more appropriate index for identifying open water bodies. The NDWI algorithm was developed by McFeeters [45] as a means of measuring water surface extent and can be defined as: where and are the Rrs of B3 and B8a (from the S-2 data) respectively. During the MSI data collection process, the measurable reflectance of water surfaces is affected by nearby structures, such as bridges and passing ships, and by factors such as water disturbance [46]. This is known as the adjacency effect, and was an important factor for consideration when analyzing water grids corresponding to in situ sampling points [28]. In this research, the nine-grid method was used to correct pixel points identified by the NDWI algorithm ( Figure 3). This involved searching the pixel points located within a 3 × 3 window of each MSI image centered on the sampling point, extracting the water pixels amongst them and calculating the mean values of those water pixel points. If no water pixels were found to exist within the search window, the in situ point was deleted.

Selection of Environmental Factors
Four environmental factors were considered in this study: WT, T, WD and WS. First, the Pearson correlation coefficients between TSI, trophic indicators and these environmental factors were calculated to predict the influence these variables may have on the water body trophic state.
As is shown in Figure 4, both WT and T were strongly associated with TSI, showing similar correlation coefficients. These two environmental factors also displayed strong correlations with TP and CODMn, but no significant correlations with TN and Chl-a. As a result, WT and T were suitable for TSI estimation. Besides, there were no significant correlations between (1) WD and TSI and (2) WD and trophic indicators, except CODMn. However, the correlation coefficient between WD and Chl-a was highest amongst all the environmental factors, thus WD was alternatively selected as the input variable. There were no significant correlations between WS and TSI and all the trophic indicators, which may be attributable to the low levels of WS during the field survey. Therefore, WS was excluded from the estimation model.

Selection of Environmental Factors
Four environmental factors were considered in this study: WT, T, WD and WS. First, the Pearson correlation coefficients between TSI, trophic indicators and these environmental factors were calculated to predict the influence these variables may have on the water body trophic state.
As is shown in Figure 4, both WT and T were strongly associated with TSI, showing similar correlation coefficients. These two environmental factors also displayed strong correlations with TP and COD Mn , but no significant correlations with TN and Chl-a. As a result, WT and T were suitable for TSI estimation. Besides, there were no significant correlations between (1) WD and TSI and (2) WD and trophic indicators, except COD Mn . However, the correlation coefficient between WD and Chl-a was highest amongst all the environmental factors, thus WD was alternatively selected as the input variable. There were no significant correlations between WS and TSI and all the trophic indicators, which may be attributable to the low levels of WS during the field survey. Therefore, WS was excluded from the estimation model. Unfortunately, the correlation coefficients between independent variables were high, especially in the case of WT and T (correlation coefficient = 0.94), accounting for serious collinearity between the independent variables. Collinearity variables should not be input simultaneously to avoid estimation distortion of the prediction model.

TSI Estimation Model Based on Backpropagation Neural Network
Considering the complexity of trophic mechanisms within the aqueous environment, it is difficult to explain the relationships that exist between the TSI and the numerous influencing factors used in this study. It was in response to this fact that we introduced the BP-NN into the TSI recognition model. BP-NN is a multi-layer feedforward neural network, the main characteristics of which are signal forward propagation and error backpropagation [47].
The network structure of the BP-NN in this study is composed of three layers: the input layer, the hyperbolic tangent function hidden layer and the linear output layer. The environmental factors and RS factors were included in the input layer and the TSI in the output layer. The maximum number of training sessions was defined as 5000, while other parameters were set to default values. According to the universal approximation theorem [48], as long as the number of hidden layer nodes is appropriately defined within reasonable limits, a three-layer NN can be effectively applied to a wide range of problems [49]. This being the case, the number of hidden layer nodes in this study was optimized by means of a test analysis, run in order to obtain the optimal fitting results. We then set up four combination patterns of input variables (Table 1) representative of different water Unfortunately, the correlation coefficients between independent variables were high, especially in the case of WT and T (correlation coefficient = 0.94), accounting for serious collinearity between the independent variables. Collinearity variables should not be input simultaneously to avoid estimation distortion of the prediction model.

TSI Estimation Model Based on Backpropagation Neural Network
Considering the complexity of trophic mechanisms within the aqueous environment, it is difficult to explain the relationships that exist between the TSI and the numerous influencing factors used in this study. It was in response to this fact that we introduced the BP-NN into the TSI recognition model. BP-NN is a multi-layer feedforward neural network, the main characteristics of which are signal forward propagation and error backpropagation [47].
The network structure of the BP-NN in this study is composed of three layers: the input layer, the hyperbolic tangent function hidden layer and the linear output layer. The environmental factors and RS factors were included in the input layer and the TSI in the output layer. The maximum number of training sessions was defined as 5000, while other parameters were set to default values. According to the universal approximation theorem [48], as long as the number of hidden layer nodes is appropriately defined within reasonable limits, a three-layer NN can be effectively applied to a wide range of problems [49]. This being the case, the number of hidden layer nodes in this study was optimized by means of a test analysis, run in order to obtain the optimal fitting results. We then set up four combination patterns of input variables (Table 1) representative of different water conditions, and defined the optimal combinations based on the accuracy assessment of the BP-NN output.

TSI Estimation Model Based on Backpropagation Neural Network
MIV was used to determine the influence of input neurons on the output neurons within the BP-NN model. The specific process followed in performing this calculation was as follows [50]: After network training had been concluded, the training samples of each input variable were used to form two new training samples based on its original value, plus or minus 10%. The application of the developed network was built in the new training sample, and two simulation results were obtained. The difference of the simulation results represented the influence on TSI induced by each input variable.

Assessment of the Accuracy of the Model
The accuracy of the developed model could be measured through the coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute percentage error (MAPE) and classification accuracy rate (CAR). These four parameters were calculated as follows: where y is the value of TSI measured using in situ data, f is the estimated TSI value, n is the number of all samples, C(·) is the classification based on TSI and I(·) is the indicator function.

TSI Level and S-2 Spectral Characteristics
Because the study area was obscured by thick clouds throughout January, February and April 2019, all S-2 imagery acquired during these three months was omitted from this study. Subsequent to data preprocessing, TSI results of 110 samples were obtained for the region for the period from 2018 to 2019 ( Figure 5). Due to the high trophic load of shallow urban lakes, water fluidity was lacking and the eutrophication was severe, with the average TSI fluctuating between 58 and 80. Additionally, the TSI of the studied urban water bodies exhibited pronounced seasonal heterogeneity. Temperatures rose in spring and remained high throughout summer, providing favorable conditions for phytoplankton growth. During this annual warm period, it was observed that the TSI increases monthly, peaking at over 80 (severe eutrophication) in August. In addition to seasonal differences, TSI also showed up spatial heterogeneity in reservoirs, rivers and lakes. Specifically, a reservoir represented by the 1# sampling point was in the mesotrophic state, with a TSI of around 44 during the study period. The river typed urban waters, represented by the 7# Remote Sens. 2021, 13, 2498 9 of 16 survey point, were mainly in the light eutrophic state, with an average TSI of around 59. For most of the urban lakes, the eutrophication problems appeared in different degrees.
growth. During this annual warm period, it was observed that the TSI increases monthly, peaking at over 80 (severe eutrophication) in August. In addition to seasonal differences, TSI also showed up spatial heterogeneity in reservoirs, rivers and lakes. Specifically, a reservoir represented by the 1# sampling point was in the mesotrophic state, with a TSI of around 44 during the study period. The river typed urban waters, represented by the 7# survey point, were mainly in the light eutrophic state, with an average TSI of around 59. For most of the urban lakes, the eutrophication problems appeared in different degrees. In order to explore the differences of spectral characteristics within different water trophic states, the typical spectra for S-2 MSI bands were selected for each of the four TSI grades, ranging from mesotrophic to hypereutrophic ( Figure 6). The mesotrophic curve was obtained using measurements taken from the reservoir within the study area, where water quality was still high. The middle eutrophic curve and the first hypereutrophic curve corresponded to the same urban water body in the months of July and September, respectively. The last two hypereutrophic curves describe the effluents of a sewage treatment plant in the months of August and October (respectively). A significant peak was observed in B3 and B5 and a slight reflectance trough can be seen in B4, representing eutrophic water, where B3 is the minimum absorption band of chlorophyll, B4 can reflect the chlorophyll fluorescence effect, and B5 is the special vegetation red edge band of the S-2 products. Those bands are also sensitive for algal blooms [35]. The reflectance of the deep reservoir shown by the blue curve is relatively low, resulting in an indistinct spectral feature. In order to explore the differences of spectral characteristics within different water trophic states, the typical spectra for S-2 MSI bands were selected for each of the four TSI grades, ranging from mesotrophic to hypereutrophic ( Figure 6). The mesotrophic curve was obtained using measurements taken from the reservoir within the study area, where water quality was still high. The middle eutrophic curve and the first hypereutrophic curve corresponded to the same urban water body in the months of July and September, respectively. The last two hypereutrophic curves describe the effluents of a sewage treatment plant in the months of August and October (respectively). A significant peak was observed in B3 and B5 and a slight reflectance trough can be seen in B4, representing eutrophic water, where B3 is the minimum absorption band of chlorophyll, B4 can reflect the chlorophyll fluorescence effect, and B5 is the special vegetation red edge band of the S-2 products. Those bands are also sensitive for algal blooms [35]. The reflectance of the deep reservoir shown by the blue curve is relatively low, resulting in an indistinct spectral feature.

Comparison of the Performances of the TSI Estimation Model with Environmental Factors
During this study, it was observed that the error tended to stabilize when the number of hidden layer nodes for each test was set to 12 (Table A1). Distributions of both the TSI estimated from the BP-NN model and the measured TSI values are shown in Figure 7. Initially, the model ran with only RS factors (Figure 7a), but its accuracy improved significantly following the addition of environmental factors. The CAR of the model combined with the T factor (Figure 7b) improved by 22% when compared with the original model. Among the three combinations of environmental factors utilized, the TSI estimation model combined with WT (Figure 7d) had the highest accuracy and determination coefficient (R 2 = 0.977, RMSE = 3.256, MAPE = 2.494%). This combination also achieved the highest accuracy for trophic classification (CAR = 86.364%). However, the accuracy of the model combined with WT and WD (Figure 7c) decreased, underestimating a few scattered points in the light and middle eutrophic states and showing a deviation from the tropic line. This may have been the result of environmental variable redundancy, reducing the predictive capability of the model. From the above results we can deduce that WT is more suitable as an input variable for the estimation of TSI than T or WD. The significance of WT with regards to the water trophic state is discussed in Section 5.
Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of 17 Figure 6. Typical Rrs spectra for S-2 multi-spectral instrument (MSI) bands describing urban waters with different trophic states.

Comparison of the Performances of the TSI Estimation Model with Environmental Factors
During this study, it was observed that the error tended to stabilize when the number of hidden layer nodes for each test was set to 12 (Table A1). Distributions of both the TSI estimated from the BP-NN model and the measured TSI values are shown in Figure 7. Initially, the model ran with only RS factors (Figure 7a), but its accuracy improved significantly following the addition of environmental factors. The CAR of the model combined with the T factor (Figure 7b) improved by 22% when compared with the original model. Among the three combinations of environmental factors utilized, the TSI estimation model combined with WT ( Figure 7d) had the highest accuracy and determination coefficient (R 2 = 0.977, RMSE = 3.256, MAPE = 2.494%). This combination also achieved the highest accuracy for trophic classification (CAR = 86.364%). However, the accuracy of the model combined with WT and WD (Figure 7c) decreased, underestimating a few scattered points in the light and middle eutrophic states and showing a deviation from the tropic line. This may have been the result of environmental variable redundancy, reducing the predictive capability of the model. From the above results we can deduce that WT is more suitable as an input variable for the estimation of TSI than T or WD. The significance of WT with regards to the water trophic state is discussed in Section 5.
We further analyzed the WT-Rrs combined model's ability to generalize the different trophic levels of water bodies. As shown in Figure 8, the model produced a high estimation accuracy when it identified areas of light and middle eutrophication (50 < TSI < 70), even when the model was run without environmental factor inputs. When mesotrophic and hypereutrophic water bodies were concerned (TSI < 50 or TSI > 70), the estimation model produced more significant estimation and classification errors. The WT-Rrs combined estimation model demonstrated a higher classification accuracy in the case of hypereutrophic waters (CAR = 81.82%), making it suitable for use in places where urban water bodies exhibit eutrophication problems.  We further analyzed the WT-Rrs combined model's ability to generalize the different trophic levels of water bodies. As shown in Figure 8, the model produced a high estimation accuracy when it identified areas of light and middle eutrophication (50 < TSI < 70), even when the model was run without environmental factor inputs. When mesotrophic and hypereutrophic water bodies were concerned (TSI < 50 or TSI > 70), the estimation model produced more significant estimation and classification errors. The WT-R rs combined estimation model demonstrated a higher classification accuracy in the case of hypereutrophic waters (CAR = 81.82%), making it suitable for use in places where urban water bodies exhibit eutrophication problems.

Mean Impact Value Analysis
The importance of each input variable involved in the WT-Rrs combined estimation model was evaluated, and the absolute MIV distribution of variables is shown in Figure  9.
The reflectance values in B4 and B5 ranked most highly among all the variables, although they show opposite effects on TSI estimation (B4 is negative whilst B5 is positive).

Mean Impact Value Analysis
The importance of each input variable involved in the WT-R rs combined estimation model was evaluated, and the absolute MIV distribution of variables is shown in Figure 9.

Temporal and Spatial Distribution of Trophic State
The BP-NN model, integrating the WT and Rrs of S-2 was used to map the spatial distribution of TSI with respect to water bodies throughout the study area over a period of four months between 2018 and 2019. Figure 10 shows changes in the water area and trophic state in the study area over time. The predicted distribution of TSI in the four stages fits well with our previous, existing understanding of this region.
The TSI values modelled for urban water bodies in Gongqingcheng City displayed a seasonal pattern of lower values in winter and higher values in summer. This conforms to apparent seasonal variations in nutrient levels; examples of the latter include TP, TN, and Chl-a. Additionally, lower trophic levels were observed in rivers and reservoirs than in lakes. The reflectance values in B4 and B5 ranked most highly among all the variables, although they show opposite effects on TSI estimation (B4 is negative whilst B5 is positive). The prominence of these bands represents the chlorophyll fluorescence effect and the vegetation red edge effect. WT also had a high MIV on the estimated results, reflecting the importance of environmental factors in determining the trophic state. In addition, the MIV of B11 and B2 (−7.5350 and 5.6419, respectively) trailed closely behind that of WT. MIV scores in the red-edge bands (B6 and B7) were lowest, at approximately 1.0.

Temporal and Spatial Distribution of Trophic State
The BP-NN model, integrating the WT and R rs of S-2 was used to map the spatial distribution of TSI with respect to water bodies throughout the study area over a period of four months between 2018 and 2019. Figure 10 shows changes in the water area and trophic state in the study area over time. The predicted distribution of TSI in the four stages fits well with our previous, existing understanding of this region.

Discussion
WT exhibits a rapid and direct response to climatic forcing [51][52][53]. Previous studies have indicated that WT is the main driving factor behind water eutrophication. Studies have also shown that it plays a vital role in the recovery and growth of cyanobacteria, making WT a key variable in the formation, decline and large-scale outbreak of algal bloom events [1]. RS images can be used to effectively gauge the trophic state, enabling us to identify areas undergoing increased eutrophication. This is made possible because the increase in productivity associated with eutrophication is accompanied by a change in the optical properties of water [20].
In this study, we coupled 'influence variables' with 'state variables'; that is to say, the The area of water classified as being eutrophic decreased between the winter of 2018 and October 2019, especially in the part of Nanhu Lake that adjoins to Poyang Lake. This observation shows that the restoration project helped to alleviate eutrophication in the study area.
The water bodies examined in this study underwent the most pronounced eutrophication during the summer months (Figure 10c). With the exception of Sixia Lake (in the north of the study area), nearly 80% of the water bodies in this region exhibited a TSI > 70. Of those water bodies, the Boyang River (which flows into the South Lake) appeared to undergo the most significant changes.
Unfortunately, the predicted TSI value corresponding to the triangular water area, shown in dark red in Figure 10b, is too high. This is perhaps because the area in question is a wetland during the dry season, and the water is relatively shallow at that time (no water was detected in Figure 10d). The optical characteristics of this locale are therefore different to those of a more 'conventional' water body. This affects the modelling result. Clearly, further work is required in this area.

Discussion
WT exhibits a rapid and direct response to climatic forcing [51][52][53]. Previous studies have indicated that WT is the main driving factor behind water eutrophication. Studies have also shown that it plays a vital role in the recovery and growth of cyanobacteria, making WT a key variable in the formation, decline and large-scale outbreak of algal bloom events [1]. RS images can be used to effectively gauge the trophic state, enabling us to identify areas undergoing increased eutrophication. This is made possible because the increase in productivity associated with eutrophication is accompanied by a change in the optical properties of water [20].
In this study, we coupled 'influence variables' with 'state variables'; that is to say, the variable of WT was fed into the TSI estimation model so as to supply information that is difficult to extract from remotely sensed images. The MIV method was then used to assess the relative importance of different variables in terms of their effect on trophic state. The BP-NN is a non-linear black-box model that can simplify both the variables fed into the model and the process of model fitting to ensure the robustness of the algorithm and reduce the risk of overfitting.
Our results show that WT is not only strongly correlated with TSI, but also exerts a significant influence on the estimation of TSI. Therefore, in cases where urban water trophic state is being modelled, using long time series data and on a large geographic scale, we suggest that WT be taken as an input variable to improve the accuracy and temporal and spatial portability of the model. This ease of use should enable water management authorities anywhere in the world to employ the method, contributing to the scientific management of urban water environments on a global scale.
It should be noted that the NN model needs to be supported by large-scale sampling data [26,54]. Due to the lack of available monitoring data for non-eutrophic water bodies, the accuracy of this model cannot be verified for water bodies with a TSI < 30. Furthermore, the model occasionally underestimates sample points with a TSI < 50, which is the same as the result from Watanabe et al. [27]. It is likely that the similarity between mesotrophic and light eutrophic samples measured by S-2 R rs (Figure 6), and the limitations in S-2 spectral bands, interfere with the identification of the trophic state, thereby contributing to this issue [55]. We encourage other researchers to apply this model to urban water bodies elsewhere so as to further validate and improve our method.

Conclusions
A BP-NN-based TSI estimation model is proposed for the analysis of urban water bodies. The established model is superior to alternative methods in several ways: (i) unlike the traditional RS models, both environmental factors (influencing variables) and satellite images (current 'state' variables) are considered, and WT is demonstrated to be the most important variable for TSI estimation; (ii) the machine learning approach based on the BP-NN algorithm is trained and substantiated using in situ datasets (n = 110) covering the typical period of one year in urban waters. The high accuracy of the WT-R rs combined estimation model suggests that environmental factors can compensate for the insufficiency of remotely sensed estimation methods to accurately monitor trophic state. Our integrated methodology enables spatial and temporal distribution maps of TSI to be produced and effectively evaluates the trophic state of urban water bodies.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Acknowledgments:
The authors wish to acknowledge all study participants from Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences and Jiangxi Academy of Water Science and Engineering for field work support.

Conflicts of Interest:
The authors declare no conflict of interest.