Next Article in Journal
Mudflow Hazard on Rivers in the Khamar-Daban Mountains (East Siberia): Hydroclimatic and Geomorphological Prerequisites
Previous Article in Journal
Chemical and Physical Characterisation of Microplastics Present on Beaches of the Cantabrian Coast, Bay of Biscay (Spain)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Remote Sensing Inversion of Total Phosphorus in East Juyan Lake Based on Machine Learning

Survey Center of Comprehensive Natural Resources, China Geological Survey, Beijing 100055, China
*
Author to whom correspondence should be addressed.
Hydrology 2025, 12(11), 299; https://doi.org/10.3390/hydrology12110299
Submission received: 10 October 2025 / Revised: 6 November 2025 / Accepted: 10 November 2025 / Published: 11 November 2025

Abstract

Timely and accurate monitoring of lakes’ water quality is crucial for assessing regional ecological health and implementing targeted conservation activities. Compared with traditional in situ water quality measurement methods, satellite remote sensing technology is more cost-effective and convenient, and also enables long-term time-series monitoring. This study utilizes Sentinel-2 multispectral imagery, selects East Juyan Lake as the study area, and employs measured water quality data from 30 in situ sampling points as training and testing samples. Using the correlation coefficient, root mean square error, and mean absolute error as evaluation metrics, a Grid Search-based XGBoost machine-learning method is applied to invert the concentration of total phosphorus (TP), a key parameter for water quality assessment. The experiments demonstrate that: (1) The XGBoost model, after parameter tuning via Grid Search, achieved the highest inversion accuracy, with R2, RMSE, and MRE values of 0.856, 0.017, and 7.20%, respectively; The average TP concentration retrieved for the lake was 0.231 mg/L. This method requires minimal manual setting of numerous training parameters, reducing human intervention. (2) The spatial distribution shows that TP is primarily enriched in the deeper central and eastern parts of the lake, while concentrations are relatively lower in the near-shore vegetation zones and the western shallow water areas. The findings provide a significant reference for remote sensing monitoring of lake water quality and can be used to predict and regulate salinity, eutrophication, and similar conditions in comparable lakes.

1. Introduction

Accurately monitoring the mass concentration of phosphorus in water bodies is a key step in assessing water quality [1]. Traditional methods for monitoring TP are often time-consuming, labor-intensive, costly, and limited by sparse sampling points. Additionally, they suffer from slow data analysis, delayed information transmission, and susceptibility to weather and seasonal variations, ultimately failing to provide comprehensive data [2]. Remote sensing technology for TP monitoring not only overcomes these limitations of traditional methods but also offers advantages such as high efficiency, low cost, and extensive coverage. It enables the rapid identification of subtle migration patterns that are often difficult to detect and allows for repeated monitoring of the same area within short timeframes [3].
With the advancement of water quality remote sensing technology, its application has gradually expanded from oceans to inland water bodies, progressing from simple water extent extraction to the remote sensing monitoring of water quality parameters. The remote sensing of water quality parameters has also evolved from qualitative assessment to quantitative retrieval, with methodologies advancing from initial analytical and empirical methods to semi-analytical, semi-empirical, and machine-learning-based inversion approaches [4,5,6].
Over the past few decades, scholars have conducted extensive research on remote sensing for water quality monitoring, achieving significant results in estimating optically active parameters such as Chlorophyll-a (Chl-a), Suspended Particulate Matter (SPM), Colored Dissolved Organic Matter (CDOM), turbidity, and water clarity [7,8]. However, directly estimating non-optically active parameters like TP and Total Nitrogen (TN) through spectral characteristics is challenging, as these parameters have minimal direct impact on the optical properties measured by satellite sensors. Typically, the estimation of non-optically active parameters relies on correlations between optically active parameters and the target non-optical parameters [9,10,11]. For instance, Chang et al. used Moderate Resolution Imaging Spectroradiometer (MODIS) imagery and a genetic programming model to estimate TP concentrations in Tampa Bay, USA. Their results indicated that MODIS bands 1, 3, and 4 had the greatest influence on determining TP concentration [12]. Li et al. established an empirical model using Landsat 8 Operational Land Imager (OLI) data to estimate TP and TN in the Xin’anjiang Reservoir (China). Their analysis showed that the factors (Band 1 + Band 3 + Band 4) / Band 2 and Band 4 / (Band 2 + Band 5), derived from Landsat 8 OLI, exhibited significant correlations with TP and TN concentrations, respectively [13].
In recent years, the rapid advancement of machine-learning technologies has opened up new possibilities for water quality monitoring across various aquatic ecosystems. This is particularly significant for retrieving non-optically active parameters, whose concentration-spectrum relationships often exhibit nonlinear characteristics and are influenced by complex environmental factors [14,15]. Traditional regression methods struggle to capture these intricate patterns, whereas machine-learning algorithms, with their flexible hypothesis space during training, can effectively model such nonlinear dynamics [16]. Among the numerous machine-learning algorithms, methods such as Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Neural Networks have become highly effective tools in the field of water quality inversion, owing to their strong generalization capabilities and adaptability to complex nonlinear relationships. For example, Liu et al. constructed a neural network inversion model using Sentinel-2 satellite imagery and in situ monitoring data to retrieve TN and ammonia nitrogen in the Danjiangkou Reservoir. They found that the established model had high fitting accuracy, was suitable for the remote sensing inversion of these parameters in the reservoir, and effectively reflected its water quality status and spatial distribution [17]. Hu et al. employed GF-1, Landsat-8, and Sentinel-2 data with the RF method to retrieve TP and TN concentrations in the lower reaches of the Yangtze River. All three datasets achieved strong fitting results with the measured data [18]. Zhan et al. utilized in situ data and Sentinel-2 imagery, employing the Pearson correlation coefficient for feature selection, and applied three machine-learning methods to retrieve TP and TN concentrations in the MinJiang River. Their study effectively achieved the inversion of non-optically active water quality parameters in an inland river, with the RF model demonstrating the highest retrieval accuracy [19]. Wu et al., focusing on the lower reaches of the Haihe River, analyzed the correlations between Landsat 8 OLI data (single bands and band combinations) and concentrations of total phosphorus, ammonia nitrogen, total nitrogen, and electrical conductivity. Using two inversion methods, their results indicated that the neural network model better captured the nonlinear relationship between water quality parameters and spectral reflectance, yielding higher accuracy compared to statistical regression models [20]. Guo et al. developed a DNN to reconstruct the spatiotemporal variations of TP in the Great Lakes region from 2002 to 2022. They found that the DNN algorithm based on MODIS Rrs data exhibited high reliability in TP estimation [21]. Fang et al. applied a two-line classification method to categorize in situ samples from 105 lakes across China and employed four machine-learning models with MODIS data for TP retrieval. The XGBoost model ultimately delivered the best performance [22]. Xiong et al. selected OLCI imagery and, based on the vertical distribution characteristics of TP concentrations in several eutrophic lakes, developed an XGBoost-based model for estimating the water column total phosphorus mass. The model successfully achieved a prediction accuracy of R2 ≥ 0.6, though estimation errors increased with greater water depth [23]. It is evident that the retrieval of water quality parameters like TP from remote sensing data is a nonlinear process. Machine-learning methods, with their capacity to handle complex nonlinear mappings, are therefore more effective at modeling the nonlinear function between water quality parameters and spectral reflectance. Currently, for machine-learning-based retrieval of TP and other water quality parameters, it is crucial to effectively and reasonably adjust various model parameters according to different remote sensing data sources in order to achieve optimal inversion performance.
East Juyan Lake, a terminal lake of the Heihe River located in the heart of the Ejin Oasis in the lower reaches, is a vital component of the Heihe River Basin’s ecosystem and a key indicator of the river’s health. As a typical terminal lake of an inland river situated in an arid zone with intense evaporation, its water volume and quality exhibit significant instability. Therefore, conducting timely and effective monitoring and protection of the lake’s water quality is a crucial task for safeguarding the regional ecological environment and managing water resources.
In summary, this study selected East Juyan Lake as the research area and, building upon previous TP inversion models, employed the XGBoost algorithm integrated with grid search to automatically determine the optimal parameter combination, thereby reducing the uncertainty associated with manual parameter tuning. Utilizing Sentinel-2 satellite imagery, the study retrieved the lake’s TP concentration and explored the most effective inversion model by comparing different methods. This research aims to provide a more practical and efficient monitoring strategy for non-optically active water quality parameters in lake management, offering a significant reference value.

2. Study Area and Materials

2.1. Study Area

This study selected East Juyan Lake as the study area. East Juyan Lake (also known as Sogo Nur) is located 44 km north of Ejin Banner, Alxa League, Inner Mongolia Autonomous Region, at the northwestern edge of the Badain Jaran Desert. It is a terminal lake of the Heihe River, China’s second-largest inland river. The lake is situated between 42°10′–42°20′ N and 101°12′–101°19′ E (Figure 1). After flowing into Ejin Banner, the Heihe River is referred to as the Ejin River, covering a course of approximately 270 km. The river splits into eastern and western branches at Langxinshan, with the eastern branch flowing into East Juyan Lake. The region has an annual average temperature of 9.7 °C, with average maximum and minimum temperatures of 17.6 °C and 1.5 °C, respectively.
Since the 1950s, rapid socioeconomic development in the Heihe River Basin has led to a sharp increase in industrial and agricultural water use along the river. Consequently, the water flow reaching the lower reaches of the Heihe River significantly decreased, and the river even experienced a complete cutoff, resulting in the drying up of East Juyan Lake in 1992. In 2000, the national government initiated a unified regulation of water flow in the main stream of the Heihe River. Following the implementation of this regulation and recent basin management measures, the water volume entering the lower reaches has gradually increased, generating substantial positive ecological impacts. East Juyan Lake has remained unfrozen for 20 consecutive years, with its water area maintained at approximately 40 km2 [24]. During the study period in August 2023, there was no precipitation in the region, and the lake’s sole inlet was dry, with no water replenishment occurring.

2.2. Materials

The data used in this study primarily include Sentinel-2 remote sensing imagery and in situ measured water quality data.

2.2.1. Imagery Data

The remote sensing imagery was obtained from the Multispectral Instrument (MSI) onboard the Sentinel-2 satellite. The imagery covers 13 spectral bands ranging from the visible and near-infrared (VNIR) to the short-wave infrared (SWIR), with varying spatial resolutions. It features a swath width of 290 km and a revisit cycle of 10 days. The available data products include Level-1C (L1C) and Level-2A (L2A). The L1C product provides top-of-atmosphere (TOA) reflectance data, which has undergone orthorectification and geometric correction but not atmospheric correction. The L2A product, in contrast, provides bottom-of-atmosphere (BOA) reflectance data after atmospheric correction. When only L1C data are available, users must generate the L2A data based on their requirements.
To ensure data consistency between satellite and ground-based measurements, the imagery used in this study was obtained from the website of the Copernicus Data Space Ecosystem (CDSE) [25], with an imaging date of 18 August 2023. Other detailed specifications are provided in Table 1. The L1C product of Sentinel-2 imagery was pre-processed through atmospheric correction using the Environmental Visualization Image Processing Software (ENVI 5.3) to generate the L2A product, which provides bottom-of-atmosphere reflectance data for water quality inversion modeling.
To achieve better inversion results, water body extraction from the remote sensing imagery is necessary. This study employed the Modified Normalized Difference Water Index (MNDWI) to identify water bodies and applied the OTSU algorithm to automatically determine the threshold, enabling the extraction and statistical analysis of the water area and obtaining the complete spatial extent of the lake.
M N D W I = B a n d G r e e n B a n d S W I R / B a n d G r e e n + B a n d S W I R
where B a n d G r e e n and B a n d S W I R represent the green and far-infrared bands of the Sentinel-2 imagery, respectively.
Using the extracted water body layer, the Sentinel-2 satellite imagery was clipped to obtain the image data for remote sensing inversion. This process effectively eliminates interference from land, vegetation, and other non-aquatic features.

2.2.2. In Situ Water Quality Data

In situ water samples were collected from the study area on 17 August 2023. A total of 30 sampling points were established, uniformly distributed across the lake area (the spatial distribution of these points is shown in Figure 1). After collection, the samples were immediately sent to the laboratory for analysis using inductively coupled plasma optical emission spectrometry (ICP-OES). The measured parameters included TP, TN, permanganate index (CODMn), and Total Dissolved Solids (TDS). All sampling and testing procedures were strictly followed by national standard methods to ensure data accuracy and comparability. The measured values of various water quality parameters from the 30 samples are summarized in Table 2.

3. Methods

To evaluate the performance of different methods for retrieving water quality parameters, this study conducts comparative experiments using a band-ratio model, a Random Forest model, and an XGBoost model trained via Grid Search for water quality inversion.

3.1. Band Ratio Model

This approach constructs an inversion formula using the ratio of two Sentinel-2 satellite bands. Leveraging the absorption and scattering characteristics of water bodies for electromagnetic waves of different wavelengths, regression analysis is performed between measured water quality data and the reflectance of corresponding pixels in the remote sensing imagery to establish an empirical polynomial inversion model.
Using the pearsonr module from the scipy library in Python 3.12, the Pearson correlation coefficient was employed to analyze the relationship between the spectral values of Sentinel-2 satellite bands and the measured water quality concentration values in the study area. After a comparative analysis of various band combinations (Table 3), the band ratio B3/B5 was identified as having the strongest correlation with the measured TP concentration. Therefore, a polynomial inversion model for TP was established based on the B3/B5 ratio of Sentinel-2 imagery.

3.2. Random Forest Model

Random Forest is an ensemble learning method. This approach constructs multiple models from data and aggregates their predictions through voting (for classification problems) or averaging (for regression problems), thereby endowing the overall model with high accuracy and strong generalization capability [26].
In this study, the Random Forest model was implemented using scikit-learn in Python 3.12. The key hyperparameters were determined through analysis of learning curves, with the specific settings provided in Table 4. Based on the RF model, the top 10 feature importances are shown in Figure 2a. Among them, the band ratio B2/B4 achieved the highest importance score of 0.0876, followed by B3/B5 and B2/B5. The single band B11 ranked fourth in importance.

3.3. Grid Search-Based XGBoost Model

XGBoost is a widely used boosting ensemble learning algorithm. Developed as an enhancement to the GBDT algorithm, it performs a second-order Taylor expansion of the objective function, thereby preserving more information from the original objective. To mitigate overfitting, a regularization term is incorporated, which helps the model achieve lower variance [27]. The predictive accuracy of the model primarily depends on its hyperparameters, which are specific values or weights that define the algorithm’s learning process. Grid search is a widely used hyperparameter optimization method that employs an exhaustive approach to identify the optimal parameter combination from predefined sets [28]. The core concept involves defining a set of hyperparameters and their value ranges, generating all possible combinations, evaluating the performance of each combination, and ultimately selecting the one that performs best on the validation set. Grid search is extensively applied in hyperparameter tuning, such as determining the maximum depth of decision trees and the regularization parameters of support vector machines. By systematically traversing all possible parameter combinations, it ensures finding the globally optimal solution within the given search space.
This study introduces the grid search method to automatically evaluate different parameter combinations and identify the optimal set, particularly beneficial when working with small datasets. The principle of the method is as follows: first, the parameter with the greatest current impact on the model is selected for tuning. A predefined range of values is explored sequentially until the optimal value is found. Then, the next most influential parameter is tuned in the same manner. This process continues iteratively until all parameters have been optimized, resulting in the best parameter combination. Among these parameters, max_depth helps prevent the model from overfitting while accelerating convergence, and eta controls the model’s learning rate, enhancing its adaptability. Compared to other machine-learning algorithms, XGBoost offers greater flexibility. It continuously seeks to optimize the objective function, automatically handles missing feature values, and thereby improves prediction accuracy.
The model was developed using Python 3.12, and key hyperparameters were optimized through grid search. The optimization procedure was as follows: First, the learning rate (eta) was initially set to 0.1, with other parameters at their default values, and the GridSearchCV function was used to determine the optimal number of boosting rounds. Second, GridSearchCV was employed to automatically optimize the maximum tree depth. Following this, the learning rate was decreased (or increased) while correspondingly increasing (or decreasing) the number of iterations until the combination that minimized the error was identified. Finally, the optimal parameter set identified by the grid search (Table 4) was applied to the model for prediction, resulting in the best inversion performance. The top 10 features ranked by importance in the model, along with their corresponding scores, are presented in Figure 2b. Among these, the band ratio B3/B5 achieved the highest importance score of 0.0881, followed by B2/B4 (0.0713), B3/B4 (0.0702), and B4/B5 (0.0612). The subsequent four positions were occupied by single bands: B11, B12, B8, and B4.

3.4. Comparative Analysis of Evaluation Results

To evaluate the performance of the three water quality inversion models, three metrics were employed: the coefficient of determination (R2), Root Mean Square Error (RMSE), and Mean Relative Error (MRE). A higher R2 value indicates a better fit of the model, while lower RMSE and MRE values signify smaller errors and superior regression performance.
R 2 = 1 i = 1 N y ^ i y i 2 i = 1 N y _ i y i 2
R M S E = 1 N i = 1 N y ^ i y i 2
M R E = 1 N × i = 1 N y ^ i y i y i 100 %
where y i , y _ i and y ^ i represent the measured, average, and estimated water quality parameter values, respectively, and N is the number of sampling points.

3.5. Water Quality Classification

To more intuitively reflect the water quality status corresponding to the retrieved TP concentrations in East Juyan Lake during the study period, this study refers to the tiered TP limit criteria for lakes specified in the “Environmental Quality Standards for Surface Water” (GB 3838-2002) [29] (Table 5). These standards were applied to classify the water quality levels based on the TP inversion results.

4. Results

4.1. Accuracy Analysis of the 3 Inversion Methods

To compare the inversion accuracy of the Grid Search-Based XGBoost model against the other models, the R2, RMSE, and MRE between the predicted values and the measured values were calculated for each model. Corresponding cross-validation scatter plots were also generated. The results are presented in Table 6 and Figure 3.
As shown in Table 6 and Figure 3, among the three models evaluated in this study, the Grid Search-Based XGBoost model achieved the best performance for TP inversion. Compared with the commonly used band ratio model and the RF model, the Grid Search-Based XGBoost model demonstrated a higher R2 value, along with lower RMSE and MRE values, indicating stronger correlation and higher inversion accuracy.

4.2. Retrieved TP Concentrations of the Study Area

The trained Grid Search-Based XGBoost regression model was applied to Sentinel-2 satellite imagery to retrieve the TP concentrations in East Juyan Lake on 18 August 2023. The TP inversion and water quality classification results are presented in Figure 4a and 4b, respectively.
As indicated by the inversion results, the TP concentration in the lake is notably elevated, with an average value of 0.231 mg/L and minimum/maximum concentrations of 0.17 mg/L and 0.36 mg/L, respectively. According to the “Environmental Quality Standards for Surface Water” (GB 3838-2002), a TP concentration ranging from 0.1 to 0.2 mg/L corresponds to Class V water quality, while levels exceeding 0.2 mg/L are classified as inferior to Class V. Thus, during the study period, the lake’s overall water quality, based on TP levels, was categorized as inferior to Class V. This classification indicates that the water is unsuitable for agricultural or certain non-potable uses, necessitating enhanced wastewater treatment measures and increased vigilance regarding eutrophication risks. Furthermore, the spatial distribution of TP concentrations exhibited a pattern of higher levels in the western region and lower levels in the eastern region of the lake.
These observed water quality conditions are likely influenced by the lake’s primary water sources, as well as its internal ecological and topographic characteristics. As a terminal lake in the Heihe River Basin, East Juyan Lake is a typical endorheic lake. Salts and nutrients entering the lake via river inflow are retained and concentrated due to evaporation, without outflow, leading to the gradual accumulation of these substances. Currently, the lake’s water replenishment mainly depends on periodic ecological water releases from the upper reaches during summer and autumn. These inflows can carry nutrients such as nitrogen and phosphorus, posing a risk of mild to moderate eutrophication.
In 2023, the study year, the inflow from the upper Heihe River was lower than average for the first time in 21 years, and regional precipitation was severely insufficient. With an annual evaporation rate exceeding 3000 mm, this extreme imbalance between inflow and evaporation further increased the risk of water quality deterioration. In addition, higher summer water temperatures accelerate evaporation and stimulate biological activity, potentially accelerating eutrophication.
In terms of spatial distribution, TP concentrations are generally lower in the southern and western littoral zones with reed growth, likely due to the purifying effect of aquatic vegetation. Overall, TP concentrations are significantly higher in the central-eastern region than in the western part. Considering the lake’s bathymetry, the average water depth in the western part is about 1 m, while the central-eastern area is deeper. Under the relatively closed hydrological conditions, nutrients tend to accumulate more in these deeper areas, resulting in the observed spatial pattern.

5. Discussion

This study employed 3 distinct methods to retrieve the TP concentration in East Juyan Lake. Among them, the Grid Search-based XGBoost model achieved the highest inversion accuracy, with an R2 value of 0.856.
From the feature importance results of the two machine-learning models—RF and Grid Search-based XGBoost—the top ten important features were generally similar between the two models, though their ranking orders differed. Overall, ratios involving bands B2, B3, B4, and B5 were assigned high importance in both models and were ranked near the top. In the RF model, the band ratio B2/B4 achieved the highest importance score of 0.0876, while in the Grid Search-based XGBoost model, it was B3/B5, with a score of 0.0881. The RF model is relatively easier to use and exhibits lower overall prediction bias. However, it requires manual adjustment of parameters. The key innovation of the Grid Search-based XGBoost model constructed in this study lies in its automation: numerous training parameters in XGBoost do not require manual setting. Provided the search range is sufficiently large and the granularity is fine enough, grid search ensures that the globally optimal parameter combination within the defined space is identified. The process and results are intuitive, making it easier to understand the relationship between model performance and different hyperparameter combinations. This reduces human intervention and improves inversion accuracy. A notable drawback, however, is its computational demand, as it must exhaustively evaluate all parameter combinations, leading to longer training times until the optimal set is found. The traditional band ratio method establishes a deterministic fitted function through regression analysis between measured data and satellite band values/ratios. It has proven effective in retrieving optically active parameters such as water transparency, Chl-a, and SPM. However, its performance for TP in this study was relatively modest, yielding the lowest R2 value, and it did not fully utilize the multispectral information available in the Sentinel-2 data. In addition, it should be noted that these results are based on only 30 samples collected within a single day. Therefore, the model’s robustness has not been further tested with additional data points.
Based on the TP inversion results for East Juyan Lake during the study period obtained by the Grid Search-based XGBoost model, the average TP concentration was 0.231 mg/L. According to the “Environmental Quality Standards for Surface Water” (GB 3838-2002), this corresponds to inferior Class V water quality. The spatial distribution of TP in the lake aligns with the actual conditions observed during the study period. With the sole inflow channel dry and precipitation extremely low, East Juyan Lake functioned essentially as a closed water body. Weak water circulation and movement resulted in higher nutrient levels and elevated TP concentrations in the deeper central-eastern regions compared to the near-shore areas with aquatic vegetation and the shallower western part of the lake.

6. Conclusions

This study innovatively constructed a Grid Search-based XGBoost model to retrieve the TP concentration in East Juyan Lake by combining Sentinel-2 satellite imagery and in situ measured data. The retrieval results were compared with those obtained using the traditional band ratio method and the Random Forest method. The conclusions are as follows: (1) The traditional band ratio model is simple in algorithm and easy to implement, but its accuracy is low, and its performance in retrieving non-optically active parameters such as TP and TN is moderate. The Random Forest model exhibited relatively low overall prediction bias but requires manual parameter adjustment. Furthermore, its extrapolation capability is relatively limited when the training sample size is small. (2) The XGBoost model optimized with Grid Search achieved the highest retrieval accuracy and improved efficiency. Its performance metrics (R2 = 0.856, RMSE = 0.017, MRE = 7.20%) were superior to those of the other two algorithms, demonstrating excellent retrieval capability and practical utility. Nevertheless, the method requires further validation on additional datasets, including both chemical analysis and remote sensing data. (3) The spatial distribution showed higher concentrations in the central-eastern regions and relatively lower levels in the near-shore vegetation zones and western shallow-water areas. These results are consistent with the actual conditions.

Author Contributions

Conceptualization, Y.Z. and M.H.; methodology, Y.Z.; software, Y.Z.; validation, J.L. and X.L.; writing—original draft preparation, Y.Z.; writing—review and editing, M.H.; visualization, Y.Z.; supervision, W.Y.; project administration, W.Y.; funding acquisition, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the China Geological Survey Project, grant number DD20230501301.

Data Availability Statement

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, T.; Qiu, S.; Mao, S.; Bao, R.; Deng, H. Evaluating water resource accessibility in southwest China. Water 2019, 11, 1708. [Google Scholar] [CrossRef]
  2. Wang, S.; Li, J.; Zhang, B.; Lee, Z.; Spyrakos, E.; Feng, L.; Liu, C.; Zhao, H.; Wu, Y.; Zhu, L.; et al. Changes of water clarity in large lakes and reservoirs across china observed from long-term modis. Remote Sens. Environ. 2020, 247, 111949. [Google Scholar] [CrossRef]
  3. Duan, H.; Luo, J.; Cao, Z.; Xue, K.; Xiao, Q.; Liu, D. Progress in remote sensing of aquatic environments at the watershed scale. Prog. Geogr. 2019, 38, 1182–1195. [Google Scholar] [CrossRef][Green Version]
  4. Wu, C.; Wu, J.; Qi, J.; Zhang, L.; Huang, H.; Lou, L.; Chen, Y. Empirical estimation of total phosphorus concentration in the mainstream of the qiantang river in china using landsat tm data. Int. J. Remote Sens. 2010, 31, 2309–2324. [Google Scholar] [CrossRef]
  5. Zhao, L.; Lu, X.; Tan, H.; Ma, T. Water quality monitoring technology based on gf-1 satellite and xgboost model. Remote Sens. Inf. 2021, 36, 96–103. [Google Scholar]
  6. Huang, H.; Li, M.; Chen, Y.; Chen, G.; Liu, H.; Xing, Q.; Cai, J. Water quality retrieval by hyperspectral for city rivers in pearl river estuary based on partial least squares regression. Water Resour. Prot. 2021, 37, 36–42. [Google Scholar]
  7. Bugnot, A.B.; Lyons, M.B.; Scanes, P.; Clark, G.F.; Fyfe, S.K.; Lewis, A.; Johnston, E.L. A novel framework for the use of remote sensing for monitoring catchments at continental scales. J. Environ. Manag. 2018, 217, 939–950. [Google Scholar] [CrossRef] [PubMed]
  8. Wang, J.; Hu, M.; Gao, B.; Fan, H.; Wang, J. A spatiotemporal interpolation method for the assessment of pollutant concentrations in the yangtze river estuary and adjacent areas from 2004 to 2013. Environ. Pollut. (1987) 2019, 252, 501–510. [Google Scholar] [CrossRef] [PubMed]
  9. Lv, Y.; Guo, H.; Jin, S.; Wang, L.; Bian, H.; Liu, H. Permanganate index variations and factors in hongze lake from landsat-8 images based on machine learning. Photogramm. Eng. Remote Sens. 2022, 88, 791–802. [Google Scholar] [CrossRef]
  10. Lu, S.J.; Deng, R.R.; Liang, Y.H.; Xiong, L.H.; Ai, X.J.; Qin, Y. Remote sensing retrieval of total phosphorus in the pearl river channels based on the gf-1 remote sensing data. Remote Sens. 2020, 12, 1420. [Google Scholar] [CrossRef]
  11. Goyens, C.; Lavigne, H.; Dille, A.; Vervaeren, H. Using hyperspectral remote sensing to monitor water quality in drinking water reservoirs. Remote Sens. 2022, 14, 5607. [Google Scholar] [CrossRef]
  12. Chang, N.; Xuan, Z.; Yang, Y.J. Exploring spatiotemporal patterns of phosphorus concentrations in a coastal bay with modis images and machine learning models. Remote Sens. Environ. 2013, 134, 100–110. [Google Scholar] [CrossRef]
  13. Li, J.; Hu, C.; Shen, Q.; Barnes, B.B.; Murch, B.; Feng, L.; Zhang, M.; Zhang, B. Recovering low quality modis-terra data over highly turbid waters through noise reduction and regional vicarious calibration adjustment: A case study in taihu lake. Remote Sens. Environ. 2017, 197, 72–84. [Google Scholar] [CrossRef]
  14. Begliomini, F.N.; Barbosa, C.C.F.; Martins, V.S.; Novo, E.M.L.M.; Paulino, R.S.; Maciel, D.A.; Lima, T.M.A.; O’Shea, R.E.; Pahlevan, N.; Lamparelli, M.C. Machine learning for cyanobacteria mapping on tropical urban reservoirs using prisma hyperspectral data. ISPRS J. Photogramm. Remote Sens. 2023, 204, 378–396. [Google Scholar] [CrossRef]
  15. Liang, Y.; Ding, F.; Liu, L.; Yin, F.; Hao, M.; Kang, T.; Zhao, C.; Wang, Z.; Jiang, D. Monitoring water quality parameters in urban rivers using multi-source data and machine learning approach. J. Hydrol. 2025, 648, 132394. [Google Scholar] [CrossRef]
  16. Zhang, X.; Huang, J.; Chen, J.; Zhao, Y. Remote sensing monitoring of total suspended solids concentration in jiaozhou bay based on multi-source data. Ecol. Indic. 2023, 154, 110513. [Google Scholar] [CrossRef]
  17. Liu, X.; Zhao, T.; Cai, T.; Xiao, C.; Chen, X.; Zhang, W. Spatiotemporal monitoring of total nitrogen and ammonia nitrogen in danjiangkou reservoir. J. Agric. Resour. Environ. 2021, 38, 829–838. [Google Scholar]
  18. Hu, W.; Jin, S.; Zhang, Y. Water quality variations in the lower yangtze river based on ga-rf model from gf-1, landsat-8, and sentinel-2 images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4992–5004. [Google Scholar] [CrossRef]
  19. Tan, Z.; Ren, J.; Li, S.; Li, W.; Zhang, R.; Sun, T. Inversion of nutrient concentrations using machine learning and influencing factors in minjiang river. Water 2023, 15, 1398. [Google Scholar] [CrossRef]
  20. Wu, H.; Guo, Q.; Zang, J.; Qiao, Y.; Zhu, L.; He, Y. Study on water quality parameter inversion based on landsat 8 and measured data. Remote Sens. Technol. Appl. 2021, 36, 898–907. [Google Scholar]
  21. Guo, H.; Huang, J.J.; Zhu, X.; Tian, S.; Wang, B. Spatiotemporal variation reconstruction of total phosphorus in the great lakes since 2002 using remote sensing and deep neural network. Water Res. 2024, 255, 121493. [Google Scholar] [CrossRef] [PubMed]
  22. Fang, C.; Song, C.; Wang, X.; Wang, Q.; Tao, H.; Wang, X.; Ma, Y.; Song, K. A novel total phosphorus concentration retrieval method based on two-line classification in lakes and reservoirs across China. Sci. Total Environ. 2024, 906, 167522. [Google Scholar] [CrossRef]
  23. Xiong, J.; Lin, C.; Ma, R.; Wang, X.; Xue, K.; Cao, Z.; Hu, M.; Chen, L. Remote sensing observations of phosphorus in eutrophic lakes: From concentration to storage. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4203812. [Google Scholar] [CrossRef]
  24. Lu, J.; Li, L.; Jiang, E.; Gan, R.; Liu, C.; Deng, Y. Ecological water demand estimations for desert terminal lake survival under inland river water diversion regulation. Water 2023, 15, 66. [Google Scholar] [CrossRef]
  25. Copernicus Data Space Ecosystem|Europe’s Eyes on Earth. Available online: https://dataspace.copernicus.eu/ (accessed on 21 September 2025).
  26. Guo, H.; Huang, J.J.; Chen, B.; Guo, X.; Singh, V.P. A machine learning-based strategy for estimating non-optically active water quality parameters using sentinel-2 imagery. Int. J. Remote Sens. 2021, 42, 1841–1866. [Google Scholar] [CrossRef]
  27. Johnson, S.; Perumalsamy, D. Application of xgboost algorithm and grid search hyperparameter tuning to study health effects among individuals in the industrial area. Multimed. Tools Appl. 2025, 84, 34449–34492. [Google Scholar] [CrossRef]
  28. Tran, N.T.; Tran, T.T.G.; Nguyen, T.A.; Lam, M.B. A new grid search algorithm based on xgboost model for load forecasting. Bull. Electr. Eng. Inform. 2023, 12, 1857–1866. [Google Scholar] [CrossRef]
  29. GB 3838-2002; Environmental Quality Standards for Surface Water. Ministry of Ecology and Environment of the PRC: Beijing, China, 2002.
Figure 1. (a) Geographical location map, (b) True-color satellite image (Sentinel-2, acquired on 18 August 2023), and (c) land use classification overview (from ESA WorldCover 10 m 2021 v200) of the study area.
Figure 1. (a) Geographical location map, (b) True-color satellite image (Sentinel-2, acquired on 18 August 2023), and (c) land use classification overview (from ESA WorldCover 10 m 2021 v200) of the study area.
Hydrology 12 00299 g001
Figure 2. Feature importance score (Top 10) of RF (a) and Grid Search-Based XGBoost model (b).
Figure 2. Feature importance score (Top 10) of RF (a) and Grid Search-Based XGBoost model (b).
Hydrology 12 00299 g002
Figure 3. Corresponding cross-validation scatter plots of (a) Band ratio, (b) RF, (c) Grid Search-Based XGBoost.
Figure 3. Corresponding cross-validation scatter plots of (a) Band ratio, (b) RF, (c) Grid Search-Based XGBoost.
Hydrology 12 00299 g003
Figure 4. (a) Retrieved TP concentrations and (b) water quality classification in East Juyan Lake (Derived from Sentinel-2 imagery acquired on 18 August 2023, and in situ measurements collected on 17 August 2023, using the Grid Search-Based XGBoost inversion model).
Figure 4. (a) Retrieved TP concentrations and (b) water quality classification in East Juyan Lake (Derived from Sentinel-2 imagery acquired on 18 August 2023, and in situ measurements collected on 17 August 2023, using the Grid Search-Based XGBoost inversion model).
Hydrology 12 00299 g004
Table 1. Specifications of the Sentinel-2 imagery used in this study.
Table 1. Specifications of the Sentinel-2 imagery used in this study.
NO.PropertyValue
1Spacecraft NameSentinel-2B
2MGRS Tile47TPG
3Product IDS2B_MSIL1C_20230818T040549_N0509_R047_T47TPG
4LevelL1C
5Cloudy Pixel Percentage0
Sensing Time18 August 2023 04:05:49.024 GMT
Table 2. Measured water quality parameter values.
Table 2. Measured water quality parameter values.
ConcentrationMeasured Water Quality (mg/L)
TPTNCODMnTDS
Range0.185–0.3551.74–3.4511.48–18.5213,410–14,630
Mean0.2282.2414.5513,734
Standard Deviation0.0330.341.60252.43
Table 3. Correlation Coefficients Between Sentinel-2 Band Ratios and Measured TP.
Table 3. Correlation Coefficients Between Sentinel-2 Band Ratios and Measured TP.
NO.Bands RatioCorrelation Coefficient
1B2/B50.412
2B3/B50.578
3B4/B5−0.436
4B5/B60.378
5B3/B70.345
6B5/B80.433
7B6/B110.485
8B7/B110.476
9B8/B9−0.478
10B4/B110.467
11B3/B90.513
12B2/B30.505
13(B3 − B5)/(B3 + B5)0.436
14(B4 − B5)/(B4 + B5)−0.453
15(B8 − B9)/(B8 + B9)−0.531
16(B5 − B8)/(B5 + B8)0.485
17(B3 − B10)/(B3 + B10)0.418
18(B5 − B7)/(B5 + B7)0.385
Table 4. Parameter settings of the RF and Grid Search-Based XGBoost model.
Table 4. Parameter settings of the RF and Grid Search-Based XGBoost model.
RFGrid Search-Based XGBoost
ParameterSetting ValueParameterSetting Value
N estimators30eta0.05
max_depth4.00
Max depth4reg_alpha0.1
reg_lambda0.5
gamma0.1
Max featureslog2subsample0.8
colsample_bytree0.8
n_estimators50.00
Table 5. Water Quality Classes Corresponding to TP Concentrations.
Table 5. Water Quality Classes Corresponding to TP Concentrations.
ClassTP Concentrations (mg/L)
I≤0.01
II≤0.025
III≤0.05
IV≤0.1
V≤0.2
Inferior V>0.2
Table 6. Model Performance Comparison.
Table 6. Model Performance Comparison.
ModelR2RMSEMRE
Band ratio0.6070.02811.89%
RF0.7340.0239.78%
Grid Search-Based XGBoost0.8560.0177.20%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, Y.; Yang, W.; Hu, M.; Li, J.; Liu, X. Research on Remote Sensing Inversion of Total Phosphorus in East Juyan Lake Based on Machine Learning. Hydrology 2025, 12, 299. https://doi.org/10.3390/hydrology12110299

AMA Style

Zhou Y, Yang W, Hu M, Li J, Liu X. Research on Remote Sensing Inversion of Total Phosphorus in East Juyan Lake Based on Machine Learning. Hydrology. 2025; 12(11):299. https://doi.org/10.3390/hydrology12110299

Chicago/Turabian Style

Zhou, Yi, Weilong Yang, Ming Hu, Junnan Li, and Xiaotong Liu. 2025. "Research on Remote Sensing Inversion of Total Phosphorus in East Juyan Lake Based on Machine Learning" Hydrology 12, no. 11: 299. https://doi.org/10.3390/hydrology12110299

APA Style

Zhou, Y., Yang, W., Hu, M., Li, J., & Liu, X. (2025). Research on Remote Sensing Inversion of Total Phosphorus in East Juyan Lake Based on Machine Learning. Hydrology, 12(11), 299. https://doi.org/10.3390/hydrology12110299

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop