Next Article in Journal
An Assessment of Collector-Drainage Water and Groundwater—An Application of CCME WQI Model
Previous Article in Journal
Integration of Remote Sensing and Machine Learning Approaches for Operational Flood Monitoring Along the Coastlines of Bangladesh Under Extreme Weather Events
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Retrieval of Chlorophyll-a Concentration in Nanyi Lake Using the AutoGluon Framework

1
School of Earth Sciences and Spatial Information Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
2
Hunan Yueyang Ecological Environment Monitoring Center, Yueyang 414000, China
3
Hunan Piesat Hongtu UAV System Co., Ltd., Changsha 410131, China
*
Author to whom correspondence should be addressed.
Water 2025, 17(15), 2190; https://doi.org/10.3390/w17152190
Submission received: 15 May 2025 / Revised: 8 July 2025 / Accepted: 16 July 2025 / Published: 23 July 2025
(This article belongs to the Section Water Resources Management, Policy and Governance)

Abstract

The chlorophyll-a (Chl-a) concentration in lakes is a crucial parameter for monitoring water quality and assessing phytoplankton abundance. However, accurately retrieving Chl-a concentrations remains a significant challenge in remote sensing. To address the limitations of existing methods in terms of modeling efficiency and adaptability, this study focuses on Lake Nanyi in Anhui Province. By integrating Sentinel-2 satellite imagery with in situ water quality measurements and employing the AutoML framework AutoGluon, a Chl-a inversion model based on narrow-band spectral features is developed. Feature selection and model ensembling identify bands B6 (740 nm) and B7 (783 nm) as the optimal combination, which are then applied to multi-temporal imagery from October 2022 to generate spatial mean distributions of Chl-a in Lake Nanyi. The results demonstrate that the AutoGluon framework significantly outperforms traditional methods in both model accuracy (R2: 0.94, RMSE: 1.67 μ g/L) and development efficiency. The retrieval results reveal spatial heterogeneity in Chl-a concentration, with higher concentrations observed in the southern part of the western lake and the western side of the eastern lake, while the central lake area exhibits relatively lower concentrations, ranging from 3.66 to 21.39 μ g/L. This study presents an efficient and reliable approach for lake ecological monitoring and underscores the potential of AutoML in water color remote sensing applications.

1. Introduction

Lakes play a vital role in the global carbon cycle, serve as essential sources of freshwater for human consumption, and support biodiversity. These ecosystem functions underscore the importance of monitoring lake water quality [1]. Chlorophyll-a (Chl-a), a natural contributor to light attenuation, is widely recognized as a biological indicator for assessing phytoplankton abundance and eutrophication [2]. It serves not only as a proxy for phytoplankton biomass and nutrient enrichment but also as a direct indicator of the onset and progression of potentially harmful algal blooms [3]. Therefore, the Chl-a concentration is a critical metric for evaluating the ecological health of aquatic environments [4], playing a central role in assessing aquatic organism growth and overall water quality [5].
To monitor Chl-a concentrations, various remote sensing inversion methods have been developed, including empirical statistical models, semi-analytical models, and machine learning approaches [6]. Empirical models establish statistical relationships between Chl-a concentrations and spectral bands or band combinations, typically using regression analysis to construct inversion formulas [7,8,9]. Although easy to implement, such models often lack generalizability and require site-specific calibration. Semi-analytical models, grounded in radiative transfer theory, decompose the optical properties of water into absorption and scattering components to estimate Chl-a from spectral reflectance [10,11,12]. Despite their physical robustness, their complexity and sensitivity to water optical variability limit their broader applicability.
Recent advances in computer science and artificial intelligence have facilitated the application of machine learning to Chl-a retrieval. Machine learning algorithms are capable of extracting nonlinear features and modeling complex relationships between spectral data and Chl-a concentrations [13]. Models such as Support Vector Machines (SVMs), Random Forests (RFs), and deep learning architectures (e.g., neural networks) have demonstrated high inversion accuracy across varying water conditions [14,15,16]. However, these methods often require labor-intensive tasks, including model selection, feature engineering, and hyperparameter tuning, which demand substantial algorithmic expertise [17,18,19].
To address these challenges and improve modeling efficiency, this study introduces the AutoML framework AutoGluon. AutoGluon automates model selection, hyperparameter optimization, and feature engineering, and leverages ensemble learning to integrate the strengths of multiple algorithms, thereby enabling the rapid development of high-performance models [20]. In environmental science applications—such as water quality prediction and nutrient concentration estimation—AutoML has demonstrated superior performance compared to traditional machine learning methods [21,22,23,24].
This study focuses on Nanyi Lake as the research area, integrating Sentinel-2 imagery data and in situ water body measurements to construct chlorophyll-a concentration inversion models with different feature band combination schemes using the AutoGluon framework. The optimal feature band combination was selected and applied to the inversion of chlorophyll-a concentrations in Nanyi Lake. This research provides a scientific foundation for ecological monitoring in Lake Nanyi and highlights the potential of AutoML in aquatic color remote sensing applications.

2. Materials and Methods

2.1. Study Area

Lake Nanyi (31°2 N–31°11 N, 118°50 E–119°2 E) is a natural freshwater lake situated at the junction of Xuancheng District and Langxi County in Southern Anhui Province, China. It forms part of the ancient Danyang Lake system (Figure 1). The lake is divided into eastern and western sections, forming a “V”-shaped surface area of approximately 189 km2. The region experiences a subtropical monsoon climate, with an average annual temperature of 15.9 °C and mean annual precipitation of 1168.2 mm. Lake Nanyi has an average water depth of 2 to 4 m, with primary pollution sources including agricultural runoff and aquaculture wastewater (mainly nitrogen and phosphorus) during the flood season, as well as domestic sewage and endogenous pollution from lake sediments during the non-flood season [25]. Lake Nanyi serves as a critical drinking water source for local residents and plays an essential role in supporting industrial and agricultural water demands. The lake receives its inflow primarily from surrounding rivers and direct precipitation. As a typical freshwater lake, Lake Nanyi’s physical and environmental characteristics make it a suitable case study for investigating chlorophyll-a (Chl-a) concentration dynamics.

2.2. Sentinel-2 Imagery and Preprocessing

Sentinel-2 is a key Earth observation mission under the Copernicus Program of the European Space Agency. The mission primarily aims to monitor the Earth’s surface, providing remote sensing services for applications such as forest monitoring, land cover change detection, and natural disaster management [26]. Although Sentinel-2 was not specifically designed for water body monitoring, it has been selected as the data source for this study due to its advantageous characteristics for aquatic environments, including its spatial, temporal, and spectral resolutions. The band parameters are provided in Table 1.
Sentinel-2 data were obtained from the Copernicus Data Space Ecosystem (CDSE). The Level-1C (L1C) top-of-atmosphere reflectance products, which have undergone geometric correction, were used for further processing. ACOLITE, developed by the Royal Belgian Institute of Natural Sciences (RBINS), is an atmospheric correction tool specifically designed for water bodies, suitable for both coastal and inland water studies [27]. In this study, ACOLITE version 20250402.0 was used. By default, ACOLITE applies Dark Spectrum Fitting (DSF) for atmospheric correction [28], performs bilinear interpolation to resample imagery to a 10-meter spatial resolution aligned with the original grid, and automatically masks non-water areas, cloud cover, and anomalous pixels with high top-of-atmosphere reflectance (via spectral thresholding), thereby extracting pure water spectral information. In addition, ACOLITE accounts for the adjacency effect, integrating the RAdCor module, a physically-based adjacency correction method for both land and water. This module utilizes the Atmospheric Point Spread Function (APSF), derived from real spectral data, to account for wavelength-dependent diffuse transmission and the spatial distribution of surface reflectance [29].

2.3. In Situ Water Quality Measurements and Preprocessing

The in situ water spectral sample data were obtained from the semi-final dataset of the 2023 “Gaofen Earth Observation Application Technology Innovation Competition”, organized by the China Platform of Earth Observation System. A total of 85 samples were provided, including attributes such as observation time, water spectral reflectance, chlorophyll-a concentration, and geographic coordinates. Chlorophyll -a concentration data were collected in accordance with National Environmental Protection Standard of the People’s Republic of China: HJ 897-2017 [30], using the spectrophotometric method for measurement in the laboratory [31]. The water spectral reflectance was measured using an RS-5400 spectroradiometer (SPECTRAL EVOLUTION, Lawrence, MA, USA), with a wavelength interval of 1 nm and a wavelength range of 328–937 nm. Measurements were performed using the above-water method to avoid interference from solar flares and ship shadows [32]. Sampling was conducted from 2 June 2022, to 18 October 2022, with six sampling campaigns distributed across the entire Nanyi Lake water body, covering the main functional zones of the lake. The sampling locations are shown in Figure 1, and the spectral curves are presented in Figure 2.
As shown in Figure 2, due to the absorption of most incident solar radiation by the water body, reflectance in the visible-to-near-infrared (VNIR) range is generally low. Within the 400–670 nm range, water reflectance gradually forms a primary peak, with a maximum value below 0.04 near 580 nm. This is attributed to the strong absorption of blue and red wavelengths by chlorophyll-a, along with its higher reflectance in the green region. In the 670–740 nm red-edge region, the reflectance increases initially and then decreases, forming a secondary peak near 700 nm, with values below 0.035. In the 740–937 nm near-infrared (NIR) range, reflectance further declines due to the strong absorption of infrared radiation by water. The spectral curve shown in Figure 2 demonstrates the typical characteristics of inland freshwater lakes [33]. Chlorophyll-a concentration in the water samples ranged from a minimum of 1.53 μ g/L to a maximum of 26.53 μ g/L, with an average of 10.13 μ g/L. The measured spectral curves exhibit a positive correlation with chlorophyll-a concentration, with reflectance increasing as chlorophyll-a concentration rises. Within the 456–880 nm wavelength range, the Pearson correlation coefficient (r) between the reflectance and chlorophyll-a concentration ranges from 0.397 to 0.648. Notably, in the 706–867 nm range, chlorophyll-a concentration exhibits a moderate correlation with reflectance (r > 0.6).
In general, the measured water spectrum represents the total reflectance ( ρ , dimensionless) of diffuse reflection on a horizontal plane. By applying the sensor’s spectral response function, the measured reflectance ρ is further converted into the narrow-band reflectance ρ r s , which corresponds to the spectral characteristics of the sensor’s multispectral bands [34]. In other words, the continuous measured spectrum ρ is transformed into sensor-equivalent narrow-band reflectance ρ r s . The conversion is carried out using Equation (1):
ρ r s λ i = λ 1 λ 2 ρ λ · S R F λ d λ λ 1 λ 2 S R F λ d λ
In Equation (1), ρ r s λ i denotes the narrow-band surface reflectance at the center wavelength λ i of the i-th band of the sensor, ρ λ represents the measured reflectance at wavelength λ , λ 1 , λ 2 defines the wavelength range of the i-th band of the sensor, and S R F λ is the sensor’s spectral response function. Using Equation (1), the 85 measured water spectra are converted into sensor-equivalent narrow-band reflectance corresponding to S2 bands and subsequently used for model training and validation.

2.4. Automatic Machine Learning Framework

AutoGluon is an open-source automated machine learning (AutoML) framework developed by Amazon Web Services (AWS). It is designed to automatically optimize machine learning workflows, including tasks such as data preprocessing, model architecture selection, and hyperparameter tuning. Model ensembling is a common strategy in machine learning, whereby combining predictions from multiple models can substantially improve predictive accuracy and reduce variance [35]. Traditional stacking methods construct a meta-model based on aggregated predictions from base learners to compensate for individual model weaknesses and leverage their complementary strengths [36]. Unlike conventional approaches, AutoGluon implements a novel multi-layer stacking ensemble strategy. It employs the same model types in both the base and stacking layers, while using a combination of original features and model predictions as input. Finally, AutoGluon applies ensemble selection to aggregate predictions from the stacked models, thereby enhancing overall accuracy and robustness [20].
Upon loading the dataset, AutoGluon automatically infers the task type—such as classification, regression, or time series—based on the input structure. In this study, chlorophyll-a concentration inversion is formulated as a regression task. During preprocessing, AutoGluon automatically handles essential steps including data splitting into training and validation sets, missing value imputation, and one-hot encoding. Model training is initiated via the fit function, which by default explores a wide range of model types, such as NeuralNetTorch, KNeighborsDist, ExtraTreesMSE, CatBoost, XGBoost, KNeighborsUnif, NeuralNetFastAI, LightGBM, RandomForestMSE, LightGBMLarge, and LightGBMXT, along with a final ensemble model WeightedEnsemble. During training, AutoGluon performs k-fold cross-validation and model stacking while iterating through hyperparameter tuning to optimize validation performance. Once training is complete, the model is evaluated on the test dataset and used to make predictions on new data. The version of AutoGluon used in this study is 1.0.0.

2.5. Technical Workflow

The technical workflow is illustrated in Figure 3. Initially, various feature combination training schemes are designed. Subsequently, model training is performed within the AutoGluon automated machine learning framework. Finally, based on the test results, the optimal feature combination is selected and applied to invert the chlorophyll-a concentration in Lake Nanyi.

3. Results and Discussion

3.1. Optimal Combination of Characteristic Spectral Bands

3.1.1. Feature Selection and Scoring

The Pearson correlation coefficient is a simple yet effective method for understanding the relationships between features and the target variable, measuring the degree of linear correlation between variables [37]. AutoGluon offers a built-in function, “feature_importance”, which evaluates the contribution of each feature to model performance by assessing the decrease in prediction accuracy when the values of a specific feature are randomly shuffled across rows, creating a perturbed version of the dataset.
In this study, we compute the feature importance scores between the simulated hyperspectral narrowband reflectance and chlorophyll-a concentration (Figure 4). These scores reflect the contribution of each spectral band to the prediction of chlorophyll-a concentration in water bodies. As shown in Figure 4, the overall trend of the importance scores aligns well with the corresponding Pearson correlation coefficients. Band B6 exhibits the highest importance score (11.52), followed by Bands B5 (8.61) and B7 (6.60), ranking second and third in importance, respectively. In contrast, Band B2 has the lowest importance score (1.19), suggesting its limited contribution to the prediction of chlorophyll-a concentration.
According to the correlation scores, Bands B5 (0.591), B6 (0.646), B7 (0.612), B8 (0.615), and B8A (0.603) exhibit relatively strong correlations with chlorophyll-a concentration, while Band B3 (0.411) shows a comparatively weaker correlation. Band B6 demonstrates strong performance in both correlation and importance scores, suggesting that it is a key feature for predicting chlorophyll-a concentration in water bodies. Similarly, Bands B5 and B7 also show relatively high scores in both metrics, indicating their significance as predictive features. Interestingly, although Band B3 has a lower correlation score, it yields a high importance score. This suggests that it may have a nonlinear relationship with the chlorophyll-a concentration or may interact with other spectral bands in a way that contributes meaningfully to the prediction, thus still possessing predictive value.

3.1.2. Feature Engineering and Evaluation

The model is trained using simulated hyperspectral narrowband reflectance data, with 85 samples divided into training and testing sets in a 73:12 ratio. The test set, randomly stratified by sampling time, is kept completely independent and is not involved in model training. It is used solely to provide an unbiased evaluation of the final model’s performance. During training, AutoGluon further partitions the 73 training samples using k-fold cross-validation. This approach ensures that all data are used for both training and validation, thereby mitigating potential errors arising from an improper validation set—especially critical when the sample size is limited [38]. In this experiment, the number of folds is set to seven, automatically determined by AutoGluon.
Based on the feature importance scores, various spectral band combinations are designed for model training. In scenarios where red-edge bands are unavailable (e.g., in Landsat-8 data), a more general RGB + NIR band combination is used. In total, nine different spectral band combinations are tested. Model performance is evaluated using the coefficient of determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE).
Figure 5 presents a comparison of model performance between base models and ensemble models within the AutoGluon framework across different feature band combinations. As shown in Figure 5, AutoGluon’s stacking and ensemble strategies consistently exhibit superior performance across most schemes. Except for Schemes II, VII, and IX, the ensemble model outperforms all individual base models; in these three cases, the ensemble model performs comparably to the best-performing base model. By assigning different weights to multiple models in the second layer, the ensemble approach effectively integrates the strengths of its component models, thereby significantly enhancing prediction accuracy and robustness.
Figure 6 shows scatter plots comparing the predicted and observed chlorophyll-a concentrations for the ensemble model under different feature schemes. According to the distribution of data points, Scheme VIII (B6, B7) achieves the best performance across all evaluation metrics, with an R2 of 0.94, RMSE of 1.67, and MAPE of 20.82%. In contrast, other combinations show only marginal improvements when additional bands are included. Furthermore, the high intercorrelation among spectral bands increases the computational complexity and may introduce redundant information. Therefore, Scheme VIII (B6, B7) is selected as the optimal inversion model for chlorophyll-a concentration.
It is noteworthy that the B6 (740 nm) and B7 (783 nm) bands of Sentinel-2 are located in the typical red-edge region, which reflects the transition from strong red light absorption to strong near-infrared reflectance and is highly sensitive to changes in chlorophyll-a concentration. Relevant studies indicate that the red-edge bands can effectively capture spectral changes caused by chlorophyll-a in water bodies, with a clear biophysical basis [39,40,41]. Therefore, from the feature selection results, the combination of B6 and B7 not only exhibits the best model performance but also aligns with the theoretical expectations regarding the absorption mechanisms of chlorophyll-a in spectral inversion.

3.2. Remote Sensing-Based Inversion of Chlorophyll-a Concentration

Lake water environments are influenced by both human activities and natural factors, resulting in rapid changes that make it challenging for single-time retrieval results to accurately reflect the spatial distribution of Chl-a concentrations. Atmospheric correction was performed on Sentinel-2 Level 1C images from five different dates in October 2022 using ACOLITE. The corrected images were subsequently used for inversion to generate the average Chl-a concentration distribution in Nanyi Lake for October 2022. To classify the results, the geometric interval method is applied in ArcGIS, ensuring that the sum of squares of the number of elements in each class is minimized. The retrieval results are illustrated in Figure 7.
The retrieval results show that the average Chl-a concentration in Nanyi Lake for October 2022 range from 3.66 to 21.39 μ g/L. The Chl-a concentration distribution across the lake exhibits spatial heterogeneity, likely due to a combination of factors such as water depth, light penetration, sediment deposition, flow velocity, and human activities from surrounding areas. The northern part of the West Lake area is masked by ACOLITE as non-water pixels, while the southern part exhibits generally higher concentrations. In the East Lake area, the Chl-a concentration is higher on the western side, with a gradual transition outward. This spatial pattern may be attributed to the more uniform influence of nutrients and environmental conditions on phytoplankton proliferation. The central part of the lake shows relatively low concentrations, potentially due to the reduced light availability or nutrient-poor conditions that inhibit phytoplankton growth. In the southernmost part of the East Lake area, the Chl-a concentrations show an increase.

3.3. Discussion

3.3.1. Advantages and Limitations of AutoGluon

Traditional water quality parameter inversion methods typically rely on expert knowledge and experience, resulting in complex and time-consuming model construction processes. This challenge is particularly pronounced when addressing different aquatic environments or when inverting multiple parameters simultaneously. These traditional methods exhibit significant limitations in terms of adaptability and scalability. Moreover, they often overlook the potential for complex nonlinear relationships between spectral signals and water quality parameters, requiring extensive regional calibration for each specific water body. In contrast, the AutoGluon framework reduces the dependence on expert knowledge and enhances model development efficiency by automating feature engineering, model selection, and hyperparameter optimization. Its multi-model ensemble capability allows the model to better capture complex nonlinear relationships within the data, thus improving its robustness. For smaller datasets, AutoGluon ensures model stability and accuracy through techniques such as repeated ensemble methods, default non-hyperparameter tuning, early stopping control, and nested cross-validation. Its high-level encapsulated API and automated modeling features enable users from non-remote sensing fields to quickly construct high-performance models (training was conducted under the following configuration: 2*Tesla V100 GPUs [NVIDIA Corp., Santa Clara, CA, USA] and an Intel Xeon Gold 5220 CPU [Intel Corp., Santa Clara, CA, USA]; the average training time for ensemble models in this study was approximately 25 s), significantly lowering the barrier to model development.
Despite AutoGluon’s superior performance in water quality inversion tasks, the interpretability of its models remains relatively limited. Recent research has explored interpretability techniques such as SHAP and LIME to better understand the decision-making mechanisms and internal logic of machine learning models [42,43]. These efforts provide valuable directions for optimizing and expanding the practical applications of these models in the future. Future research will explore integrating the AutoGluon framework with typical physical process models, such as the Environmental Fluid Dynamics Code (EFDC), to develop a hybrid water quality parameter inversion model combining physics-driven and data-driven approaches [44,45]. For instance, hydrodynamic and water quality process simulation results from EFDC (e.g., time-series predictions of chlorophyll-a) can be used as input or prior information for the AutoGluon Chronos model [46], enhancing model prediction accuracy and physical interpretability through residual learning or multi-source fusion strategies.
Furthermore, AutoGluon’s automation capabilities make it not only suitable for Chl-a concentration inversion tasks but also highly versatile with broad potential applications. By extending the framework to incorporate additional water quality parameters (e.g., total suspended matter, dissolved organic matter) and integrating multi-source remote sensing data (e.g., Landsat, Gaofen series), AutoGluon can be adapted to more complex water quality monitoring tasks. Its potential in remote sensing big data processing warrants further exploration, particularly in time series analysis and cross-regional applications. In the future, the combination of multi-source data and long-term sequence data will enable the full utilization of remote sensing big data, thereby advancing the widespread use of AutoGluon in lake ecosystem protection, environmental management, and other related fields.

3.3.2. Effects of Atmospheric Correction on Inversion Accuracy

The statistics of the Sentinel-2 overpasses synchronized with in situ water body observations are summarized in Table 2. In this study, statistical metrics including the coefficient of determination (R2), Mean Absolute Percentage Error (MAPE), and Median Symmetric Accuracy (MdSA) are used to evaluate the performance of atmospheric correction results obtained using ACOLITE. Based on the matching conditions outlined in Table 2, these metrics are calculated separately for both Sentinel-2A (S2A) and Sentinel-2B (S2B), with the detailed evaluation results presented in Table 3.
The Global Climate Observing System (GCOS) has established explicit requirements for lake observations within its Essential Climate Variable (ECV) products. Specifically, the uncertainty of lake water reflectance must satisfy the criterion of a Median Symmetric Accuracy (MdSA) below 30% [47,48].
ACOLITE performs atmospheric correction by automatically acquiring auxiliary data from NASA to retrieve parameters such as ozone concentration, water vapor content, air pressure, and wind speed. It estimates aerosol optical thickness (AOT) using the Dark Spectrum Fitting (DSF) method. This approach is highly automated and well-suited for large-scale processing. However, as shown in Table 3, even under consistent production environments, differences in sensors, image quality, and limitations in auxiliary data can cause certain corrected bands in Sentinel-2 imagery to fall short of GCOS standards [47].
The Atmospheric Correction Inter-Comparison Exercise (ACIX) is an international initiative aimed at evaluating surface reflectance products generated by various state-of-the-art atmospheric correction processors. Results from ACIX indicate that for inland waters, even the best-performing algorithms, such as ACOLITE and iCOR, introduce uncertainties of 20–30% in reflectance within the visible spectrum (490–743 nm). This level of uncertainty can propagate into downstream products, resulting in 25–70% uncertainty in parameters such as chlorophyll-a concentration and total suspended matter [48]. To minimize the impact of atmospheric correction on such downstream products, it is recommended that the MdSA for key spectral bands be maintained below 10% [49].
Enhancing the accuracy of atmospheric correction is an effective strategy to improve the precision of retrieval results. Aerosol models play a critical role in the atmospheric correction process; however, current models may still exhibit limitations in accuracy and generalizability across different environments. Further research is required to improve their performance under varying conditions [50]. Additionally, considering the optically complex and heterogeneous nature of inland waters, region-specific fine-tuning of correction parameters can enhance model adaptability and more effectively reduce retrieval errors.

4. Conclusions

This study successfully developed and validated a chlorophyll-a (Chl-a) concentration inversion model based on the AutoGluon framework, utilizing in situ water body spectral data and Sentinel-2 imagery from Nanyi Lake. The results demonstrate that AutoGluon effectively leverages the strengths of multiple machine learning algorithms through automated feature engineering, model selection, and hyperparameter optimization. Its stacking and ensemble strategies achieve robust inference performance across various feature combinations, with the ensemble model significantly outperforming traditional methods. Based on feature scores derived from simulated narrowband reflectance and in situ Chl-a concentrations, as well as experimental validation, bands B6 (740 nm) and B7 (783 nm) bands were identified as the key band combination for Chl-a concentration inversion. Compared to including additional bands, this streamlined feature combination maintained high model accuracy (R2 = 0.94, RMSE = 1.67 μ g/L) while reducing computational complexity. The retrieval results further revealed the spatial heterogeneity of Chl-a concentration distribution in Nanyi Lake, ranging from 3.66 to 21.39 μ g/L, with higher concentrations in the southern part of the West Lake area and the western side of the East Lake area, and relatively lower concentrations in the central lake area. These patterns may be influenced by a combination of hydrological conditions, light penetration, and human activities.
This study introduces the AutoGluon framework to water color remote sensing inversion, significantly improving inversion accuracy and efficiency while lowering the barrier to model development. It demonstrates promising application potential for lake chlorophyll concentration analysis, particularly in low-concentration scenarios. However, we acknowledge the limitations of the current dataset, which may not fully represent a broader range of conditions. To address this, future research will expand to include additional lakes, incorporating datasets with diverse trophic states and chlorophyll concentration ranges (e.g., A Database of Chlorophyll and Water Chemistry in Freshwater Lakes) to further test the robustness and generalizability of the method. Furthermore, future work will explore integrating physical models with multi-source data to enhance model interpretability and adaptability, as well as extending the application potential of AutoGluon to monitoring other water quality parameters.

Author Contributions

Conceptualization, W.G.; data curation, W.G.; funding acquisition, J.L.; investigation, W.G.; methodology, W.G.; project administration, J.L.; resources, S.G.; validation, W.G., J.L., L.Y. and R.J.; visualization, W.G.; writing—original draft, W.G.; writing—review and editing, W.G. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Hunan Environmental Protection Research Project (No. HBKT-2022018); Research Foundation of the Department of Natural Resources of Hunan Province (No. 20230127DZ); and Student Research and Innovation Program (SRIP) of Hunan University of Science and Technology (No. ZX2335, S202310534185).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Author Shanshan Guo was employed by Hunan Piesat Hongtu UAV System Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Free, G.; Bresciani, M.; Trodd, W.; Tierney, D.; O’Boyle, S.; Plant, C.; Deakin, J. Estimation of lake ecological quality from Sentinel-2 remote sensing imagery. Hydrobiologia 2020, 847, 1423–1438. [Google Scholar] [CrossRef]
  2. Huang, W.; Mukherjee, D.; Chen, S. Assessment of Hurricane Ivan impact on chlorophyll-a in Pensacola Bay by MODIS 250 m remote sensing. Mar. Pollut. Bull. 2011, 62, 490–498. [Google Scholar] [CrossRef] [PubMed]
  3. Schaeffer, B.A.; Hagy, J.D.; Conmy, R.N.; Lehrter, J.C.; Stumpf, R.P. An approach to developing numeric water quality criteria for coastal waters using the SeaWiFS satellite data record. Environ. Sci. Technol. 2012, 46, 916–922. [Google Scholar] [CrossRef] [PubMed]
  4. Mishra, D.R.; Schaeffer, B.A.; Keith, D. Performance evaluation of normalized difference chlorophyll index in northern Gulf of Mexico estuaries using the Hyperspectral Imager for the Coastal Ocean. GIScience Remote Sens. 2014, 51, 175–198. [Google Scholar] [CrossRef]
  5. Mineeva, N. Chlorophyll and its role in freshwater ecosystem on the example of the Volga River reservoirs. In Chlorophylls; IntechOpen: London, UK, 2022. [Google Scholar]
  6. Liu, Y.; Li, J.; Xiao, C.; Zhang, F.; Wang, S. Inland water chlorophyll-a retrieval based on ZY-1 02D satellite hyperspectral observations. Natl. Remote Sens. Bull. 2022, 26, 168–178. [Google Scholar] [CrossRef]
  7. Shi, K.; Zhang, Y.; Zhu, G.; Liu, X.; Zhou, Y.; Xu, H.; Qin, B.; Liu, G.; Li, Y. Long-term remote monitoring of total suspended matter concentration in Lake Taihu using 250 m MODIS-Aqua data. Remote Sens. Environ. 2015, 164, 43–56. [Google Scholar] [CrossRef]
  8. Gidudu, A.; Letaru, L.; Kulabako, R.N. Empirical modeling of chlorophyll a from MODIS satellite imagery for trophic status monitoring of Lake Victoria in East Africa. J. Great Lakes Res. 2021, 47, 1209–1218. [Google Scholar] [CrossRef]
  9. Ouma, Y.O.; Noor, K.; Herbert, K. Modelling Reservoir Chlorophyll-a, TSS, and Turbidity Using Sentinel-2A MSI and Landsat-8 OLI Satellite Sensors with Empirical Multivariate Regression. J. Sens. 2020, 2020, 8858408. [Google Scholar] [CrossRef]
  10. Dorji, P.; Fearns, P.; Broomhall, M. A semi-analytic model for estimating total suspended sediment concentration in turbid coastal waters of northern Western Australia using MODIS-Aqua 250 m data. Remote Sens. 2016, 8, 556. [Google Scholar] [CrossRef]
  11. Watanabe, F.; Alcântara, E.; Imai, N.; Rodrigues, T.; Bernardo, N. Estimation of chlorophyll-a concentration from optimizing a semi-analytical algorithm in productive inland waters. Remote Sens. 2018, 10, 227. [Google Scholar] [CrossRef]
  12. Allan, M.G.; Hamilton, D.P.; Hicks, B.; Brabyn, L. Empirical and semi-analytical chlorophyll a algorithms for multi-temporal monitoring of New Zealand lakes using Landsat. Environ. Monit. Assess. 2015, 187, 364. [Google Scholar] [CrossRef] [PubMed]
  13. Blix, K.; Li, J.; Massicotte, P.; Matsuoka, A. Developing a new machine-learning algorithm for estimating chlorophyll-a concentration in optically complex waters: A case study for high northern latitude waters by using Sentinel 3 OLCI. Remote Sens. 2019, 11, 2076. [Google Scholar] [CrossRef]
  14. Zhang, T.; Huang, M.; Wang, Z. Estimation of chlorophyll-a Concentration of lakes based on SVM algorithm and Landsat 8 OLI images. Environ. Sci. Pollut. Res. 2020, 27, 14977–14990. [Google Scholar] [CrossRef] [PubMed]
  15. Shen, M.; Luo, J.; Cao, Z.; Xue, K.; Qi, T.; Ma, J.; Liu, D.; Song, K.; Feng, L.; Duan, H. Random forest: An optimal chlorophyll-a algorithm for optically complex inland water suffering atmospheric correction uncertainties. J. Hydrol. 2022, 615, 128685. [Google Scholar] [CrossRef]
  16. Syariz, M.A.; Lin, C.H.; Nguyen, M.V.; Jaelani, L.M.; Blanco, A.C. WaterNet: A convolutional neural network for chlorophyll-a concentration retrieval. Remote Sens. 2020, 12, 1966. [Google Scholar] [CrossRef]
  17. Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
  18. Bischl, B.; Binder, M.; Lang, M.; Pielok, T.; Richter, J.; Coors, S.; Thomas, J.; Ullmann, T.; Becker, M.; Boulesteix, A.L.; et al. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2023, 13, e1484. [Google Scholar] [CrossRef]
  19. Claesen, M.; De Moor, B. Hyperparameter search in machine learning. arXiv 2015, arXiv:1502.02127. [Google Scholar] [CrossRef]
  20. Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. Autogluon-tabular: Robust and accurate automl for structured data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
  21. Madni, H.A.; Umer, M.; Ishaq, A.; Abuzinadah, N.; Saidani, O.; Alsubai, S.; Hamdi, M.; Ashraf, I. Water-quality prediction based on H2O AutoML and explainable AI techniques. Water 2023, 15, 475. [Google Scholar] [CrossRef]
  22. Kim, G.E.; Steller, M.; Olson, S. Modeling watershed nutrient concentrations with AutoML. In Proceedings of the 10th International Conference on Climate Informatics, Online, 22–25 September 2020; pp. 86–90. [Google Scholar]
  23. Prasad, D.V.V.; Kumar, P.S.; Venkataramana, L.Y.; Prasannamedha, G.; Harshana, S.; Srividya, S.J.; Harrinei, K.; Indraganti, S. Automating water quality analysis using ML and auto ML techniques. Environ. Res. 2021, 202, 111720. [Google Scholar] [CrossRef] [PubMed]
  24. Prasad, D.V.V.; Venkataramana, L.Y.; Kumar, P.S.; Prasannamedha, G.; Harshana, S.; Srividya, S.J.; Harrinei, K.; Indraganti, S. Analysis and prediction of water quality using deep learning and auto deep learning techniques. Sci. Total Environ. 2022, 821, 153311. [Google Scholar] [CrossRef] [PubMed]
  25. Ding, L.; Qi, C.; Li, G.; Zhang, W. TP concentration inversion and pollution sources in Nanyi Lake based on Landsat 8 data and InVEST model. Sustainability 2023, 15, 9678. [Google Scholar] [CrossRef]
  26. Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
  27. Vanhellemont, Q.; Ruddick, K. Acolite for Sentinel-2: Aquatic applications of MSI imagery. In Proceedings of the 2016 ESA Living Planet Symposium, Prague, Czech Republic, 9–13 May 2016; Volume 9. [Google Scholar]
  28. Vanhellemont, Q. Adaptation of the dark spectrum fitting atmospheric correction for aquatic applications of the Landsat and Sentinel-2 archives. Remote Sens. Environ. 2019, 225, 175–192. [Google Scholar] [CrossRef]
  29. Castagna, A.; Vanhellemont, Q. A generalized physics-based correction for adjacency effects. Appl. Opt. 2025, 64, 2719–2743. [Google Scholar] [CrossRef]
  30. HJ 897-2017; Water Quality—Determination of Chlorophyll a—Spectrophotometric Method. Ministry of Ecology and Environment of People’s Republic of China: Beijing, China, 2017. Available online: https://english.mee.gov.cn/Resources/standards/water_environment/method_standard2/201801/t20180105_429208.shtml (accessed on 18 July 2025).
  31. Johan, F.; Jafri, M.; Lim, H.; Maznah, W.W. Laboratory measurement: Chlorophyll-a concentration measurement with acetone method using spectrophotometer. In Proceedings of the 2014 IEEE International Conference on Industrial Engineering and Engineering Management, Selangor Darul Ehsan, Malaysia, 9–12 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 744–748. [Google Scholar]
  32. Tang, J.W.; Tian, G.L.; Wang, X.Y.; Wang, X.M.; Song, Q.J. The methods of water spectra measurement and analysis I: Above-water method. J. Natl. Remote Sens. Bull. 2004, 8, 37–44. [Google Scholar]
  33. Qian-cheng, D.; Yong, X.; Zui, T.; Wen, S.; Fei-yu, P.; Yi, S.; Bang-hui, Y. Research on fluorescence retrieval algorithm of chlorophyll a concentration in Nanyi lake. Spectrosc. Spectr. Anal. 2022, 42, 3941–3947. [Google Scholar]
  34. Cui, T.; Ding, J.; Jia, F.; Mu, B.; Liu, R.; Xu, P.; Liu, J.; Zhang, J. Out-of-band response for the Coastal Zone Imager (CZI) onboard China’s ocean color satellite HY-1C: Effect on the observation just above the sea surface. Sensors 2018, 18, 3067. [Google Scholar] [CrossRef] [PubMed]
  35. Zounemat-Kermani, M.; Batelaan, O.; Fadaee, M.; Hinkelmann, R. Ensemble machine learning paradigms in hydrology: A review. J. Hydrol. 2021, 598, 126266. [Google Scholar] [CrossRef]
  36. Ting, K.M.; Witten, I.H. Stacking Bagged and Dagged Models. In Proceedings of the 14th International Conference on Machine Learning, San Francisco, CA, USA, 8–12 July 1997; pp. 367–375. [Google Scholar]
  37. Cohen, I.; Huang, Y.; Chen, J.; Benesty, J.; Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. In Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
  38. Fushiki, T. Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 2011, 21, 137–146. [Google Scholar] [CrossRef]
  39. Delegido, J.; Verrelst, J.; Alonso, L.; Moreno, J. Evaluation of sentinel-2 red-edge bands for empirical estimation of green LAI and chlorophyll content. Sensors 2011, 11, 7063–7081. [Google Scholar] [CrossRef] [PubMed]
  40. Clevers, J.G.; Gitelson, A.A. Remote estimation of crop and grass chlorophyll and nitrogen content using red-edge bands on Sentinel-2 and -3. Int. J. Appl. Earth Obs. Geoinf. 2013, 23, 344–351. [Google Scholar] [CrossRef]
  41. Bramich, J.; Bolch, C.J.; Fischer, A. Improved red-edge chlorophyll-a detection for Sentinel 2. Ecol. Indic. 2021, 120, 106876. [Google Scholar] [CrossRef]
  42. Al-Najjar, H.A.; Pradhan, B.; Beydoun, G.; Sarkar, R.; Park, H.J.; Alamri, A. A novel method using explainable artificial intelligence (XAI)-based Shapley Additive Explanations for spatial landslide prediction using Time-Series SAR dataset. Gondwana Res. 2023, 123, 107–124. [Google Scholar] [CrossRef]
  43. Sun, D.; Gu, Q.; Wen, H.; Xu, J.; Zhang, Y.; Shi, S.; Xue, M.; Zhou, X. Assessment of landslide susceptibility along mountain highways based on different machine learning algorithms and mapping units by hybrid factors screening and sample optimization. Gondwana Res. 2023, 123, 89–106. [Google Scholar] [CrossRef]
  44. Hamrick, J.M. A Three-Dimensional Environmental Fluid Dynamics Computer Code: Theoretical and Computational Aspects; Virginia Institute of Marine Science: Gloucester Point, VA, USA, 1992. [Google Scholar]
  45. Chen, C.; Chen, Q.; Yao, S.; He, M.; Zhang, J.; Li, G.; Lin, Y. Combining physical-based model and machine learning to forecast chlorophyll-a concentration in freshwater lakes. Sci. Total Environ. 2024, 907, 168097. [Google Scholar] [CrossRef] [PubMed]
  46. Ansari, A.F.; Stella, L.; Turkmen, C.; Zhang, X.; Mercado, P.; Shen, H.; Shchur, O.; Rangapuram, S.S.; Arango, S.P.; Kapoor, S.; et al. Chronos: Learning the language of time series. arXiv 2024, arXiv:2403.07815. [Google Scholar] [CrossRef]
  47. Zemp, M.; Chao, Q.; Han Dolman, A.J.; Herold, M.; Krug, T.; Speich, S.; Suda, K.; Thorne, P.; Yu, W. GCOS 2022 Implementation Plan; Technical Report; World Meteorological Organization: Geneva, Switzerland, 2022. [Google Scholar]
  48. Pahlevan, N.; Mangin, A.; Balasubramanian, S.V.; Smith, B.; Alikas, K.; Arai, K.; Barbosa, C.; Bélanger, S.; Binding, C.; Bresciani, M.; et al. ACIX-Aqua: A global assessment of atmospheric correction methods for Landsat-8 and Sentinel-2 over lakes, rivers, and coastal waters. Remote Sens. Environ. 2021, 258, 112366. [Google Scholar] [CrossRef]
  49. Cetinic, I.; McClain, C.R.; Werdell, P.J.; Ahmad, Z.; Franz, B.A.; Karakoylu, E.M.; McKinna, L.I.; Patt, F.S. PACE Technical Report Series, Volume 6: Data Product Requirements and Error Budgets Consensus Document; Technical Report; NASA Goddard Space Flight Center: Greenbelt, MD, USA, 2019.
  50. Frouin, R.J.; Franz, B.A.; Ibrahim, A.; Knobelspiesse, K.; Ahmad, Z.; Cairns, B.; Chowdhary, J.; Dierssen, H.M.; Tan, J.; Dubovik, O.; et al. Atmospheric correction of satellite ocean-color imagery during the PACE era. Front. Earth Sci. 2019, 7, 145. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Nanyi Lake in Anhui Province as a study area.
Figure 1. Nanyi Lake in Anhui Province as a study area.
Water 17 02190 g001
Figure 2. The measured data of water spectra and Chl-a.
Figure 2. The measured data of water spectra and Chl-a.
Water 17 02190 g002
Figure 3. Retrieval of Chl-a concentration using the AutoGluon intelligent framework.
Figure 3. Retrieval of Chl-a concentration using the AutoGluon intelligent framework.
Water 17 02190 g003
Figure 4. The score of characteristic bands.
Figure 4. The score of characteristic bands.
Water 17 02190 g004
Figure 5. Comparison of basic models and ensemble models under the AutoGluon framework in different schemes.
Figure 5. Comparison of basic models and ensemble models under the AutoGluon framework in different schemes.
Water 17 02190 g005
Figure 6. Scatter comparison of ensemble models across different schemes.
Figure 6. Scatter comparison of ensemble models across different schemes.
Water 17 02190 g006
Figure 7. The average Chl-a concentration in Nanyi Lake in October 2022.
Figure 7. The average Chl-a concentration in Nanyi Lake in October 2022.
Water 17 02190 g007
Table 1. Sentinel-2 satellite band parameter (Version 3.2).
Table 1. Sentinel-2 satellite band parameter (Version 3.2).
BandWavelength Range (nm)Central Wavelength (nm)Bandwidth (nm)Spatial Resolution (m)
B1411–4564432060
B2456–5324906510
B3536–5825603510
B4646–6856653010
B5694–7147051520
B6730–7487401520
B7766–7947832020
B8774–90784210510
B8A848–8808652020
B9930–9579452060
B101339–141513753060
B111538–167916109020
B122065–2303219018020
Table 2. Statistical of observation sampling and synchronous satellite.
Table 2. Statistical of observation sampling and synchronous satellite.
DateNumber of SamplesOverpassing SatelliteMatching SamplesCloud Cover
2 June 202210Sentinel-2A424.2%
15 June 202212Sentinel-2A62.96%
1 August 202212Sentinel-2A1228.1%
2 August 202212---
19 September 202213---
18 October 202226Sentinel-2B260.01%
Table 3. Atmospheric correction accuracy of ACOLITE (S2B&S2A).
Table 3. Atmospheric correction accuracy of ACOLITE (S2B&S2A).
BandR2 (S2B)MAPE (S2B)MdSA (S2B)R2 (S2A)MAPE (S2A)MdSA (S2A)
B20.9213.19%14.31%0.2825.12%24.44%
B30.926.89%5.29%0.6120.22%19.33%
B40.946.61%6.95%0.8013.32%11.50%
B50.9211.20%9.51%0.7012.05%7.44%
B60.909.34%9.46%0.1236.73%29.44%
B70.9011.28%10.35%0.3747.92%36.56%
B80.8912.09%9.04%0.0342.77%44.30%
B8A0.8411.10%6.92%0.00276.25%66.84%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gu, W.; Liang, J.; Yang, L.; Guo, S.; Jia, R. Retrieval of Chlorophyll-a Concentration in Nanyi Lake Using the AutoGluon Framework. Water 2025, 17, 2190. https://doi.org/10.3390/w17152190

AMA Style

Gu W, Liang J, Yang L, Guo S, Jia R. Retrieval of Chlorophyll-a Concentration in Nanyi Lake Using the AutoGluon Framework. Water. 2025; 17(15):2190. https://doi.org/10.3390/w17152190

Chicago/Turabian Style

Gu, Weibin, Ji Liang, Lian Yang, Shanshan Guo, and Ruixin Jia. 2025. "Retrieval of Chlorophyll-a Concentration in Nanyi Lake Using the AutoGluon Framework" Water 17, no. 15: 2190. https://doi.org/10.3390/w17152190

APA Style

Gu, W., Liang, J., Yang, L., Guo, S., & Jia, R. (2025). Retrieval of Chlorophyll-a Concentration in Nanyi Lake Using the AutoGluon Framework. Water, 17(15), 2190. https://doi.org/10.3390/w17152190

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop