PlanetScope Imagery and Hybrid AI Framework for Freshwater Lake Phosphorus Monitoring and Water Quality Management

Deng, Ying; Pan, Daiwei; Yang, Simon X.; Gharabaghi, Bahram

doi:10.3390/w18020261

Open AccessArticle

PlanetScope Imagery and Hybrid AI Framework for Freshwater Lake Phosphorus Monitoring and Water Quality Management

College of Engineering, University of Guelph, 50 Stone Road East, Guelph, ON N1G 2W1, Canada

^*

Authors to whom correspondence should be addressed.

Water 2026, 18(2), 261; https://doi.org/10.3390/w18020261

Submission received: 17 December 2025 / Revised: 10 January 2026 / Accepted: 14 January 2026 / Published: 19 January 2026

(This article belongs to the Section Water Quality and Contamination)

Download

Browse Figures

Versions Notes

Abstract

Accurate estimation of Total Phosphorus, referred to as “Phosphorus, Total” (PPUT; µg/L) in the sourced monitoring data, is essential for understanding eutrophication dynamics and guiding water-quality management in inland lakes. However, lake-wide PPUT mapping at high resolution is challenging to achieve using conventional in-situ sampling, and nearshore gradients are often poorly resolved by medium- or low-resolution satellite sensors. This study exploits multi-generation PlanetScope imagery (Dove Classic, Dove-R, and SuperDove; 3–5 m, near-daily revisit) to develop a hybrid AI framework for PPUT retrieval in Lake Simcoe, Ontario, Canada. PlanetScope surface reflectance, short-term meteorological descriptors (3 to 7-day aggregates of air temperature, wind speed, precipitation, and sea-level pressure), and in-situ Secchi depth (SSD) were used to train five ensemble-learning models (HistGradientBoosting, CatBoost, RandomForest, ExtraTrees, and GradientBoosting) across eight feature-group regimes that progressively extend from bands-only, to combinations with spectral indices and day-of-year (DOY), and finally to SSD-inclusive full-feature configurations. The inclusion of SSD led to a strong and systematic performance gain, with mean R² increasing from about 0.67 (SSD-free) to 0.94 (SSD-aware), confirming that vertically integrated optical clarity is the dominant constraint on PPUT retrieval and cannot be reconstructed from surface reflectance alone. To enable scalable SSD-free monitoring, a knowledge-distillation strategy was implemented in which an SSD-aware teacher transfers its learned representation to a student using only satellite and meteorological inputs. The optimal student model, based on a compact subset of 40 predictors, achieved R² = 0.83, RMSE = 9.82 µg/L, and MAE = 5.41 µg/L, retaining approximately 88% of the teacher’s explanatory power. Application of the student model to PlanetScope scenes from 2020 to 2025 produces meter-scale PPUT maps; a 26 July 2024 case study shows that >97% of the lake surface remains below 10 µg/L, while rare (<1%) but coherent hotspots above 20 µg/L align with tributary mouths and narrow channels. The results demonstrate that combining commercial high-resolution imagery with physics-informed feature engineering and knowledge transfer enables scalable and operationally relevant monitoring of lake phosphorus dynamics. These high-resolution PPUT maps enable lake managers to identify nearshore nutrient hotspots, tributary plume structures. In doing so, the proposed framework supports targeted field sampling, early warning for eutrophication events, and more robust, lake-wide nutrient budgeting.

Keywords:

PlanetScope; Lake Simcoe; total phosphorus; knowledge distillation; machine learning; deep learning; remote sensing; water quality retrieval

1. Introduction

Eutrophication of inland lakes remains one of the most widespread environmental challenges worldwide, driven primarily by excessive phosphorus and nitrogen inputs that stimulate algal blooms, reduce water transparency, and degrade aquatic ecosystems [1,2,3]. Accurate and spatially continuous estimation of total phosphorus is essential for understanding nutrient dynamics and supporting effective lake management.

However, obtaining lake-wide distributions of Total Phosphorus (PPUT; µg/L) through conventional in-situ sampling remains challenging due to the sparse spatial and temporal coverage of monitoring stations [4,5]. Furthermore, nearshore phosphorus concentrations are difficult to accurately retrieve from medium- or low-resolution satellite imagery, where pixel mixing and land-water adjacency effects severely distort spectral signals [6,7,8].

Remote sensing provides an effective means to complement in-situ observations by enabling repetitive, synoptic coverage of large water bodies [9,10]. Many studies have used satellite optical reflectance and derived indices to estimate water-quality parameters such as chlorophyll-a, turbidity, suspended sediments and, to some extent, phosphorus or nitrogen [11,12,13]. However, phosphorus is a non-optically active constituent whose spectral expression is indirect, primarily mediated by its relationship with other optically detectable variables such as phytoplankton and turbidity [14,15,16]. This indirect linkage complicates retrieval: spectral signals are weak, and model relationships are often site-specific [17,18].

Another constraint is spatial resolution: many water-quality retrieval studies rely on sensors such as Sentinel-2 MSI (≈10 m resolution) or Landsat 8 OLI (≈30 m resolution). These resolutions may fail to resolve narrow bays, tributary plumes or near-shore mixing zones, and mixed-pixel effects can degrade retrieval accuracy in heterogeneous lake zones [19,20]. In this context, high spatial resolution and high temporal revisit satellite data become highly desirable for fine-scale inland lake nutrient mapping.

The satellite constellation PlanetScope (Dove Classic, Dove-R, SuperDove) offers distinct advantages: spatial resolution of ~3–5 m, near-daily revisit frequency, and continuity across multi-generation sensors, enabling high-resolution and frequent monitoring of lake surfaces and fine-scale features such as shoreline and tributary plumes [21,22,23]. In fact, comparative work has shown that PlanetScope imagery outperforms coarser sensors (Sentinel-2, Landsat-8) in retrieving specific water-quality parameters in optically complex inland waters [24,25]. Consequently, PlanetScope is especially suited for whole-lake PPUT retrieval and capturing near-shore nutrient heterogeneity, offering improved spatial detail and repeat coverage.

On the modelling front, machine-learning (ML) and deep-learning (DL) approaches have seen increased adoption in water-quality retrieval thanks to their ability to model nonlinear relationships between spectral, environmental and in-situ data [21,26,27]. Algorithms such as Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Support Vector Regression (SVR), and histogram-based gradient boosting (HGBR) have been successfully applied to estimate parameters such as Chl-a, TSS, and even Total Nitrogen (TN)/Total Phosphorus (TP) in inland waters [28,29,30].

More recently, deep learning models including Multilayer Perceptron (MLP), one-dimensional Convolutional Neural Networks (1D-CNN), and Transformer-based architectures have shown additional potential for handling high-dimensional inputs, temporal sequences, and spatial patterns [31,32,33,34].

However, the performance of such models often degrades when key auxiliary in-situ variables, such as Secchi depth (SSD), which reflect optical clarity and vertical attenuation, are unavailable. These auxiliary variables are typically measured at only a limited number of monitoring stations and thus cannot support continuous spatial mapping across entire lake surfaces. Developing models that maintain high predictive accuracy without reliance on in-situ auxiliary features, therefore, remains a significant challenge for whole-lake nutrient retrieval [35,36,37].

To address this limitation, knowledge distillation (KD) offers a promising solution. In KD, a high-capacity “teacher” model trained on full-feature information (including auxiliary in-situ data) transfers knowledge to a “student” model that uses only readily available features (e.g., remote sensing + meteorology) and can achieve comparable accuracy [38]. KD has been applied in remote-sensing domains for image classification, segmentation and multi-task learning [39], although its application to inland water-quality retrieval remains scarce.

Recent advances in machine learning have substantially improved the retrieval of optically inactive parameters such as TP from multispectral imagery. Xiong et al. [5] developed machine-learning algorithms for TP in eutrophic Lake Taihu using Landsat-8, demonstrating that random forest and other non-linear models can outperform conventional band-ratio approaches, with typical R² values around 0.70 for TP estimation. Similarly, Cui et al. [40] and Qin et al. [41] used Sentinel-2 imagery combined with tree-based models such as random forest and XGBoost to retrieve TP and other nutrients in large Chinese lakes, reporting R² in the range of 0.65–0.80 and highlighting the importance of integrating spectral indices and hydrometeorological variables.

While these studies confirm the feasibility of satellite-based TP retrieval at 10–30 m spatial resolution, they provide limited insight into nearshore gradients and tributary plumes, and they do not explicitly address how to leverage auxiliary clarity measurements such as SSD within a transferable, SSD-free prediction framework. The present study extends this line of work by exploiting 3 to 5 m PlanetScope imagery and a knowledge-distillation strategy to transfer information from SSD-aware models to operational SSD-free students.

Although some studies have used machine-learning or deep-learning for TP retrieval in inland waters [42,43,44,45,46], very few have leveraged high spatial–temporal PlanetScope imagery for full-lake PPUT mapping, combined systematic comparison of multiple algorithms under varying feature sets (remote sensing only; remote sensing and meteorology; remote sensing, meteorology, and SSD), and then applied a KD framework to enable SSD-free full-lake prediction. Therefore, this research aims to undertake the following:

Utilize PlanetScope multi-generation imagery (Dove Classic, Dove-R, SuperDove) to achieve high-resolution, high-frequency retrieval of PPUT across the entire lake surface (including near-shore and tributary zones).
Systematically compare multiple machine learning algorithms (HistGBM, CatBoost, RandomForest, ExtraTrees, and GradientBoosting) under 8 distinct feature settings: combinations of remote sensing, INDICES, DOY, meteorological variables and SSD.
Quantify performance degradation when SSD is removed and implement a KD-based framework to transfer knowledge from the full-feature teacher model to a reduced-feature student model, thereby enabling full-lake mapping without SSD.
Apply the optimal student model to generate whole-lake PPUT distribution maps for spring (March–April) and summer (July–August) periods and analyze spatial and seasonal variability of phosphorus in the study lake.

By integrating PlanetScope’s high spatial–temporal resolution, advanced AI modelling, and knowledge distillation strategies, this work seeks to advance the feasibility of fine-scale, lake-wide nutrient retrieval and to support near-real-time eutrophication monitoring and management.

2. Methods

This section describes the general methodological framework developed for high-resolution retrieval of PPUT from PlanetScope imagery, and its application to Lake Simcoe as a representative case study. We first outline the overall framework, including data sources, preprocessing, feature construction, model configurations, and evaluation design. We then detail how this framework is applied to Lake Simcoe—covering the study area characteristics, available monitoring networks, PlanetScope acquisition strategy, and the specific experimental regimes used to train teacher and student models for lake-wide PPUT mapping.

2.1. Problem Formulation

We aimed to retrieve lake-wide PPUT from multi-source predictors. Let

x \in R^{d}

denote the fused feature vector per sample, composed of the following:

(i): Satellite reflectance-derived features (RS);
(ii): The day of the year (DOY) when the sample data were collected;
(iii): Meteorological descriptors (MET);
(iv): PPUT-related remote sensing indices (IDX);
(v): Auxiliary in-situ variables (e.g., SSD).

We compared eight regimes, including two groups and four training stages (Table 1). Then, we developed a distillation model with

x_{4 A}

as the input to its teacher model and

x_{4 B}

as the input to its student model. The goal is to deploy a SSD-free model for full-lake mapping with high accuracy comparable to the SSD-aware models.

2.2. Data Modalities

The following features were incorporated into the model to capture diverse environmental and biological information relevant to this study:

Remote sensing (RS): Surface reflectance bands (blue, green, red, and NIR) from PlanetScope imagery were used to characterize spatial variability in lake optical properties. From these bands, we derived remote-sensing indices that capture absorption/scattering contrasts relevant to PPUT retrieval, including normalized-difference indices (e.g., NDWI = (Green − NIR)/(Green + NIR); NDVI = (NIR − Red)/(NIR + Red); GNDVI = (NIR − Green)/(NIR + Green)), band-ratio indices (e.g., NIR/Red, Red/Green), and visible-band greenness/algal-enhancement indices (e.g., VARI = (Green − Red)/(Green + Red − Blue); Excess Green, ExG = 2·Green − Red − Blue; ExGR), along with intensity/contrast measures (e.g., sum/mean of RGB, redness = Red/(Blue + Green), greenness = Green/(Red + Blue), and NIR-to-visible contrast).
Meteorology (MET): Daily near-surface air temperatures (including average, minimum, and maximum temperatures), precipitation (including mean or accumulation over 3- and 7-day periods), wind (speed and gust), air pressure, snowfall, and sunshine duration, in addition to short-term windows (3 to 7-day mean or accumulation for precipitation).
Auxiliary in-situ variable (SSD): SSD serves as an auxiliary variable in the teacher model for KD, reflecting water optical clarity and vertical attenuation. The teacher model must capture detailed information, but the student model aims to achieve comparable performance without it.
Day of the year (DOY): A feature derived from sampling dates to capture seasonal patterns such as light availability and biological productivity.

Collectively, these predictors describe optical conditions (RS), short-term environmental forcing (MET), seasonal timing (DOY), and an optional in-situ clarity proxy (SSD, available only to the teacher).

2.3. Pre-Processing and Feature Engineering

The pre-processing and feature engineering pipeline encompasses three overarching stages to systematically prepare input data for subsequent modeling:

Step 1: Data Validation and Preparatory Assessment: The 4-band data provided by Planet Scope has already undergone surface reflectance (SR) processing and been normalized to Sentinel-2 bands for consistent radiometry. This eliminates the need for additional SR correction or cross-sensor harmonization, ensuring the foundational spectral data meets radiometric consistency requirements for downstream analysis.
Step 2: Multi-Dimensional Feature Construction: A comprehensive spectral index library is constructed to capture vegetation, water, and other land surface properties. This library includes representative indices adaptable across sensors, such as the Normalized Difference Water Index (NDWI), Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index 2 (EVI2), Soil-Adjusted Vegetation Index (SAVI), Optimized SAVI (OSAVI), Excess Green (ExG), Excess Green minus Excess Red (ExGR), and Visible Atmospherically Resistant Index (VARI). Additionally, ratios and normalized differences in individual spectral bands (NIR, Red, Green, Blue) are included, along with intensity and contrast terms to enhance texture-related information. To incorporate seasonality, Day-of-Year (DOY) is used as a direct temporal feature, complemented by cyclic encoding via sine (sin) and cosine (cos) transformations to capture periodic patterns across years. Meteorological data are integrated using aggregated windows, including same-day observations, 3-day and 7-day moving averages (with precipitation summed over these periods), to account for short-term climatic influences on surface conditions.
Step 3: Feature Refinement and Quality Control: Standardization (e.g., z-score normalization) is applied to features intended for linear model components to ensure consistent scaling, while outlier handling (e.g., clipping or Winsorization) is performed as needed to enhance the robustness of the dataset against extreme values. This final stage ensures features are numerically stable and resilient to noise, optimizing their suitability for model training.

The entire pre-processing and feature engineering pipeline is designed as an end-to-end workflow that integrates data validation, multi-dimensional feature extraction, and quality-driven refinement. By leveraging preprocessed spectral data, constructing a diverse set of biophysical and temporal features, and applying rigorous normalization and outlier-mitigation techniques, the pipeline ensures the input dataset is both comprehensive and robust. This systematic approach not only enhances the predictive power of subsequent models but also improves their generalizability across different geographic regions and environmental conditions, ultimately supporting more accurate and reliable land surface monitoring and analysis.

2.4. Model Zoo and Training Regimes

This paper benchmarked a representative set of tree-based ensemble regressors to evaluate predictive performance under the eight input regimes defined in Section 2.1 and to quantify the incremental value of MET and SSD. The models being assessed include HistGradientBoosting (HistGBM), CatBoost, RandomForest, ExtraTrees, and GradientBoosting, covering both boosting- and bagging-based paradigms commonly used for nonlinear regression with heterogeneous predictors. To ensure a fair comparison, all models were trained using the same data split and preprocessing and evaluated using the same metrics (R², RMSE, and MAE as primary metrics; MAPE reported only in the stratified analyses). To maximize validation R², the five machine learning models were trained using Optuna for hyperparameter tuning. For each feature group and model, a Bayesian optimization search was conducted over the model’s hyperparameter space (e.g., learning rate, tree depth, subsampling ratio) using stratified 5-fold cross-validation within the training set.

Our emphasis is on performance sensitivity to input availability (e.g., with vs. without SSD/MET), rather than on detailed algorithmic exposition or model-specific optimizations. Additional model descriptions, equations, and implementation settings are provided in Appendix A. In addition to serving as benchmarks, the ensemble learners described above also provide candidates for the backbone model used in the subsequent knowledge distillation (KD) framework (Section 2.5). Rather than fixing a priori which algorithm to distill, we first compare the five models across the eight feature-group regimes and then adopt the ensemble with the best overall performance as the common backbone for both the teacher and the student. This design ensures that KD is built on the strongest available base learner, while keeping the architectural choice data-driven and empirically justified. Details of the backbone selection are reported in Section 4.3.

2.5. Knowledge Distillation

Building on the model zoo described in Section 2.4, we implement the teacher–student knowledge-distillation (KD) framework using a two-stage regression architecture. In the first stage, a tree-based ensemble regressor is used as the backbone model; specifically, we choose the ensemble learner that exhibits the best overall predictive performance in the benchmark comparison (Section 4.3). This backbone captures nonlinear interactions among spectral bands, spectral indices, meteorological descriptors, and (for the teacher only) SSD.

In the second stage, we append a linear calibration head to the backbone predictions. Concretely, we concatenate the backbone output with a small subset of physically interpretable predictors (e.g., selected spectral indices and short-term meteorological aggregates) and fit a Ridge regression (RidgeCV) model with ℓ² regularization. This two-stage design decouples high-capacity nonlinear features learning from final prediction calibration, improving numerical stability, reducing variance at the distribution tails, and providing an interpretable linear combination of a few key drivers. Both the teacher and the student share the same two-stage architecture; they differ only in their input feature sets and the loss function used to train the student, as described below. The soft target is constructed as:

y_{K D} = α y_{t r u e} + (1 - α) {\hat{y}}_{t e a c h e r}

(1)

where

α \in [0, 1]

balances the contribution of ground-truth labels

y_{t r u e}

and teacher predictions

{\hat{y}}_{t e a c h e r}

. For feature selection, permutation-based

Δ R^{2}

Candidate features are computed to quantify their importance; student models are then trained using the top-K features (with K swept) to enhance compactness and generalization. To optimize

α

, we perform a grid search over

α = 0.0, 0.1, \dots, 0.9

and validate via station-grouped K-fold cross-validation to prevent spatial leakage.

2.6. Validation and Metrics

Model performance was primarily evaluated using three standard regression metrics:

Coefficient of Determination (R²) representing the proportion of variance explained by the model:

R^{2} = 1 - \frac{\sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i} {(y_{i} - \bar{y_{i}})}^{2}}

(2)

2.: Root Mean Square Error (RMSE) quantifying the average magnitude of prediction error:

R M S E = \sqrt{\frac{1}{n} \sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}

(3)

3.: Mean Absolute Error (MAE) measuring the average absolute deviation:

M A E = \frac{1}{n} \sum_{i} |y_{i} - {\hat{y}}_{i}|

(4)

In addition, mean absolute percentage error (MAPE) was reported only in the stratified analyses to facilitate relative error comparisons across segments with different concentration ranges:

M A P E = \frac{100}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i} + ϵ}|

(5)

where

\hat{y}

is the mean of

y

and

ϵ

is a small constant added to avoid instability when

y_{i}

is close to zero.

Our complete modelling dataset consists of 544 water-quality samples from 11 long-term monitoring stations across Lake Simcoe. Each sample pairs an in-situ PPUT measurement with a temporally matched PlanetScope scene and the corresponding meteorological descriptors. To prevent spatial leakage and to evaluate the models on genuinely unseen locations, we adopted a station-based grouped splitting strategy implemented via scikit-learn’s GroupShuffleSplit. The grouping variable was the station identifier; all samples from a given station were assigned exclusively to either the training or the test set. Using a nominal 80/20 split (test_size = 0.2, random_state = 42), this procedure produced a training set of 399 samples (73.3%) from 8 stations and a test set of 145 samples (26.7%) from 3 distinct stations, with no station appearing in both sets. The slight deviation from a perfect 80/20 ratio reflects the constraint that splits are performed at the station level rather than at the individual-sample level.

Within the training set, we further employed station-grouped 5-fold cross-validation (GroupKFold, n_splits = 5) for all hyperparameter tuning and feature-selection experiments, including (i) the K-sweep over the number of retained features (K = 10, 15, …, 60) and (ii) the grid search over the distillation weight α

\in

{0.0, 0.1, …, 0.9}. This nested design separates (i) the outer train–test split, which measures generalization to new spatial locations (unseen stations), from (ii) the inner station-grouped CV, which selects model hyperparameters without leaking information across stations.

Such a station-grouped strategy is critical in inland water-quality modelling, because neighbouring samples at the same station are strongly spatially autocorrelated; random sample-wise splits would artificially inflate performance estimates and fail to reflect the true difficulty of transferring models to new monitoring locations.

2.7. Lake-Wide Prediction

Based on PlanetScope surface reflectance (SR) inputs, the lake-wide PPUT retrieval workflow followed a systematic, reproducible sequence that mirrors the data-processing logic implemented in our training pipeline, including mosaicking, masking, and model inference.

First, for each target prediction date, cloud-filtered SR mosaics were generated using the matched UDM2 quality mask. This step removed pixels flagged as clouds, cloud shadows, haze, or bright artifacts, following the same masking approach applied during model training. The individual tiles of a given date were then seamlessly merged into a spatial mosaic to ensure consistent coverage across the entire Lake Simcoe region.

Second, lake boundary polygons were obtained from the Ontario Ministry of Natural Resources and Forestry’s Land Information Ontario (LIO) geospatial data portal [47]. The “Waterbody” dataset provides authoritative GIS lake outlines for Ontario and was used as the reference geometry for delineating Lake Simcoe. These polygons served multiple purposes: (i) clipping and isolating the lake surface from the PlanetScope mosaics, (ii) defining valid prediction regions, and (iii) serving as the base layer for cartographic visualization.

Third, the SSD-free student model, trained with KD, was applied to infer PPUT values for every valid water pixel within the lake mask. This inference step follows the same preprocessing pipeline as the training code, including feature extraction (all spectral indices, DOY encodings, meteorological features, if applicable) and the standardized input structure used during student training.

Fourth, a single representative mid-summer PlanetScope scene (26 July 2024) was selected to characterize low external loading and thermally stratified conditions. Pixel-level predictions from this scene were used to generate a high-resolution lake-wide PPUT map for spatial pattern analysis.

Finally, the resulting maps were analyzed to extract key spatial patterns. This included characterizing contrasts between nearshore and offshore zones, evaluating the spatial extent and strength of tributary-driven phosphorus plumes (e.g., the Holland River and Black River), and comparing seasonal changes in the distribution of lake-wide PPUT.

These analyses provide contextual understanding of how hydrologic conditions and nutrient delivery pathways shape the spatio-temporal structure of phosphorus concentrations in Lake Simcoe.

Overall, the methodological framework presented in this section establishes a general, dataset-agnostic pipeline for retrieving total phosphorus (PPUT) from multi-source predictors. The workflow integrates standardized preprocessing of multispectral surface reflectance, construction of an extensive feature library, including spectral indices, meteorological descriptors, and seasonal variables, and the implementation of a comprehensive model comparison across both machine-learning and deep-learning families.

The framework further incorporates a knowledge-distillation strategy to enable high-accuracy nutrient prediction even when auxiliary in-situ variables (e.g., SSD) are unavailable across the full spatial domain. Through unified feature engineering, grouped cross-validation, α-grid optimization, and Top-K feature selection, the proposed approach ensures robust generalization and consistent deployment across varied hydrological or optical environments.

3. Case Study: Lake Simcoe, Ontario, Canada

The general methodological framework described in Section 2 provides a flexible, sensor-independent pipeline for PPUT retrieval that leverages multisource predictors, feature engineering, multi-model benchmarking, and knowledge distillation-enhanced learning.

To demonstrate the practical applicability and effectiveness of this framework in a real-world inland lake environment, we conducted a case study on Lake Simcoe, Ontario, Canada. This section details the study area characteristics, the construction of the multi-source dataset (PlanetScope SR, in-situ water quality, meteorology), the implementation of the teacher-student training paradigm, and the deployment of the final SSD-free student model for lake-wide PPUT mapping under different hydrological seasons.

3.1. Study Area

As Figure 1 shows, Lake Simcoe (44.3–44.6° N, 79.2–79.6° W) is the largest inland lake in southern Ontario outside the Great Lakes system, covering ~722 km² with a mean depth of ~15 m and a maximum depth of 41 m. The watershed includes more than 30 tributaries, the most important of which are the Holland River, Black River, Beaver River, and Talbot River. The lake outflows southeast through the Trent-Severn Waterway toward Lake Ontario. Lake Simcoe supports a rapidly growing watershed population and provides multiple ecosystem services and socio-economic benefits. The lake supplies drinking water to several municipalities, assimilates municipal and agricultural wastewater, and supports a provincially significant year-round sport fishery, boating, and recreational tourism industry [48]. The watershed includes a mosaic of urban, agricultural, and forested land uses and is home to diverse communities, including rapidly expanding suburban populations and Indigenous communities. Recent demographic growth and land-use intensification have increased pressure on nutrient management and shoreline ecosystems. In this context, high-resolution, spatially explicit PPUT mapping can directly inform source control, shoreline restoration, and adaptive management actions to protect both ecosystem health and the well-being of local beneficiaries.

At the same time, Lake Simcoe has experienced sustained phosphorus loading and eutrophication pressures over recent decades, prompting intensive monitoring and the implementation of the Lake Simcoe Protection Plan. The combination of long-term in-situ water-quality and SSD datasets, diverse watershed land uses, strong management relevance, and ongoing policy initiatives makes Lake Simcoe an ideal testbed for evaluating high-resolution, AI-based phosphorus retrieval frameworks and for demonstrating how such methods can support evidence-based lake and watershed management.

3.2. Data Sources

The implementation of the proposed hybrid AI framework for Lake Simcoe required integrating three primary data sources: PlanetScope multi-generation satellite imagery, long-term in-situ water-quality observations, and meteorological data from nearest weather stations. These data layers were assembled into a harmonized, case-specific dataset that captures the optical, hydrological, and atmospheric conditions of Lake Simcoe.

Figure 1. Lake Simcoe study area showing the LSRCA in-situ water-quality sampling stations (blue dots) and Environment Canada meteorological stations (red dots) used to construct daily and short-term (3–7 day) aggregated meteorological predictors for each satellite–in-situ match-up.

PlanetScope imagery (Dove Classic, Dove-R, and SuperDove) covering the period 2018–2023 was obtained for all acquisition dates intersecting the lake. Surface Reflectance (SR) products were used to ensure radiometric consistency, and cloud-covered areas were removed using the accompanying UDM2 quality mask. For dates with multiple overlapping scenes, mosaics were produced to achieve complete spatial coverage. Only acquisitions with less than 10% cloud cover were retained for lake-wide prediction and for constructing training samples. Each in-situ measurement was temporally matched to the nearest PlanetScope acquisition within a ±1-day window to minimize radiometric-in-situ mismatch.

In-situ water-quality data were provided by the Lake Simcoe Region Conservation Authority (LSRCA) [49], covering 15 monitoring stations distributed across embayments, nearshore zones, and offshore basins. The target variable used for model development was PPUT, while SSD served as an auxiliary variable incorporated exclusively into the teacher model. Records underwent quality control to remove invalid or duplicate samples and were spatially harmonized using the official station coordinates.

Meteorological variables were collected using the Meteostat API [50] and linked to each in-situ record by selecting the five nearest available stations. The meteorological fields included daily air temperature (tavg, tmin, tmax), precipitation, wind speed and gust, atmospheric pressure, snowfall, and sunshine duration. To capture short-term hydrometeorological variability associated with watershed loading and water-column mixing, 3-day and 7-day aggregated metrics were derived. These variables were subsequently merged with SR-based features and in-situ observations to create the final multi-modal training dataset.

3.3. Case-Specific Data Preparation

Data preparation for the Lake Simcoe case followed the general procedures outlined in Section 2 and was adapted to account for the lake’s specific optical and hydrological characteristics. All PlanetScope SR scenes were clipped using the Lake Simcoe boundary polygon obtained from the Ontario Land Information Ontario (LIO) Waterbody dataset. This ensured that subsequent modeling and mapping were restricted strictly to lake pixels, thereby preventing land reflectance contamination in littoral zones, which are abundant along the lake’s complex shoreline.

Temporal alignment between PlanetScope SR, meteorological observations, and in-situ sampling was a central requirement. Because Lake Simcoe undergoes rapid changes during spring melt, strict temporal-matching criteria (±1 day) were applied to minimize discrepancies caused by rapidly changing optical and hydrologic conditions. Spatially, samples located near river mouths—especially those of the Holland River and Black River—were evaluated to ensure that their surrounding SR values were not affected by adjacency effects or mixed land-water pixels. Where necessary, edge pixels were removed during preprocessing to maintain data integrity.

The final prepared dataset retained only high-quality, temporally synchronized, and spatially validated pixel-station pairs, serving as the foundation for model-training and distillation stages.

3.4. Application of the Modeling Framework to Lake Simcoe

The methodological framework described in Section 2 was operationalized for Lake Simcoe by training machine-learning and deep-learning models using the prepared multi-modal dataset (Figure 2). While the underlying algorithms, feature categories, and evaluation metrics follow the general design, their application in this case study reflects the unique spectral and hydrological characteristics of Lake Simcoe.

A comprehensive multi-model comparison was conducted using remote sensing only, remote sensing combined with meteorology, and the full RS, INDICES, DOY, MET, and SSD feature set. As expected, the inclusion of SSD substantially improved model accuracy, highlighting the importance of optical clarity indicators for phosphorus prediction. Because SSD measurements are not spatially continuous across the lake, to ensure accurate prediction, a KD scheme was employed. A high-capacity teacher model utilizing RS + INDICES + DOY + MET + SSD inputs was first trained, and its soft predictions were then used to guide the SSD-free student model. This approach enabled the student model to inherit domain knowledge about water clarity even when operating without SSD.

To adapt the framework to the Lake Simcoe setting, feature importance rankings were explicitly generated for this dataset. A Top-K feature selection analysis revealed that a compact set of forty features provided the best balance between model complexity and predictive performance. Using these Lake Simcoe-specific configurations, the final student model achieved an R² of 0.83 on held-out stations, demonstrating the robustness of the KD-enhanced workflow for practical deployment.

3.5. Lake-Wide PPUT Mapping for Lake Simcoe

Following model development, the SSD-free student model was applied to full-lake PlanetScope SR mosaics to produce spatially continuous PPUT maps at a 3–5 m resolution. All mosaics were screened with UDM2 masks and clipped to the Lake Simcoe polygon. Only pixels flagged as clear in the UDM2 clear band (band 1 = 1), which implicitly excludes pixels classified as cloud, cloud shadow, haze, or snow/ice in the other UDM2 classes were retained. For each valid water pixel, the complete set of required input features, including spectral indices, seasonal attributes, and any available meteorological descriptors, was reconstructed using the same transformations employed during model training, ensuring methodological consistency. A single representative mid-summer PlanetScope scene (26 July 2024) was selected to characterize low external loading and thermally stratified lake conditions.

In summary, this case study operationalized the proposed workflow for Lake Simcoe by constructing the multi-source dataset, training the teacher–student KD models, and generating lake-wide PPUT predictions from PlanetScope mosaics for contrasting seasonal conditions. The quantitative evaluation across feature regimes and the resulting spatial patterns in the lake-wide maps are presented in Section 4.

4. Results

4.1. Influence of Feature Groups and SSD Availability on Model Performance

The multi-model evaluation across eight feature-group regimes was designed to systematically examine how different categories of predictors influence the stability, accuracy, and generalizability of PPUT retrieval. Because phosphorus concentrations are governed by a mixture of hydrological, biogeochemical, and optical processes, the effectiveness of machine-learning models depends not only on algorithmic design but on the observability of physically meaningful variables.

SSD represents a vertically integrated clarity metric that encodes water-column light attenuation and particulate loads, both of which are tightly linked to phosphorus dynamics in inland lakes. Table 2 presents a comparison of model-prediction error statistics (R², RMSE, MAE) across all models and feature groups, revealing a clear structural separation between SSD-aware and SSD-free regimes.

While the raw numerical contrast shows that mean R² increased from 0.6741 to 0.9364 when SSD was added, the broader implication is that SSD introduces a form of physical regularization into the prediction problem. Because SSD encapsulates information about water-column scattering and absorption processes strongly influenced by suspended sediments, algal biomass, and particulate phosphorus, including SSD provides a constraint that effectively reduces the model’s solution space. For completeness, we evaluated an SSD-only model as a station-scale reference. Using SSD as the sole predictor, the SSD-only models achieved test R² of approximately 0.80–0.85 (best ≈ 0.846), with RMSE ≈ 13.7–15.8 and MAE ≈ 7.44–7.81. While SSD alone provides a strong predictive signal for PPUT at monitoring stations, it is not deployable for lake-wide mapping because SSD is not available wall-to-wall; therefore, we report SSD-only results only as an information-content reference rather than a competing lake-wide baseline.

The high consistency of SSD-aware performance across five structurally distinct models (tree ensembles, gradient boosting, and CatBoost) indicates that the improvement is not algorithm-dependent but arises from the biophysical relevance of SSD itself. The sharp decline in accuracy in SSD-free scenarios (≈39% relative drop) further highlights the difficulty of inferring vertical water clarity from surface-only features.

The boxplots in Figure 3 deepen this interpretation by demonstrating that SSD influences not only mean model accuracy but also the distribution of prediction errors. The substantial reduction in RMSE (from 0.49 to 0.22 µg/L) and MAE (from 0.33 to 0.16 µg/L) suggests that SSD reduces both bias and variance in the predictive outputs. This stabilizing effect stems from SSD’s ecological role: because phosphorus concentrations co-vary strongly with suspended particulate loads and algal biomass, SSD acts as a proxy for the processes that modulate phosphorus availability.

The fact that even minimal SSD-aware feature sets (e.g., SSD + Bands) outperform feature-rich SSD-free models indicates that no combination of spectral indices or meteorological variables can fully substitute for the depth-integrated information that SSD provides. SSD constrains predictions across heterogeneous optical conditions—such as nearshore vs. offshore waters—thereby preventing error inflation in areas where surface reflectance alone is ambiguous.

The R² heatmap (Figure 4) demonstrates that SSD’s influence is robust across model architectures with differing inductive biases. All models achieved R² ≥ 0.93 under SSD-aware regimes, demonstrating that SSD reduces the complexity of the prediction task to a level where algorithmic differences become nearly irrelevant. This convergence suggests that SSD effectively linearizes or simplifies the multidimensional mapping between observable features and PPUT, making the problem well-defined even for models that otherwise underperform in high-variance conditions (e.g., Random Forest).

In SSD-free settings, the wide performance spread (0.55–0.75 R²) reflects the inherent ambiguity of estimating phosphorus without clarity information. Here, models must rely on indirect proxies, reflectance-based indices or meteorological drivers, whose relationships to PPUT vary seasonally and spatially. The higher sensitivity to model architecture in this regime highlights the extent to which SSD provides structural information that the model would otherwise have to infer.

The analyses in Figure 1, Figure 2 and Figure 3 indicate that SSD is the single most influential variable for PPUT prediction, not merely because it improves accuracy, but because it encodes the fundamental optical and particulate processes that shape phosphorus dynamics. SSD transforms the PPUT retrieval task from an underdetermined surface-reflectance inversion into a physically grounded, well-constrained prediction problem. The strong cross-model consistency under SSD-aware regimes and the pronounced instability of SSD-free models demonstrate that high-fidelity phosphorus estimation in inland lakes requires access to depth-integrated clarity information (either measured directly or approximated through advanced techniques such as KD). These insights motivate the hybrid AI framework developed in this study and lay the foundation for operational phosphorus monitoring at high spatial and temporal resolution.

4.2. Performance Gains from Feature Accumulation

To evaluate the contribution of additional feature classes, we examined prediction accuracy across eight progressively complex feature groups. For SSD-aware groups, adding Day-of-Year (DOY), remote sensing indices, and meteorological features resulted in incremental improvements. Transitioning from SSD + Bands to SSD + Bands + DOY increased R² by approximately 0.01, and further inclusion of spectral index and meteorology yielded diminishing yet consistent gains. This pattern suggests that SSD already encapsulates a large proportion of the optically relevant environmental variability, while auxiliary features provide refinement rather than fundamental explanatory power.

SSD-free models showed a more apparent benefit from incremental features. Raw spectral bands alone produced modest performance (mean R² ≈ 0.59), but incorporating temporal descriptors (Bands + DOY) increased accuracy to 0.70, and the addition of spectral indices (Bands + DOY + Indices) further increased R² to 0.75 (Figure 5). Inclusion of meteorological variables yielded marginal improvements (<0.03). These patterns indicate that derived indices more effectively capture water-color changes driven by suspended solids, CDOM, and biomass, which act as partial proxies for phosphorus dynamics.

4.3. Cross-Model Comparison and Algorithmic Behavior

As shown in Table 2, across all feature-group regimes, the comparative behaviour of the five single models (CatBoost, HistGradientBoosting, GradientBoosting, ExtraTrees, and RandomForest) revealed systematic and interpretable differences in predictive skill that align with their algorithmic properties and the underlying optical complexity of Lake Simcoe.

In the absence of SSD, model performance diverged substantially. CatBoost, HistGradientBoosting, and GradientBoosting consistently outperformed ExtraTrees and RandomForest, with gains of approximately 0.05–0.10 in R² and reductions of 0.03–0.06 in RMSE across the Bands, Bands + DOY, and Bands + DOY + Indices regimes.

This advantage reflects the ability of boosting-based methods to adaptively refine decision boundaries and capture second-order interactions among spectral indices, meteorological descriptors, and temporal variables—interactions that are important when the signal-to-noise ratio of surface reflectance is moderate and auxiliary depth-based clarity information is absent.

RandomForest, by comparison, exhibited the lowest accuracy among the five models in all SSD-free settings. This behaviour is consistent with the known limitations of bagging ensembles in high-dimensional regression, where subtle gradients rather than coarse class separations drive predictive performance. The model’s tendency to average over many weak learners, combined with its limited capacity to model fine-grained feature interactions, led to underfitting in regimes that rely heavily on spectral proxies for water clarity, phytoplankton concentration, and suspended solids.

The performance gap between model families shrank markedly once SSD was introduced. In all SSD-aware regimes—SSD + Bands, SSD + Bands + DOY, SSD + Bands + DOY + Indices, and SSD Full—differences in R² across the five models were reduced to within 0.01–0.02, with all models achieving R² > 0.93. This convergence reflects SSD’s dominant explanatory value as a vertically integrated optical clarity metric.

The inclusion of SSD effectively linearizes the remaining feature–response relationships, reduces the dependence on multi-order interactions, and stabilizes predictions across models with different inductive biases. Under these conditions, even RandomForest achieved R² values comparable to CatBoost and HistGradientBoosting, demonstrating that SSD acts as a strong stabilizing variable capable of diminishing algorithmic differences.

HistGradientBoosting and GradientBoosting emerged as the most consistently stable high performers across the full spectrum of feature groups. HistGradientBoosting offered near-optimal accuracy in both SSD-free and SSD-aware settings while maintaining relatively low computational cost. CatBoost showed similar strength, especially in SSD-aware regimes, benefiting from its native handling of categorical interactions and ordered boosting. ExtraTrees exhibited slightly higher variance across feature groups, an effect likely attributable to its randomized split-selection strategy, which introduces additional noise when feature importance is highly uneven.

The single-model comparison across the five ensemble learners (Figure 3, Table 2) shows that HistGradientBoosting (HistGBM) provides the best overall trade-off between accuracy and robustness. Under the SSD-inclusive feature regime, HistGBM attains the highest test R² and the lowest error among all candidates, while in SSD-free regimes it remains among the top performers and exhibits relatively small variability across feature groups.

RandomForest and ExtraTrees tend to plateau at slightly lower R², and CatBoost offers only marginal gains at substantially higher computational cost. Based on this consistent behaviour, we selected HistGBM as the backbone regressor in the teacher–student KD framework described in Section 2.5. In all KD experiments, both the teacher and the student therefore share the same HistGBM-based backbone and RidgeCV calibration head, differing only in their input feature sets (with vs. without SSD) and training objectives.

4.4. Feature Importance and Physical Interpretation

Table 3 presents the short-term meteorological (e.g., tmax_3d, pres_3d, wspd_7d), NIR reflectance (band4_mean), and water-color indices (e.g., ExG, ExGR, SR_NR, ND_NB) as influential predictors and their hypothesized physical relevance to PPUT dynamics in Lake Simcoe.

A category-level aggregation (Figure 6) further revealed that meteorological variables contributed the largest cumulative ΔR² (0.081), followed by remote-sensing indices (0.039), spectral bands (0.007), and temporal descriptors (0.000). This hierarchy is not merely a numerical ranking but reflects the underlying physical mechanisms governing phosphorus dynamics in Lake Simcoe.

Meteorological drivers—particularly wind, temperature, and precipitation—directly regulate hydrodynamic mixing, sediment resuspension, tributary inflows, and catchment runoff, all of which act as primary pathways for phosphorus mobilization and redistribution. Consequently, short-term meteorological variability introduces real biogeochemical forcing that cannot be inferred from surface reflectance alone, explaining why this category produces the most significant marginal gain in model performance.

Remote-sensing indices formed the second most influential category because they capture higher-order spectral relationships—such as the balance between scattering and absorption—that are sensitive to turbidity, algal biomass, and CDOM. These processes, while closely linked to phosphorus availability, often manifest in the reflectance domain through nonlinear transformations rather than through individual raw bands. Thus, indices such as ExG, ND-based ratios, and greenness amplify subtle optical cues associated with P-related ecological states, enabling the model to resolve fine-scale gradients even when depth-dependent clarity information is absent.

In contrast, raw spectral bands provided only a modest ΔR² (0.007), consistent with the fact that unprocessed reflectance signals blend multiple optical constituents and are more susceptible to noise, illumination variability, and confounding factors such as glint or adjacency effects. Temporal descriptors contributed effectively zero incremental explanatory power because seasonality is implicitly encoded within the spectral and meteorological data themselves; once these categories are included, DOY adds little new information.

Together, these findings clarify why the model derives the greatest benefit from meteorological and index-based predictors: one category captures the physical drivers of phosphorus transport, while the other captures the optical manifestations of its ecological consequences. The synergy between these two categories underpins the hybrid AI framework’s strong performance.

4.5. KD and Optimal Feature Subset

The K-sweep analysis (Figure 7) reveals a typical diminishing-returns profile for the SSD-free student model. Accuracy increases steeply when the number of retained features grows from K = 10 to about K ≈ 20, and then enters a broad performance plateau between K ≈ 32 and K = 40. Within this plateau, individual K values exhibit small oscillations due to sampling variability in the station-grouped cross-validation.

For example, K = 32 and K = 40 both achieve similar validation accuracy (for K = 32, R² equals to 0.8306, RMSE equals to 9.853 and for K = 40, R² equals to 0.8318 and RMSE equals to 9.818), whereas K = 33 shows a local dip (R² = 0.73, RMSE = 12.49 µg L⁻¹) associated with a less favourable feature subset. The global optimum occurs at K = 40, which attains the highest cross-validated R² (0.8318) and the lowest RMSE (9.82 µg L⁻¹) among all tested K values. For K > 40, both R² and RMSE systematically deteriorate (e.g., K = 50 yields R² ≈ 0.71–0.75 and RMSE ≈ 12.9–17.6 µg L⁻¹), indicating overfitting and redundancy when too many features are retained.

We therefore select K = 40 as the final configuration, representing the upper end of the stable plateau and a sparse yet expressive 40-feature subset for the SSD-free student model. Figure 8 confirms the feasibility of deploying SSD-free models for operational, lake-wide nutrient monitoring with reasonable accuracy. For the K = 40 feature subset, we performed a grid search over α

\in

{0.0, 0.1, …, 0.9} using station-grouped 5-fold cross-validation (Group K-Fold) within the training set. α = 0.2 yielded the highest mean cross-validated R² (0.676) and the lowest RMSE among all tested values, and was therefore adopted for all reported student-model results.

To further evaluate model performance across different PPUT concentration ranges, we conducted a stratified analysis by dividing the dataset into concentration bins (Table 4). We focused on alternative metrics for evaluating performance within concentration subsets, while R² is sensitive to the variance of the dependent variable and can yield misleading values when applied to restricted-range data, we instead employed RMSE, MAE, and MAPE (Mean Absolute Percentage Error) to assess model accuracy across different concentration ranges.

In the low-concentration range (<25 µg/L, comprising 75.6% of all samples), the model achieved RMSE values of 2.81 µg/L (training) and 6.29 µg/L (testing), with corresponding MAPE values of 14.0% and 35.6%, respectively. These modest error magnitudes indicate acceptable predictive performance in the concentration range most frequently observed in Lake Simcoe. The high-concentration range (≥50 µg/L) showed RMSE values of 15.38 µg/L (training) and 14.08 µg/L (testing), with MAPE values of 11.5% and 14.3%, demonstrating consistent relative accuracy despite the larger absolute errors inherent to higher concentration values. The middle range (25–50 µg/L) exhibited higher variability in error metrics, with testing RMSE of 24.19 µg/L and MAPE of 52.2%, likely due to the limited sample size (23 in training, 7 in testing) and greater uncertainty in this transitional concentration zone.

4.6. Lake-Wide High-Resolution PPUT Prediction

Figure 9 presents a representative mid-summer snapshot of lake-wide PPUT predictions for 26 July 2024 derived from the SSD-free student model. The land area and the clouds are masked out by our filter algorithm. The map shows spatially coherent fields at 3 m resolution, with a scene-wide mean concentration of 7.57 µg/L and an observed range from 5.00 to 68.25 µg/L. The pelagic surface is dominated by low to moderate concentrations forming a broad mesotrophic background, while distinct high-value patches emerge along the shoreline, in narrow channels, and at tributary confluences.

The statistical distribution confirms that most of the lake surface remains in a relatively low-concentration regime. Median and upper-quartile values are p₅₀ = 7.15 µg/L and p₇₅ = 8.22 µg/L, with the 90th and 95th percentiles reaching 8.97 and 9.49 µg/L, respectively. Across all water pixels (n ≈ 7.89 × 10⁷), 97.40% fall within the 5–10 µg/L bin, and only 1.88% and 0.19% lie in the 10–20 µg/L and 20–30 µg/L ranges. Pixels exceeding 30 µg/L account for just 0.53% of the lake area, yet they represent ecologically critical hotspots where particulate phosphorus loading and/or resuspension are strongly enhanced.

At the basin scale, quadrant-wise averages indicate relatively modest cross-lake contrasts: the northeast and northwest quadrants show similar means (7.42 and 7.41 µg/L), whereas the southeast and southwest quadrants are slightly elevated (7.78 and 7.70 µg/L), consistent with the influence of significant inflows and shallow embayments in the southern basins. Overall, the 26 July snapshot demonstrates that the distilled student model is capable of generating physically plausible, spatially detailed PPUT fields: the lake interior is characterized mainly by low to moderate concentrations, while high-PPUT waters are confined to structurally and hydrologically meaningful nearshore and tributary-influenced zones.

5. Discussion

The experimental results obtained in this study provide an opportunity not only to benchmark predictive performance but also to understand the physical, optical, and algorithmic mechanisms that govern phosphorus retrieval from high-resolution satellite data. Rather than viewing the models as black boxes, we interpreted their behavior in the context of Lake Simcoe’s bio-optical regime, the contrasting roles of surface reflectance and SSD, and the influence of short-term meteorological forcing. By jointly analyzing multi-model feature groups, permutation-based importances, K-sweep dimensionality patterns, and teacher-student knowledge-distillation behaviour, we can link quantitative metrics such as R², RMSE, and MAE to underlying processes such as light attenuation, sediment resuspension, watershed inputs, and stratification dynamics.

The following subsections synthesize these lines of evidence, with a focus on (i) explaining SSD’s dominant predictive role, (ii) assessing the extent to which SSD-free monitoring is feasible through distillation, (iii) clarifying how feature engineering and model class shape performance, and (iv) discussing the implications, limitations, and broader applicability of the proposed PlanetScope-based AI framework.

5.1. Mechanisms Underlying SSD’s Dominant Predictive Role

The experimental findings confirm that SSD carries unique optical and biogeochemical information that is seldom recoverable from multispectral satellite data alone. SSD integrates the effects of light attenuation through the water column, which—in inland waters—is controlled primarily by suspended particulate matter, chromophoric dissolved organic matter (CDOM), and phytoplankton biomass. Each of these constituents is directly or indirectly linked to phosphorus loading: particulate phosphorus is adsorbed onto mineral particles and onto organic detritus, which together dominate scattering. In contrast, dissolved phosphorus often co-varies with CDOM originating from terrestrial runoff. Consequently, SSD provides a vertically integrated proxy for both particulate and dissolved phosphorus pools. In contrast, PlanetScope’s spectral bands capture only surface reflectance, with the effective water-leaving signal typically limited to the top 3 m in such inland water environments [51].

This depth-integration is crucial in Lake Simcoe, where internal loading, sediment resuspension, and near-bed processes contribute substantially to the total phosphorus budget. Satellite reflectance cannot fully resolve these processes because the optical signal from the upper layer dominates it. In contrast, SSD effectively “compresses” the combined effects of particle backscattering, CDOM absorption, and pigment absorption into a single, observable variable. The considerable R² improvement when SSD is added (on the order of +0.22–0.26, depending on model family and feature group) therefore reflects fundamental observational physics rather than algorithmic artifacts. The inability of raw spectral bands or indices to compensate for SSD suggests that no combination of band ratios, temporal descriptors, or short-term meteorology can fully substitute for vertically integrated clarity metrics in optically complex lakes.

From an information-content perspective, the residual performance gap between SSD-aware and SSD-free configurations can be interpreted as the portion of the variance attributable to truly unobservable vertical structure. Even ensemble methods show only marginal gains once SSD is included, indicating that the remaining error is not dominated by model inadequacy but by the inherent observational limits of 4-band surface reflectance. In this sense, SSD acts as a rate-limiting information source for PPUT prediction, transforming an underdetermined inversion problem into a well-constrained regression task.

5.2. Implications for Scalable, SSD-Free Monitoring

The superior performance of the SSD-aware teacher model underscores the importance of strategic in-situ SSD sampling for model calibration. At the same time, the strongly performing student model (R² ≈ 0.83, RMSE = 9.82 µg/L) demonstrates that KD provides a feasible bridge between accuracy and scalability. By training the student to reproduce both hard PPUT labels and soft teacher outputs, the framework allows the teacher-learned SSD-driven structure to be embedded in a reduced feature space consisting solely of remote-sensing and meteorological variables.

The retention of approximately 88% of the teacher’s explanatory power indicates that the student model internalizes key relationships between clarity, optical signatures, and hydrometeorological forcing, effectively mimicking SSD’s predictive role without requiring SSD as an explicit input.

This behavior can be viewed through the lens of an information bottleneck: the teacher model uses SSD to learn a high-dimensional manifold linking depth-integrated optical conditions to PPUT, while the student approximates this manifold using a lower-dimensional set of observable variables. The chosen distillation configuration (with a relatively modest teacher weight in the loss) encourages the student to balance fidelity to the measured PPUT with alignment with the teacher’s smoother decision boundaries. As a result, the student benefits from the teacher’s physics-informed structure yet avoids overfitting to noise in the limited in-situ dataset.

The error increase associated with the student model (e.g., MAE is higher by ~1.5 µg/L than the teacher) is non-negligible for fine-grained regulatory assessment. Still, it remains acceptable for regional surveillance, long-term trend analysis, or early-warning applications. In particular, the consistently moderate MAPE values of the student model (below 36% for the dominant low-concentration range) demonstrate that the model provides operationally useful predictions for water quality monitoring and management applications across the lake’s typical concentration spectrum, and proved the student model’s accuracy is sufficient to distinguish broad trophic classes, whose thresholds are typically separated by >20 µg/L. Thus, the distillation framework effectively delineates a practical division of labour: teacher models anchored by SSD are reserved for high-stakes calibration and validation. In contrast, SSD-free student models support routine, spatially extensive monitoring across larger lake networks where dense in-situ sampling is economically or logistically infeasible.

5.3. Feature Engineering, Physical Interpretability, and Dimensionality Trade-Off

The superior performance of derived spectral indices over raw bands supports the integration of domain knowledge in feature engineering. Ratio- and normalized-difference-based indices mitigate illumination, atmospheric, and viewing-geometry effects, and they more directly express the balance between scattering and absorption that underpins water-column optical behavior. Indices such as ExGR, ND-based ratios, and SR_NR emphasize differences between green, red, and NIR reflectance that are sensitive to phytoplankton, suspended sediments, and CDOM. Their prominence in the permutation-based importance rankings indicates that the model relies on these transformations to extract physically meaningful signals that raw bands only weakly encode.

Feature-level rankings show that spectral transformations (e.g., Excess Green, NIR/Red ratios, and normalized-difference ratios) consistently appear as a second tier of influential predictors after the dominant meteorological drivers. This pattern is consistent with indices amplifying the nonlinear balance between scattering and absorption across the visible–NIR region, thereby isolating optical signatures of suspended particles, phytoplankton biomass, and CDOM that are only weakly expressed in individual raw bands.

The feature-importance hierarchy indicates a complementary division of information content: meteorological variables encode the short-term forcing that redistributes phosphorus (runoff pulses, mixing, and resuspension), while reflectance-based indices capture the optical manifestation of these processes in the surface layer. This coupling helps explain the strong performance of the hybrid predictor set and supports the physical interpretability of the learned relationships.

At the category level, meteorological variables emerged as the strongest contributors to ΔR², ahead of remote-sensing indices, with spectral bands and temporal descriptors playing secondary roles. This ordering is physically consistent: short-term temperature, wind, and pressure patterns govern stratification, resuspension, and runoff, which in turn mobilize and redistribute phosphorus. Spectral indices then capture the optical manifestations of these processes through changes in turbidity, pigment concentration, and water colour. The relatively small incremental benefit of stand-alone bands and DOY implies that once physically meaningful meteorology and indices are present, additional raw reflectance or calendar information adds little unique explanatory power.

The K-sweep analysis provides a complementary perspective on feature-space complexity. Accuracy increased rapidly as K rose from 10 to about 20, reflecting the addition of genuinely informative features that capture independent aspects of phosphorus dynamics. Between K = 20 and K = 40, performance gains became more modest, indicating a regime of diminishing returns where newly added predictors were increasingly correlated with variants of existing ones.

The optimal configuration at K = 40 corresponds to a point at which the model retains sufficient feature diversity to approximate the teacher’s manifold while avoiding over-parameterization. Beyond K > 50, performance degrades, consistent with overfitting in a setting with modest sample size and many highly collinear features. These patterns support the use of carefully curated, physically interpretable feature subsets in operational scenarios, rather than indiscriminately including all available predictors.

5.4. Cross-Model Implications and Underlying Machine Learning Mechanisms

The multi-model comparisons highlight several broader implications for the use of machine learning in inland water-quality retrieval [52]. First, the stark performance improvement produced by SSD across all algorithms confirms that vertically integrated optical information fundamentally constrains the phosphorus estimation problem. In SSD-free regimes, models must infer water-column clarity and nutrient status from a combination of surface reflectance and meteorological context—an inherently high-variance task, particularly in morphometrically complex systems like Lake Simcoe. In this context, boosting-based methods outperform bagging because they iteratively refine decision boundaries to capture weak, nonlinear interactions between spectral indices and short-term atmospheric forcing. Bagging methods, by averaging over many decorrelated trees, tend to underfit subtle structures that are essential for distinguishing overlapping optical states with different phosphorus signatures.

Second, the convergence of all models in SSD-aware regimes, with R² values tightly clustered above 0.93, indicates that when a strong physical predictor is available, model architecture becomes significantly less important. Once SSD is present, even comparatively simple ensembles approach the performance of more sophisticated boosting methods, and additional architectural complexity yields negligible performance gains. This pattern suggests that the primary bottleneck in PPUT retrieval lies in the observability of key state variables rather than in the representational capacity of modern machine-learning models. It also reinforces the idea that improvements in auxiliary data streams—such as routine SSD sampling or other clarity proxies—may deliver greater benefits than further algorithmic tuning.

Third, the success of the distilled student model demonstrates how teacher-student frameworks can encode biophysical relationships—such as depth-dependent attenuation and particulate scattering—into a compressed feature space. The teacher effectively learns a high-dimensional representation of optical—biogeochemical structure underpinned by SSD, while the student approximates this representation using only remote-sensing and meteorological variables. This process resembles manifold learning, in which the student is constrained to follow the teacher’s learned decision surface rather than exploring spurious correlations in the SSD-free feature space.

Finally, the alignment between feature-importance patterns and established limnological processes (e.g., the influence of wind-driven resuspension, storm-driven inflows, and algal growth on phosphorus distribution) suggests that the models are learning mechanistic relationships rather than arbitrary statistical associations. This enhances confidence in the scientific interpretability and robustness of the proposed AI framework.

5.5. Comparison with Previous TP Retrieval Studies and Rationale for Backbone Selection

Previous work has made substantial progress in satellite-based retrieval of total phosphorus, but most studies face one or more constraints related to spatial resolution, feature completeness, or model deployment ability. Xiong et al. developed a remote-sensing algorithm for TP in eutrophic lakes using MODIS FAI-type indices combined with both conventional and machine-learning models, achieving R² ≈ 0.60 for a Taihu-specific model and R² ≈ 0.64 (RMSE ≈ 0.06 mg·L⁻¹) for a generalized multi-lake algorithm [5,53]. This work is valuable because it systematically compares semi-analytical and data-driven approaches and explicitly addresses generalization across lakes. However, the coarse 250 m MODIS resolution limits its ability to resolve nearshore gradients and tributary plumes, and the feature set is largely restricted to surface reflectance and FAI-type indices without vertically integrated clarity metrics or short-term meteorological drivers.

Qiao et al. used Landsat-8 imagery for the Miyun Reservoir and conducted a comprehensive comparison of twelve machine-learning algorithms, showing that Extra Trees (ETRs) yielded the best TP retrieval performance with R² > 0.85 and very low MAE on a single-reservoir dataset [53]. Their study is a strong benchmark for algorithmic comparison under medium spatial resolution (30 m), but the feature space is dominated by spectral bands and simple indices. The model is site-specific and operates at a scale where many littoral processes remain subpixel, and there is no explicit treatment of auxiliary vertically integrated indicators such as SSD or of temporal meteorological context.

Several recent studies have adopted gradient-boosting ensembles and Sentinel-class imagery to improve TP retrievals at regional scales. Wang et al. used Sentinel-3/OLCI images across the Yangtze–Huaihe lake region and found that an XGBoost-based model outperformed empirical approaches but still achieved only moderate accuracy (R² ≈ 0.53, RMSE ≈ 0.08 mg·L⁻¹) when generalized across many lakes [54]. Cui et al. optimized an XGBoost model for TP retrieval in Taihu Lake using Sentinel-2 and a carefully selected feature combination, achieving R² ≈ 0.72 with improved stability relative to using all variables [40]. Lin et al. further advanced this line of work by developing an interpretable LightGBM-based model that reconstructs long-term TP dynamics (2005–2024) in Lake Taihu, emphasizing explainability and driver attribution, yet still within a single large lake and at 10–30 m resolution [55]. These studies collectively demonstrate that boosted tree ensembles (XGBoost, LightGBM and related methods) are consistently among the top performers for TP retrieval, and that adding carefully engineered indices improves robustness. However, they generally operate at medium resolution, rarely integrate in-situ clarity proxies such as SSD into the retrieval model, and do not address how to deploy a “high-information” model when such auxiliaries are absent.

Other recent work has begun to incorporate meteorological variables into TP or nutrient modelling. For example, Li et al. proposed a multimodal framework that combined satellite data and meteorological forcings to estimate several water-quality parameters, including TP, obtaining R² ≈ 0.50 for TP at the regional scale [56], while Qin et al. showed that air temperature is a key driver for TP and TN variability in northeastern lakes when combined with Sentinel-2 and machine-learning methods [57]. These studies highlight the importance of meteorological context but typically treat meteorological features as simple covariates; they neither integrate vertically integrated optical measures (SSD) nor explore how such rich feature spaces can be transferred to operational settings where some inputs are missing.

Against this backdrop, the present study differs from and extends prior work in three main ways. First, it exploits multi-generation PlanetScope imagery (3–5 m) to explicitly resolve nearshore and tributary structures that are unresolved by MODIS, OLCI, or even Sentinel-2 in many lakes. This enables detection and mapping of narrow plume-like phosphorus hotspots and embayment gradients that previous medium-resolution studies could only infer indirectly or at coarse scales.

Second, the feature space explicitly integrates a vertically integrated clarity metric (SSD), short-term meteorological descriptors (3–7 day aggregates of temperature, wind speed, precipitation, and pressure), and physically informed spectral indices. This design directly addresses two major gaps identified in the TP retrieval literature: the lack of water-column integrated optical information and the limited incorporation of short-term hydrometeorological drivers.

Third, the use of a teacher–student knowledge-distillation framework allows the high-accuracy, SSD-informed teacher to be “compressed” into an SSD-free student that can operate at any pixel where only remote-sensing and meteorological variables are available, thereby resolving the common operational dilemma of sparse in-situ data.

The choice of HistGradientBoosting as the backbone model for both teacher and student is also grounded in and consistent with the broader literature. Gradient-boosting ensembles (including XGBoost, LightGBM, and related variants) have repeatedly emerged as top performers in TP and nutrient retrieval tasks across lakes and regions, often outperforming random forests and support-vector regressors because they can capture complex, non-linear interactions while remaining relatively robust on tabular feature sets [57,58,59].

In this study, an initial benchmark across five ensemble learners (HistGBM, CatBoost, RandomForest, ExtraTrees, and GradientBoosting) and eight feature regimes showed that HistGradientBoosting systematically offered the best or near-best trade-off between accuracy, stability across feature groups, and computational efficiency. On this basis, the hybrid two-stage model used for knowledge distillation employs a HistGradientBoostingRegressor as the base learner, augmented by a standardized Ridge regression “linear head” trained on out-of-fold residuals and teacher predictions. This configuration benefits from the strong non-linear fitting capacity of boosting while allowing a lightweight linear correction layer to absorb residual structure and KD signals, providing a good balance between accuracy, stability, and interpretability for both the SSD-informed teacher and the SSD-free student.

By explicitly positioning the proposed framework relative to these earlier studies, the contribution of this work can be summarized as follows: it brings high-resolution (3–5 m) TP mapping into an optically complex lake, leverages both vertically integrated clarity and meteorological forcing in a unified feature space, and introduces a knowledge-distillation mechanism that preserves most of the teacher’s accuracy in an operationally scalable, SSD-free student. In doing so, it addresses several key limitations of previous TP retrieval studies—limited spatial resolution, incomplete driver sets, and non-deployable model architectures—while remaining consistent with the demonstrated strengths of gradient-boosting ensembles for water-quality retrieval.

5.6. Limitations and Future Research

Despite the encouraging performance of both teacher and student models, several limitations warrant careful consideration. The first constraint concerns optical water type dependence. The current models are calibrated primarily on mesotrophic to eutrophic conditions in Lake Simcoe, where PPUT ranges from several to tens of µg/L. In oligotrophic lakes with very low phosphorus and correspondingly weak optical signals, multispectral sensors may lack the sensitivity necessary to resolve phosphorus-related variability, and the learned relationships may not transfer without recalibration or the addition of hyperspectral data.

Second, there are temporal resolutions and timing constraints. Although PlanetScope provides near-daily nominal revisit, adequate temporal coverage is reduced by cloud contamination, data gaps, and mismatches with in-situ sampling times. Short-lived events such as storm-induced turbidity spikes, rapid tributary pulses, or internal waves can therefore be underrepresented or entirely missed in the satellite record. Integrating PlanetScope with constellations such as Sentinel-2 and Landsat-8/9 could mitigate some of these limitations by providing a denser, multi-sensor time series, but would introduce additional challenges in cross-sensor harmonization.

Third, the framework is subject to vertical ambiguity. The models predict surface or near-surface PPUT-like conditions and are largely insensitive to deep-water phosphorus accumulation below the optical penetration depth. In strongly stratified periods, deep hypolimnetic phosphorus enrichment may therefore remain undetected, requiring complementary profiling or moored sensor data for comprehensive assessment.

Fourth, spatial heterogeneity in nearshore zones remains a challenge. The use of pixel aggregation and limited spatial predictors can smooth sharp gradients at the land-water interface, where human activities, shoreline morphology, and local hydrodynamics can create strong PPUT contrasts over short distances. Incorporating explicit spatial descriptors—such as distance to shore, bathymetric depth, or proximity to tributary mouths—may help improve model performance in these transition regions.

Fifth, these comparisons indicate that feature observability is the primary determinant of performance, and algorithmic differences are most consequential only when SSD is unavailable. For lake-wide inference over large PlanetScope mosaics, HistGradientBoosting provides a pragmatic balance of predictive accuracy, computational efficiency, and robustness, while the SSD-aware regime mainly serves as a high-fidelity calibration and validation benchmark.

Finally, the knowledge-distillation framework introduces its own form of uncertainty propagation. Because student models inherit part of the teacher’s structure, any systematic biases or unresolved uncertainties in the teacher will be partially transferred. Furthermore, distillation tends to smooth extreme values, which may modestly reduce sensitivity to rare but ecologically significant high-phosphorus events. Future work could explore confidence-aware distillation, Bayesian teacher-student architectures, and explicit uncertainty quantification to better characterize and propagate predictive uncertainty.

Also, the distillation weight α was kept constant throughout training, consistent with standard KD formulations. Future work could explore adaptive or schedule-based α schemes (e.g., annealing α across epochs or modulating it as a function of student confidence), which may further enhance the balance between teacher guidance and data-driven learning.

5.7. Broader Implications

The integration of PlanetScope’s high-resolution multispectral imagery with machine learning and KD techniques has substantial implications for the future of inland water-quality monitoring. By demonstrating that lake-wide phosphorus distributions can be reconstructed at 3–5 m spatial resolution without reliance on dense auxiliary in-situ measurements such as SSD, the proposed framework addresses one of the most persistent gaps in operational monitoring programs: limited spatial coverage. The ability to generate stable, high-frequency phosphorus estimates from commercial satellite constellations enables a transition from episodic, labour-intensive sampling campaigns to a more proactive, spatially continuous monitoring paradigm.

In practice, high-resolution PPUT predictions improve situational awareness for lake managers. Fine-scale maps enhance the detection of nearshore nutrient hotspots, tributary plume dynamics, and localized resuspension events—features often missed by medium-resolution satellite products or sparse station networks. The temporal density of PlanetScope acquisitions mitigating eutrophication episodes before they fully develop. Such capabilities are particularly valuable for targeting field campaigns toward critical periods and locations, optimizing limited monitoring resources, and refining basin-wide nutrient budgets that account for spatial heterogeneity in loading and retention.

The lake-wide map for 26 July 2024 further illustrates how such SSD-free predictions can be used in practice to support spatially targeted management. Although nearly 97% of surface pixels fall within a relatively low 5–10 µg/L range, high-concentration waters form compact yet coherent clusters aligned with tributary mouths, narrow channels, and sheltered embayments. This pattern indicates that a small fraction of the lake area exerts a disproportionate influence on phosphorus loading and ecological risk. From a management perspective, these hotspot polygons provide natural candidates for intensified in-situ sampling, source apportionment studies, and land-use interventions in the corresponding sub-catchments.

Conversely, the broad, low-variability background in the open lake suggests that additional routine sampling in the pelagic interior would yield diminishing returns compared with targeted efforts in nearshore and tributary corridors. In this way, high-resolution PPUT maps produced by the distilled model do not simply visualize nutrient status; they operationalize a tiered monitoring strategy in which scarce field and remediation resources can be preferentially allocated to the small but dynamically important zones where phosphorus actually accumulates and propagates.

Beyond Lake Simcoe, the findings suggest a scalable implementation pathway. A tiered operational concept can be envisioned in which high-accuracy, SSD-aware teacher models are deployed on a limited set of benchmark lakes for calibration and regulatory assessment.

In contrast, distilled SSD-free student models are applied across larger regional lake networks for routine surveillance. In data-sparse regions or for historical reconstruction, further simplified models using a reduced set of spectral indices and temporal descriptors could still provide qualitative assessments of nutrient status at minimal cost. Collectively, these advances highlight the broader potential of combining commercial high-resolution satellite archives with AI-based knowledge-transfer methods to support scalable, accurate, and operational inland water-quality assessment, while maintaining a scientifically interpretable link to the underlying hydrological and biogeochemical processes.

Taken together, the analyses in Section 5 show that the success of PPUT retrieval in Lake Simcoe is less determined by the choice of machine-learning architecture than by which aspects of the lake system are made observable to the model. SSD emerges as a pivotal depth-integrated constraint that renders the inversion problem well-posed, while meteorological drivers and carefully designed spectral indices encode, respectively, the forcing and the optical expression of phosphorus dynamics. KD then provides a principled way to transfer this structure into an SSD-free student model, enabling scalable mapping across space and time without fully sacrificing accuracy. At the same time, the identified limitations—vertical ambiguity, optical water-type dependence, temporal mismatch, and uncertainty propagation—highlight the need to view such models as components of an integrated observing system rather than as stand-alone decision tools.

Thus, combining high-resolution PlanetScope imagery, physics-informed feature design, and teacher-student learning can bridge long-standing monitoring gaps in inland waters, while preserving a clear mechanistic link between model outputs and the hydrological and biogeochemical processes they are intended to represent.

6. Conclusions

This study demonstrates that integrating multi-generation PlanetScope imagery with a hybrid machine-learning and knowledge-distillation framework provides a robust pathway for high-resolution, lake-wide retrieval of total phosphorus (PPUT) in an optically complex inland lake.

By systematically evaluating five ensemble learners across eight feature-group regimes, we showed that the dominant constraint on PPUT retrieval is not model architecture but feature observability—most notably the availability of SSD as a vertically integrated indicator of water clarity. Across all models, including SSD increased the mean R² from approximately 0.67 to 0.94 and reduced error metrics by more than half, confirming that water-column transparency strongly governs the learnable structure of the PPUT prediction problem.

Building on this insight, we developed a two-stage teacher–student framework in which a physically informed teacher model is trained with SSD and transfers its representation to an SSD-free student via knowledge distillation. The distilled student retained about 88% of the teacher’s accuracy (R² = 0.83, RMSE = 9.82 µg/L, MAE = 5.41 µg/L) while relying only on PlanetScope reflectance, derived spectral indices, and short-term meteorological descriptors. The K-sweep analysis further revealed that an intermediate subset of 40 features provides an optimal balance between predictive skill and parsimony, indicating that a compact combination of optical and meteorological drivers can encode the essential dynamics of phosphorus transport, resuspension, and biological uptake.

Application of the SSD-free student model to PlanetScope SuperDove imagery from 2020 to 2025 produced metre-scale PPUT maps that resolve spatial patterns far beyond the capability of traditional monitoring and medium-resolution satellites. A representative case study for 26 July 2024 showed that the vast majority of lake-surface pixels (>97%) fall within a low-concentration band of 5–10 µg/L, whereas rare (<1%) but spatially coherent hotspots exceeding 20 µg/L occur near tributary mouths, sheltered embayments, and narrow channels. These hotspots form contiguous clusters that coincide with known river-inflow corridors and shallow, wind-exposed shorelines, highlighting the role of nearshore processes and watershed inputs in structuring phosphorus heterogeneity across Lake Simcoe.

In general, the findings indicate that the lake-wide nutrient monitoring using labour-intensive field campaigns can be complemented and enhanced using high-resolution satellite observations and distilled AI models. The proposed framework is readily transferable to other lakes where SSD or similar auxiliary measurements are collected at limited stations but cannot be mapped wall-to-wall. By enabling near-real-time estimation of PPUT at 3 m resolution, the approach provides a foundation for adaptive monitoring programs, early-warning tools for eutrophication, and improved watershed-scale nutrient budgeting. Overall, the combination of commercial high-resolution imagery, physics-informed feature engineering and knowledge-transfer algorithms constitutes a robust and generalizable strategy for inland water-quality retrieval.

Author Contributions

Y.D. contributed to the conceptualization, methodology, and the original and final writing of the manuscript. S.X.Y. and B.G. both contributed to the conceptualization, writing—review and editing, supervision of the manuscript, project administration, and funding acquisition. D.P. contributed to the writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) Alliance Grant #401643.

Data Availability Statement

The datasets used in this study are publicly available and can be downloaded from the internet. PlanetScope imagery used in this study was obtained from Planet Labs PBC under an academic research licence and is not publicly shareable. Interested researchers can obtain similar imagery directly from Planet Labs “https://www.planet.com (accessed on 28 August 2025)”, subject to licensing conditions. In-situ water-quality and Secchi depth data for Lake Simcoe and its Tributaries were provided by the Lake Simcoe Region Conservation Authority’s Open Data portal “https://lsrca.on.ca/index.php/home/open-data/ (accessed on 28 August 2025)” and the Ontario Ministry of the Environment, Conservation and Parks Open Data Portal “https://open.canada.ca/data/en/dataset/ (accessed on 28 August 2025)”. The lake polygon boundaries were obtained from the Land Information Ontario “https://geohub.lio.gov.on.ca/ (accessed on 15 October 2025)” geospatial data portal operated by the Ontario Ministry of Natural Resources and Forestry. Historical meteorological data were retrieved from the Meteostat project “https://meteostat.net (accessed on 28 August 2025)”. Derived PPUT prediction maps and analysis scripts are available from the corresponding author on reasonable request.

Acknowledgments

The authors gratefully acknowledge the Lake Simcoe Region Conservation Authority (LSRCA) and the Ontario Ministry of the Environment, Conservation and Parks (MECP) for providing in-situ water-quality and Secchi depth observations for Lake Simcoe. Land Information Ontario (LIO) of the Ontario Ministry of Natural Resources and Forestry is thanked for access to lake-boundary geospatial data used for lake delineation and masking. The authors also thank the Meteostat project for providing open, quality-controlled meteorological data and Planet Labs PBC for supplying PlanetScope imagery under an academic research license. The authors sincerely thank Andrew Paterson (Ontario Ministry of the Environment, Conservation and Parks; Dorset Environmental Science Centre) for his careful review and constructive comments, which helped improve the manuscript. The authors also appreciate helpful feedback from colleagues and anonymous reviewers.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Appendix A. Model Descriptions and Implementation Details

Appendix A.1. HistGradientBoosting Regressor (HistGBM)

HistGBM is an optimized implementation of gradient boosting machines that leverages histogram-based binning of continuous features. Instead of evaluating each feature value as a potential split point, HistGBM first bins feature values into a fixed number of histograms (typically 256). For a feature with values

x \in ℝ

, the binning process maps

x

to a discrete bin index

b (x) \in 1, 2, \dots, B

, where B is the number of bins. This reduces the computational complexity of finding optimal splits from O(n) to O(B) per feature, where n is the number of samples. The histogram construction also enables efficient parallelization. For our study, HistGBM is particularly valuable for handling high-dimensional input spaces (a potential consequence of including MET and SSD features) and large datasets, enabling faster experimentation across the 8 input regimes. Its ability to model non-linear relationships through additive trees:

F (x) = \sum_{t = 1}^{T} h_{t} (x)

(A1)

where each

h_{t}

is a decision tree and T is the number of trees, aligns with our goal of capturing complex dependencies between input features and the target variable.

Appendix A.2. CatBoost Regressor (CatBoost)

CatBoost is a gradient boosting library renowned for its robust handling of categorical features and for mitigating prediction shift. A key innovation in CatBoost is its ordered boosting mechanism, which uses a permutation-driven approach to decorrelate the trees. For each tree and each sample

i

, CatBoost uses only a subset of the data (those appearing before

i

in a random permutation to train the model when making predictions for

i

. This is formalized by defining, for each tree

t

, a permutation

σ_{t}

of the training samples. The prediction for the sample

i

at tree

t

is then based on the model trained on

σ_{t} (1), \dots, σ_{t} (i - 1)

. This reduces overfitting to noise in the training data. Additionally, CatBoost automatically handles categorical features by generating target-based statistics (e.g., mean target value per category) and combining them with one-hot encoding for low-cardinality features. In our study, CatBoost serves as a strong boosting baseline for comparing the effect of different input regimes under a consistent training and evaluation protocol. (If any categorical metadata are used in an extended setting, CatBoost can natively handle categorical variables; however, our primary use here is as a robust boosting regressor under the same numerical feature space.) Furthermore, its strong generalization performance and resistance to overfitting make it a reliable baseline for assessing the incremental value of MET across different input configurations.

Appendix A.3. RandomForest Regressor

RandomForest is an ensemble method that constructs multiple decision trees during training and outputs the mean prediction of the individual trees. The “random” aspect comes from two primary sources:

(i): bagging (bootstrap aggregation), where each tree is trained on a random subset of the training data sampled with replacement, and
(ii): random feature selection, where each node split is chosen from a random subset of the available features. For a forest with trees, the final predictions:

F (x) = \frac{1}{T} \sum_{t = 1}^{T} h_{t} (x; θ_{t}, D_{t})

(A2)

where

θ_{t}

are the tree parameters (split points, etc.) learned using a random subset of features, and

D_{t}

is the bootstrap sample for tree

T

. This inherent randomness decorrelates the trees, reducing variance and improving generalization. RandomForest is well-suited for our study because it provides a stable, non-parametric baseline that is less prone to overfitting compared to single decision trees. Its ability to capture non-linear interactions and its insensitivity to feature scaling make it ideal for analyzing the impact of adding MET features across various SSD contexts, as it can robustly model both simple and complex relationships without strong assumptions about the data distribution.

Appendix A.4. Extremely Randomized Trees (ExtraTrees)

ExtraTrees takes the randomization of RandomForest a step further by introducing randomness into split-point selection. While RandomForest selects the best split point among a random subset of features by evaluating potential split points (e.g., midpoints between sorted feature values), ExtraTrees randomly selects the split point itself. For a given feature and node, instead of searching for the optimal threshold that minimizes impurity (e.g., MSE for regression), a threshold is uniformly sampled from the range of feature values present in the node. The tree is then split based on this random threshold. This additional randomization further reduces the ensemble’s variance at the cost of a slight increase in bias, often leading to better generalization on noisy datasets. The model is defined similarly to RandomForest:

F (x) = \frac{1}{T} \sum_{t = 1}^{T} h_{t} (x; {\tilde{θ}}_{t}, D_{t})

(A3)

where

{\tilde{θ}}_{t}

now includes randomly selected split thresholds. For our research, ExtraTrees is valuable as it provides a different inductive bias compared to RandomForest and gradient boosting methods. Its faster training time (due to avoiding exhaustive split point search) and potential to find novel splits that a greedy search might miss make it a complementary model for evaluating the robustness of MET’s added value across diverse input regimes, including those with high levels of SSD-aware noise or variability.

Appendix A.5. Gradient Boosting Regressor (Vanilla GBR)

Gradient Boosting is a sequential ensemble method in which each new tree is trained to correct the residuals from the previous ensemble. It starts with an initial model, often a constant value:

F_{0} (x) = \arg \min_{γ} \sum_{i = 1}^{n} L (y_{i}, γ)

(A4)

where

L

is the loss function, typically

M S E (L (y, \hat{y}) = (y - \hat{y})^{2})

. Subsequent trees

h_{t}

are fit to the negative gradient of the loss with respect to the current prediction. The ensemble is updated as:

F_{t} (x) = F_{t - 1} (x) + ν h_{t} (x)

(A5)

where

ν

is the learning rate. Unlike HistGBM or CatBoost, this implementation typically uses exact split finding (evaluating all possible split points for each feature) and does not include specialized handling for categorical features or advanced regularization beyond subsampling. This model serves as a crucial baseline in our study because it represents the traditional gradient boosting approach without the optimizations of HistGBM or CatBoost. By comparing its performance across input regimes to the more optimized variants, we can better isolate the specific contributions of DOY, INDICES, MET and SSD features from the effects of model-specific optimizations. Its tendency to overfit if not properly regularized also makes it a good test for the discriminative power of the above features- if these features truly add signal, it should help even a simpler GBR model generalize better across different SSD conditions.

By training these five diverse models under each of the 8 input regimes, we aim to systematically assess how the inclusion of DOY, INDICES, MET features, under varying SSD contexts, impacts predictive performance across different learning paradigms, thus ensuring the robustness and generalizability of our findings.

References

Wei, Z.; Yu, Y.; Yi, Y. Analysis of future nitrogen and phosphorus loading in watershed and the risk of lake blooms under the influence of complex factors: Implications for management. J. Environ. Manag. 2023, 345, 118581. [Google Scholar] [CrossRef] [PubMed]
Paerl, H.W.; Havens, K.E.; Xu, H.; Zhu, G.; McCarthy, M.J. Mitigating eutrophication and toxic cyanobacterial blooms in large lakes: The evolution of a dual nutrient (N and P) reduction paradigm. Hydrobiologia 2020, 847, 4359–4373. [Google Scholar] [CrossRef]
Ding, S.; Chen, M.; Gong, M.; Fan, X.; Qin, B.; Xu, H. Internal phosphorus loading from sediments causes seasonal nitrogen limitation for harmful algal blooms in a large shallow lake. Sci. Total Environ. 2018, 636, 139–148. [Google Scholar]
Li, J.; Wang, J.; Wu, Y.; Cui, Y.; Yan, S. Remote sensing monitoring of total nitrogen and total phosphorus concentrations in the water around Chaohu Lake based on geographical division. Front. Environ. Sci. 2022, 10, 1014155. [Google Scholar] [CrossRef]
Xiong, J.; Lin, C.; Cao, Z.; Hu, M.; Xue, K.; Chen, X.; Ma, R. Development of remote sensing algorithm for total phosphorus concentration in eutrophic lakes: Conventional or machine learning? Water Res. 2022, 218, 118444. [Google Scholar] [CrossRef]
Wu, Y. Adjacency Effect in Nearshore Aquatic Remote Sensing: Modelling, Correction, and Application; University of Ottawa Research Repository: Ottawa, ON, Canada, 2025. [Google Scholar]
Zhang, R.; Chu, N.; Yin, K.; Dong, L.; Li, Q.; Liu, H. Satellite-Based Analysis of Nutrient Dynamics in Northern South China Sea Marine Ranching under the Combined Effects of Climate Warming and Anthropogenic Activities. J. Mar. Sci. Eng. 2025, 13, 1677. [Google Scholar] [CrossRef]
Guo, H.; Huang, J.J.; Zhu, X.; Tian, S.; Wang, B. Spatiotemporal variation reconstruction of total phosphorus in the Great Lakes since 2002 using remote sensing and deep neural network. Water Res. 2024, 258, 120450. [Google Scholar] [CrossRef] [PubMed]
Batina, A.; Krtalić, A. Integrating remote sensing methods for monitoring lake water quality: A comprehensive review. Hydrology 2024, 11, 92. [Google Scholar] [CrossRef]
Zhai, M.; Zhou, X.; Tao, Z.; Xie, Y.; Yang, J.; Shao, W. Satellite-ground synchronous in-situ dataset of water optical parameters and surface temperature for typical lakes in China. Sci. Data 2024, 11, 883. [Google Scholar]
Ashphaq, M.; Srivastava, P.K.; Mitra, D. Preliminary examination of influence of Chlorophyll, Total Suspended Material, and Turbidity on Satellite Derived-Bathymetry estimation in coastal turbid water. Heliyon 2023, 9, e17681. [Google Scholar]
Liu, J.; Qiu, Z.; Feng, J.; Wong, K.P.; Tsou, J.Y.; Wang, Y. Monitoring Total Suspended Solids and Chlorophyll-a Concentrations in Turbid Waters: A Case Study of the Pearl River Estuary and Coast Using Machine Learning. Remote Sens. 2023, 15, 5559. [Google Scholar] [CrossRef]
Theenathayalan, V.; Sathyendranath, S.; Kulk, G. Regional satellite algorithms to estimate chlorophyll-a and total suspended matter concentrations in Vembanad Lake. Remote Sens. 2022, 14, 6404. [Google Scholar] [CrossRef]
Abualhin, K.; Abushaban, S. Predictive models of non-optically active coastal water quality parameters by remote sensing imagery. Maejo Int. J. Sci. Technol. Res. 2025, 14, 274944. [Google Scholar] [CrossRef]
Lin, W.; Li, N.; Zhang, Y.; Shi, K.; Guo, H.; Zhang, Y.; Qin, B. Widespread decrease of phosphorus and the potential driving mechanisms in Taihu Basin’s lakes. Environ. Res. Commun. 2025, 7, 125011. [Google Scholar] [CrossRef]
Chen, L.; Liu, L.; Liu, S.; Shi, Z.; Shi, C. The application of remote sensing technology in inland water quality monitoring and water environment science: Recent progress and perspectives. Remote Sens. 2025, 17, 667. [Google Scholar] [CrossRef]
Yang, X.; Chen, J.; Lu, X.; Liu, H.; Liu, Y.; Bai, X.; Qian, L. Advances in UAV Remote Sensing for Monitoring Crop Water and Nutrient Status: Modeling Methods, Influencing Factors, and Challenges. Plants 2025, 14, 2544. [Google Scholar] [CrossRef]
Assunção, A.; Silva, T.F.G.; de Carvalho, L.A.S. Assessing water quality restoration measures in Lake Pampulha (Brazil) through remote sensing imagery. Environ. Sci. Pollut. Res. 2025, 32, 21277–21291. [Google Scholar] [CrossRef]
Tavares, M.H.; Guimarães, D.; Roussillon, J.; Baute, V. A Framework to Retrieve Water Quality Parameters in Small, Optically Diverse Freshwater Ecosystems Using Sentinel-2 MSI Imagery. Remote Sens. 2025, 17, 2729. [Google Scholar] [CrossRef]
Karimi, N.; Torabi, O. Remote Sensing-Based Bathymetry Mapping in Shallow Lakes: Comparative Analysis of Sentinel-2 and Landsat-8 Imagery Integrated with Machine Learning. Cont. Shelf Res. 2025, 272, 105075. [Google Scholar] [CrossRef]
Deng, Y.; Zhang, Y.; Pan, D.; Yang, S.X.; Gharabaghi, B. Review of Recent Advances in Remote Sensing and Machine Learning Methods for Lake Water Quality Management. Remote Sens. 2024, 16, 4196. [Google Scholar] [CrossRef]
Zhao, Y.; He, X.; Pan, S.; Bai, Y.; Wang, D.; Li, T. Satellite Retrievals of Water Quality for Diverse Inland Waters from Sentinel-2 Images: An Example from Zhejiang Province, China. Ecol. Indic. 2024, 164, 114875. [Google Scholar]
Wasehun, E.T.; Hashemi Beni, L.; Di Vittorio, C.A. UAV and Satellite Remote Sensing for Inland Water Quality Assessments: A Literature Review. Environ. Monit. Assess. 2024, 196, 12342. [Google Scholar] [CrossRef] [PubMed]
Parida, B.R.; Tiwari, S.; Dwivedi, C.S.; Pandey, A.C. Comparative Assessment of Satellite-Based Models through PlanetScope and Landsat-8 for Determining Physico-Chemical Water Quality Parameters in Varuna River, India. Appl. Water Sci. 2025, 15, 2367. [Google Scholar]
Kabir, S.; Saranathan, A.M.; Barnes, B. Feasibility of PlanetScope SuperDove Constellation for Water Quality Monitoring of Inland and Coastal Waters. Front. Remote Sens. 2025, 6, 1624783. [Google Scholar]
Liu, B.; Li, T. A Machine-Learning-Based Framework for Retrieving Water Quality Parameters in Urban Rivers Using UAV Hyperspectral Images. Remote Sens. 2024, 16, 905. [Google Scholar]
Tian, D.; Zhao, X.; Gao, L.; Liang, Z.; Yang, Z.; Zhang, P. Inversion of Water Quality Variables Based on Machine Learning and Cluster-Analysis Empirical Models Using Multi-Source Remote Sensing Data in Inland Reservoirs. Environ. Pollut. 2024, 339, 123021. [Google Scholar]
Xu, X.; Pan, J.; Zhang, H.; Lin, H. Progress in Remote Sensing of Heavy Metals in Water. Remote Sens. 2024, 16, 3888. [Google Scholar] [CrossRef]
Biswas, P. Development of Chlorophyll-a Soft Sensor Using Machine Learning and IoT. ResearchGate Preprint. Master’s Thesis, Universiti Malaya, Kuala Lumpur, Malaysia, 2023. [Google Scholar]
Tian, S.; Guo, H.; Xu, W.; Zhu, X.; Wang, B.; Zeng, Q. Remote Sensing Retrieval of Inland Water Quality Parameters Using Sentinel-2 and Multiple Machine Learning Algorithms. Environ. Sci. Pollut. Res. 2023, 30, 44478–44492. [Google Scholar] [CrossRef]
Li, H.; Wang, N.; Du, Z.; Huang, D.; Shi, M.; Zhong, Z.; Yuan, D. Multi-Parameter Water Quality Inversion in Heterogeneous Inland Waters Using UAV-Based Hyperspectral Data and Deep Learning Methods. Remote Sens. 2025, 17, 2191. [Google Scholar]
Zheng, D.; Lv, A. MosaicFormer: A Novel Approach to Remote Sensing Spatiotemporal Data Fusion for Lake Water Monitors. Remote Sens. 2025, 17, 1138. [Google Scholar]
Chen, Y.; Xue, P. Dual-Transformer Deep Learning Framework for Seasonal Forecasting of Great Lakes Water Levels. J. Hydrometeorol. 2025, 26, e519. [Google Scholar] [CrossRef]
Pan, D.; Deng, Y.; Yang, S.X.; Gharabaghi, B. Recent Advances in Remote Sensing and Artificial Intelligence for River Water Quality Forecasting: A Review. Environments 2025, 12, 158. [Google Scholar] [CrossRef]
Zhou, Y.; Li, W.; Cao, X.; He, B.; Feng, Q.; Yang, F. Spatial-Temporal Distribution of Labeled Set Bias in Remote Sensing Estimation: Implication for Supervised Machine Learning in Water Quality Monitoring. Ecol. Indic. 2024, 163, 114313. [Google Scholar] [CrossRef]
Sajib, A.M.; Uddin, M.G.; Rahman, A.; Ahmadian, R.; Olbert, A.I. Remote sensing applications for monitoring optically inactive water quality indicators: A comprehensive review. Earth-Sci. Rev. 2025, 271, 105259. [Google Scholar] [CrossRef]
Ramtel, P.; Feng, D.; Gardner, J. Toward Large-Scale Riverine Phosphorus Estimation Using Remote Sensing and Machine Learning. J. Geophys. Res. Biogeosci. 2024, 129, e2024JG008121. [Google Scholar] [CrossRef]
Cristofoli, E. Using Satellite Images and Deep Learning to Detect Water Hidden Under the Vegetation: A Cross-Modal Knowledge Distillation-Based Method to Reduce Manual Labeling. Master’s Thesis, Luleå University of Technology, Luleå, Sweden, 2024. [Google Scholar]
Zhou, W.; Li, Y.; Huan, J.; Liu, Y. MSTNet-KD: Multilevel Transfer Networks Using Knowledge Distillation for Dense Prediction of Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4504612. [Google Scholar] [CrossRef]
Cui, J.; Wu, S.; Dai, J.; Xue, W.; Zhang, Y.; You, J.; Lv, X. Satellite Retrievals of Total Phosphorus in Taihu Lake Using Sentinel-2 Images and an Optimized XGBoost Model. Ecol. Inform. 2025, 82, 1004935. [Google Scholar] [CrossRef]
Qin, H.; Fang, C.; Liu, G.; Song, K.; Li, Z.; Li, S.; Tao, H.; Yan, Z. Temperature Is a Key Factor Affecting Total Phosphorus and Total Nitrogen Concentrations in Northeastern Lakes Based on Sentinel-2 Images and Machine Learning Methods. Remote Sens. 2025, 17, 267. [Google Scholar] [CrossRef]
Yin, L.; Wang, C.; Wang, X. Research on Stacking Ensemble Learning-Based Remote Sensing Retrieval of Total Phosphorus Concentration in Poyang Lake and Its Multi-Dimensional Driving Mechanisms. IEEE Trans. Geosci. Remote Sens. 2025, 18, 25770–25780. [Google Scholar]
Leggesse, E.S.; Zimale, F.A.; Sultan, D.; Enku, T.; Tilahun, S.A. Advancing non-optical water quality monitoring in Lake Tana, Ethiopia: Insights from machine learning and remote sensing techniques. Front. Water 2024, 6, 1432280. [Google Scholar] [CrossRef]
Ngamile, S.; Madonsela, S.; Kganyago, M. Trends in remote sensing of water quality parameters in inland water bodies: A systematic review. Front. Environ. Sci. 2025, 13, 1549301. [Google Scholar] [CrossRef]
Xiong, J.; Lin, C.; Ma, R.; Cao, Z. Remote Sensing Estimation of Lake Total Phosphorus Concentration Based on MODIS: A Case Study of Lake Hongze. Remote Sens. 2019, 11, 2068. [Google Scholar]
Oliveira Santos, V.; Guimarães, B.M.D.M.; Neto, I.E.L.; de Souza Filho, F.D.A.; Costa Rocha, P.A.; Thé, J.V.G.; Gharabaghi, B. Chlorophyll-a Estimation in 149 Tropical Semi-Arid Reservoirs Using Remote Sensing Data and Six Machine Learning Methods. Remote Sens. 2024, 16, 1870. [Google Scholar]
Ontario Ministry of Natural Resources and Forestry. Waterbody (LIO) Dataset. Available online: https://geohub.lio.gov.on.ca/datasets/29a6e59237bd4fbe8f013b52971dbd25_14 (accessed on 12 August 2025).
Santos, V.O.; Rocha, P.A.C.; Thé, J.V.G.; Gharabaghi, B. Evaluation of machine learning methods for forecasting turbidity in river networks using Sentinel-2 remote sensing data. Ecol. Inform. 2025, 90, 103313. [Google Scholar]
Ontario Ministry of the Environment, Conservation and Parks. Lake Simcoe Water Quality Monitoring Data (Chemistry and Secchi Depth), 1980–2023; Available via Ontario Open Data Catalogue; Queen’s Printer for Ontario: Toronto, ON, Canada, 2023. [Google Scholar]
Meteostat. Meteostat: Historical Weather and Climate Data. 2025. Available online: https://meteostat.net (accessed on 12 August 2025).
Downes, J.; Bruce, D.; Miot da Silva, G.; Hesp, P.A. Optimising Satellite-Derived Bathymetry Using Optical Imagery over the Adelaide Metropolitan Coast. Remote Sens. 2025, 17, 849. [Google Scholar]
Palmer, M.E.; Winter, J.G.; Young, J.D.; Dillon, P.J.; Guildford, S.J. Introduction and summary of research on Lake Simcoe: Research, monitoring, and restoration of a large lake and its watershed. J. Great Lakes Res. 2011, 37, 1–6. [Google Scholar] [CrossRef]
Xiong, J.; Lin, C.; Ma, R.; Wang, X.; Xue, K.; Cao, Z.; Hu, M.; Chen, L. Remote Sensing Observations of Phosphorus in Eutrophic Lakes: From Concentration to Storage. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4203812. [Google Scholar]
Qiao, Z.; Sun, S.; Jiang, Q.; Xiao, L.; Wang, Y.; Yan, H. Retrieval of Total Phosphorus Concentration in the Surface Water of Miyun Reservoir Based on Remote Sensing Data and Machine Learning Algorithms. Remote Sens. 2021, 13, 4662. [Google Scholar] [CrossRef]
Wang, X.; Jiang, Y.; Jiang, M.; Cao, Z.; Li, X.; Ma, R.; Xu, L.; Xiong, J. Estimation of Total Phosphorus Concentration in Lakes in the Yangtze–Huaihe Region Based on Sentinel-3/OLCI Images. Remote Sens. 2023, 15, 4487. [Google Scholar]
Lin, W.; Zhou, Y.; Ren, Z.; Zou, W.; Guo, H.; Li, N.; Zhang, Y.; Elser, J.; Woolway, R.I.; Shi, K.; et al. Interpretable Data-Driven Modeling of Total Phosphorus Dynamics from 2005 to 2024 in a Large Shallow Lake. Water Res. 2025, 291, 125169. [Google Scholar] [CrossRef] [PubMed]
Li, N.; Yang, G.; Zhao, X.; Liang, Q.; Sun, Y.; Chen, Z.; Zhao, H. Multimodal Remote Sensing and Meteorological Estimation of Long-Term Spatial-Scale Non-Point-Source Pollution in Inland Reservoirs. Ecol. Indic. 2025, 159, 113744. [Google Scholar]
Chen, X.; Yang, Y.; Jiang, Y. Gradient Boosting and Remote Sensing Analysis of Spatiotemporal Variations of Total Nitrogen and Phosphorus in Donghu Lake, Wuhan. Inland Waters 2025, 15, 2441618. [Google Scholar] [CrossRef]
Yan, C.; Fu, X.; Gao, H.; Dong, W.; Liu, Z.; Xu, Z. Enhancing Chlorophyll-a Estimation in Optically Complex Waters Using ZY-1 02E Hyperspectral Imagery: An Integrated Approach Combining Optical Classification and Multi-Index Blending Models. Remote Sens. 2025, 17, 3795. [Google Scholar]

Figure 2. Teacher–student KD framework for Lake Simcoe PPUT prediction.

Figure 3. Boxplots summarizing model performance under SSD-aware versus SSD-free regimes across the evaluated models and feature groups.

Figure 4. Heatmap of R² by model and feature-group regime.

Figure 5. Effect of feature accumulation on predictive performance.

Figure 6. Category-level aggregation of permutation importance (cumulative ΔR²) for meteorological variables, remote-sensing indices, raw spectral bands, and temporal descriptors.

Figure 7. Student-model performance as a function of the number of selected predictors.

Figure 8. Predicted versus measured PPUT for the distilled student model on training and held-out test subsets.

Figure 9. Lake-wide PPUT prediction map for Lake Simcoe on 26 July 2024 generated by the SSD-free distilled student model from cloud-/land-masked PlanetScope SR mosaics.

Table 1. Definition of the eight feature-group regimes used for benchmarking and KD training across Groups A/B and Stages 1–4.

	Group A	Group B
Stage 1	$x_{1 A} = [x_{S S D}, x_{R S}]$	$x_{1 B} = [x_{R S}]$
Stage 2	$x_{2 A} = [{x_{S S D}, x}_{R S}, x_{D O Y}]$	$x_{2 B} = [x_{R S}, x_{D O Y}]$
Stage 3	$x_{3 A} = [x_{S S D}, x_{R S}, x_{D O Y}, x_{I D X}]$	$x_{3 B} = [x_{R S}, x_{D O Y}, x_{I D X}]$
Stage 4	$x_{4 A} = [x_{S S D}, x_{R S}, x_{D O Y}, x_{I D X}, x_{M E T}]$	$x_{4 B} = [x_{R S}, x_{D O Y}, x_{I D X}, x_{M E T}]$

Table 2. Single-model performance (R², RMSE, MAE) for SSD-aware and SSD-free regimes across the five evaluated ensemble learners.

Model	Feature Group	With SSD			Without SSD
Model	Feature Group	R²	RMSE	MAE	R²	RMSE	MAE
CatBoost	Bands	0.925	0.240	0.170	0.545	0.580	0.392
	Bands + DOY	0.936	0.215	0.152	0.685	0.490	0.325
	Bands + DOY + Indices	0.940	0.208	0.149	0.750	0.432	0.284
	Bands + DOY + Indices + MET	0.945	0.202	0.142	0.832	0.354	0.249
HistGBM	Bands	0.927	0.235	0.169	0.561	0.571	0.390
	Bands + DOY	0.939	0.212	0.150	0.704	0.471	0.318
	Bands + DOY + Indices	0.942	0.207	0.148	0.741	0.440	0.291
	Bands + DOY + Indices + MET	0.947	0.199	0.140	0.810	0.366	0.265
GradientBoosting	Bands	0.930	0.229	0.166	0.588	0.554	0.375
	Bands + DOY	0.939	0.210	0.149	0.696	0.472	0.316
	Bands + DOY + Indices	0.942	0.207	0.147	0.748	0.432	0.287
	Bands + DOY + Indices + MET	0.947	0.200	0.139	0.807	0.371	0.262
ExtraTrees	Bands	0.930	0.228	0.168	0.520	0.60	0.409
	Bands + DOY	0.940	0.211	0.152	0.672	0.495	0.330
	Bands + DOY + Indices	0.943	0.206	0.148	0.732	0.448	0.295
	Bands + DOY + Indices + MET	0.946	0.199	0.141	0.802	0.380	0.269
RandomForest	Bands	0.928	0.233	0.173	0.510	0.620	0.418
	Bands + DOY	0.937	0.216	0.155	0.648	0.505	0.339
	Bands + DOY + Indices	0.941	0.210	0.150	0.710	0.458	0.303
	Bands + DOY + Indices + MET	0.944	0.203	0.143	0.780	0.395	0.276

Table 3. Definitions of key predictors and their hypothesized physical relevance to PPUT dynamics in Lake Simcoe.

Feature Name	Definition/Formula	Physical Meaning for PPUT
tmax_3d	3-day mean of daily maximum air temperature	Short-term thermal forcing: controls stratification strength, snowmelt timing, and metabolic rates that influence phosphorus cycling.
tavg_3d	3-day mean air temperature	Similarly to tmax_3d: integrates general warming/cooling trends affecting runoff and internal loading.
tavg	Same-day mean air temperature	Day-scale thermal signal: complements 3-day aggregates.
pres_3d	3-day mean surface pressure	Low-pressure systems are associated with storm events and enhanced watershed runoff → increased P delivery and turbidity.
pres	Same-day surface pressure	Captures individual storm passages and synoptic conditions.
wspd_7d	7-day mean wind speed	Controls wave action and sediment resuspension in shallow areas, releasing particulate and dissolved phosphorus.
wspd_3d	3-day mean wind speed	Shorter-term wind forcing: affects mixing of river plumes and nearshore zones.
wpgt_7d	7-day mean wind gust	Episodic strong winds: important for resuspension events and internal P loading.
prcp_7d	7-day cumulative precipitation	Proxy for watershed runoff and river discharge, which transport phosphorus from the catchment to the lake.
band4_mean	Mean NIR reflectance	Sensitive to backscattering by suspended particles: higher NIR often indicates higher TSS and particulate P.
ExG	ExG = 2G − R − B	“Excess Green” index: enhances green reflectance relative to red and blue, widely used as a proxy for algal greenness and bloom intensity.
ExGR	ExGR = ExG − (1.4R − G)	Refines ExG by subtracting a red-dominant term: better separates green, algae-rich water from turbid or reddish waters.
SR_NR	SR_NR = N/R	NIR/Red ratio: high values indicate strong NIR backscatter relative to red absorption, typical of high TSS and phytoplankton—both linked to phosphorus.
SR_RG	SR_RG = R/G	Red/Green ratio: increases with higher chlorophyll-a and suspended matter, reflecting eutrophication.
ND_NB	ND_NB = N − B/N + B	Normalized difference between NIR and Blue: distinguishes CDOM-dominated (blue absorption) vs. particle-dominated (NIR scattering) waters.
ratio_R_N	Ratio_R_N = R/N	Complement to SR_NR: highlights regimes where red absorption dominates over NIR scattering.
greenness	Greenness = G/(R + B)	Generic greenness indicator: elevated greenness is usually associated with higher phytoplankton biomass.
diff_R_G	diff_R_G = R − G	Simple red-green difference: increases when red reflectance rises due to algal pigments or inorganic particles.
diff_N_R	Diff_N_R = N − R	NIR—red contrast: strong positive values indicate high scattering by suspended solids or dense algal blooms.

Table 4. Stratified performance of the student model of Lake Simcoe PPUT prediction.

Concentration Range (µg/L)	Training				Testing
Concentration Range (µg/L)	Number of Samples	RMSE	MAE	MAPE	Number of Samples	RMSE	MAE	MAPE
5–120	435	7.6	3.14	13.6	109	9.82	5.41	33.7
<25	324	2.81	1.26	14	87	6.29	3.51	35.6
25–50	23	8.83	5.65	15.9	7	24.19	17.62	52.2
≥50	88	15.38	9.39	11.5	15	14.08	10.71	14.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deng, Y.; Pan, D.; Yang, S.X.; Gharabaghi, B. PlanetScope Imagery and Hybrid AI Framework for Freshwater Lake Phosphorus Monitoring and Water Quality Management. Water 2026, 18, 261. https://doi.org/10.3390/w18020261

AMA Style

Deng Y, Pan D, Yang SX, Gharabaghi B. PlanetScope Imagery and Hybrid AI Framework for Freshwater Lake Phosphorus Monitoring and Water Quality Management. Water. 2026; 18(2):261. https://doi.org/10.3390/w18020261

Chicago/Turabian Style

Deng, Ying, Daiwei Pan, Simon X. Yang, and Bahram Gharabaghi. 2026. "PlanetScope Imagery and Hybrid AI Framework for Freshwater Lake Phosphorus Monitoring and Water Quality Management" Water 18, no. 2: 261. https://doi.org/10.3390/w18020261

APA Style

Deng, Y., Pan, D., Yang, S. X., & Gharabaghi, B. (2026). PlanetScope Imagery and Hybrid AI Framework for Freshwater Lake Phosphorus Monitoring and Water Quality Management. Water, 18(2), 261. https://doi.org/10.3390/w18020261

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PlanetScope Imagery and Hybrid AI Framework for Freshwater Lake Phosphorus Monitoring and Water Quality Management

Abstract

1. Introduction

2. Methods

2.1. Problem Formulation

2.2. Data Modalities

2.3. Pre-Processing and Feature Engineering

2.4. Model Zoo and Training Regimes

2.5. Knowledge Distillation

2.6. Validation and Metrics

2.7. Lake-Wide Prediction

3. Case Study: Lake Simcoe, Ontario, Canada

3.1. Study Area

3.2. Data Sources

3.3. Case-Specific Data Preparation

3.4. Application of the Modeling Framework to Lake Simcoe

3.5. Lake-Wide PPUT Mapping for Lake Simcoe

4. Results

4.1. Influence of Feature Groups and SSD Availability on Model Performance

4.2. Performance Gains from Feature Accumulation

4.3. Cross-Model Comparison and Algorithmic Behavior

4.4. Feature Importance and Physical Interpretation

4.5. KD and Optimal Feature Subset

4.6. Lake-Wide High-Resolution PPUT Prediction

5. Discussion

5.1. Mechanisms Underlying SSD’s Dominant Predictive Role

5.2. Implications for Scalable, SSD-Free Monitoring

5.3. Feature Engineering, Physical Interpretability, and Dimensionality Trade-Off

5.4. Cross-Model Implications and Underlying Machine Learning Mechanisms

5.5. Comparison with Previous TP Retrieval Studies and Rationale for Backbone Selection

5.6. Limitations and Future Research

5.7. Broader Implications

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Model Descriptions and Implementation Details

Appendix A.1. HistGradientBoosting Regressor (HistGBM)

Appendix A.2. CatBoost Regressor (CatBoost)

Appendix A.3. RandomForest Regressor

Appendix A.4. Extremely Randomized Trees (ExtraTrees)

Appendix A.5. Gradient Boosting Regressor (Vanilla GBR)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI