Satellite-Based Machine Learning for Temporal Assessment of Water Quality Parameter Prediction in a Coastal Shallow Lake

Batina, Anja; Šerić, Ljiljana; Krtalić, Andrija; Šiljeg, Ante

doi:10.3390/jmse14060566

Open AccessArticle

Satellite-Based Machine Learning for Temporal Assessment of Water Quality Parameter Prediction in a Coastal Shallow Lake

¹

Center for Geospatial Technologies, University of Zadar, Trg kneza Višeslava 9, 23000 Zadar, Croatia

²

Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture, University of Split, Ruđera Boškovića 32, 21000 Split, Croatia

³

Faculty of Geodesy, University of Zagreb, Kačićeva 26, 10000 Zagreb, Croatia

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(6), 566; https://doi.org/10.3390/jmse14060566

Submission received: 27 February 2026 / Revised: 10 March 2026 / Accepted: 16 March 2026 / Published: 18 March 2026

(This article belongs to the Special Issue Assessment and Monitoring of Coastal Water Quality)

Download

Browse Figures

Versions Notes

Abstract

Satellite remote sensing increasingly supports water quality monitoring, yet the temporal transferability of machine learning (ML) models remains insufficiently tested, particularly in coastal shallow lakes subject to hydrological variability. This study evaluates the predictive robustness of satellite-based ML models for electrical conductivity (EC), turbidity (TUR), water temperature (WT), and dissolved oxygen (DO) in Vrana Lake, Croatia. A total of 409 in situ measurements collected during 2023–2024 and 2025 were paired with Sentinel-2 and Landsat 8–9 imagery. Pearson, Spearman, and Kendall correlation analyses were applied for parameter-specific band selection using original, inverse, quadratic, and logarithmic feature transformations. Seventeen regression algorithms were evaluated under six training–testing split strategies, including strict temporal projection. WT exhibited high robustness (R² ≈ 0.90 under temporal projection) due to its strong dependence on thermal bands, while DO achieved moderate temporal stability (R² = 0.51) using log-transformed predictors. EC and TUR demonstrated substantial performance degradation under temporal separation (R² = 0.14 and −4.62, respectively), reflecting sensitivity to distribution shifts. For parameters showing sufficient stability, interpretable band-based retrieval equations were derived using the most strongly correlated spectral predictors. These findings highlight the importance of temporally structured validation and demonstrate that model complexity does not guarantee operational robustness in shallow, dynamically evolving lake systems.

Keywords:

Sentinel-2; Landsat 8–9; band selection; Mediterranean coastal lake; temporal validation

1. Introduction

Satellite remote sensing has become an increasingly important tool for water quality monitoring because it provides spatially continuous observations that complement traditional in situ measurements [1,2]. Recent advances in multispectral satellite sensors, particularly the Sentinel-2 and Landsat 8–9 missions, have improved the detection of key water quality parameters in inland and coastal waters [3,4]. These developments have been accompanied by rapid growth in the application of machine learning (ML) methods, which have shown strong potential for capturing complex, non-linear relationships between spectral reflectance and water quality variables [5,6].

Numerous recent studies have demonstrated that ML approaches can often outperform classical empirical and semi-analytical models in optically complex aquatic environments [6,7]. However, the majority of these studies rely on randomly shuffled training and validation datasets, which may lead to overly optimistic performance estimates and limit the operational applicability of developed models [8]. In practice, water quality monitoring requires models that can predict future conditions rather than merely interpolate within historically observed distributions. Temporal non-stationarity caused by seasonal variability, hydrological changes, and meteorological forcing remains a critical challenge for satellite-based water quality retrieval [9].

The issue of model generalization and transferability is particularly pronounced in shallow coastal lakes, where strong coupling between water level, salinity, and biogeochemical processes results in rapid changes in optical properties [10,11]. Such systems are often characterized by mixed pixels, variable bottom influence, and regime shifts that violate assumptions of temporal stability commonly adopted in remote sensing models [12]. These characteristics complicate the interpretation of satellite spectral signals and may reduce the transferability of empirical or ML retrieval models. Under these conditions, models trained on historical data may fail when applied to independent future periods, especially if environmental boundary conditions shift. Recent literature highlights the need for validation strategies that explicitly account for temporal separation between training and testing datasets to more realistically assess model robustness [8].

In addition to validation design, feature engineering and parameter-specific band selection play a crucial role in improving model stability. Correlation-based feature selection has been increasingly applied as a preprocessing step to improve model interpretability and reduce overfitting in ML-based water quality studies [1,5]. By identifying spectrally relevant bands prior to model training, correlation analysis can improve parameter-specific retrieval performance and facilitate the derivation of interpretable band-based formulations [1,13]. Such approaches are particularly valuable in operational monitoring contexts, where transparency, physical interpretability, and temporal robustness are required alongside predictive accuracy [8].

Several recent studies have explored integrated frameworks combining geographic information systems (GIS), satellite remote sensing, and ML for monitoring lakes and coastal waters, emphasizing the importance of sensor selection, parameter-specific modelling strategies, and appropriate validation design [9,10,11]. Previous work has demonstrated that while certain parameters such as water temperature (WT) exhibit high temporal robustness, others (particularly turbidity (TUR) and salinity-related proxies) are more sensitive to environmental variability and distribution shifts [9,12]. However, systematic evaluation of how different parameters respond to temporally structured validation and out-of-time prediction scenarios remains limited, particularly in shallow Mediterranean coastal lakes. This limitation makes it difficult to assess whether models that perform well under conventional validation can reliably support operational monitoring of future water quality conditions.

To address these challenges, this study combines multi-sensor satellite observations, correlation-driven band selection, feature transformations, and temporally structured validation strategies within a unified ML framework. Within this context, the present study differs from many previous studies by systematically evaluating the temporal predictive capability of satellite-based ML models for electrical conductivity (EC), TUR, WT, and dissolved oxygen (DO) in Vrana Lake (Croatia), a shallow Mediterranean coastal lake characterized by strong hydrological variability and salinity influence. By combining multi-year in situ observations, multi-sensor satellite data, correlation-driven band selection, feature transformations, and both random and temporally separated validation strategies, this study aims to:

Quantify how model performance changes under independent temporal projection,
Identify parameter- and sensor-specific strengths and limitations in out-of-time prediction, and
Derive interpretable band-based retrieval formulations only for parameter–sensor combinations demonstrating sufficient temporal robustness.

Through this validation-aware framework, the study addresses the operational challenge of predicting future water quality dynamics in shallow coastal lakes. The novelty of this study lies in the systematic evaluation of temporal transferability in satellite-based ML water quality models by comparing multiple validation strategies and deriving interpretable band-based formulations only for parameter–sensor combinations demonstrating sufficient temporal robustness.

The following sections describe the methodological framework of the study, including the characteristics of the study area, in situ data collection and harmonization procedures, satellite data acquisition, and the statistical and ML approaches applied. Particular emphasis is placed on assessing temporal robustness and deriving interpretable band-based retrieval formulations for water quality estimation in a shallow coastal lake system.

2. Materials and Methods

The dataset used in this study originates from a broader research initiative focused on improving water quality monitoring in Vrana Lake through the integration of in situ measurements, GIS-based spatial analysis, satellite remote sensing, and ML methods. The monitoring network design and in situ data collection and processing procedures were previously described in Batina et al. (2025) [14], where a grid of 20 monitoring stations and monthly measurements of key physicochemical parameters were established.

This study builds on these previously published methodological components and focuses on evaluating the temporal transferability of ML models for predicting selected water quality parameters from multispectral satellite data (Figure 1).

2.1. Study Area

Vrana Lake is a shallow coastal lake located in the Mediterranean region of Croatia (Figure 2). Due to its shallowness, proximity to the Adriatic Sea coast, and sensitivity to hydrological and meteorological variability, the lake represents an aquatic system with pronounced temporal and spatial variability. Variations in water level, salinity intrusion, and seasonal stratification strongly influence the lake’s physical and chemical properties, making it a suitable case study for evaluating satellite-based water quality monitoring approaches in coastal shallow environments [14].

The influence of water level and meteorological drivers such as precipitation and wind on water quality dynamics in Vrana Lake has been analysed in detail in Batina et al. (2025) [14], providing environmental context for the satellite-based modelling approach used in this study.

2.2. In Situ Data Collection and Harmonization

In situ water quality monitoring was conducted at 20 sampling stations distributed across Vrana Lake. Measurements included EC, TUR, WT, and DO. A total of 409 measurements were collected during two continuous monitoring periods. The first monitoring period, from July 2023 to July 2024, comprised 250 measurements collected using a YSI EXO2 multiparameter probe (YSI Inc., Yellow Springs, OH, USA). The second monitoring period, from May to December 2025, comprised 159 measurements collected using a YSI EXO2s multiparameter probe. To ensure comparability between monitoring periods, measurements from 2023 to 2024 were harmonized by extracting values at approximately 30–40 cm depth, corresponding to the dominant depth range of measurements collected in 2025. This harmonization strategy is consistent with approaches used in other coastal water quality studies to minimize depth-related spectral differences [3]. The suitability of using shallow-depth measurements and median values as representative station-level indicators is supported by a detailed analysis of vertical stratification and temporal variability in Vrana Lake, which demonstrated a generally well-mixed water column and limited depth-dependent variability of EC, WT, DO, and TUR [14].

Eighteen monitoring stations retained identical spatial locations across both periods. Two stations were slightly relocated in 2025 and were treated as separate monitoring locations to avoid spatial bias (Figure 3).

2.3. Satellite Data Acquisition and Preprocessing

Multispectral satellite data from the Sentinel-2 MultiSpectral Instrument (MSI; European Space Agency, Paris, France) and the Landsat 8–9 Operational Land Imager/Thermal Infrared Sensor (OLI/TIRS; NASA and U.S. Geological Survey, Washington, DC, USA) were used in this study. Atmospherically corrected Level-2 surface reflectance products were used to ensure methodological consistency between sensors and to focus the analysis on the integration of remote sensing data with ML modelling rather than on the evaluation or comparison of atmospheric correction algorithms. Sentinel-2 imagery was obtained through the Copernicus Browser and Landsat 8–9 data through the USGS Earth Explorer platform.

Although the use of pre-processed imagery simplifies the workflow, some limitations remain, particularly in inland water environments where atmospheric variability, adjacency effects from surrounding land, aerosol uncertainty, and water surface reflections can influence reflectance retrieval accuracy [15,16]. Moreover, research indicates that in large-scale statistical modelling studies residual atmospheric uncertainties may have a limited impact when predictions rely primarily on statistical relationships rather than on pixel-level physical retrievals [15,16]. Therefore, additional custom atmospheric correction procedures were not applied in this study, as the objective was to evaluate satellite-based ML modelling of water quality parameters rather than to assess atmospheric correction performance.

Previous research indicates that a 1-day temporal window between satellite acquisition and field measurements is optimal, although it may be extended up to 10 days when environmental conditions remain stable [17,18]. The acceptable time window also depends on sensor characteristics, as higher spatial, spectral, and radiometric resolution increases the reliability of pairing satellite and in situ observations over longer intervals [19].

Accordingly, a ±10-day temporal tolerance was adopted as a practical and commonly used compromise in satellite-based water quality studies to minimize data loss due to cloud cover while preserving ecological comparability between field and satellite observations. Considering the shallow and well-mixed nature of Vrana Lake, which exhibits limited vertical stratification and relatively stable water column conditions [14], the selected temporal tolerance was considered acceptable for satellite–in situ matching.

For each in situ measurement, the satellite image acquired on the nearest cloud-free overpass was selected. Only one image per monitoring event was used, specifically the nearest scene with minimal cloud cover from a relative orbit fully covering the entire lake area. Cloud- and cloud-shadow-contaminated pixels were excluded using mission-specific quality assessment layers, an approach aligned with established procedures for satellite-derived water quality retrieval [2].

All satellite imagery was temporally matched to in situ measurements based on acquisition date and subsequently standardized prior to statistical and ML analysis, following typical practice in remote sensing regression workflows [7]. The temporal offset between in situ monitoring and satellite acquisition dates ranged from −13 to +14 days, with a median difference of −1 day (Table 1). In practice, most satellite–in situ matchups occurred within the 0–10-day window, supporting the ecological relevance of the satellite-based analysis under typical conditions.

2.4. Statistical and Correlation Analysis

To quantify relationships between satellite spectral information and in situ water quality parameters, Pearson, Spearman, and Kendall correlation coefficients were calculated for each parameter–band combination. This multi-metric approach enabled the identification of both linear and monotonic non-linear relationships while reducing sensitivity to outliers, in line with recent studies emphasizing robust feature selection strategies in water quality retrieval [8,9].

Correlation analysis served as a feature selection step prior to ML modelling. For each water quality parameter, spectral bands showing the strongest and most consistent correlations across correlation metrics were retained. This parameter-specific band selection strategy was applied to reduce input dimensionality, improve interpretability, and limit overfitting, consistent with best practices in ML pre-processing.

In addition to original spectral reflectance values, an expanded feature space was constructed by applying logarithmic, inverse, and quadratic transformations to all spectral bands. These transformations were introduced to account for potential non-linear relationships between spectral response and water quality parameters, which are frequently observed in shallow and optically complex aquatic systems.

Correlation analysis was performed for both original and transformed feature sets. Pearson, Spearman, and Kendall coefficients were calculated for each parameter–band combination, and the same multi-metric screening procedure was applied to transformed variables. Bands (or transformed bands) demonstrating strong and consistent relationships across correlation metrics were retained as candidate predictors for subsequent modelling.

2.5. Machine Learning Modelling and Validation Strategy

Correlation analysis (Pearson, Spearman, and Kendall) was used as an initial feature screening step to reduce redundant predictors. Several of the applied ML models (e.g., Lasso, ElasticNet, Random Forest, and Gradient Boosting) inherently perform variable selection or regularization during training, allowing them to account for multivariate interactions among spectral predictors. Given the relatively limited number of spectral predictors derived from multispectral satellite imagery and their clear physical interpretability, a correlation-based screening approach was considered appropriate prior to model training. Advanced feature importance and interpretability techniques such as SHAP (SHapley Additive exPlanations) could provide additional insights into predictor contributions; however, their implementation was beyond the scope of the present study and is left for future research.

Following this initial feature screening, multiple regression-based ML models were implemented to capture a broad range of linear and non-linear relationships between satellite spectral data and in situ water quality parameters. A total of 17 regression algorithms were evaluated, including linear models (Linear, Ridge, Lasso, ElasticNet, Bayesian Ridge), polynomial regression, ensemble tree-based models (Random Forest, Extra Trees, Gradient Boosting, HistGradientBoosting, AdaBoost, XGBoost), kernel-based models (Support Vector Regression (SVR) and Kernel Ridge with radial basis function kernels), Gaussian Process Regression, k-Nearest Neighbours (KNN), and a multilayer perceptron artificial neural network (ANN/MLP). Models were developed separately for each water quality parameter and satellite mission, consistent with recent applications of ML in satellite-based water quality monitoring [10,11].

Hyperparameters of the evaluated ML models were specified a priori using standard implementations and fixed model-specific settings, which were kept constant across all validation scenarios to ensure a consistent comparison among algorithms. The main parameter settings used in the model implementation are provided in the publicly available code repository accompanying the manuscript. The aim of this study was to compare model behaviour under different temporal validation scenarios rather than to maximize the performance of each individual algorithm through extensive hyperparameter tuning. Therefore, fixed hyperparameter settings were used to provide a consistent benchmarking framework across models.

All models were evaluated under six training–testing split strategies (Table 2). Performance was assessed using the coefficient of determination (R²), mean absolute error (MAE), and root mean squared error (RMSE). Evaluation metrics for all models, parameters, split strategies, and feature configurations were compiled and compared to assess model robustness under varying temporal scenarios [12]. To quantify sensitivity to temporal non-stationarity, model performance under random splits was compared with performance under temporally separated splits (Table 2). The use of multiple split strategies enabled a systematic comparison of model performance under both random and temporally structured validation scenarios. This framework provides an empirical comparison of temporal robustness across models, although formal statistical testing of differences between split strategies was beyond the scope of the present study.

ML models were trained exclusively using spectral bands (or their transformed counterparts) that demonstrated the strongest and most consistent correlations with each individual water quality parameter in the preceding correlation analysis. This parameter-specific feature selection ensured that only statistically relevant predictors were included in model development, thereby reducing input dimensionality, improving interpretability, and minimizing the risk of overfitting. Feature subsets were defined separately for each parameter and feature configuration (original and transformed bands), meaning that different predictors were used for WT, EC, DO, and TUR, depending on their correlation profiles.

2.6. Derivation of Band-Based Retrieval Formulations

For parameters that demonstrated stable behaviour and acceptable predictive performance under temporal validation, interpretable band-based retrieval formulations were derived using regression models. These formulations express water quality parameters as functions of selected satellite spectral bands and provide transparent relationships suitable for operational application, a strategy that has been successfully applied in similar studies of water quality estimation with multispectral sensors [3,7].

Band-based retrieval formulations were derived using regression for parameter–sensor combinations that demonstrated stable behaviour under temporal validation. Formulations were generated for both original and transformed feature sets when applicable. Coefficient values were directly extracted from the fitted models without additional post-processing.

The derivation process was restricted to models exhibiting physically interpretable relationships and consistent validation behaviour across split strategies. Parameters that did not meet these criteria were not simplified into explicit retrieval equations, consistent with current recommendations in the literature for robust model interpretation [8].

3. Results

3.1. Temporal Patterns and Value Ranges of in Situ Water Quality Measurements

3.1.1. Temporal Variability of in Situ Water Quality Parameters

Monthly analysis of in situ measurements revealed distinct differences in temporal behaviour between the two monitoring periods (2023–2024 and 2025) (Figure 4). The most pronounced deviations were observed for TUR, which showed substantial month-to-month variability, particularly during the warmer period. These differences are likely driven by meteorological forcing, including wind-induced sediment resuspension and episodic precipitation events.

WT exhibited clear seasonal dynamics in both periods; however, noticeable differences in monthly mean values were observed between years, reflecting interannual variability in atmospheric conditions. Despite these differences, the overall seasonal pattern remained consistent. DO showed a highly similar temporal trend across both monitoring periods, indicating relatively stable oxygen dynamics at the seasonal scale. In contrast, EC displayed a marked increase during November and December 2025 compared to the corresponding period in 2023–2024. This increase is likely associated with reduced water levels and enhanced salinity influence during late 2025, suggesting altered hydrological conditions relative to the reference period.

3.1.2. Spatial Variability of In Situ Water Quality Parameters

The spatial distribution maps shown in Figure 5 were generated by interpolating median values from the monitoring stations using the Simple Kriging—Trend method, which was identified as the most suitable interpolation approach for Vrana Lake water quality parameters in Batina et al. (2025) [14]. The maps represent the median values of the analysed in situ water quality parameters calculated for each monitoring station over the entire observation period (21 months across three years). The use of median values reduces the influence of short-term variability and extreme measurements, allowing a clearer representation of the long-term spatial patterns of water quality across Vrana Lake.

The results indicate consistent spatial gradients along the northwest–southeast axis of the lake. EC showed a clearer spatial pattern, increasing from the northwestern part of the lake (approximately 3.64 dS/m) toward the southeastern region (up to 4.16 dS/m) (Figure 5a). This gradient is consistent with the hydrological influence of seawater intrusion through the Prosika canal located near the southeastern part of the lake.

In contrast, TUR exhibited higher values in the northwestern part of the lake (up to 3.17 FNU) and lower values toward the southeastern sector (approximately 2.74 FNU) (Figure 5b). This pattern may reflect the influence of freshwater inflows, sediment resuspension, and runoff from surrounding agricultural areas in the northern part of the catchment.

WT ranged from approximately 20.20 to 21.23 °C and followed a spatial pattern similar to EC, with slightly higher values in the southeastern part of the lake (Figure 5c). This distribution may be related to local bathymetry, solar heating conditions, and the influence of saline water inflows.

DO values ranged from approximately 9.61 to 9.95 mg/L, with slightly higher concentrations observed in the southern part of the lake and lower values toward the northwestern region (Figure 5d). Although the spatial variability of DO was relatively small, the gradual gradient suggests differences in water circulation and mixing conditions across the lake.

The results indicate that the spatial variability of water quality parameters in Vrana Lake is influenced by the combined effects of freshwater inflows, seawater intrusion, local morphometry, and wind-driven mixing processes typical for shallow coastal lake systems.

3.1.3. Comparison of Minimum, Average, and Maximum Values

Analysis of minimum, average, and maximum values further highlighted parameter-specific differences between monitoring periods (Figure 6). For WT and DO, the minimum and maximum values observed in 2025 remained within the range recorded during 2023–2024, indicating that interannual variability did not exceed historical bounds for these parameters. In contrast, EC exhibited a substantial increase in maximum values during 2025, reaching levels approximately 50% higher than those observed in the reference period. TUR showed the opposite behaviour, with minimum values in May 2025 reaching approximately 0.01 FNU, which is lower than the minimum values recorded during 2023–2024. These deviations indicate that both EC and TUR extended beyond the value ranges used for model training.

Such out-of-range conditions are expected to adversely affect model performance, particularly under temporal validation, and represent a key challenge for the reliable modelling of EC and TUR using satellite-based ML approaches.

3.2. Correlation Analysis Between Spectral Bands and In Situ Parameters

Correlation analysis revealed clear parameter- and sensor-specific patterns in the relationships between satellite spectral bands and in situ water quality measurements (Figure 7). For Sentinel-2, strong and consistent correlations were observed for WT and DO across all three correlation metrics. WT exhibited high positive correlations with multiple visible and near-infrared bands (Pearson r up to ~0.76), while DO showed strong negative correlations with the same bands (r down to ~−0.75), reflecting their inverse seasonal relationship.

EC exhibited moderate to strong negative correlations across a wide range of Sentinel-2 spectral bands, including visible (B01, B03, B04), near-infrared, and short-wave infrared bands. The consistent correlation signs across Pearson, Spearman, and Kendall coefficients indicate a system-wide optical response associated with salinity-driven and hydrological variability. In contrast, TUR showed weak correlations across all Sentinel-2 bands and metrics, with values close to zero, suggesting limited sensitivity of multispectral reflectance to TUR under the observed conditions.

For Landsat 8–9, the strongest relationships were observed for WT, driven primarily by the thermal infrared bands (Bands 10 and 11), which exhibited very high positive correlations across all metrics (Pearson r ≈ 0.96). DO showed strong negative correlations with the same thermal bands, consistent with temperature-driven oxygen dynamics. Correlations between Landsat multispectral bands and EC or TUR were generally weak and inconsistent, indicating lower sensitivity compared to Sentinel-2 for these parameters.

The agreement in correlation magnitude and sign across Pearson, Spearman, and Kendall coefficients confirms that the identified relationships are robust and not driven by outliers or purely linear effects. Based on the correlation analysis, WT was primarily associated with Landsat 8–9 thermal bands (B10–B11), while DO exhibited strong negative correlations with those thermal bands; however, this relationship is primarily indirect and driven by the inverse dependence of oxygen solubility on WT, rather than by a direct spectral sensitivity to DO. EC showed moderate to strong and spectrally broad correlations with Sentinel-2 bands, with the strongest and most consistent relationships observed for the water vapour band (B9), followed by visible (B01–B04) and Short-Wave Infrared (SWIR) bands (B11–B12), indicating an integrated optical response to salinity-driven and hydrological variability.

In addition to the original spectral reflectance values (Figure 7), correlation analysis was repeated using three transformed feature spaces (inverse, quadratic, and logarithmic transformations of all spectral bands) to examine potential non-linear spectral–parameter relationships. Although all three transformations were evaluated, only the logarithmically transformed correlation results are presented (Figure 8), as this feature space demonstrated the most consistent improvement in model performance across split strategies (Section 3.3; Table S1), particularly for EC and DO.

For WT, the strongest associations remained concentrated in the Landsat thermal bands (L_B10_log and L_B11_log), which continued to exhibit very high positive correlations across all three metrics (Spearman r up to ~0.95). This confirms that the temperature signal remains dominant and robust even under logarithmic scaling.

For DO, log-transformed thermal bands showed strengthened and more consistent negative correlations compared to their original counterparts (Spearman r ≈ −0.81 to −0.82), indicating a monotonic but non-linear dependence consistent with temperature-mediated oxygen dynamics.

A notable shift was observed for EC. While Sentinel-2 bands dominated the correlation structure in the original feature space, the log-transformed analysis revealed stronger associations with multiple Landsat reflectance bands (L_B04_log–L_B07_log) and the thermal band L_B10_log. This suggests that EC-related spectral responses exhibit multiplicative or non-linear behaviour that becomes more pronounced under logarithmic scaling. In contrast, TUR continued to show weak and inconsistent correlations across both original and log-transformed feature spaces, reinforcing the limited sensitivity of multispectral reflectance to TUR variability under the observed environmental conditions.

The overall agreement in correlation direction and relative ranking across Pearson, Spearman, and Kendall coefficients for both original and transformed bands indicates that the identified spectral–parameter relationships are structurally consistent rather than artefacts of linear scaling.

Based on the multi-metric correlation screening (Figure 7 and Figure 8), a parameter-specific subset of spectral bands was retained for subsequent modelling. The selection was performed separately for each water quality parameter and considered both original and logarithmically transformed feature spaces. The final selected predictors reflect the dominant spectral domains associated with each parameter and are summarized in Table 3.

3.3. Machine Learning Model Performance Across Split Strategies

ML performance varied substantially across validation strategies, highlighting the influence of temporal structure and distribution shifts on predictive stability (Table S1). Model performance was further influenced by regression algorithm type and feature representation. Differences between linear, regularized, ensemble, and kernel-based approaches became particularly evident under temporally structured splits.

Across all parameters, random splits consistently produced the highest predictive performance, reflecting the similarity between training and testing distributions. Under this configuration, ensemble tree-based models (e.g., Gradient Boosting, HistGradientBoosting, AdaBoost) frequently achieved the highest

{R_t e s t}^{2}

, reflecting their ability to capture complex non-linear relationships when training and testing data share similar distributions. However, when temporally separated splits were applied, model rankings shifted noticeably. Although ensemble methods often achieved the highest apparent performance under random validation, regularized linear models (Ridge) demonstrated comparatively greater temporal stability and were therefore retained for final formulation derivation.

WT showed the highest overall robustness across algorithms and feature representations. Under random splits, non-linear ensemble methods achieved excellent performance (

{R_t e s t}^{2} > 0.98

), particularly when feature transformations were applied. Under strict temporal projection, performance decreased moderately but remained high (

{R_t e s t}^{2} \approx 0.90

), corresponding to a modest decline (Δ

R^{2}

≈ 0.08).

Importantly, performance differences between linear (Ridge) and non-linear models narrowed under temporal validation, indicating that increased model complexity did not substantially improve generalization beyond the strong physical signal captured by Landsat thermal bands. Feature transformations had limited impact on WT, confirming the largely linear physical relationship between thermal radiance and surface temperature. Consequently, the final WT formulation was derived from a linear Ridge configuration.

DO exhibited greater sensitivity to both model complexity and feature representation. While ensemble models performed best under random splits (

{R_t e s t}^{2} \approx 0.88

), strict temporal projection showed reduced accuracy of

{R_t e s t}^{2} = 0.51

(Δ

R^{2}

≈ 0.37). Temporally structured splits favoured simpler and regularized approaches, particularly Ridge regression. Logarithmic feature transformations consistently improved temporal stability for DO. The improvement under log transformation indicates a non-linear but monotonic relationship between DO and thermally driven dynamics. The final DO formulation was therefore derived using log-transformed thermal predictors within a Ridge framework.

EC showed the strongest dependence on feature representation. Under random splits, squared or non-linearly expanded feature spaces combined with ensemble methods achieved the highest performance (

{R_t e s t}^{2} = 0.73

). However, these models exhibited substantial degradation under temporal projection (

{R_t e s t}^{2} = 0.14; Δ R^{2} \approx 0.59

), indicating sensitivity to distribution shifts observed between monitoring periods.

While ensemble models combined with expanded feature spaces performed well under random validation, these configurations exhibited limited temporal robustness. Logarithmic transformations improved robustness under temporally separated splits, particularly when combined with regularized linear models. Unlike WT, where model complexity had limited effect, EC performance varied considerably across both algorithm type and feature representation. These results suggest that EC-related spectral responses contain non-linear structure but remain vulnerable to interannual variability and range shifts. The final EC formulation was therefore derived from a log-transformed Ridge configuration to balance predictive accuracy and temporal stability.

TUR models demonstrated high apparent performance under random splits when non-linear ensemble methods were used (

{R_t e s t}^{2} = 0.84

). However, performance collapsed under temporally structured validation (

{R_t e s t}^{2} = - 4.62

), regardless of algorithm or feature transformation. Neither logarithmic nor quadratic expansion substantially improved temporal generalization. This instability suggests that TUR-related spectral variability is either weak relative to measurement noise or highly dependent on short-term environmental conditions not captured consistently by satellite reflectance. Despite acceptable random-split performance, TUR lacked stable transferability. The final formulation was therefore restricted to a simple linear representation reflecting its limited robustness.

Across all parameters, the magnitude of performance decline between random and strict temporal splits increased progressively from WT (Δ

R^{2}

≈ 0.08) to DO (0.37) and EC (0.59), and was most extreme for TUR (5.46). These results demonstrate that model complexity improves apparent accuracy under random validation but does not guarantee temporal robustness. Regularized linear approaches, particularly when combined with logarithmic feature transformations for EC and DO, provided more stable behaviour under independent temporal validation. Parameters with direct physical spectral drivers (WT) retained high transferability, whereas parameters governed by indirect or system-level processes (DO) showed moderate sensitivity to non-stationarity. Parameters lacking stable spectral relationships (EC, TUR) exhibited poor generalization regardless of algorithm complexity.

Based on the observed temporal stability and interpretability of selected model configurations, explicit band-based retrieval formulations were derived for parameters demonstrating sufficient robustness under independent validation.

The relationship between observed and predicted values for the Ridge regression models under strict temporal projection is illustrated in Figure 9. The scatter plots provide a visual assessment of model performance for both training and testing datasets. WT shows the strongest agreement between observed and predicted values, with points closely distributed along the 1:1 line, confirming its high temporal robustness. DO demonstrates moderate agreement, reflecting its indirect relationship with thermal predictors. In contrast, EC and TUR exhibit weak correspondence between observations and predictions, with predicted values clustered within a narrow range, indicating limited model sensitivity and reduced temporal transferability for these parameters.

3.4. Band-Based Retrieval Formulations

The derivation of explicit band-based retrieval formulations was restricted to parameter–sensor combinations that demonstrated both predictive stability under temporally independent validation and physically interpretable spectral behaviour. This selective approach ensures that the resulting equations are not merely statistically optimized representations of the training data, but structurally robust formulations suitable for operational application in temporally evolving shallow lake systems.

Unlike purely performance-driven model selection under random validation, the derivation process prioritized temporal robustness, stability across split strategies, and spectral coherence. As a result, final formulations were derived from regularized linear (Ridge) models, which demonstrated superior generalization compared to more flexible ensemble approaches under temporally separated validation.

3.4.1. WT

WT retained strong linear behaviour across validation scenarios and was therefore expressed using Landsat 8–9 thermal bands:

WT = −113.390 + 0.424 × L_B10 + 0.424 × L_B11,

(1)

where L_B10 and L_B11 represent the values of Landsat bands 10 and 11, respectively. The identical coefficients assigned to both thermal bands confirm the dominant and physically direct relationship between thermal radiance and surface WT. The stability of this formulation across validation scenarios reflects the persistence of the underlying thermodynamic control mechanism. WT therefore represents a direct spectral retrieval regime, characterized by strong physical determinism and minimal dependence on feature transformation.

3.4.2. EC

EC displayed pronounced sensitivity to temporal distribution shifts when modelled in its original linear feature space. However, logarithmic transformation slightly enhanced stability under independent validation, indicating that EC-related spectral responses follow a non-linear scaling behaviour. The final formulation was therefore derived using log-transformed Landsat reflectance and thermal bands:

EC = 7.699 + 0.296 × L_B04_log + 0.483 × L_B05_log + 0.279 × L_B06_log +
0.168 × L_B07_log − 0.672 × L_B10_log,

(2)

where L_B04_log, L_B05_log, L_B06_log, L_B07_log, and L_B10_log represent the logarithmic values of Landsat bands 4, 5, 6, 7, and 10, respectively. The combination of visible, near-infrared, short-wave infrared, and thermal predictors suggests that EC is not governed by a single spectral mechanism, but rather reflects an integrated optical response associated with salinity-driven and hydrological variability. The necessity of logarithmic scaling indicates a multiplicative or heteroscedastic relationship between spectral reflectance and conductivity. However, the reduced temporal transferability observed in Section 3.3 limits the reliability of this formulation for independent or operational deployment.

3.4.3. DO

DO also demonstrated improved stability under logarithmic transformation, leading to a log-based thermal formulation:

DO = 18.522 − 0.553 × L_B10_log − 1.902 × L_B11_log,

(3)

where L_B10_log and L_B11_log represent the logarithmic values of Landsat bands 10 and 11, respectively. The dominance of log-transformed thermal bands reflects the temperature-mediated nature of oxygen solubility. Unlike WT, DO does not represent a direct spectral signal, but rather an indirect thermodynamic dependency. The improved stability under log transformation indicates a monotonic but non-linear dependence consistent with oxygen–temperature coupling.

3.4.4. TUR

Despite limited temporal robustness compared to other parameters, TUR exhibited sufficient internal consistency under selected validation scenarios to allow derivation of a simplified formulation using Sentinel-2 multispectral bands:

TUR = 5.697 − 2.814 × S_B02 − 0.085 × S_B04 + 1.711 × S_B07 + 0.844 × S_B12,

(4)

where S_B02, S_B04, S_B07, and S_B12 represent the values of Sentinel bands 2, 4, 7, and 12, respectively. The inclusion of visible, red-edge, and SWIR bands reflects sensitivity to scattering-related optical variability. However, the limited temporal robustness observed in Section 3.3 suggests that this formulation should be interpreted cautiously for operational extrapolation.

4. Discussion

4.1. Temporal Variability as a Driver of Model Transferability

The in situ time series revealed that WT and DO remained within comparable ranges across monitoring periods (Section 3.1), whereas EC and TUR extended beyond the 2023–2024 envelope in 2025. Such regime-dependent variability is a recognized challenge for satellite-based water quality modelling, particularly when models are evaluated under temporally structured validation. When environmental conditions extend beyond the range represented in the training data, ML models may exhibit substantial performance degradation under temporally independent validation. A recent review emphasizes that apparent ML skill under random splits can overestimate operational performance when temporal non-stationarity and distribution shifts occur, and recommends validation strategies that explicitly test temporal transferability [20].

In Vrana Lake, the late 2025 increase in EC is consistent with behaviour reported in other coastal lagoon and transitional systems, where water level fluctuations and hydrological connectivity can alter salinity-related conditions and associated optical regimes. This supports the interpretation by Caballero et al. (2022) [21] that the observed degradation under strict temporal projection is not purely algorithmic, but driven by a shift in environmental boundary conditions between the training and testing periods.

4.2. Sensor–Parameter Suitability and Spectral Observability

The correlation analysis (Section 3.2) and subsequent model outcomes (Section 3.3) highlight strong parameter- and sensor-specific differences in observability. WT showed the most robust behaviour, with dominant dependence on Landsat thermal bands (B10–B11) and consistently high performance even under strict temporal projection. This is expected because WT has a direct physical basis in the thermal domain, and Landsat-based surface temperature products have been shown to support multi-year coastal-lagoon temperature analyses, as seen in a study by Talavera et al. (2024) [22].

DO exhibited strong negative relationships with Landsat thermal bands, which is consistent with a temperature-mediated control (solubility and seasonal co-variability) rather than direct spectral sensitivity. A review of ML for water quality prediction by Yan et al. (2024) [23] notes that non-optically active variables often rely on indirect proxies, which can retain moderate predictability but are more sensitive to regime changes than physically direct targets such as temperature.

EC is also non-optically active, and its association with satellite spectral observations is therefore indirect. It reflects hydro-optical interactions in which salinity-driven hydrological processes influence optically active constituents such as coloured dissolved organic matter (CDOM) and total suspended solids (TSS). The successful remote sensing of CDOM and TSS and their observed relationships with salinity suggest that EC can be mapped indirectly using satellite data, either explicitly using these constituents as proxies or implicitly through wavelengths affected by their concentrations [24]. Comparable coastal-lagoon and crisis-monitoring studies that integrate Sentinel-2 and Landsat by Caballero et al. (2022) [21] show that multi-sensor approaches can capture system-level changes, but relationships may shift across events and years.

TUR displayed weak and inconsistent correlations in both original and log feature spaces, while showing strong collapse under temporal validation. TUR is directly related to suspended particulate matter, which modifies light scattering and absorption within the water column and can therefore produce detectable changes in water reflectance captured by multispectral satellite sensors [25,26]. This is consistent with findings from lagoonal and optically complex waters by Molner et al. (2023) [27], where TUR retrieval can be feasible but often remains highly system- and regime-dependent due to resuspension dynamics, particle composition variability, shallow-water bottom influence, and timing mismatches between satellite overpasses and short-lived TUR events.

In addition to parameter-specific observability, several environmental and methodological factors may influence satellite-derived reflectance signals in shallow lake systems. Although atmospherically corrected Level-2 surface reflectance products were used in this study, residual atmospheric uncertainties may still affect spectral observations. Water surface conditions such as wind-driven roughness or sun glint can also introduce variability in reflectance measurements. Furthermore, in shallow systems such as Vrana Lake, bottom reflectance may contribute to the recorded signal under clear-water conditions where light penetration reaches the lakebed. These effects can complicate the interpretation of spectral relationships and should be considered when evaluating satellite-based water quality models in shallow coastal environments. Such environmental influences may partly explain the reduced temporal stability observed for parameters such as EC and TUR, where spectral responses are driven by indirect hydro-optical processes rather than by direct physical signals.

4.3. Feature Transformations Matter for EC and DO

A distinctive outcome of this study is the role of feature transformations: the log-transformed correlation structure (Figure 8) and final formulations (Section 3.4) indicate that EC and DO are more stably represented in a logarithmic feature space, while WT and TUR retained linear band selections. This is consistent with broader ML/remote sensing literature noting that scaling and transformation can improve stability where relationships are monotonic but non-linear or heteroscedastic, especially for indirectly observable parameters [1].

The observed shift in EC associations from Sentinel-dominated signals in the original feature space to Landsat reflectance–thermal dominance under logarithmic transformation suggests that EC retrieval in this system is not tied to a single spectral mechanism, but reflects a multiplicative response across hydro-optical conditions. Similar multi-model lake studies report that band selection and feature engineering can be as influential as algorithm choice, especially when optical regimes vary [28].

4.4. Model Complexity Versus Temporal Robustness

The comparison across split strategies (Section 3.3) reinforces the emerging view that model complexity does not guarantee operational robustness. This behaviour is also visible in the observed–predicted relationships shown in Figure 9, where WT predictions closely follow the 1:1 line under strict temporal projection, while EC and TUR predictions collapse into narrow value ranges, indicating weak sensitivity of the models to independent temporal conditions. The consistent decline in model performance from random to temporally separated splits across multiple algorithms further highlights the importance of validation strategies that explicitly account for temporal non-stationarity in operational water quality monitoring. Ensemble tree-based models often achieved the highest performance under random splits, but rankings changed under temporally separated splits, where regularized linear approaches were comparatively more stable and interpretable. This aligns with recent work by Filippelli et al. (2024) [20], which focused specifically on temporal transferability in remote sensing models and demonstrates that temporal cross-validation and out-of-time testing can reveal bias and degradation not visible under random sampling.

Importantly, the final band-based formulations derived here (Section 3.4) reflect a validation-aware selection principle: stable, interpretable ridge-based equations were retained only when behaviour remained sufficiently consistent under temporal separation. This approach is consistent with recent recommendations in water-quality remote sensing to balance predictive power with interpretability and operational reliability by Yan et al. (2024) [23].

4.5. Implications for Operational Monitoring of Shallow Coastal Lakes

Collectively, the results indicate three practical implications for monitoring shallow coastal lakes:

WT retrieval is operationally reliable in this setting due to direct thermal observability and strong temporal transferability.
DO retrieval can be feasible but requires regime-aware modelling, where feature transformation (log) and regularization help stabilize relationships under temporal shifts.
EC and TUR retrieval remain high-risk for temporal generalization in shallow, event-driven systems; operational use should explicitly account for episodic variability and mismatch effects.

These conclusions are consistent with multi-sensor coastal lagoon applications by Caballero et al. (2022) [21] demonstrating that successful satellite monitoring depends on matching sensor capabilities to parameter observability and on validating under conditions that reflect real deployment scenarios.

5. Conclusions

This study evaluated the temporal predictive robustness of satellite-based ML models for estimating WT, EC, DO, and TUR in a shallow Mediterranean coastal lake. By combining multi-year in situ observations, multisensor satellite data (Sentinel-2 and Landsat 8–9), correlation-driven band selection, feature transformations, and six validation split strategies, the analysis explicitly assessed model behaviour under temporally independent conditions.

The results demonstrate that model performance under random validation substantially overestimates operational capability. While ensemble models achieved high apparent accuracy under random splits, temporal projection revealed marked performance degradation for several parameters. WT exhibited the highest temporal robustness (

{R_t e s t}^{2} \approx 0.90

), driven by the strong physical relationship with Landsat thermal bands (L_B10 and L_B11), and retained high predictive performance under independent validation. DO achieved moderate transferability (

{R_t e s t}^{2} = 0.51

) when log-transformed thermal bands (L_B10_log, L_B11_log) were used, reflecting its indirect temperature-mediated dependence. In contrast, EC and TUR showed limited temporal generalization, with substantial sensitivity to distribution shifts and interannual variability.

Regularized linear models, particularly when combined with parameter-specific feature transformations, provided the most stable behaviour across validation scenarios. The derived band-based retrieval formulations therefore represent validation-aware, interpretable solutions rather than purely performance-optimized models.

Beyond parameter-specific findings, this study contributes methodologically by integrating multi-sensor data, feature transformations, and six temporally structured validation strategies within a unified framework focused on transferability rather than apparent accuracy. By explicitly comparing random and temporally independent validation, the results demonstrate how model ranking, complexity preference, and interpretability change under realistic deployment conditions. The validation-aware derivation of simplified ridge-based retrieval formulations further advances operational applicability by prioritizing temporal robustness over purely performance-driven optimization.

This study has several limitations that should be acknowledged. First, the analysis focuses on a single shallow coastal lake, which may limit the direct generalizability of the results to other aquatic systems with different hydrological or optical conditions. In addition, the temporal matching between satellite imagery and in situ measurements occasionally exceeded the ideal acquisition window due to cloud cover and satellite revisit constraints, which may introduce additional environmental variability. Although turbidity can exhibit short-term variability due to hydrological or meteorological events, the ML models were trained on multi-temporal datasets covering different seasons and environmental conditions, which reduces the influence of individual temporal mismatches between satellite and in situ observations.

Therefore, future research should extend this framework to additional shallow coastal and transitional systems to evaluate model transferability and generalizability, incorporate longer multi-year datasets and extreme hydrological conditions, explore hybrid or regime-adaptive modelling strategies to improve robustness for parameters sensitive to boundary-condition shifts, such as EC and TUR, and further investigate model interpretability using advanced feature importance techniques such as SHAP. Future research could also incorporate explicit uncertainty quantification approaches, such as prediction intervals or ensemble-based uncertainty estimation, to further improve the interpretability and operational reliability of satellite-based water quality predictions.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jmse14060566/s1, Table S1: Satellite-Based Machine Learning Temporal Assessment_SI_Table S1.

Author Contributions

Conceptualization, A.B., L.Š., A.K. and A.Š.; methodology, A.B. and L.Š.; software, L.Š.; validation, A.B., L.Š., A.K. and A.Š.; formal analysis, L.Š.; investigation, A.B. and L.Š.; resources, A.B. and A.Š.; data curation, A.B. and L.Š.; writing—original draft preparation, A.B.; writing—review and editing, L.Š., A.K. and A.Š.; visualization, A.B. and L.Š.; supervision, A.K. and A.Š.; project administration, A.B.; funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in Zenodo as doi: 10.5281/zenodo.19062952. The code used for water quality predictions from satellite data is available at: https://github.com/ljiljana44/LakesWQ_evaluation (accessed on 26 February 2026).

Acknowledgments

The study was supported by the Institutional research project GEOSKLAD, University of Zagreb, Faculty of Geodesy, from the quota of Program Agreements of the Ministry of Science, Education and Youth of the Republic of Croatia with the University of Zagreb, Croatia and by the Interreg VI-A IPA Croatia-Bosnia and Herzegovina-Montenegro program 2021–2027 under Interreg Self-sustainable Multisensor System for Monitoring Water Quality in Inland Waterbodies (SMART-Water) project, grant number HR-BA-ME00330. No specific funding was received. During the preparation of this manuscript/study, the authors used ChatGPT 5.2 for language refinement and text editing support. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial Neural Network
DO	Dissolved Oxygen
EC	Electrical Conductivity
CDOM	Coloured Dissolved Organic Matter
GIS	Geographic Information System
KNN	k-Nearest Neighbours
L_Bxx	Landsat 8–9 Spectral Band (e.g., L_B10 = Landsat Band 10)
MAE	Mean Absolute Error
ML	Machine Learning
MLP	Multilayer Perceptron
R²	Coefficient of Determination
RMSE	Root Mean Squared Error
S_Bxx	Sentinel-2 Spectral Band (e.g., S_B02 = Sentinel-2 Band 2)
SHAP	SHapley Additive exPlanations
SVR	Support Vector Regression
SWIR	Short-Wave Infrared
TSS	Total Suspended Solids
TUR	Turbidity
WT	Water Temperature

References

Deng, Y.; Zhang, Y.; Pan, D.; Yang, S.X.; Gharabaghi, B. Review of Recent Advances in Remote Sensing and Machine Learning Methods for Lake Water Quality Management. Remote Sens. 2024, 16, 4196. [Google Scholar] [CrossRef]
Sun, Y.; Wang, D.; Li, L.; Ning, R.; Yu, S.; Gao, N. Application of Remote Sensing Technology in Water Quality Monitoring: From Traditional Approaches to Artificial Intelligence. Water Res. 2024, 267, 122546. [Google Scholar] [CrossRef]
Kong, Y.; Jimenez, K.; Lee, C.M.; Winter, S.; Summers-Evans, J.; Cao, A.; Menczer, M.; Han, R.; Mills, C.; McCarthy, S.; et al. Monitoring Coastal Water Turbidity Using Sentinel2—A Case Study in Los Angeles. Remote Sens. 2025, 17, 201. [Google Scholar] [CrossRef]
Assaf, M.N.; Abdelal, Q.; Hussein, N.M.; Halaweh, G.; Alzubaidi, A.J. Water Quality Monitoring and Management: Integration of Machine Learning Algorithms and Sentinel-2 Images for the Estimation of Chlorophyll-a. Model. Earth Syst. Environ. 2025, 11, 348. [Google Scholar] [CrossRef]
Wu, Z.; Pang, J.; Li, J.; Wang, Y.; Ruan, J.; Zhang, X.; Yang, L.; Pang, Y.; Gao, Y. A Review of Remote Sensing-Based Water Quality Monitoring in Turbid Coastal Waters. Intell. Mar. Technol. Syst. 2025, 3, 24. [Google Scholar] [CrossRef]
Jaywant, S.A.; Arif, K.M. Remote Sensing Techniques for Water Quality Monitoring: A Review. Sensors 2024, 24, 8041. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Gao, X.; Yuan, R. Advances in Remote Sensing and Sensor Technologies for Water-Quality Monitoring: A Review. Water 2025, 17, 3000. [Google Scholar] [CrossRef]
Nikoo, M.R.; Zamani, M.G.; Zadeh, M.M.; Al-Rawas, G.; Al-Wardy, M.; Gandomi, A.H. Mapping Reservoir Water Quality from Sentinel-2 Satellite Data Based on a New Approach of Weighted Averaging: Application of Bayesian Maximum Entropy. Sci. Rep. 2024, 14, 16438. [Google Scholar] [CrossRef]
Arantes, A.E.; De Castro, B.R.F.; Martins, A.B.; Capelo-Neto, J.; Gonçalves Barros, M.U. Satellite-Based Water Quality Assessment of Castanhão Reservoir Using Machine Learning and Genetic Algorithms. Next Res. 2025, 2, 100340. [Google Scholar] [CrossRef]
Batina, A.; Šiljeg, A.; Krtalić, A.; Šerić, L. SIGMaL: An Integrated Framework for Water Quality Monitoring in a Coastal Shallow Lake. Remote Sens. 2026, 18, 312. [Google Scholar] [CrossRef]
Shamloo, A.; Sima, S. Investigating the Potential of Remote Sensing-Based Machine-Learning Algorithms to Model Secchi-Disk Depth, Total Phosphorus, and Chlorophyll-a in Lake Urmia. J. Great Lakes Res. 2024, 50, 102370. [Google Scholar] [CrossRef]
Batina, A.; Šiljeg, A. Enhancing Water Quality Monitoring in a Coastal Shallow Lake Using GIS and Multi-Criteria Decision Analysis. Environ. Sustain. Indic. 2025, 28, 100881. [Google Scholar] [CrossRef]
Šerić, L.; Pinjušić, T.; Topić, K.; Blažević, T. Lost Person Search Area Prediction Based on Regression and Transfer Learning Models. ISPRS Int. J. Geo-Inf. 2021, 10, 80. [Google Scholar] [CrossRef]
Batina, A.; Cukrov, N.; Ćuže Denona, M. Spatiotemporal Water Quality Analysis of Vrana Lake, Croatia. Open Geosci. 2025, 17, 20250817. [Google Scholar] [CrossRef]
Pan, Y.; Bélanger, S.; Huot, Y. Evaluation of Atmospheric Correction Algorithms over Lakes for High-Resolution Multispectral Imagery: Implications of Adjacency Effect. Remote Sens. 2022, 14, 2979. [Google Scholar] [CrossRef]
Zhu, W.; Xia, W. Effects of Atmospheric Correction on Remote Sensing Statistical Inference in an Aquatic Environment. Remote Sens. 2023, 15, 1907. [Google Scholar] [CrossRef]
Andrzej Urbanski, J.; Wochna, A.; Bubak, I.; Grzybowski, W.; Lukawska-Matuszewska, K.; Łącka, M.; Śliwińska, S.; Wojtasiewicz, B.; Zajączkowski, M. Application of Landsat 8 Imagery to Regional-Scale Assessment of Lake Water Quality. Int. J. Appl. Earth Obs. Geoinf. 2016, 51, 28–36. [Google Scholar] [CrossRef]
Kuhn, C.; De Matos Valerio, A.; Ward, N.; Loken, L.; Sawakuchi, H.O.; Kampel, M.; Richey, J.; Stadler, P.; Crawford, J.; Striegl, R.; et al. Performance of Landsat-8 and Sentinel-2 Surface Reflectance Products for River Remote Sensing Retrievals of Chlorophyll-a and Turbidity. Remote Sens. Environ. 2019, 224, 104–118. [Google Scholar] [CrossRef]
Kayastha, P.; Dzialowski, A.R.; Stoodley, S.H.; Wagner, K.L.; Mansaray, A.S. Effect of Time Window on Satellite and Ground-Based Data for Estimating Chlorophyll-a in Reservoirs. Remote Sens. 2022, 14, 846. [Google Scholar] [CrossRef]
Filippelli, S.K.; Schleeweis, K.; Nelson, M.D.; Fekety, P.A.; Vogeler, J.C. Testing Temporal Transferability of Remote Sensing Models for Large Area Monitoring. Sci. Remote Sens. 2024, 9, 100119. [Google Scholar] [CrossRef]
Caballero, I.; Roca, M.; Santos-Echeandía, J.; Bernárdez, P.; Navarro, G. Use of the Sentinel-2 and Landsat-8 Satellites for Water Quality Monitoring: An Early Warning Tool in the Mar Menor Coastal Lagoon. Remote Sens. 2022, 14, 2744. [Google Scholar] [CrossRef]
Talavera, L.; Domínguez-Gómez, J.A.; Navarro, N.; Rodríguez-Santalla, I. Analysing Spatiotemporal Variability of Chlorophyll-a Concentration and Water Surface Temperature in Coastal Lagoons of the Ebro Delta (NW Mediterranean Sea, Spain). J. Mar. Sci. Eng. 2024, 12, 941. [Google Scholar] [CrossRef]
Yan, X.; Zhang, T.; Du, W.; Meng, Q.; Xu, X.; Zhao, X. A Comprehensive Review of Machine Learning for Water Quality Prediction over the Past Five Years. J. Mar. Sci. Eng. 2024, 12, 159. [Google Scholar] [CrossRef]
Ansari, M.; Knudby, A.; Homayouni, S. River Salinity Mapping through Machine Learning and Statistical Modeling Using Landsat 8 OLI Imagery. Adv. Space Res. 2025, 75, 6981–7002. [Google Scholar] [CrossRef]
Dogliotti, A.I.; Ruddick, K.G.; Nechad, B.; Doxaran, D.; Knaeps, E. A Single Algorithm to Retrieve Turbidity from Remotely-Sensed Data in All Coastal and Estuarine Waters. Remote Sens. Environ. 2015, 156, 157–168. [Google Scholar] [CrossRef]
Kutser, T. Quantitative Detection of Chlorophyll in Cyanobacterial Blooms by Satellite Remote Sensing. Limnol. Oceanogr. 2004, 49, 2179–2189. [Google Scholar] [CrossRef]
Molner, J.V.; Soria, J.M.; Pérez-González, R.; Sòria-Perpinyà, X. Measurement of Turbidity and Total Suspended Matter in the Albufera of Valencia Lagoon (Spain) Using Sentinel-2 Images. J. Mar. Sci. Eng. 2023, 11, 1894. [Google Scholar] [CrossRef]
Villota-González, F.H.; Sulbarán-Rangel, B.; Zurita-Martínez, F.; Gurubel-Tun, K.J.; Zúñiga-Grajeda, V. Assessment of Machine Learning Models for Remote Sensing of Water Quality in Lakes Cajititlán and Zapotlán, Jalisco—Mexico. Remote Sens. 2023, 15, 5505. [Google Scholar] [CrossRef]

Figure 1. Methodological workflow for satellite-based ML temporal assessment of water quality parameter prediction.

Figure 2. Study area.

Figure 3. In situ stations.

Figure 4. Monthly mean values of EC, TUR, WT, and DO measured during the 2023–2024 and 2025 monitoring periods.

Figure 5. Spatial distribution of median in situ water quality parameters in Vrana Lake during the monitoring period (21 months): (a) EC, (b) TUR, (c) WT, and (d) DO.

Figure 6. Minimum, average, and maximum values of EC, TUR, WT, and DO for the 2023–2024 and 2025 monitoring periods, highlighting changes in parameter ranges.

Figure 7. Spectral band correlations with EC, TUR, WT, and DO (S = Sentinel-2, L = Landsat 8–9).

Figure 8. Log-transformed spectral band correlations with EC, TUR, WT, and DO (S = Sentinel-2, L = Landsat 8–9).

Figure 9. Scatter plots of observed versus predicted values for Ridge regression models under strict temporal projection (training: 2023–2024; testing: 2025).

Table 1. Temporal difference (days) between in situ measurement dates and corresponding satellite overpass dates.

Measurement Date	Satellite Overpass Date	Difference Days
17 July 2023	11 July 2023	−6
18 August 2023	20 August 2023	2
27 September 2023	29 September 2023	2
13 October 2023	9 October 2023	−4
4 December 2023	18 December 2023	14
19 December 2023	18 December 2023	−1
11 January 2024	12 January 2024	1
19 February 2024	21 February 2024	2
14 March 2024	17 March 2024	3
29 April 2024	21 April 2024	−8
24 May 2024	11 May 2024	−13
17 June 2024	15 June 2024	−2
19 July 2024	10 July 2024	−9
28 May 2025	7 June 2025	10
26 June 2025	25 June 2025	−1
24 July 2025	15 July 2025	−9
27 August 2025	19 August 2025	−8
16 September 2025	18 September 2025	2
28 October 2025	4 November 2025	7
12 November 2025	4 November 2025	−8
12 December 2025	12 December 2025	0
	Min	−13
	Max	14
	Median	−1

Table 2. Training and testing data split strategies used for model evaluation.

Split Name	Training Set	Testing Set	Purpose
Random split (baseline)	Randomly sampled observations from all periods	Remaining random observations	Baseline performance under stationary data distribution
Strict temporal projection	2023–2024	2025	Operational scenario with fully independent future data
Partial forward projection	2023–2024 + first 3 months of 2025	Last 5 months of 2025	Evaluate limited forward transfer
Late-season calibration	2023–2024 + last 3 months of 2025	First 5 months of 2025	Assess sensitivity to late-season training data
Backward projection	All available data excluding first 4 months of 2023–2024	First 4 months of 2023–2024	Test model robustness when projecting backward in time
Extended calibration	All available data excluding last 4 months of 2023–2024	Last 4 months of 2023–2024	Assess late-season variability within reference period

Table 3. Final selected spectral predictors by parameter (original and log feature spaces).

Parameter	Satellite Mission	Selected Bands	Dominant Spectral Region
WT	Landsat 8–9	L_B10, L_B11	Thermal infrared
TUR	Sentinel-2	S_B02, S_B04, S_B07, S_B12	Visible, red-edge, SWIR
EC	Landsat 8–9	L_B04_log, L_B05_log, L_B06_log, L_B07_log, L_B10_log	Red, near-infrared, SWIR, thermal
DO	Landsat 8–9	L_B10_log, L_B11_log	Thermal infrared

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Batina, A.; Šerić, L.; Krtalić, A.; Šiljeg, A. Satellite-Based Machine Learning for Temporal Assessment of Water Quality Parameter Prediction in a Coastal Shallow Lake. J. Mar. Sci. Eng. 2026, 14, 566. https://doi.org/10.3390/jmse14060566

AMA Style

Batina A, Šerić L, Krtalić A, Šiljeg A. Satellite-Based Machine Learning for Temporal Assessment of Water Quality Parameter Prediction in a Coastal Shallow Lake. Journal of Marine Science and Engineering. 2026; 14(6):566. https://doi.org/10.3390/jmse14060566

Chicago/Turabian Style

Batina, Anja, Ljiljana Šerić, Andrija Krtalić, and Ante Šiljeg. 2026. "Satellite-Based Machine Learning for Temporal Assessment of Water Quality Parameter Prediction in a Coastal Shallow Lake" Journal of Marine Science and Engineering 14, no. 6: 566. https://doi.org/10.3390/jmse14060566

APA Style

Batina, A., Šerić, L., Krtalić, A., & Šiljeg, A. (2026). Satellite-Based Machine Learning for Temporal Assessment of Water Quality Parameter Prediction in a Coastal Shallow Lake. Journal of Marine Science and Engineering, 14(6), 566. https://doi.org/10.3390/jmse14060566

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Satellite-Based Machine Learning for Temporal Assessment of Water Quality Parameter Prediction in a Coastal Shallow Lake

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. In Situ Data Collection and Harmonization

2.3. Satellite Data Acquisition and Preprocessing

2.4. Statistical and Correlation Analysis

2.5. Machine Learning Modelling and Validation Strategy

2.6. Derivation of Band-Based Retrieval Formulations

3. Results

3.1. Temporal Patterns and Value Ranges of in Situ Water Quality Measurements

3.1.1. Temporal Variability of in Situ Water Quality Parameters

3.1.2. Spatial Variability of In Situ Water Quality Parameters

3.1.3. Comparison of Minimum, Average, and Maximum Values

3.2. Correlation Analysis Between Spectral Bands and In Situ Parameters

3.3. Machine Learning Model Performance Across Split Strategies

3.4. Band-Based Retrieval Formulations

3.4.1. WT

3.4.2. EC

3.4.3. DO

3.4.4. TUR

4. Discussion

4.1. Temporal Variability as a Driver of Model Transferability

4.2. Sensor–Parameter Suitability and Spectral Observability

4.3. Feature Transformations Matter for EC and DO

4.4. Model Complexity Versus Temporal Robustness

4.5. Implications for Operational Monitoring of Shallow Coastal Lakes

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI