SIGMaL: An Integrated Framework for Water Quality Monitoring in a Coastal Shallow Lake

Batina, Anja; Šiljeg, Ante; Krtalić, Andrija; Šerić, Ljiljana

doi:10.3390/rs18020312

Open AccessArticle

SIGMaL: An Integrated Framework for Water Quality Monitoring in a Coastal Shallow Lake

¹

Center for Geospatial Technologies, University of Zadar, Trg kneza Višeslava 9, 23000 Zadar, Croatia

²

Faculty of Geodesy, University of Zagreb, Kačićeva 26, 10000 Zagreb, Croatia

³

Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture, University of Split, Ruđera Boškovića 32, 21000 Split, Croatia

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(2), 312; https://doi.org/10.3390/rs18020312

Submission received: 10 December 2025 / Revised: 13 January 2026 / Accepted: 15 January 2026 / Published: 16 January 2026

(This article belongs to the Special Issue Remote Sensing in Water Quality Monitoring)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

The SIGMaL framework, integrating in situ data, GIS–MCDA WQI, satellite imagery, and ML, successfully modelled lake water quality, with WQI-based CNNs clearly outperforming raw-parameter regression across all sensors.
Sentinel-2 provided the strongest overall performance for WQI classification (AUC ≈ 1.00; R² up to 0.84), while PlanetScope offered the most detailed spatial differentiation; Landsat 8–9 performed very well for water temperature but was less effective for multi-class WQI discrimination, especially in temporal CNN models.

What are the implications of the main findings?

Using integrated WQI targets within the SIGMaL framework enhances the robustness of ML-based remote sensing workflows for coastal lakes, enabling scalable and transferable water quality monitoring across heterogeneous satellite platforms.
The combination of GIS–MCDA raster densification and CNN-based WQI assessment provides a practical framework for continuous water quality assessment in lakes with limited in situ data, complementing and extending traditional in situ monitoring programs.

Abstract

Coastal lakes require monitoring approaches that capture spatial and temporal variability beyond the limits of conventional in situ measurements. In this study, a SIGMaL framework (Satellite–In situ–GIS-multicriteria decision analysis (MCDA)–Machine Learning (ML)) was developed, a unified methodology that integrates in situ monitoring, GIS MCDA-derived water quality index (WQI), satellite imagery, and ML models for comprehensive coastal lake water quality assessment. A WQI, derived from a 12-month series of in situ measurements and environmental parameters, was used alongside four physicochemical parameters measured by a multiparameter probe. First, satellite reflectance from each sensor was used to train a set of nine regression models for modelling electrical conductivity (EC), turbidity, water temperature (WT), and dissolved oxygen (DO). Second, convolutional neural networks (CNNs) with spectral and temporal inputs were trained to classify WQI classes, enabling a cross-sensor evaluation of their suitability for lake water quality monitoring. Third, the trained CNNs were applied to generate WQI maps for a subsequent 12-month period without in situ data. Across all analyses, WQI-based models provided more stable and accurate models than those trained on raw parameters. Sentinel-2 achieved the most consistent WQI performance (AUC ≈ 1.00, R² ≈ 0.84), PlanetScope captured fine-scale spatial detail (R² ≈ 0.77), while Landsat 8–9 was most effective for WT but less reliable for multi-class WQI discrimination. Sentinel-2 is recommended as the primary satellite sensor for WQI mapping within the SIGMaL framework. These findings demonstrate the advantages of WQI-based modelling and highlight the potential of ML–remote sensing integration to support coastal lake water quality monitoring.

Keywords:

convolutional neural networks; machine learning; satellite data; regression modelling; WQI assessment

1. Introduction

Freshwater and coastal lake ecosystems are globally vulnerable to accelerating pressures from climate change, land use transformations, and hydrological instability, making accurate and timely water quality monitoring increasingly essential [1,2,3,4]. Traditional in situ measurements remain the foundation of ecological assessment, yet their limited spatial and temporal coverage creates significant gaps in understanding ecosystem-wide dynamics, particularly in shallow coastal lakes where conditions can change rapidly [5].

Parallel advances in geospatial technologies, satellite remote sensing, and machine learning (ML) have created opportunities to bridge these gaps by integrating multi-source environmental information into coherent predictive frameworks [6]. Although many recent studies have explored remote sensing-based retrieval of single physicochemical parameters [7,8], inconsistencies in temporal matching, sparse in situ measurement networks, and spatial heterogeneity frequently constrain predictive robustness and limit operational use.

Vrana Lake in Dalmatia, Croatia, a shallow coastal freshwater system hydrologically connected to the Adriatic Sea, represents an ecologically sensitive environment where seawater intrusion, seasonal fluctuations, and nutrient inputs interact to shape water quality [9]. Previous research has characterized these dynamics [10,11,12], yet monitoring efforts remained spatially limited and did not fully capture system-wide variability [9]. To overcome this challenge, GIS-based multicriteria decision analysis (MCDA) has proven effective for synthesizing environmental factors into spatial models of water quality, enabling the identification of critical zones vulnerable to eutrophication and pollution pressures [10]. Similarly, satellite missions such as Landsat 8–9, Sentinel-2, and PlanetScope offer high-frequency, multispectral imagery that can be correlated with in situ data to map key water quality parameters across the entire lake surface [13].

These methods provide valuable complementary perspectives, but their full potential lies in their integration with advanced data-driven techniques. To ensure reliable temporal matching, water quality index (WQI) values were compared with satellite imagery acquired within a 10-day window from each field campaign. Previous studies have shown that a 1-day time window is ideal, with the possibility of extending up to 10 days if conditions do not significantly change [14,15], and further research has suggested that this window also depends on satellite resolution, where higher spatial, spectral, and radiometric resolution increases the reliability of extending the time window for pairing satellite and ground-based data [16]. A 10-day temporal tolerance was selected in this study to minimize the impact of cloud cover and to maximize the availability of usable satellite scenes, while still maintaining ecological relevance in the comparison of in situ and remote sensing data, without major weather changes between the measurement day and satellite overpass. In practice, most satellite–in situ matchups occurred within 0–3 days of the field measurements.

Most remote sensing and ML studies addressing water quality rely on a direct comparison between raw in situ measurements and satellite imagery, with models trained to predict individual physicochemical parameters [17]. While effective for local analyses, this approach is often constrained by sparse monitoring networks and high spatial variability within inland waters, resulting in limited predictive robustness. In contrast, the present study emphasizes the use of an integrated WQI, generated through GIS-MCDA by Batina and Šiljeg (2025) [10], as the primary reference for model training (Figure A1). This index aggregates multiple parameters into a single, spatially continuous representation of water quality, thereby providing a more robust basis for predictive modelling. As a secondary comparison, ML models were also trained on raw in situ data, allowing the study to evaluate differences between the conventional approach and the proposed WQI-driven framework.

ML offers a powerful toolset for bridging the gap between point-based field data and spatially continuous remote sensing observations [17]. By leveraging statistical learning algorithms, ML can identify complex nonlinear relationships between spectral reflectance and water quality parameters [18], as well as classify ecological conditions into distinct quality classes. In the context of Vrana Lake, ML enables the combination of in situ measurements, raster-based GIS models, and satellite imagery into a unified monitoring framework, enhancing monitoring accuracy and supporting long-term ecological assessment in sensitive protected areas.

This study introduces SIGMaL, a unified framework for lake water quality monitoring that integrates satellite imagery, in situ measurements, GIS-MCDA, and ML into a single analytical pipeline. SIGMaL combines four complementary components: (i) a year-long series of monthly in situ measurements of key physicochemical parameters; (ii) raster-based WQI derived from GIS-MCDA, providing spatially continuous water quality classes; (iii) multi-sensor satellite observations from Sentinel-2, Landsat 8–9, and PlanetScope (acquired and evaluated independently); and (iv) ML models, including convolutional neural networks (CNNs) for WQI classification, trained separately for each sensor to enable a fair cross-sensor comparison.

To overcome the spatial limitations of the 20 in situ monitoring stations, SIGMaL uses MCDA-derived raster densification, expanding the dataset to 318 samples that better represent lake water quality variability. Satellite reflectance data from each sensor were paired with these samples and used to train ML models under identical modelling settings, enabling systematic comparison of spectral, spatial, and temporal performance across sensors. Designed as a modular, scalable, and reproducible workflow, SIGMaL enhances spatial and temporal coverage beyond conventional point-based monitoring and provides a robust basis for evaluating the suitability of different satellite platforms for coastal shallow lake environments. The framework supports improved ecological assessment and offers a transferable methodology for data-limited freshwater and coastal shallow lake systems.

In contrast to most lake studies in which ML is trained directly on raw in situ measurements [13,19,20,21], the GIS–MCDA WQI is adopted in this study as the primary modelling target. Two parallel strategies are employed: (A) CNN-based classification of WQI classes (principal track), and (B) regression on raw in situ parameters (comparison track). We hypothesize that (1) the WQI derived from GIS–MCDA provides a more stable modelling target than individual in situ parameters, and (2) the SIGMaL framework can accurately classify spatial water quality patterns across a complex coastal lake. The main findings indicate that the SIGMaL framework integrating in situ data, GIS–MCDA WQI, satellite imagery, and machine learning provides a more stable and spatially comprehensive approach to lake water quality monitoring than traditional parameter-based models, with Sentinel-2 offering the strongest overall performance and WQI-based CNNs consistently outperforming raw-parameter regression across all sensors.

2. Materials and Methods

2.1. Study Area

Vrana Lake, located in Dalmatia near the eastern Adriatic coast, is the largest natural freshwater lake in Croatia, covering an area of about 30 km² [22]. The lake extends between 43°51′–43°57′N and 15°30′–15°39′E (WGS84) (Figure 1). It is characterized by shallow water that undergoes strong seasonal fluctuations, with higher water levels in winter and spring, and lower levels in summer and autumn [23]. Due to its ecological importance and species richness, the lake and its surroundings are protected within the Vrana Lake Nature Park.

The lake’s hydrological regime is influenced by multiple factors, including precipitation, tributary inflows, groundwater exchange, evaporation, and its artificial connection to the Adriatic Sea through the Prosika canal [9]. During periods of low water levels, seawater intrusion increases salinity, whereas freshwater inputs from surrounding karst fields and springs reduce salinity during wetter periods [24]. These dynamics, combined with wind-driven mixing, strongly affect the water quality and ecosystem health of the lake.

2.2. Data Collection

Field surveys were conducted on a monthly basis from July 2023 to June 2024. Measurements were performed in the morning hours (08:00–13:00 local time) to minimize diurnal variability in water temperature (WT), dissolved oxygen (DO), and chlorophyll-a concentrations. Measurement days were chosen to coincide with stable meteorological conditions, avoiding precipitation and strong winds that could compromise data comparability [9]. Each campaign was carried out by a team aboard a small research vessel using YSI EXO2 multiparameter probe (YSI Inc., Yellow Springs, OH, USA). The probe was calibrated before every campaign following manufacturer guidelines (explained in Section 2.2.2). Due to adverse weather conditions, the November 2023 survey was postponed and conducted on 4 December, while the regular December survey took place on 19 December [9]. All other campaigns were carried out as scheduled.

2.2.1. In Situ Measurements

Batina et al. (2025) [9] established a network of 20 fixed monitoring stations across the lake to ensure sufficient spatial coverage and representation of hydrological and ecological variability (Figure 1). Over the 12-month monitoring period, maximum of 20 stations was measured in each of the 12 campaigns, resulting in 230 valid station measurements per parameter, as not all sites could be measured every month due to weather or equipment restraints.

Although several physicochemical and biological parameters were measured [9], this study includes DO, WT, turbidity, and electrical conductivity (EC), resulting in 230 observations for each of these four parameters over the one-year research period. The selection of these parameters is supported by a year-long multiparameter analysis and correlation study conducted in Vrana Lake [9], where they were identified as dominant drivers of lake water quality dynamics. Their selection was further supported by expert input from the Ruđer Bošković Institute and the Public Institution Vrana Lake Nature Park. Moreover, these parameters constitute the core physicochemical inputs of the GIS–MCDA-based WQI model, whose robustness was validated through sensitivity analysis and Monte Carlo simulations by Batina and Šiljeg (2025) [10] (Figure A1).

2.2.2. Multiparameter Probe

The YSI EXO2 multiparameter probe is designed to measure a wide range of physicochemical indicators, including WT, DO, EC, salinity, turbidity, and chlorophyll-a, with manufacturer-specified accuracies that vary by parameter (e.g., ±0.01 °C for WT, ±0.5% or 0.001 dS/m for EC, and ±1% or 0.1 mg/L for DO) [25]. Such precision makes the instrument highly suitable for ecological and hydrological monitoring; however, its reliability depends heavily on proper handling and routine calibration. Regular maintenance is essential to mitigate external influences such as sensor fouling, sediment deposition, or biological growth, which can compromise data quality. Calibration should be carried out using standard solutions of known conductivity, oxygen reference standards, and systematic cleaning of optical sensors to ensure that field measurements remain both accurate and reproducible [25].

The calibration procedure of the YSI EXO2 multiparameter probe was applied in accordance with the instructions provided in the official EXO User Manual [25] (Figure 2). Prior to each calibration step, the EXO calibration cup and sensors were rinsed two to three times with the appropriate standard for the parameter being adjusted, with the rinse solutions discarded and replaced with fresh calibration standard. When calibration standards were not used immediately, the sensors and cup were rinsed with deionized water and dried with a lint-free cloth before refilling. The calibration cup was filled to the recommended level to ensure that all sensors were fully submerged, while precautions were taken to avoid cross-contamination. Clean, dry probes were mounted on the sonde, and a calibration-dedicated guard was installed and tightened, and a separate guard was reserved for field deployments to maintain accuracy and cleanliness. The sequence followed the prescribed order from the manual: verification of the temperature sensor against a certified reference thermometer, calibration of EC first, then pH and ORP, followed by turbidity, and finally the optical sensors such as DO and depth. This order reflects sensor interdependencies and is designed to minimize error propagation, ensuring that the EXO2 provides reliable and reproducible field measurements [25].

2.3. Satellite Data Acquisition and Preprocessing

This study used atmospherically corrected Level-2 (surface reflectance) imagery from Sentinel-2 MultiSpectral Instrument (MSI; European Space Agency, Paris, France), Landsat 8–9 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS; National Aeronautics and Space Administration and U.S. Geological Survey, Washington, DC, USA), and PlanetScope SuperDove satellites (Planet Labs PBC, San Francisco, CA, USA). Sentinel-2 images were obtained through the Copernicus Browser (European Space Agency, Paris, France), Landsat 8–9 through the USGS Earth Explorer (U.S. Geological Survey, Reston, VA, USA), and PlanetScope from Planet Explorer (Planet Labs PBC, San Francisco, CA, USA) [26]. Because the goal of the study was not to compare atmospheric correction algorithms, pre-processed Level-2 data were adopted to ensure consistency across sensors and to focus computational effort on the integration of remote sensing, GIS–MCDA, and ML.

Although employing pre-processed imagery simplifies the workflow, the authors are aware of potential limitations, especially over inland waters where atmospheric conditions, adjacency effects, aerosol variability, and water surface reflections complicate correction accuracy [27,28]. Previous research has shown the potential of advanced remote sensing and ML approaches to enhance water quality monitoring when atmospheric corrections are adequately addressed [29]. Furthermore, Pan et al. (2022) [27] evaluated ten atmospheric correction algorithms over lakes and highlighted that adjacency effects near land and inconsistent aerosol modelling can reduce the fidelity of water reflectance retrievals. Similarly, Zhu and Xia (2023) [28] discuss that while atmospheric correction is generally beneficial for remote sensing inversion tasks, in large-scale statistical inference studies small residual atmospheric errors may have limited impact on performance when models rely on strong statistical correlations rather than pixel-level physical retrievals. Because the focus of this study is on the integrated WQI rather than individual water quality parameters, this approach was considered acceptable: it maintains consistency across Sentinel-2, Landsat 8–9, and PlanetScope datasets, and ensures that computational complexity is focused on the ML and MCDA integration stages rather than on refining atmospheric correction.

Moderate-resolution sensors such as Sentinel-2 (10–60 m, 5-day revisit) and Landsat 8–9 (30–100 m, 8-day aggregate revisit) have been widely used in water quality monitoring [13,30,31]. Recently, PlanetScope has emerged as a valuable alternative for small or narrow waterbodies due to its daily revisit and 3 m spatial resolution, despite its limited spectral depth relative to Sentinel-2 and Landsat (Table 1). The Landsat 8–9 collection includes OLI optical and TIRS thermal bands, with 30 m and 100 m reflectance and 15 m panchromatic resolution. Sentinel-2 MSI provides 13 spectral bands (10–60 m) in visible and near-infra red (NIR), including three Red-Edge bands critical for aquatic applications. PlanetScope Level-3B products consist of 8 spectral bands at 3 m resolution.

Field measurement dates were aligned with predicted Sentinel-2 and Landsat 8–9 overpasses, while PlanetScope was excluded from planning due to its daily revisit capability. Because Vrana Lake is shallow and highly exposed to wind, currents and waves can mix the entire water column, especially during strong Bora and Jugo events in winter and Maestral winds in summer, ensuring relatively uniform temperature and nutrient conditions [10]. Favourable meteorological conditions were therefore essential; satellite scenes had to be cloud-free and precipitation-free, and fieldwork had to be conducted under safe wind conditions for the vessel crew. Table 2 summarizes the dates of in situ measurements alongside the closest available satellite acquisitions without clouds. As shown, satellite imagery did not always coincide with field measurements, and scenes from different sensors were often available on different days. Columns Max prior (days) and Max after (days) in Table 2 indicate the maximum temporal offset between each in situ campaign and corresponding satellite scenes, illustrating, for example, that July measurements coincided with a Landsat 9 overpass and were preceded by Sentinel-2 and PlanetScope acquisitions by one day.

Quantitative evidence of rapid temporal variability in Vrana Lake is provided by Batina et al. (2025) [9], who reported pronounced seasonal and intra-annual fluctuations in turbidity, EC, WT, and DO across monthly campaigns at 20 stations. That study further demonstrated that the lake behaves as a well-mixed shallow system with minimal vertical stratification but strong horizontal and temporal variability driven by meteorological forcing and seawater intrusion. Although maximum temporal offsets between satellite overpasses and in situ measurements reached up to 11 days, the majority of satellite–in situ matchups occurred within 0–3 days, with a mean offset of approximately 0.3 days, supporting the ecological relevance of the satellite-based analysis under typical conditions.

Although the monitoring network consisted of 20 fixed stations, the number of stations measured during each monthly campaign varied because adverse weather conditions or occasional equipment malfunction prevented safe access to all sites. The total number of valid measurements per month is visible in Table 2.

2.4. Dataset Development

To strengthen the dataset for statistical analysis and ML applications, the original network of 20 in situ stations was densified to 318 points (Figure 3), corresponding to 318 pixels derived from the final water quality raster presented in Batina and Šiljeg (2025) [10]. Importantly, this procedure does not represent statistical resampling of point-based field observations, nor an attempt to create independent in situ measurements. Instead, the 318 samples serve as spatial reference points derived from a GIS–MCDA-based WQI surface. The rationale was that, instead of directly comparing maximum of 20 in situ measurements per parameter monthly with satellite imagery on a monthly basis, a larger number of measurements was required to ensure effective model training and testing.

The raster used for densification was generated by a MCDA approach, which aggregated multiple weighted criteria into a final water quality map using the Weighted Linear Combination (WLC) method [10]. The raster cells had a resolution of 300 × 300 m, each representing spatially explicit information on water quality across the lake. This rasterization provided a consistent framework for extracting 318 evenly distributed pixel values, which served as additional measurement points. By integrating these raster-derived points, the analysis was able to capture greater spatial variability and provide larger sample size to support robust ML model development. These raster-derived samples should therefore be interpreted as spatial representations of relative water quality patterns, not as statistically independent observations. Accordingly, the ML models are trained to recognize relative spatial WQI patterns, emphasizing spatial differentiation across the lake.

A water quality raster of Vrana Lake was classified into seven discrete classes by Batina and Šiljeg (2025) [10], representing different levels of water quality across the lake surface. These classes provided a spatially explicit framework for distinguishing areas of higher and lower water quality, reflecting the heterogeneity of environmental conditions within the lake. The underlying raster values ranged from 0.596 (Class 7) to 0.737 (Class 1), indicating overall good water quality, but with detectable spatial variation that served as the basis for ranking and differentiating classes across the system.

In this study, the seven raster-derived classes were used as reference categories for ML, serving to identify which parts of the lake belong to each water quality class based on satellite imagery. To enable model training and testing, 318 raster cells (pixels) were extracted from the classified surface and used as measurement points, ensuring sufficient spatial coverage and dataset size for robust model development. The pixels were distributed across the classes as follows: Class 1—55 pixels, Class 2—64 pixels, Class 3—49 pixels, Class 4—48 pixels, Class 5—46 pixels, Class 6—29 pixels, and Class 7—27 pixels. This quantitative distribution ensured that all water quality categories were represented, allowing the ML models to be trained on the full spectrum of observed lake conditions and to capture subtle differences in relative water quality across the system.

The seven WQI classes are not intended to represent fine-scale or instantaneous water quality variability. Instead, they reflect integrated, lake-scale ecological conditions derived from annual averages and GIS-MCDA synthesis. Accordingly, the WQI serves as a spatial reference framework for identifying persistent patterns and relative gradients rather than micro-scale heterogeneity.

2.5. ML Framework

To model lake water quality, two modelling tracks were implemented: (A) regression of individual in situ parameters and (B) CNN-based WQI classification. Models were trained and evaluated separately for each satellite sensor (Sentinel-2, Landsat 8–9, PlanetScope) to allow a sensor-specific assessment of predictive capability under a consistent experimental design.

2.5.1. Regressors for Water Quality Parameters Modelling

In the task of water quality parameters modelling, a diverse set of regression algorithms was considered to balance linear baselines and nonlinear learners:

Linear Regression [32] is a fundamental supervised learning method that establishes a linear relationship between variables by fitting the best-fitting line to the observed data. The primary objective is to estimate model parameters that minimize the Sum of Squared Errors (SSE) between predicted and actual values. Implementing the algorithm involves key steps like data preprocessing, feature selection, model fitting using the least squares method, and subsequent evaluation and diagnosis.

Ridge Regression [33], also known as L2-regularized regression, is an advanced form of linear regression designed for situations where the dataset has many features relative to the number of data points or when features are highly correlated (multicollinearity). Its primary function is to prevent overfitting and improve the model’s robustness. It achieves this by adding an L2 penalty term to the standard linear regression cost function.

Random Forest [34] is a ML technique that belongs to the ensemble family of algorithms, meaning it uses multiple models to get a better overall result. Its fundamental goal is to build a “forest” of many simple decision trees and combine their individual predictions to produce an outcome that is more accurate and less prone to errors than any single tree. Random Forest achieves this stability by purposefully introducing randomness; it trains each tree on a slightly different random subset of the data and features.

Gradient Boosting [35] is an ensemble method in ML that builds its predictive model as a series of sequential steps. It works by creating new, simple decision trees that are designed to fix the prediction errors of the trees that came before them. This process uses a gradient descent approach to gradually improve accuracy by minimizing a chosen measure of error. The technique is flexible and effective across various applications but does require careful setting of its parameters to achieve good performance.

The eXtreme Gradient Boosting (XGBoost) [36] is a highly efficient and scalable implementation of the gradient boosting framework, often favoured for its speed and performance in structured data competitions. It introduces several enhancements, such as regularization (L1 and L2) to prevent overfitting and parallel processing of the tree construction. Due to its advanced optimization and handling of missing values, it has become a leading choice for complex regression tasks.

The Support Vector Machine (SVM) [37] is a classification algorithm that works by finding the most distinct boundary to separate two classes of data. The main idea is to maximize the margin, which is the empty space between the separating line (hyperplane) and the closest data points from each class. These closest points are called support vectors because they are the only ones that “support” or define the final position of the boundary. By maximizing this gap, the SVM creates a robust model that generalizes well and makes more reliable predictions on new, unseen data. For complex data that cannot be separated by a straight line, SVM uses the Kernel Trick. This mathematical technique allows the algorithm to effectively transform the data into a higher dimension where a straight separation is possible, enabling it to fit non-linear patterns.

The Random Sample Consensus (RANSAC) algorithm [38] is an iterative algorithm that estimates model parameters by fitting candidate solutions to randomly selected data subsets and retaining the solution supported by the largest consensus set. Although robust to outliers, RANSAC is known to be computationally expensive and sensitive to noise and the correct selection of the true dimension.

K-Nearest Neighbours (KNN) Regression is a nonparametric and simple method highly valued for its effectiveness with complex data structures [39]. It works by predicting the value for a new data point based on the average (or a weighted average) of the k closest data points in the training set. While easy to implement, standard KNN regression is susceptible to overfitting and discontinuity in the fit. Methods like KNN are proposed to enhance its accuracy and robustness in big data applications by integrating techniques like kernel smoothing and bootstrap sampling.

Poisson Regression [40] is a generalized linear model for count data, using a log link, function to ensure non-negative predictions. This allows the model to correctly predict non-negative counts based on various input factors. Poisson regression was included to provide a statistical baseline for comparison across modelling approaches.

This extended set of models allowed for a comprehensive benchmarking of both classical statistical approaches and modern ensemble learners, ensuring that the analysis captured linear, nonlinear, and instance-based perspectives on the relationship between satellite reflectance and water quality parameters.

In situ parameter modelling was carried out using raw in situ measurements as regression targets using Python 3.12.4 (computer code is available, as stated in the section Data Availability Statement). Performance was quantified using the mean absolute error (MAE, Equation (1)), root mean square error (RMSE, Equation (2)), and the coefficient of determination (R², Equation (3)) [41].

M A E (y, \hat{y}) = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - \hat{y_{i}}|,

(1)

R M S E (y, \hat{y}) = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \hat{y_{i}})}^{2}},

(2)

R^{2} (y, \hat{y}) = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}},

(3)

where

\hat{y_{i}}

is the estimated value,

y_{i}

is the observed value,

\bar{y}

is the mean of observed values, and

N

is the number of samples.

The R² quantifies the proportion of variance in the dependent variable that is explained by a regression model. Its values range from 1 (perfect prediction) to negative infinity. While values close to 1 indicate strong predictive performance, negative R² values occur when a model performs worse than a baseline predictor that simply returns the mean of the observed data [42]. Because R² is dependent on the variance of the underlying dataset, it is not directly comparable across datasets with different distributions. The score is undefined when the true target has zero variance; in such cases, implementations typically assign 1.0 for perfect predictions or 0.0 when predictions deviate from the constant target.

2.5.2. CNNs for WQI Assessment

To predict the WQI from satellite observations, the dataset of 318 raster-derived samples was randomly partitioned into two subsets: 80% for model training and 20% for independent testing, preserving the distribution of WQI classes using Python (computer code is available, as stated in the section Data Availability Statement).

The candidate models for WQI prediction were developed based on a one-dimensional (1D) CNN architecture. CNNs are a class of ML algorithms that combine convolutional layers with fully connected dense layers. The convolutional layers excel at extracting features from raw signals or imagery without requiring prior preprocessing, while the dense layers serve primarily for classification tasks. Given that WQI measurements were collected over a one-year period, each sampling point was represented by multiple satellite image snapshots spanning that time frame. Specifically, a single WQI value was predicted from a matrix consisting of 12 temporal snapshots for each spectral band, where the dimension is bands × 12 months, reflecting consistent temporal coverage used throughout the study.

The neural network architecture consisted of two convolutional layers, followed by normalization, pooling, and dense layers. Initially, the network was trained to learn spectral features by applying 1D convolutions in the spectral dimension, capturing spectral characteristics and their temporal variations. Subsequently, 1D convolutions were applied along the temporal dimension, enabling the model to identify significant feature dynamics over time for each spectral band separately. This two-folds convolution strategy allows us to effectively evaluate both spectral and temporal information in the satellite time series data contributions for the prediction accuracy of the WQI, aligning with common practices in deep learning for environmental and remote sensing applications.

For the WQI-based modelling, predicted classes were compared against reference labels derived from the classified MCDA raster. These accuracy metrics were used to quantitatively assess the performance of the classification models. Overall accuracy was evaluated using confusion matrices, where correct predictions correspond to the main diagonal and all misclassifications are treated equally, regardless of how close the predicted class is to the correct one [43]. Area under receiver operating characteristics (ROC) curve (AUC) [44] was used as the principal performance metric because, unlike accuracy, it evaluates the model based on the predicted probabilities of class membership, capturing how well the model ranks and separates water quality categories across all decision thresholds.

Generalization performance was assessed on the held-out 20% test subset. Testing was conducted independently for Sentinel-2, Landsat 8–9, and PlanetScope to ensure a fair, sensor-specific comparison under identical evaluation criteria.

2.6. Workflow Overview

The overall SIGMaL workflow (Figure 4) integrates four main components: (i) in situ water quality monitoring and probe calibration, (ii) GIS–MCDA and raster-based derivation of the WQI, (iii) satellite image acquisition and preprocessing, and (iv) ML model training, testing, and spatial prediction.

This stepwise design ensured consistency across heterogeneous data sources (in situ measurements, GIS–MCDA models, and satellite imagery) and facilitated reproducible ML experiments. The 80/20 split of raster-derived samples into training and testing subsets provided the foundation for robust model evaluation, while comparative benchmarking across algorithms and sensors allowed systematic identification of the most effective predictive approach.

3. Results

3.1. Regression of In Situ Parameters

3.1.1. Sentinel-2 Results

Across the Sentinel-2 dataset, ensemble methods outperformed linear and kernel-based approaches, with clear advantages in modelling nonlinear spectral–water quality relationships (Table 3). Gradient Boosting delivered the strongest overall performance for most variables, achieving the highest accuracy for WT (R² = 0.816, MAE = 15.218, RMSE = 25.675) and competitive fits for turbidity (R² = 0.765) and DO (R² = 0.682). For EC, Random Forest slightly outperformed Gradient Boosting, achieving the highest coefficient of determination (R² = 0.650) together with the lowest MAE (0.178) and a marginally lower RMSE (0.247) compared to Gradient Boosting (R² = 0.652, RMSE = 0.246, MAE = 0.185), indicating strong robustness to spectral heterogeneity and stable predictive performance.

The KNN Regressor showed particularly strong behaviour for turbidity, achieving the highest R² across all models (0.806) and low MAE and RMSE values, suggesting that local neighbourhood patterns in reflectance strongly benefit turbidity estimation. Linear Regression produced moderate fits across all variables, while Ridge and Poisson regression models consistently resulted in near-zero or negative R² values, confirming their limited suitability for modelling nonlinear satellite reflectance–water quality relationships. SVM and RANSAC exhibited unstable performance, especially for turbidity and DO, with negative or low R², reflecting sensitivity to noise and high-variance spectral conditions.

3.1.2. Landsat 8–9 Results

For Landsat 8–9, ensemble tree-based models again provided superior performance relative to linear and kernel methods, with clearer separation among algorithms for individual water quality parameters (Table 4). Gradient Boosting achieved the best overall performance for EC, delivering the highest R² (0.728) and overall, the lowest MAE and RMSE among the tested learners. For turbidity, Random Forest achieved the best overall performance (R² = 0.591, MAE = 6.245), outperforming Gradient Boosting and XGBoost.

WT prediction showed exceptionally strong accuracy across the board, with both Random Forest and Gradient Boosting achieving R² = 0.996 and very low error values (<4 °C RMSE). This indicates that Landsat’s thermal bands (B10 and B11) provide highly stable temperature information for the study area. For DO, Random Forest outperformed all other models by a large margin (R² = 0.921, RMSE = 0.368), with Gradient Boosting performing similarly but slightly weaker. Linear Regression provided moderate fits, while Ridge Regression suffered degraded performance for all variables except WT. Kernel-based SVM regression, Poisson regression, and RANSAC performed poorly, often yielding negative or near-zero R² values, highlighting their sensitivity to nonlinear and noisy spectral–ecological relationships.

3.1.3. PlanetScope Results

PlanetScope produced more variable model performance due to its limited spectral range and sensitivity to atmospheric and adjacency effects (Table 5). Nevertheless, several algorithms achieved strong predictive capability. The KNN Regressor was the strongest overall performer, delivering the highest R² values for EC (0.713), turbidity (0.661), and DO (0.613), indicating that PlanetScope’s fine spatial resolution (3 m) enables effective exploitation of local spectral neighbourhoods despite the restricted spectral configuration.

For WT, Linear Regression surprisingly outperformed all nonlinear models (R² = 0.685), suggesting that under stable atmospheric conditions the reflectance–temperature relationship behaves more linearly than for other variables. Ensemble tree-based models such as Random Forest and Gradient Boosting produced moderate and consistent results across most parameters (R² between 0.48 and 0.60), confirming their robustness to noise but also highlighting the constraints imposed by PlanetScope’s narrow spectral range. Ridge, Poisson, SVM, and RANSAC frequently yielded low or negative R², particularly for turbidity, where adjacency contamination and radiometric instability were most pronounced.

3.2. WQI CNN Models

When using the integrated WQI derived from the GIS–MCDA raster, the problem was reformulated as a supervised classification task. The seven WQI classes served as categorical labels for model training and testing.

Classical classification algorithms are generally unsuitable for complex inputs such as time series of spectral vectors because they treat each feature independently and cannot capture the inherent spectral and temporal dependencies in the data. This results in suboptimal performance since these dependencies carry crucial information about the underlying processes. Moreover, the high dimensionality of such inputs significantly complicates the optimization process, especially when the number of training samples is limited. The large feature space can lead to overfitting and poor generalization, making convergence during training unlikely. In contrast, methods like CNNs are better suited to this type of data because they can learn local spectral features and temporal patterns through convolutional operations, preserving dependencies and reducing dimensional complexity via shared weights and pooling layers.

The following WQI results are produced by the CNN models; “spectral” refers to single-date band stacks, and “temporal” refers to band-wise concatenation of monthly windows.

Model evaluation was based on confusion matrices (Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10), with overall accuracy and AUC used as the principal metrics of predictive accuracy (Table 6). Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 show confusion matrices illustrating the classification performance of WQI prediction models across the three satellite datasets (Sentinel-2, Landsat 8–9, and PlanetScope). Each figure contains four panels representing the training and test subsets for both spectral and temporal feature configurations.

The performance of ML models applied for WQI classification across Sentinel-2, Landsat 8–9, and PlanetScope datasets, using both spectral and temporal features, is listed in Table 6. Overall, the models based on spectral inputs consistently outperformed those relying on temporal composites, indicating that spectral variability provides a more stable and discriminative basis for estimating integrated water quality conditions.

3.2.1. Sentinel-2 CNN Performance

In the spectral configuration, both the training and test confusion matrices (Figure 5) show a dominant diagonal, indicating generally correct class assignments. However, the model frequently confused neighbouring classes, mostly within the mid-range WQI categories (Classes 3–5). It is consistent with the moderate test accuracy of 0.53 and high AUC of 1.00 reported in Table 6. Misclassifications rarely extend far from the diagonal, suggesting that the model captured the overall ordinal structure of the WQI but struggled to resolve subtle class boundaries.

The temporal model exhibits even stronger diagonal structure. Training performance is notably higher (accuracy 0.85, AUC 1.00), and the test matrices show fewer off-diagonal entries than in the spectral case. Although test accuracy (0.53) matches that of the spectral model, the temporal model achieves higher test R² (0.82) and more concentrated diagonal predictions, indicating better preservation of ordinal class relationships. This suggests that temporal aggregation stabilized spectral variability and enhanced class separability, reducing confusion among adjacent WQI classes.

The learning curves for the Sentinel-2 spectral and temporal models are calculated (Figure 6). In both configurations, training accuracy increased steadily and reached values above 0.85, accompanied by a consistent decrease in training loss throughout the 500 epochs. Validation accuracy remained lower, fluctuating mostly between 0.45 and 0.60 in both cases, without a strong upward trend. Validation loss showed pronounced variability, including frequent spikes that increased in magnitude at later epochs. These patterns indicate that the model learned stable representations on the training data, while validation performance remained less consistent.

3.2.2. Landsat 8–9 CNN Performance

For Landsat 8–9 (Figure 7), the spectral model showed moderate classification capability, achieving high AUC values (0.98 train, 0.97 test) but only test accuracy of 0.53 (Table 6). The test confusion matrix confirms this mismatch: although the model correctly follows the overall WQI gradient, misclassifications remain frequent across several classes, including errors beyond neighbouring categories. This indicates that, despite good probabilistic separation reflected in the AUC, the spectral model struggled to assign discrete class labels with high reliability. It is likely a consequence of Landsat’s coarser spatial resolution and fewer narrow spectral bands compared with Sentinel-2, limiting its ability to resolve subtle differences between adjacent WQI classes.

Temporal modelling resulted in severe degradation of performance. Both train and test AUC values collapsed to 0.50, with test accuracy decreasing to 0.08 and R² reaching −3.56, clearly indicating prediction collapse (Table 6). The temporal confusion matrices corroborate that nearly all samples were assigned to a single WQI class, with almost no differentiation across the seven classes. This behaviour suggests that temporal stacking introduced noise rather than informative temporal structure. The likely cause is Landsat’s long revisit interval combined with inconsistent atmospheric and illumination conditions between acquisition dates, which reduced temporal coherence and led the CNN to overfit the training set while failing entirely to generalize to unseen data.

The learning curves for the Landsat 8–9 temporal model are calculated (Figure 8). Training accuracy increased gradually to approximately 0.70, while training loss decreased smoothly over epochs. In contrast, validation accuracy remained low and highly variable, fluctuating mostly between 0.05 and 0.25 without a clear upward trend. Validation loss showed substantial instability, with frequent large spikes throughout training. These patterns indicate that, although the model fitted the training data, its performance on the validation set was inconsistent under the temporal configuration. Furthermore, the validation predictions tended to collapse into a single WQI class for extended periods during training, with the dominant predicted class shifting from epoch to epoch, reflecting unstable class separation under temporal inputs.

3.2.3. PlanetScope CNN Performance

For PlanetScope (Figure 9), the spectral model showed strong classification ability, consistent with its test AUC of 0.97, accuracy of 0.42, and R² of 0.77 (Table 6). The spectral confusion matrices display a clear diagonal trend, with most predictions falling into the correct WQI class. Misclassifications occur primarily between adjacent classes (especially around Classes 2–3 and 4–5) which indicates that the model successfully captured the underlying ordinal gradient while occasionally struggling with fine boundary transitions. This behaviour aligns with the high spatial resolution of PlanetScope imagery (3 m), which enables discrimination of small-scale spatial patterns relevant to water quality.

The temporal model performed substantially worse, with a test AUC of 0.94, accuracy of 0.44, and a low R² of 0.10 (Table 6). The temporal confusion matrix reveals considerable class mixing: several classes show dispersion into multiple neighbouring categories, and true classes 3–5 exhibit notable overlap. Although diagonal structure is still present, class separability is reduced compared with the spectral model. This degradation likely reflects PlanetScope’s limited spectral range combined with day-to-day variations in illumination and atmospheric conditions, which introduce noise into temporal features and reduce their predictive stability.

The learning curves for the PlanetScope spectral and temporal models are calculated (Figure 10). For both configurations, training accuracy increased steadily, reaching approximately 0.90, while training loss decreased smoothly across epochs. Validation accuracy remained notably lower, fluctuating mostly between 0.35 and 0.55 without a clear long-term upward trend. Validation loss exhibited substantial variability, with frequent spikes that persisted throughout training. Compared to the spectral configuration, the temporal model showed similar behaviour, with slightly larger oscillations in validation loss but comparable validation accuracy ranges. The curves indicate stable convergence on the training data but limited consistency in validation performance for both PlanetScope configurations.

3.3. CNN-Based Predictions of WQI

The CNN-based WQI prediction for the year following in situ monitoring period, generated separately for Sentinel-2, Landsat 8–9, and PlanetScope under spectral and temporal model configurations are calculated (Figure 11). Since no ground-truth data exist for this period, the maps represent forward predictions of spatial water quality patterns derived from the trained models.

Across all sensors, the maps in Figure 11 display a broadly consistent spatial structure: higher WQI classes (1–3) occur mainly in the western and central parts of Vrana Lake, while lower-quality classes (5–7) are more frequent along the eastern and southeastern margins. This gradient mirrors the dominant spatial trend captured during model training.

For Sentinel-2, the spectral model (top left) produces a smooth but spatially detailed gradient, with noticeable internal class transitions. In contrast, the temporal model (top right) yields a more uniform surface, with reduced fine-scale variability and more clustered class regions.

For Landsat 8–9, spectral predictions (middle left) exhibit stronger heterogeneity and a wider distribution of mid- to low-quality classes across the lake. The temporal model (middle right) produces highly homogenized outputs, with most pixels assigned to a narrow range of lower-quality classes, reflecting reduced class discrimination.

For PlanetScope, the spectral model (bottom left) generates the most spatially detailed output among all sensors, with well-defined class boundaries and visible local variation. The temporal model (bottom right) preserves the general lake-wide gradient but presents a smoother pattern with less within-lake differentiation.

4. Discussion

4.1. Cross-Sensor Comparison

Across all sensor–model configurations, Sentinel-2 demonstrated the strongest and most consistent performance for both regression of individual in situ parameters and CNN-based WQI classification. Its combination of dense visible and NIR spectral coverage and moderate spatial resolution allowed the models to capture both the optical complexity and spatial gradients of Vrana Lake. Ensemble regression models reached the highest R² values for EC, turbidity and DO, while CNN classification achieved AUC = 1.00 and stable WQI class separation. These findings align with recent studies by Pizani et al. (2020) [19] and Toming et al. (2016) [20] showing that Sentinel-2 reliably estimates water quality indicators across rivers, lakes and reservoirs. They highlight the suitability of Sentinel-2 as the primary remote-sensing component within the SIGMaL framework.

PlanetScope performed very well in tasks requiring fine spatial discrimination. Its spectral CNN model produced the most detailed WQI boundary delineation among all sensors, which is consistent with previous work showing that PlanetScope’s 3 m resolution excels at mapping small-scale spatial heterogeneity despite its limited spectral range [21,45]. However, because it carries only a few broad multispectral bands, it is more susceptible to atmospheric variation and less robust when modelling temporally aggregated features. This behaviour is fully reflected in the SIGMaL experiments and matches patterns observed in earlier comparative water-quality study by Di Francesco et al. (2025) [46].

For Landsat 8–9 (OLI/TIRS), the results diverged between regression and classification tasks. In situ temperature regression achieved exceptionally high accuracy (R² ≈ 0.996), consistent with many studies demonstrating that Landsat’s thermal bands provide highly reliable surface water temperature retrievals [19,47]. In contrast, Landsat’s CNN classification performance was modest in the spectral configuration (test accuracy = 0.53) and collapsed almost entirely in the temporal configuration (accuracy ≈ 0.08; strongly negative R²). This behaviour reflects Landsat’s coarser 30 m spatial resolution, lower revisit frequency, and fewer narrow spectral bands. These characteristics limit its ability to resolve the subtle water-quality gradients needed for seven-class WQI discrimination within the SIGMaL workflow. Similar shortcomings of Landsat relative to Sentinel-2 in inland waters have been observed broadly in recent comparisons by Deng et al. (2024), Pizani et al. (2020), and Parida et al. (2025) [17,19,21].

Across all sensors, spectral CNN models consistently outperformed temporal models [29]. Spectral snapshots preserve instantaneous optical conditions, whereas temporal composites blend scenes captured under different illumination, atmospheric states, and hydrodynamic conditions, reducing contrast and adding noise. This aligns with recent studies by Deng et al. (2024), Pizani et al. (2020), and Toming et al. (2016) [17,19,20] emphasizing that, despite growing interest in temporal deep learning, snapshot-based spectral models remain more accurate for WQI estimation in optically complex inland waters.

An important explanatory factor in this study is the temporal offset between field surveys and satellite overpasses (Table 2). Maximal offsets ranged from –11 to +7 days, especially problematic for Landsat. Vrana Lake is a shallow lake and strong Bora, Jugo or Maestral winds can change temperature and nutrient distributions within hours. Temporal inputs thus often combined reflectance measurements that possibly no longer corresponded to in situ water state. It weakens temporal coherence and degrading CNN performance across all temporal SIGMaL configurations, particularly for Landsat’s already sparse revisit schedule.

Within this framework, satellite observations are essential because they provide spatially exhaustive, synoptic measurements that allow integrated WQI patterns to be mapped consistently across the entire lake surface. Repeated sampling of the same 20 in situ stations, even when combined with spatial interpolation, cannot provide sensor-comparable, wall-to-wall coverage or capture spatial organization at the resolution and extent enabled by satellite imagery. Satellite data are therefore not used to increase the number of independent observations, but to enable spatial generalization and pattern recognition beyond the discrete sampling network.

4.2. WQI Outperforms Modelling Individual Parameters

A central methodological finding is that using the integrated WQI as the modelling target substantially improved predictive stability relative to direct regression of raw physicochemical parameters. Within the SIGMaL framework, CNN classification produced clearer ordinal structure, more stable confusion matrices, and better cross-sensor consistency than parameter-specific models. This confirms that WQI acts as a noise-reduced, integrated ecological signal, smoothing short-term fluctuations and reducing the influence of measurement noise or parameter-specific anomalies.

Recent studies similarly show that ML/WQI models provide greater robustness and interpretability than models predicting individual parameters. For example, Wong et al. (2022) [48] demonstrated that WQI-based machine-learning models (particularly modified Random Forest) outperform raw parameter prediction by providing higher accuracy and more stable explanatory structure. Pang et al. (2025) [49] showed that deep-learning approaches in remote sensing similarly benefit from using integrated indices such as WQI, which improve model robustness and cross-sensor transferability. The results of this study support these findings and show that WQI provides a superior modelling target within SIGMaL.

4.3. Spatial Predictions of WQI

CNN-based annual predictions for the post-monitoring period showed a consistent lake-wide west–east gradient across all sensors, with higher-quality classes (1–3) dominating the central and western areas and lower-quality classes (5–7) occurring more frequently along the eastern margins. This pattern matches field observations and known hydrodynamic processes in Vrana Lake, where nutrient inputs and restricted water exchange influence eastern basin conditions.

The Sentinel-2 and PlanetScope spectral models provided the clearest spatial structure. Sentinel-2 produced smooth, ecologically meaningful gradients, whereas PlanetScope highlighted small-scale shoreline and central-basin heterogeneities. Landsat 8–9 reproduced the general gradient but produced smoother, more spatially homogeneous maps consistent with its coarser spatial resolution. Temporal models, especially for Landsat and PlanetScope, yielded more uniform spatial fields and reduced internal variability, which is consistent with the confusion matrices and learning curves showing diminished class separability under temporal input conditions.

Visual differences among the WQI maps derived from Sentinel-2, Landsat 8–9, and PlanetScope do not contradict the relatively high quantitative performance metrics reported in Table 6. The WQI prediction is formulated as an ordinal classification problem, where classes represent ordered categories derived from continuous GIS–MCDA scores rather than exact spatial boundaries. High AUC and R² values therefore indicate consistent discrimination and correct ranking of relative water quality conditions, even when the spatial expression of class boundaries differs among sensors. These differences primarily reflect sensor-specific characteristics, including spatial resolution, spectral configuration, and revisit frequency, which influence the level of spatial detail and smoothness in the predicted maps. Consequently, the observed map discrepancies represent variations in spatial sensitivity rather than inconsistencies in model performance.

Finally, the one-year temporal offset between the in situ–based WQI modelling and satellite-based prediction may influence model accuracy due to potential domain shifts in key input parameters. Specifically, changes in the minimum and maximum values, distribution characteristics, or inter-parameter relationships driven by differing hydrological, meteorological, or anthropogenic conditions could affect model generalization. Consequently, the spatial predictions presented here are interpreted as a scenario-based extrapolation of lake water quality patterns rather than a strict temporal validation.

4.4. Methodological Limitations and Future Work

Using Level-2 surface reflectance products (rather than performing atmospheric corrections based on date and lake specifications) likely introduced residual atmospheric and adjacency artefacts. Such effects can be significant in shallow, optically complex lakes. Although ML models are often robust to moderate atmospheric errors, employing algorithms tailored for inland waters (e.g., ACOLITE, iCOR, C2RCC) could further improve physical consistency in future work.

As shown in Table 2, temporal offsets of up to 11 days were unavoidable due to cloud cover, satellite revisit constraints, and safety considerations for fieldwork. Because Vrana Lake mixes rapidly under strong wind conditions, water quality can change significantly within these time windows. Thus, “temporal” stacks often aggregated reflectance signals that no longer matched in situ conditions, explaining the instability and class-collapse seen especially in the Landsat temporal CNN models.

WQI simplifies ecological interpretation but conceals short-term or parameter-specific extremes (e.g., chlorophyll-a spikes). Future SIGMaL implementations should pair WQI-based classification with selective regression of critical parameters.

While promising, the results presented here reflect conditions in a single, moderately productive lake. The SIGMaL framework is designed to support spatial pattern recognition, comparative assessment, and monitoring prioritization, particularly in data-limited coastal lakes. It is not intended to replace in situ measurements or to provide fully quantitative water quality estimates that are directly transferable to other waterbodies without local calibration. Applying SIGMaL to other lakes will therefore require recalibration of the WQI, additional in situ sampling, and potentially model retraining to accommodate different optical environments.

Whitin this context, SIGMaL evaluates the ability of satellite sensors to reproduce the relative spatial organization of water quality across the lake, rather than fine-scale or instantaneous variability at individual locations. Nonetheless, the cross-sensor evaluation presented here provides a strong basis for generalizing the approach for shallow coastal lakes.

5. Conclusions

This study demonstrates that integrating in situ monitoring, GIS–MCDA, satellite remote sensing, and ML within the proposed SIGMaL framework provides a robust and scalable approach for assessing water quality in shallow and dynamic freshwater ecosystems such as Vrana Lake. Across all modelling approaches, WQI-based prediction consistently outperformed regression of individual physicochemical parameters, confirming that integrated ecological indices offer a more stable and noise-resistant modelling target for remote sensing applications.

Among the evaluated satellite systems, Sentinel-2 emerged as the most suitable sensor for integrated WQI mapping, combining the highest and most consistent classification performance (AUC ≈ 1.00, R² ≈ 0.84) with its rich visible and NIR spectral configuration. PlanetScope excelled in capturing fine-scale spatial variability (R² ≈ 0.77) due to its high spatial resolution. Landsat 8–9 performed best for WT retrieval but showed reduced capability for multi-class WQI discrimination, particularly in temporal CNN models, largely due to revisit limitations and temporal mismatches with field campaigns. Accordingly, Sentinel-2 is recommended as the primary sensor for operational WQI-based monitoring within the SIGMaL framework, with PlanetScope serving as a complementary data source for high-resolution spatial analyses and Landsat 8–9 supporting temperature-focused or long-term monitoring applications.

Temporal modelling was generally less effective than spectral modelling across all sensors, partly due to inconsistent overpass timing and rapid hydrodynamic changes in the lake, which weakened temporal coherence. Despite these challenges, CNN-based WQI predictions successfully reproduced the known west–east water quality gradient of Vrana Lake, demonstrating the ecological relevance of the integrated modelling framework.

The results of this study highlight that the SIGMaL framework offers a scalable, transferable, and operationally practical approach for water quality monitoring in coastal shallow lakes. Future work should expand the framework to multiple lakes, incorporate more advanced atmospheric correction, and explore hybrid approaches that pair WQI classification with parameter-specific retrievals.

Author Contributions

Conceptualization, A.B., A.Š., L.Š. and A.K.; methodology, A.B., A.Š. and L.Š.; software, L.Š.; validation, A.B. and L.Š.; formal analysis, A.B. and L.Š.; investigation, A.B. and L.Š.; resources, A.B., A.Š., L.Š. and A.K.; data curation, L.Š.; writing—original draft preparation, A.B.; writing—review and editing, A.B., A.Š., A.K. and L.Š.; visualization, A.B. and L.Š.; supervision, A.Š., L.Š. and A.K.; project administration, A.B.; funding acquisition, A.Š. and A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Interreg VI-A IPA Croatia-Bosnia and Herzegovina-Montenegro program 2021–2027 under Interreg Self-sustainable Multisensor System for Monitoring Water Quality in Inland Waterbodies (SMART-Water) project, grant number HR-BA-ME00330.

Data Availability Statement

The original data presented in the study are openly available in Mendeley at https://data.mendeley.com/datasets/82crs2ssss/1 (accessed on 2 October 2025). The code used for regression analyses and CNN predictions is available at: https://github.com/ljiljana44/WaterQuality_Vransko_jezero (accessed on 3 December 2025).

Acknowledgments

The study was supported by the SMART-Water project HR-BA-ME00330 and Institutional research project GEOSKLAD, University of Zagreb, Faculty of Geodesy, from the quota of Program Agreements of the Ministry of Science, Education and Youth of the Republic of Croatia with the University of Zagreb, Croatia. During the preparation of this manuscript/study, the author(s) used ChatGPT, 5.1 for the purposes of interpretation of data. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

WQI	water quality index
CNN	convolutional neural network
ML	machine learning
EC	electrical conductivity
WT	water temperature
DO	dissolved oxygen
ROC	receiver operating characteristics
AUC	area under curve
MAE	mean absolute error
RMSE	root mean square error
R²	coefficient of determination
SSE	sum of squared errors
XGBoost	eXtreme gradient boosting
SVM	support vector machine
RANSAC	random sample consensus
KNN	k-nearest neighbours
NIR	near-infra red
MCDA	multicriteria decision analysis

Appendix A

The WQI applied in this study was originally developed and fully described by Batina and Šiljeg (2025) [10]. For the purpose of methodological transparency and self-description, this appendix presents a concise summary of the WQI formulation, including the criteria set, weighting approach, aggregation method, and validation framework. The complete methodological details and extended analyses are available in the original reference.

Figure A1. Workflow of the GIS–MCDA-based WQI model, showing criteria selection, spatial data preparation, fuzzy standardization, F-AHP weighting, WLC aggregation, and sensitivity analysis.

References

Adjovu, G.E.; Stephen, H.; Ahmad, S. Spatial and Temporal Dynamics of Key Water Quality Parameters in a Thermal Stratified Lake Ecosystem: The Case Study of Lake Mead. Earth 2023, 4, 461–502. [Google Scholar] [CrossRef]
Bănăduc, D.; Simić, V.; Cianfaglione, K.; Barinova, S.; Afanasyev, S.; Öktener, A.; McCall, G.; Simić, S.; Curtean-Bănăduc, A. Freshwater as a Sustainable Resource and Generator of Secondary Resources in the 21st Century: Stressors, Threats, Risks, Management and Protection Strategies, and Conservation Approaches. Int. J. Environ. Res. Public Health 2022, 19, 16570. [Google Scholar] [CrossRef] [PubMed]
Capon, S.J.; Stewart-Koster, B.; Bunn, S.E. Future of Freshwater Ecosystems in a 1.5 °C Warmer World. Front. Environ. Sci. 2021, 9, 784642. [Google Scholar] [CrossRef]
Sidle, R.C.; Gomi, T. Hydrological Systems. In Wetzel’s Limnology; Elsevier: Amsterdam, The Netherlands, 2024; pp. 57–73. [Google Scholar]
Batina, A.; Krtalić, A. Integrating Remote Sensing Methods for Monitoring Lake Water Quality: A Comprehensive Review. Hydrology 2024, 11, 92. [Google Scholar] [CrossRef]
Talukdar, S.; Shahfahad; Ahmed, S.; Naikoo, M.W.; Rahman, A.; Mallik, S.; Ningthoujam, S.; Bera, S.; Ramana, G.V. Predicting Lake Water Quality Index with Sensitivity-Uncertainty Analysis Using Deep Learning Algorithms. J. Clean. Prod. 2023, 406, 136885. [Google Scholar] [CrossRef]
Akar, A.U.; Sisman, S.; Ulku, H.; Yel, E.; Yalpir, S. Evaluating Lake Water Quality with a GIS-Based MCDA Integrated Approach: A Case in Konya/Karapınar. Environ. Sci. Pollut. Res. 2024, 31, 19478–19499. [Google Scholar] [CrossRef]
Xiong, Y.; Ran, Y.; Zhao, S.; Zhao, H.; Tian, Q. Remotely Assessing and Monitoring Coastal and Inland Water Quality in China: Progress, Challenges and Outlook. Crit. Rev. Environ. Sci. Technol. 2020, 50, 1266–1302. [Google Scholar] [CrossRef]
Batina, A.; Cukrov, N.; Ćuže Denona, M. Spatiotemporal Water Quality Analysis of Vrana Lake, Croatia. Open Geosci. 2025, 17, 20250817. [Google Scholar] [CrossRef]
Batina, A.; Šiljeg, A. Enhancing Water Quality Monitoring in a Coastal Shallow Lake Using GIS and Multi-Criteria Decision Analysis. Environ. Sustain. Indic. 2025, 28, 100881. [Google Scholar] [CrossRef]
Paule-Mercado, M.C.; Rabaneda-Bueno, R.; Porcal, P.; Kopacek, M.; Huneau, F.; Vystavna, Y. Climate and Land Use Shape the Water Balance and Water Quality in Selected European Lakes. Sci. Rep. 2024, 14, 8049. [Google Scholar] [CrossRef]
Selak, L.; Marković, T.; Pjevac, P.; Orlić, S. Microbial Marker for Seawater Intrusion in a Coastal Mediterranean Shallow Lake, Lake Vrana, Croatia. Sci. Total Environ. 2022, 849, 157859. [Google Scholar] [CrossRef] [PubMed]
Mansaray, A.S.; Dzialowski, A.R.; Martin, M.E.; Wagner, K.L.; Gholizadeh, H.; Stoodley, S.H. Comparing PlanetScope to Landsat-8 and Sentinel-2 for Sensing Water Quality in Reservoirs in Agricultural Watersheds. Remote Sens. 2021, 13, 1847. [Google Scholar] [CrossRef]
Andrzej Urbanski, J.; Wochna, A.; Bubak, I.; Grzybowski, W.; Lukawska-Matuszewska, K.; Łącka, M.; Śliwińska, S.; Wojtasiewicz, B.; Zajączkowski, M. Application of Landsat 8 Imagery to Regional-Scale Assessment of Lake Water Quality. Int. J. Appl. Earth Obs. Geoinf. 2016, 51, 28–36. [Google Scholar] [CrossRef]
Kuhn, C.; De Matos Valerio, A.; Ward, N.; Loken, L.; Sawakuchi, H.O.; Kampel, M.; Richey, J.; Stadler, P.; Crawford, J.; Striegl, R.; et al. Performance of Landsat-8 and Sentinel-2 Surface Reflectance Products for River Remote Sensing Retrievals of Chlorophyll-a and Turbidity. Remote Sens. Environ. 2019, 224, 104–118. [Google Scholar] [CrossRef]
Kayastha, P.; Dzialowski, A.R.; Stoodley, S.H.; Wagner, K.L.; Mansaray, A.S. Effect of Time Window on Satellite and Ground-Based Data for Estimating Chlorophyll-a in Reservoirs. Remote Sens. 2022, 14, 846. [Google Scholar] [CrossRef]
Deng, Y.; Zhang, Y.; Pan, D.; Yang, S.X.; Gharabaghi, B. Review of Recent Advances in Remote Sensing and Machine Learning Methods for Lake Water Quality Management. Remote Sens. 2024, 16, 4196. [Google Scholar] [CrossRef]
Sun, X.; Zhang, Y.; Shi, K.; Zhang, Y.; Li, N.; Wang, W.; Huang, X.; Qin, B. Monitoring Water Quality Using Proximal Remote Sensing Technology. Sci. Total Environ. 2022, 803, 149805. [Google Scholar] [CrossRef]
Pizani, F.M.C.; Maillard, P.; Ferreira, A.F.F.; De Amorim, C.C. Estimation of Water Quality in a Reservoir from Sentinel-2 MSI and Landsat-8 OLI Sensors. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 3, 401–408. [Google Scholar] [CrossRef]
Toming, K.; Kutser, T.; Laas, A.; Sepp, M.; Paavel, B.; Nõges, T. First Experiences in Mapping Lake Water Quality Parameters with Sentinel-2 MSI Imagery. Remote Sens. 2016, 8, 640. [Google Scholar] [CrossRef]
Parida, B.R.; Tiwari, S.; Dwivedi, C.S.; Pandey, A.C.; Singh, B.; Behera, M.D.; Kumar, N. Comparative Assessment of Satellite-Based Models through Planetscope and Landsat-8 for Determining Physico-Chemical Water Quality Parameters in Varuna River (India). Appl. Water Sci. 2025, 15, 55. [Google Scholar] [CrossRef]
Šiljeg, A. Digital Terrain Model in the Analysis of Geomorphometrical Parameters—The Example of Nature Park Lake Vrana. Doctoral Thesis, University of Zadar, Zadar, Croatia, 2013. (In Croatian) [Google Scholar]
Public Institution Vransko Jezero Nature Park Management Plan for the Nature Park and Special Ornithological Reserve Vransko Lake and Its Associated Ecological Network Areas (PU 6163) 2023–2032 (in Croatian). 2022. Available online: https://www.pp-vransko-jezero.hr/documents/plan-upravljanja/plan-upravljanja-2023-2032.pdf (accessed on 10 October 2025).
Croatian Geological Survey. Vrana Lake—Hydrogeological Research; Croatian Geological Survey, Department of Hydrogeology and Engineering Geology: Zagreb, Croatia, 2012. (In Croatian) [Google Scholar]
Xylem EXO User Manual. 2020. Available online: https://www.xylemanalytics.com/File%20Library/Resource%20Library/YSI/Manuals/EXO-User-Manual-Web.pdf (accessed on 19 November 2025).
Planet Team Planet Application Program Interface: In Space for Life on Earth. 2022. Available online: https://api.planet.com (accessed on 14 August 2025).
Pan, Y.; Bélanger, S.; Huot, Y. Evaluation of Atmospheric Correction Algorithms over Lakes for High-Resolution Multispectral Imagery: Implications of Adjacency Effect. Remote Sens. 2022, 14, 2979. [Google Scholar] [CrossRef]
Zhu, W.; Xia, W. Effects of Atmospheric Correction on Remote Sensing Statistical Inference in an Aquatic Environment. Remote Sens. 2023, 15, 1907. [Google Scholar] [CrossRef]
Ivanda, A.; Šerić, L.; Braović, M. Exploring Applications of Convolutional Neural Networks in Analyzing Multispectral Satellite Imagery: A Systematic Review. Big Data Min. Anal. 2025, 8, 407–429. [Google Scholar] [CrossRef]
Ansper, A.; Alikas, K. Retrieval of Chlorophyll a from Sentinel-2 MSI Data for the European Union Water Framework Directive Reporting Purposes. Remote Sens. 2018, 11, 64. [Google Scholar] [CrossRef]
Nguyen, U.N.T.; Pham, L.T.H.; Dang, T.D. An Automatic Water Detection Approach Using Landsat 8 OLI and Google Earth Engine Cloud Computing to Map Lakes and Reservoirs in New Zealand. Environ. Monit. Assess. 2019, 191, 235. [Google Scholar] [CrossRef]
Qu, K. Research on Linear Regression Algorithm. MATEC Web Conf. 2024, 395, 01046. [Google Scholar] [CrossRef]
Liu, S.; Dobriban, E. Ridge Regression: Structure, Cross-Validation, and Sketching. arXiv 2019, arXiv:1910.02373. [Google Scholar]
Salman, H.A.; Kalakech, A.; Steiti, A. Random Forest Algorithm Overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef]
Natekin, A.; Knoll, A. Gradient Boosting Machines, a Tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Basak, D.; Pal, S.; Patranabis, D. Support Vector Regression. Neural Inf. Process.–Lett. Rev. 2007, 11, 203–224. [Google Scholar]
Chen, G.; Ma, J.; Fattahi, S. RANSAC Revisited: An Improved Algorithm for Robust Subspace Recovery under Adversarial and Noisy Corruptions. arXiv 2025, arXiv:2504.09648. [Google Scholar] [CrossRef]
Srisuradetchai, P.; Suksrikran, K. Random Kernel K-Nearest Neighbors Regression. Front. Big Data 2024, 7, 1402384. [Google Scholar] [CrossRef]
Yang, S.; Berdine, G. Poisson Regression. Southwest Respir. Crit. Care Chron. 2015, 3, 61. [Google Scholar] [CrossRef]
Villota-González, F.H.; Sulbarán-Rangel, B.; Zurita-Martínez, F.; Gurubel-Tun, K.J.; Zúñiga-Grajeda, V. Assessment of Machine Learning Models for Remote Sensing of Water Quality in Lakes Cajititlán and Zapotlán, Jalisco—Mexico. Remote Sens. 2023, 15, 5505. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The Coefficient of Determination R-Squared Is More Informative than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef] [PubMed]
Farhadpour, S.; Warner, T.A.; Maxwell, A.E. Selecting and Interpreting Multiclass Loss and Accuracy Assessment Metrics for Classifications with Class Imbalance: Guidance and Best Practices. Remote Sens. 2024, 16, 533. [Google Scholar] [CrossRef]
Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Niroumand-Jadidi, M.; Bovolo, F.; Bruzzone, L.; Gege, P. Physics-Based Bathymetry and Water Quality Retrieval Using PlanetScope Imagery: Impacts of 2020 COVID-19 Lockdown and 2019 Extreme Flood in the Venice Lagoon. Remote Sens. 2020, 12, 2381. [Google Scholar] [CrossRef]
Di Francesco, S.; Biondi, F.; Casentini, B.; Fazi, S.; Amalfitano, S.; D’Eugenio, M.; Todisco, F.; Casadei, S.; Giannone, F. Sentinel-2 and Planet-Scope as Reliable Tools for Water Quality Monitoring of Small Reservoirs. Egypt. J. Remote Sens. Space Sci. 2025, 28, 713–723. [Google Scholar] [CrossRef]
Kramer, G.; Filho, W.P.; De Carvalho, L.A.S.; Trindade, P.M.P.; Da Rosa, C.N.; Dezordi, R. Performance and Validation of Water Surface Temperature Estimates from Landsat 8 of the Itaipu Reservoir, State of Paraná, Brazil. Environ. Monit. Assess. 2023, 195, 137. [Google Scholar] [CrossRef]
Wong, W.Y.; Ibrahim Al-Ani, A.K.; Hasikin, K.; Mohd Khairuddin, A.S.; Razak, S.A.; Hizaddin, H.F.; Mokhtar, M.I.; Azizan, M.M. Water Quality Index Using Modified Random Forest Technique: Assessing Novel Input Features. Comput. Model. Eng. Sci. 2022, 132, 1011–1038. [Google Scholar] [CrossRef]
Pang, Z.; Zhou, Z.; Fu, J.; Jiang, W.; Qin, X.; Sun, M. Deep Learning-Based Remote Sensing Retrieval of Inland Water Quality: A Review. J. Hydrol. Reg. Stud. 2025, 61, 102759. [Google Scholar] [CrossRef]

Figure 1. Vrana Lake with monitoring stations and main hydrological features.

Figure 2. Calibration of the YSI EXO2 multiparameter probes.

Figure 3. Water quality raster of Vrana Lake derived from MCDA and its use for dataset densification. (a) Continuous raster surface classified into seven WQI classes. (b) Discretized raster with 318 extracted samples shown together with 20 in situ monitoring stations.

Figure 4. Integrated SIGMaL workflow combining in situ data, GIS-MCDA, satellite imagery, and ML for WQI prediction.

Figure 5. Confusion matrices showing WQI classification performance for Sentinel-2 spectral and temporal models.

Figure 6. Training and validation accuracy and loss curves for the Sentinel-2 CNN spectral and temporal models.

Figure 7. Confusion matrices showing WQI classification performance for Landsat 8–9 spectral and temporal models.

Figure 8. Training and validation accuracy and loss curves for the Landsat 8–9 CNN spectral and temporal models.

Figure 9. Confusion matrices showing WQI classification performance for PlanetScope spectral and temporal models.

Figure 10. Training and validation accuracy and loss curves for the PlanetScope CNN spectral and temporal models.

Figure 11. Predicted spatial distribution of WQI classes in Vrana Lake for the 12-month period following field measurements, based on CNN models using Sentinel-2, Landsat 8–9, and PlanetScope imagery under spectral and temporal modelling scenarios.

Table 1. Comparison of Sentinel-2 MSI, Landsat 8–9 OLI/TIRS, and PlanetScope sensor specifications.

Category	Sentinel-2 MSI	Landsat 8–9 OLI/TIRS	PlanetScope
Spatial resolution	10–60 m	15 m (pan), 30–100 m	3 m
Temporal resolution	5 days	16 days (8-day combined)	Daily
Spectral resolution	13 bands	11 bands	8 bands
Bandwidth (nm)
Coastal	B1: 433–453 (60 m)	B1: 433–453 (30 m)	B1: 431–452 (3 m)
Blue	B2: 460–525 (10 m)	B2: 450–515 (30 m)	B2: 465–515 (3 m)
Green I	-	-	B3: 513–549 (3 m)
Green	B3: 542–577 (10 m)	B3: 525–600 (30 m)	B4: 547–583 (3 m)
Yellow	-	-	B5: 600–620 (3 m)
Red	B4: 650–680 (10 m)	B4: 630–680 (30 m)	B6: 650–680 (3 m)
Red Edge 1	B5: 697–711 (20 m)	-	B7: 697–713 (3 m)
Red Edge 2	B6: 733–747 (20 m)	-	-
Red Edge 3	B7: 773–792 (20 m)	-	-
NIR (narrow)	B8: 780–885 (10 m)	-	-
NIR	B8A: 854–875 (20 m)	B5: 845–885 (30 m)	B8: 845–885 (3 m)
Water vapour	B9: 936–955 (60 m)	-	-
Cirrus	B10: 1359–1388 (60 m)	B9: 1360–1390 (30 m)	-
SWIR 1	B11: 1569–1659 (20 m)	B6: 1560–1660 (30 m)	-
SWIR 2	B12: 2115–2289 (20 m)	B7: 2100–2300 (30 m)	-
TIRS 1	-	B10: 10,600–11,200 (100 m)	-
TIRS 2	-	B11: 11,500–12,500 (100 m)	-
Panchromatic	-	B8: 500–680 (15 m)	-
Free Imagery	Unlimited	Unlimited	5000 km²/month (education) [26]

Table 2. Overview of dates of in situ measurements, corresponding satellite acquisitions, temporal offsets, and number of measured stations.

In Situ Date	Sentinel-2	Landsat 8–9	PlanetScope	Max Prior (Days)	Max After (Days)	Stations Measured
17 July 2023	16 Jul.	17 Jul	16 Jul	−1	0	19
18 August 2023	20 Aug	18 Aug	18 Aug	0	2	20
27 September 2023	29 Sep	27 Sep	27 Sep	0	2	12
13 October 2023	09 Oct	05 Oct	16 Oct	−8	3	20
4 December 2023	23 Nov	08 Dec	08 Dec	−11	4	20
19 December 2023	18 Dec	16 Dec	19 Dec	−3	0	20
11 January 2024	12 Jan	09 Jan	11 Jan	−2	1	19
19 February 2024	21 Feb	18 Feb	20 Feb	−1	2	20
14 March 2024	12 Mar	13 Mar	14 Mar	−2	0	20
29 April 2024	21 Apr	30 Apr	30 Apr	−8	1	20
24 May 2024	24 May	01 Jun	18 May	−6	7	20
17 June 2024	15 Jun	17 Jun	17 Jun	−2	0	20
Extremes/Total	–	–	–	−11	7	230

Table 3. Benchmark results of regression models for each target variable for Sentinel-2.

Model	EC			Turbidity			WT			DO
Model	MAE	RMSE	R²	MAE	RMSE	R²	MAE	RMSE	R²	MAE	RMSE	R²
Linear Regression	0.284	0.360	0.258	10.047	11.941	0.470	29.859	37.166	0.615	0.706	0.837	0.591
Ridge Regression	0.388	0.417	0.001	13.806	16.597	−0.024	51.677	57.568	0.076	1.077	1.308	0.003
Random Forest	0.178	0.247	0.650	4.977	8.474	0.733	18.255	33.239	0.692	0.450	0.740	0.680
Gradient Boosting	0.185	0.246	0.652	4.524	7.947	0.765	15.218	25.675	0.816	0.531	0.738	0.682
XGBoost	0.195	0.308	0.457	3.956	8.241	0.748	15.899	29.560	0.756	0.511	0.790	0.637
SVM Regressor	0.212	0.291	0.516	11.691	16.693	−0.035	37.662	46.899	0.387	0.717	0.992	0.426
RANSAC Regressor	0.518	0.751	−2.235	13.751	22.137	−0.821	34.574	44.278	0.453	0.770	0.944	0.481
KNN Regressor	0.177	0.265	0.598	4.889	7.228	0.806	18.739	30.386	0.742	0.519	0.770	0.655
Poisson Regressor	0.390	0.417	0.000	13.937	16.732	−0.040	54.602	59.889	0.000	1.100	1.331	0.032

Table 4. Benchmark results of regression models for each target variable for Landsat 8–9.

Model	EC			Turbidity			WT			DO
Model	MAE	RMSE	R²	MAE	RMSE	R²	MAE	RMSE	R²	MAE	RMSE	R²
Linear Regression	0.227	0.272	0.576	10.097	13.961	0.276	6.887	8.233	0.981	0.476	0.576	0.807
Ridge Regression	0.379	0.404	0.061	13.784	16.378	0.003	9.224	12.439	0.957	0.522	0.616	0.779
Random Forest	0.168	0.262	0.605	6.245	10.494	0.591	2.658	3.663	0.996	0.268	0.368	0.921
Gradient Boosting	0.148	0.218	0.728	6.622	10.587	0.583	2.722	3.589	0.996	0.287	0.392	0.911
XGBoost	0.175	0.282	0.543	6.843	12.562	0.414	3.079	5.506	0.992	0.298	0.406	0.904
SVM Regressor	0.380	0.433	−0.076	13.495	19.497	−0.412	52.607	57.573	0.076	0.829	1.042	0.366
RANSAC Regressor	0.299	0.379	0.178	11.869	17.982	−0.202	6.922	8.260	0.981	0.623	1.159	0.217
KNN Regressor	0.211	0.326	0.389	7.763	13.064	0.366	4.308	6.871	0.987	0.365	0.519	0.843
Poisson Regressor	0.383	0.410	0.034	14.113	16.637	−0.029	13.073	16.727	0.922	0.529	0.624	0.773

Table 5. Benchmark results of regression models for each target variable for PlanetScope.

Model	EC			Turbidity			WT			DO
Model	MAE	RMSE	R²	MAE	RMSE	R²	MAE	RMSE	R²	MAE	RMSE	R²
Linear Regression	0.207	0.258	0.619	9.756	13.128	0.360	26.530	33.612	0.685	0.621	0.815	0.613
Ridge Regression	0.385	0.413	0.020	13.632	16.406	0.000	54.185	59.540	0.011	1.084	1.318	–0.012
Random Forest	0.185	0.271	0.578	7.264	11.810	0.482	26.112	40.752	0.537	0.612	0.890	0.538
Gradient Boosting	0.184	0.270	0.582	8.197	11.679	0.493	27.452	40.684	0.538	0.624	0.823	0.605
XGBoost	0.170	0.283	0.539	8.150	13.538	0.319	25.760	42.403	0.499	0.648	0.992	0.426
SVM Regressor	0.194	0.263	0.604	10.014	15.049	0.158	42.929	52.057	0.244	0.713	0.947	0.477
RANSAC Regressor	0.221	0.269	0.586	14.182	20.590	−0.575	32.051	43.412	0.474	0.779	0.987	0.432
KNN Regressor	0.139	0.223	0.713	5.731	9.546	0.661	25.088	39.978	0.554	0.544	0.815	0.613
Poisson Regressor	0.390	0.417	0.000	13.932	16.725	−0.039	54.877	60.116	−0.008	1.102	1.331	−0.033

Table 6. Performance of ML models for WQI classification across satellite sensors, based on spectral and temporal analyses.

Sensor	Analysis Type	Subset	AUC	Accuracy	R²
Sentinel-2	Spectral	Train	1.00	0.73	0.89
	Spectral	Test	1.00	0.53	0.84
	Temporal	Train	1.00	0.85	0.96
	Temporal	Test	0.99	0.53	0.82
Landsat 8–9	Spectral	Train	0.98	0.80	0.93
	Spectral	Test	0.97	0.53	0.51
	Temporal	Train	0.50	0.09	−3.41
	Temporal	Test	0.50	0.08	−3.56
PlanetScope	Spectral	Train	1.00	0.81	0.91
	Spectral	Test	0.97	0.42	0.77
	Temporal	Train	1.00	0.67	0.67
	Temporal	Test	0.94	0.44	0.10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Batina, A.; Šiljeg, A.; Krtalić, A.; Šerić, L. SIGMaL: An Integrated Framework for Water Quality Monitoring in a Coastal Shallow Lake. Remote Sens. 2026, 18, 312. https://doi.org/10.3390/rs18020312

AMA Style

Batina A, Šiljeg A, Krtalić A, Šerić L. SIGMaL: An Integrated Framework for Water Quality Monitoring in a Coastal Shallow Lake. Remote Sensing. 2026; 18(2):312. https://doi.org/10.3390/rs18020312

Chicago/Turabian Style

Batina, Anja, Ante Šiljeg, Andrija Krtalić, and Ljiljana Šerić. 2026. "SIGMaL: An Integrated Framework for Water Quality Monitoring in a Coastal Shallow Lake" Remote Sensing 18, no. 2: 312. https://doi.org/10.3390/rs18020312

APA Style

Batina, A., Šiljeg, A., Krtalić, A., & Šerić, L. (2026). SIGMaL: An Integrated Framework for Water Quality Monitoring in a Coastal Shallow Lake. Remote Sensing, 18(2), 312. https://doi.org/10.3390/rs18020312

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SIGMaL: An Integrated Framework for Water Quality Monitoring in a Coastal Shallow Lake

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Collection

2.2.1. In Situ Measurements

2.2.2. Multiparameter Probe

2.3. Satellite Data Acquisition and Preprocessing

2.4. Dataset Development

2.5. ML Framework

2.5.1. Regressors for Water Quality Parameters Modelling

2.5.2. CNNs for WQI Assessment

2.6. Workflow Overview

3. Results

3.1. Regression of In Situ Parameters

3.1.1. Sentinel-2 Results

3.1.2. Landsat 8–9 Results

3.1.3. PlanetScope Results

3.2. WQI CNN Models

3.2.1. Sentinel-2 CNN Performance

3.2.2. Landsat 8–9 CNN Performance

3.2.3. PlanetScope CNN Performance

3.3. CNN-Based Predictions of WQI

4. Discussion

4.1. Cross-Sensor Comparison

4.2. WQI Outperforms Modelling Individual Parameters

4.3. Spatial Predictions of WQI

4.4. Methodological Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI