1. Introduction
Eutrophication is one of the most persistent and widespread environmental challenges affecting aquatic ecosystems on a global scale. It occurs when water bodies receive excessive inputs of nutrients—primarily phosphorus and nitrogen—often from agricultural runoff, urban wastewater, and industrial discharges. These nutrients stimulate rapid growth of algae and phytoplankton, accelerating the natural aging process of lakes and rivers. When algal biomass increases beyond ecological balance, it leads to oxygen depletion through enhanced respiration and microbial decomposition, thereby triggering a cascade of adverse effects such as fish kills, loss of biodiversity, and degradation of aquatic habitats [
1,
2,
3].
Eutrophication consequences extend beyond ecological deterioration. It can also impair drinking-water quality through the proliferation of harmful algal blooms (HABs), which produce toxins that pose risks to human health, affecting skin, liver, and neurological functions [
4,
5,
6]. Additionally, eutrophication imposes substantial economic costs, including reduced fisheries productivity, loss of recreational value, and increased expenses for water treatment and ecosystem restoration [
7]. As climatic conditions warm and extreme precipitation events intensify, nutrient transport and bloom dynamics are also expected to worsen, further elevating global concern [
2,
8].
Given the complexity and spatial heterogeneity of eutrophication processes, effective monitoring requires tools capable of capturing both spatial and temporal variability. Traditional field-based sampling, while accurate, is often laboratory-intensive, spatially limited, and inadequate for rapid ecosystem assessments. Remote sensing has therefore emerged as a powerful complementary approach, offering the ability to detect changes in water quality over large areas with high temporal frequency. Satellite and UAV-based sensors can detect chlorophyll-a, suspended particulate matter, and other optical properties associated with eutrophication, enabling early-warning systems and long-term monitoring frameworks [
9]. In eutrophic conditions, elevated nutrient inputs stimulate phytoplankton growth, leading to increased chlorophyll-a concentrations and changes in the optical properties of the water column that can be detected by remote sensing sensors.
Modern advancements in remote sensing increasingly rely on Artificial Intelligence (AI) and machine learning (ML) techniques to extract meaningful information from complex spectral datasets. Moreover, the integration of UAV-based multispectral imaging and machine learning provides a cost-effective, flexible, and near-real-time approach for assessing water quality, offering significant advantages over traditional manual monitoring and satellite remote sensing methods. UAV multispectral technology effectively captures the spectral response characteristics of specific water quality parameters, overcoming the spatial and temporal resolution limitations commonly associated with satellite data. Furthermore, ML algorithms can automatically learn complex patterns and extract meaningful features from multispectral imagery, enabling the accurate estimation of both optical and non-optical water quality parameters. For example, in the study by [
6], it was found that the Random Forest (RF) model outperformed other ML algorithms in predicting water quality parameters like nitrite, nitrate, chlorophyll-a, phosphate, and suspended solids. That study also captured seasonal variations in water quality, with nutrient concentrations peaking in summer and declining in autumn, while chlorophyll-a reached its highest in autumn. According to Li et al. (2026), the k-means clustering algorithm is widely used among unsupervised learning approaches due to its computational simplicity and robustness [
10]. By grouping pixels with similar spectral characteristics, k-means enables automated classification of water areas into categories reflecting varying degrees of eutrophication [
9]. Over time, k-means has been proposed in several versions [
11], and it has been applied in many research topics [
12,
13,
14].
Despite these advantages, the use of UAV-based remote sensing and unsupervised clustering techniques such as k-means also presents certain limitations. UAV data acquisition is sensitive to weather conditions, illumination variability, and flight altitude, which may introduce noise or inconsistencies in the spectral signal. Additionally, UAV surveys typically cover limited spatial extents, requiring careful planning to ensure representativeness. Regarding k-means clustering, the method requires the a priori selection of the number of clusters, which may influence the resulting classification and introduce subjectivity. Moreover, k-means assumes spherical cluster shapes and equal variance, which may not fully capture the complexity of natural aquatic systems. Finally, the algorithm is sensitive to outliers and noise, potentially leading to the formation of artefact clusters in areas affected by shallow water, turbulence, or mixed surface conditions.
The aim of this study is to (a) analyze high-resolution multispectral data collected by an unmanned aerial vehicle (UAV), (b) compute remote sensing indices sensitive to chlorophyll-a, and (c) identify and classify water areas according to their relative levels of eutrophication by applying the k-means clustering algorithm. By integrating remote sensing and machine learning, this work contributes to the development of efficient, cost-effective tools for monitoring, visualizing, and managing eutrophication in aquatic ecosystems. Overall, the study demonstrates that UAV-based multispectral imaging—combined with machine-learning classification—offers a dynamic and scalable approach to environmental monitoring, particularly for regions where conventional sampling is logistically challenging or insufficient.
Research Site
The study area is Almyros, a suburban stream located in the western basin of Heraklion (Crete, Greece), approximately 8 km northwest of the city center (
Figure 1). It is noted for its ecological significance and karst hydrological uniqueness [
15].
The stream is about 1.8 km long, with a width ranging from 5 to 20 m [
16]. The Almyros ecosystem holds significant environmental value and is included among UNESCO-protected natural sites [
17]. At the stream’s source, a dam has been constructed to increase aquifer pressure, prevent seawater intrusion, and improve groundwater quality. Despite these measures, the stream’s water exhibits high salinity due to the mixing of seawater within the aquifer that feeds the spring. Additionally, the karstic nature of the region complicates water management, as water moves rapidly through subterranean channels with minimal natural filtration [
16,
17,
18].
Although the broader Almyros area (
Figure 1) does not exhibit high levels of urbanization, it is subject to considerable anthropogenic pressures [
15] according to the Impervious Density Index [
14]. Agricultural activities dominate the eastern part of the area near the dam and are closely linked to eutrophication phenomena observed in the stream [
18]. In the northern coastal zone, land uses related to tourism and recreation have become increasingly established [
18].
Overall, the Almyros Stream (
Figure 1) represents a valuable yet highly vulnerable hydrological system, playing an important role both in Crete’s natural ecology and in the local water supply. However, ongoing pressures from agriculture, tourism development, and water extraction underscore the need for effective management and protection strategies to ensure its long-term sustainability [
18].
3. Results
In this section, the results obtained from the application of the k-means algorithm are presented. The primary objective of the clustering procedure was to achieve a spatial subdivision of water pixels into categories with distinct characteristics, enabling the assessment of eutrophication levels.
3.1. Cartographic Results
The cartographic analysis of the k-means clustering results highlights three key findings. First, the spatial representation of eutrophication is highly sensitive to the selected number of clusters. Second, low k values lead to excessive spatial generalization, whereas high k values result in over-fragmentation and reduced interpretability. Third, intermediate k values provide the most balanced representation of spatial variability, effectively capturing eutrophication patterns while preserving spatial coherence. The implementation of the k-means algorithm produced raster outputs in which each pixel was assigned to one of the derived clusters. These cluster maps illustrate the spatial distribution of the identified groups across the water surface, offering an initial depiction of the spatial variability of eutrophication. The following paragraphs provide a detailed cartographic analysis of these findings, examining how different k values influence the spatial representation and interpretability of eutrophication patterns.
The cartographic outputs reveal that the level of spatial detail varies substantially with the choice of k. For lower values (k = 2–4), the resulting maps exhibit highly generalized spatial patterns, characterized by extensive homogeneous areas and a limited ability to distinguish finer spatial variations (
Figure A1,
Appendix A). At this level, the clusters represent only the most prominent spatial contrasts, failing to capture internal heterogeneity within the water body. As a result, these maps provide a useful general overview but are insufficient for detailed spatial interpretation.
At higher k values (k = 8–11), the clusters become increasingly fragmented, producing overly detailed maps composed of small, isolated spatial units (
Figure A2,
Figure A3,
Figure A4 and
Figure A5,
Appendix A). While this high granularity allows subtle differences to be detected, it often introduces excessive complexity. Previous studies have shown that the spatial distribution of chlorophyll-a tends to be correlated and exhibits a clustered distribution [
37,
38]. In many cases, minor data fluctuations generate isolated clusters that do not correspond to meaningful environmental distinctions, indicating over-fragmentation of the dataset. Between these two extremes, intermediate values (k = 5–7) offer a more balanced spatial representation (
Figure A6 and
Figure A7,
Appendix A). Within this range, the clusters clearly reflect spatial variability while avoiding excessive fragmentation. Consequently, these k values strike a desirable balance by providing sufficient spatial detail while maintaining interpretability.
Overall, the cartographic analysis demonstrates that low k values lead to over-generalization, whereas high k values introduce unnecessary detail and fragmentation. Intermediate k values yield the most coherent and meaningful representation of spatial eutrophication patterns.
3.2. Assessment of Clustering Quality
Evaluating clustering quality is a critical component of any unsupervised learning workflow, as it supports the identification of the number of clusters that yield reliable and meaningful data partitions. In this study, the evaluation was based on a combination of graphical interpretations and quantitative metrics, with the goal of minimizing subjectivity and strengthening the robustness of the results.
Graphical representations were produced to visualize the behavior of the k-means algorithm across different values of k.
Figure 3 presents the computed evaluation curves derived from the Elbow method, the Silhouette coefficient, the Calinski–Harabasz (CH) index, and the Davies–Bouldin (DBI) index.
In
Figure 3, Graph 1 corresponds to the Elbow plot, which illustrates the variation in within-cluster inertia (WCSS) as the number of clusters increases [
9,
31,
39]. As expected, inertia value decreases progressively with increasing k values. The “elbow” point indicates where further reductions in inertia become less significant, providing an indication of the optimal number of clusters.
Graph 2 in
Figure 3 presents the Silhouette index, which incorporates both cluster cohesion and separation [
9,
32]. The highest Silhouette values appear at k = 2, indicating strong clustering performance but limited spatial differentiation. For k = 5–6, the index reaches moderate and stable values, suggesting a more balanced structure that captures spatial variability without excessive fragmentation. At higher k values, the Silhouette score declines, reflecting reduced coherence and a tendency toward over-segmentation.
Graph 3 in
Figure 3 shows the Calinski–Harabasz (CH) index, which expresses the ratio between cluster dispersion and within-cluster compactness [
31]. Elevated CH values are observed for k > 5, with particularly high values at k = 10–11, indicating improved separability but also suggesting increasingly complex and fragmented cluster structures.
Graph 4 in
Figure 3 displays the Davies–Bouldin Index (DBI), which evaluates cluster similarity [
9,
31]. Lower DBI values, observed at k = 2–3, correspond to well-separated and well-defined clusters. Increasing DBI values indicate reduced cluster distinction and lower classification quality.
The combined interpretation of all evaluation metrics indicates that lower k values (k = 2–4) yield high cohesion and strong separation but provide insufficient spatial detail. Conversely, higher k values (k = 8–11) lead to overly fragmented classifications that reduce interpretability. The intermediate range (k = 5–7) achieves a more balanced outcome, preserving meaningful spatial differentiation while maintaining adequate cohesiveness.
Based on these evaluation indicators, the optimal number of clusters was determined to be k = 5, as this value provides the best compromise between cohesion and separation. The Elbow curve shows a pronounced reduction in inertia up to k = 5, after which the curve begins to flatten, indicating diminishing returns and an increased risk of over-fragmentation. Although the Silhouette index reaches its highest value at lower k, these correspond to overly generalized maps; at k = 5, Silhouette values remain acceptable, while they decline for k > 5. The Calinski–Harabasz index attains one of its highest values at k = 5, suggesting optimal separation relative to compactness. Finally, although the Davies–Bouldin Index is lowest at lower k values, it remains comparatively low at k = 5 before increasing substantially at higher values (k = 8–11). Taken together, all metrics converge on k = 5 as the most appropriate solution, achieving an effective balance between cluster cohesion, separation, and spatial interpretability.
Next, the relationships among the five clusters are further examined using a dendrogram and a heatmap (
Figure 4 and
Figure 5), providing additional insight into the distances among them and supporting the validity of the final classification.
Figure 4 presents the dendrogram, which hierarchically visualizes the relative distances between clusters and the sequence in which they merge [
40,
41]. The structure of the dendrogram shows that clusters 1 and 4 exhibit the smallest distance, indicating the highest degree of similarity, whereas cluster 2 appears more distinct from the remaining clusters.
Figure 5 shows the heatmap, which provides a complementary, quantitative representation of the pairwise distances between clusters [
42]. Darker colors correspond to smaller distances—such as those between clusters 1–4 and 4–5, while lighter shades indicate larger dissimilarities.
Overall, the combined evaluation of all clustering indices indicated that k = 5 constitutes the most suitable number of clusters, as it achieves an optimal balance between spatial detail, internal cohesion, and separation among groups. This conclusion is further supported by the spatial patterns derived from the cartographic interpretation (
Figure 6).
3.3. Interpretation of the Results
The application of the k-means algorithm produced five clusters, each representing a different level of eutrophication within the aquatic environment of Almyros Stream. This classification is based exclusively on remote sensing variables related to chlorophyll concentration and water optical properties, providing a comprehensive spatial depiction of the eutrophication.
As demonstrated in the preceding evaluation, k = 5 was identified as the most appropriate number of clusters. The following interpretation links the numerical and cartographic outputs to environmental characteristics associated with eutrophication, thereby assigning practical meaning to the clustering results and enhancing their usefulness as a monitoring tool for aquatic ecosystems.
3.3.1. Visual Representation
Cluster 5 represents a special case, as it does not correspond to an actual eutrophication level. Instead, it includes areas where the water mask failed to accurately isolate water pixels. The adjacency effect—namely, the influence of surrounding land on water pixels that leads to unreliable spectral responses—is a well-known limitation in remote sensing–based water quality studies [
43,
44]. As a result, this cluster contains several types of misclassified regions, including areas of high-water clarity (where shallow depths allow the streambed to be visible and the spectral signal is dominated by bottom reflectance), riparian vegetation [
45], and zones of strong turbulence [
46]. Such conditions commonly occur beneath bridges, near rocky formations at the dam outflow, and in locations where high flow velocities disrupt the optical signal from the water column.
In some cases, non-water surfaces were also assigned to this cluster. For example, bridge structures were misclassified due to their metal surface reflecting light in a manner distinctly different from water [
38]. Visual inspection indicates that these artefacts are particularly concentrated near the stream mouth, where shallow water depth and complex lighting conditions increase the likelihood of misclassification [
46].
The classification of cluster 5 as an “artefact” category is therefore supported primarily by visual assessment. Features such as shallow water, vegetated banks, turbulent flow, and artificial surfaces (e.g., bridges) cannot be reliably characterized using remote sensing indices alone. The corresponding images (
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11) illustrate these conditions and clearly demonstrate the distinct nature of this cluster.
Before proceeding with the detailed interpretation of the clusters generated by the algorithm, it is necessary to outline certain assumptions that provide a conceptual framework for interpreting the results. In areas with greater water depth and higher flow velocities, chlorophyll concentrations are generally expected to be lower, as continuous water renewal and increased turbulence inhibit phytoplankton growth. Similarly, lower chlorophyll values are anticipated in the estuarine zone, where mixing with seawater typically creates conditions that are less favorable for eutrophication [
47,
48,
49].
Conversely, elevated chlorophyll concentrations are more likely to occur in shallow waters or in places with low velocity flow. Such stagnant or weakly flowing conditions restrict water renewal, creating an environment conducive to increased chlorophyll production and the development of eutrophication [
50]. These assumptions provide an interpretative framework that can be meaningfully combined with the numerical and cartographic outputs of the clustering process, as demonstrated by the spatial correspondence between cluster distribution patterns, trophic scores, and the hydrological conditions observed in the results.
For clarity of interpretation, the five clusters were grouped into four eutrophication levels (1 = low, 4 = high), while Cluster 5 was designated as an artefact (
Table 1).
3.3.2. Chlorophyll-Based Representation
To quantitatively depict eutrophication levels, a composite index—referred to as the Trophic Score
(TS)—was calculated for each cluster (
Table 2). The
TS was defined as the arithmetic mean of the normalized remote sensing indices, as expressed in Equation (6).
where
Ii represents the normalized value of the
i-th remote-sensing index, and n is the total number of indices used. This composite formulation follows a commonly used approach in remote-sensing–based water quality studies, where normalized indices are equally weighted to derive an integrated indicator [
34,
51]. This procedure consisted of several steps designed to ensure an objective and balanced comparison among clusters.
In the first step, the mean value of each remote-sensing index was computed for every cluster. These indices capture the spectral signatures associated with chlorophyll presence and collectively describe the optical profile of each cluster. The resulting average values provide a concise representation of the trophic characteristics of the water pixels assigned to each group. In the subsequent step, the normalized index values were aggregated into a single composite measure by calculating their mean value. This process yielded the
TS (
Table 2), an integrated metric that expresses the trophic intensity of each cluster by equally incorporating information from all indices. This approach reduces dependence on any single index and provides a clearer, more robust assessment of eutrophication levels.
Based on the
TS (
Table 2), each cluster was assigned to one of four environmental categories, ranging from 1 (low eutrophication) to 4 (high eutrophication). Specifically, Cluster 2 exhibited near-zero index value and the lowest
TS, placing it in Level 1. Cluster 1 showed moderately elevated index values, corresponding to Level 3, while Cluster 4 exhibited the highest index values and the highest
TS and was therefore classified as Level 4 (
Table 2).
3.3.3. Representation Based on Distance
The cluster distance analysis constituted the next step in understanding the relationships among clusters and in validating the proposed environmental classification. Distances (
Figure 4 and
Figure 5) were calculated using the cluster centroids, providing a measure of similarity and dissimilarity among the groups [
40,
41]. These distances were not used to redefine the classification but rather to confirm the distinction among the proposed categories and to document the relative positioning of each cluster within the overall trophic scheme. In this sense, they serve as a complementary tool to the
TS, highlighting which clusters are closely related and which are clearly differentiated, thereby offering a more comprehensive understanding of the classification structure.
The results in
Table 3 indicate that cluster 1 is relatively close to cluster 4, suggesting that both represent the higher end of the eutrophication gradient, with cluster 4 corresponding to the most intense conditions. Cluster 3 occupies an intermediate position, forming a bridge between cluster 1 and cluster 2, which exhibits the lowest trophic values. This pattern identifies cluster 3 as a transitional category, connecting low to moderate eutrophication levels. By contrast, the largest distance is observed between clusters 2 and 4, confirming the strong contrast between the least and most eutrophic conditions within the stream system. Accordingly, the numerical labels 1–4 are interpreted as environmental classes consistent with the previous classification scheme (1 = low, 4 = high).
3.3.4. Environmental Characterization of Clusters
To convert the numerical output of the k-means algorithm into qualitative eutrophication categories, a two-step procedure was implemented: (a) an initial ranking based on the TS, and (b) confirmation or refinement of the classification using the distances among clusters.
- (a)
Initial ranking based on TS values
The TS was calculated for all clusters (excluding cluster 5, which was identified as an artefact). The four valid clusters were then ranked from lowest to highest trophic intensity according to their mean TS values:
TS2 = 0.0023
TS3 = 1.0990
TS1 = 1.3578
TS4 = 1.4235
This yields the following order:
To derive the corresponding eutrophication classes, linear interpolation was applied to generate three classes (
n = 3), corresponding to Low, Medium, and High eutrophication levels, as shown in
Table 4. At this stage, the Medium class represents an intermediate trophic condition; its subsequent subdivision into Medium and Medium–High levels was performed later, based on inter-cluster distance analysis. These class thresholds provided the initial boundaries for assigning each cluster to one of the four eutrophication levels.
- (b)
Confirmation using distances
The distance-based analysis provided an additional validation of the environmental classification derived from the
TS (
Table 5). The results show that Cluster 2 is clearly isolated, exhibiting both very low index values and large distances from the other clusters. This confirms its classification as Low eutrophication (
Table 5). At the opposite end of the gradient, Cluster 4 displays the greatest dissimilarity from the remaining clusters and is positioned at the upper extreme of the distribution (
Table 5). This supports its classification as High, representing the most eutrophic conditions. Cluster 1 occupies an intermediate position, showing moderate distances from both Cluster 3 and Cluster 4 (
Table 5). When combined with its TS, this pattern indicates a medium eutrophication level, with a tendency toward higher trophic intensity. Cluster 3 is between Cluster 2 and the group formed by Clusters 1 and 4 (
Table 5). Although its
TS places it numerically closer to lower levels, the distance analysis reveals stronger affinity with the medium-to-high portion of the gradient. This converging evidence supports its final classification as Medium, identifying it as a transitional cluster connecting lower and higher eutrophication conditions. Finally, Cluster 5, which was shown to consist primarily of artefacts (e.g., shallow water, riparian vegetation, turbulent flow, reflective surfaces), was confirmed as a non-valid category and excluded from the environmental classification (
Table 5).
In this way, the evaluation process did more than simply confirm the presence of distinct classes; it also served a corrective role—particularly for Cluster 3—ensuring a more balanced and reliable environmental classification. The combined interpretation of the
TS and the inter-cluster distance relationships allowed the clusters to be robustly characterized.
Table 6 summarizes the environmental classification derived from the integrated evaluation of cluster distances and the
TS, consistent with the previous classification scheme.
The percentage distribution of the clusters produced by the k-means analysis is presented in
Figure 12. The largest proportion of the stream area is represented by Cluster 1 (46.85%) and Cluster 4 (35.40%), corresponding to medium and high eutrophication levels, respectively. In contrast, Clusters 2 and 3, which reflect lower-intensity conditions, occupy substantially smaller areas accounting for 5.26% and 5.22% of the stream surface. Finally, Cluster 5 (7.27%), classified as an artefact, corresponds to pixels that were not correctly identified as part of the water body (
Figure 12).
Figure 13 presents the spatial distribution of eutrophication classes obtained from k-means clustering. Water surfaces are assigned to four trophic levels, and Cluster 5 is excluded as an artefact class.
To characterize the clusters based on spectral indices, the mean and standard deviation (SD) values of the four spectral indices (MCI, NDAI, NDCI and CI) were calculated for the four identified clusters (
Table 7). Cluster 1 exhibits relatively high mean values across all indices (MCI = 1.03, NDAI = 1.23, NDCI = 1.17, CI = 1.99) with consistently low SDs, indicating homogeneous spectral behavior and the presence of algal biomass. Cluster 2 presents extremely low mean values across all indices (near zero) with the smallest variability, indicating the absence of chlorophyll. Cluster 3 demonstrates intermediate index values (e.g., MCI = 1.17) but relatively lower NDAI (0.60) and NDCI (0.89), accompanied by slightly higher SDs across all indices. This heterogeneous spectral composition may represent transitional or mixed zones such as shallow or turbid waters, partially vegetated surfaces, or regions with variable biomass density. Finally, cluster 4 is characterized by the highest mean values of NDAI (1.33), NDCI (1.30), and CI (2.02) and the low SDs. These suggest that cluster 4 corresponds to highly productive and spectrally stable regions. The consistency across all indices implies the presence of algal blooms, typically corresponding to areas of elevated trophic status and biological activity.
4. Discussion
Compared to traditional point-based measurements or single-index approaches, the multidimensional dataset employed here captures eutrophication as a spatially continuous and multivariate process, allowing subtle but environmentally meaningful patterns to emerge.
The clustering analysis for k = 5 revealed five distinct spectral patterns, each corresponding to different eutrophication levels and water-quality conditions in the Almyros Stream (
Figure 1). The clusters displayed a coherent spatial organization, confirming the strong relationship between spectral variability and eutrophication. Cluster 2 was located primarily in the estuarine zone, where spectral indicator values were consistently low (
Figure 6). This reflects areas with clearer water, reduced phytoplankton presence, and limited nutrient concentrations (
Figure 12 and
Figure 13). Cluster 3 exhibited moderate indicator values and represents a transitional zone in which intermediate spectral changes occur, likely driven by localized biological activity or inputs from nearby agricultural land (
Figure 6). These areas likely correspond to the mixing of relatively clear water with slightly enriched or mildly affected water masses (
Figure 12 and
Figure 13). Clusters 1 and 4 presented elevated chlorophyll-related values, corresponding to medium and high eutrophication levels, respectively (
Figure 6). Spatially, these clusters were associated with areas influenced by agricultural runoff, reinforcing the link between eutrophication intensity and anthropogenic activities (
Figure 12 and
Figure 13). The spatial distribution of these clusters provides essential insights into the hydrological and ecological functioning of the stream. In the eastern sector, near the barrier where water depth is greater, Cluster 1 predominates, indicating medium eutrophication levels (
Figure 12 and
Figure 13). In the central part of the stream, Cluster 4 becomes more prevalent, highlighting high trophic values. This pattern is likely related to shallower, more stagnant conditions—factors known to promote phytoplankton growth (
Figure 12 and
Figure 13).
Along the stream banks, Cluster 4 appears frequently, whereas the central channel is mainly characterized by Cluster 1, suggesting the coexistence of different environmental regimes within the stream. These variations may be explained by local hydrodynamic differences and the influence of external nutrient sources. The cluster distribution confirms that areas with restricted flow are more vulnerable to eutrophication. In contrast, the estuary shows a sharp shift in pattern, dominated by Clusters 2 and 3, which correspond to low or moderate eutrophication levels. This is consistent with the mixing of freshwater with seawater and with salinization processes that typically inhibit chlorophyll accumulation. Overall, the predominance of Clusters 1 and 4 suggests that the stream exhibits medium to high levels of eutrophication. The widespread presence of medium values and the significant extent of high values indicate that, despite localized variations, the system maintains generally elevated trophic conditions. The limited spatial coverage of low-eutrophication areas (Clusters 2 and 3) is insufficient to influence the overall environmental characterization of the stream.
Beyond the qualitative interpretation of the spatial distribution of clusters, a quantitative synthesis was required to support their environmental classification. Within this framework, the TS proved to be a key interpretative tool, bridging the gap between unsupervised spectral clustering and environmental assessment. By integrating multiple chlorophyll-sensitive indices into a single composite metric, the TS reduced the influence of individual index variability and enabled a robust ranking of clusters along a relative eutrophication gradient. This facilitated objective comparison between spatially distinct water areas and strengthened the environmental meaning of the clustering results. The agreement between TS-based classification, cluster-distance relationships, and known hydrological and anthropogenic controls [
15,
16,
18] further supports the validity of TS as an effective proxy for relative trophic intensity in UAV-based stream and river monitoring.
The results are consistent with previous studies demonstrating that methodologies using multispectral imagery from unmanned aerial vehicles (UAVs) are effective for high-resolution mapping of similar ecosystems [
52], particularly for generating spatial distribution maps of chlorophyll-a (Chl-a) [
53]. UAV-based approaches address the limitations of traditional in situ sampling, which often cannot capture complete spatial distributions, conduct large-area surveys within a single tidal or hydrological phase, or adequately represent long-term trends [
54]. Other studies have shown that the application of the k-means clustering algorithm to Chl-a time-series data is effective for detecting potential impacts of external drivers on chlorophyll dynamics [
55].
Beyond aquatic ecosystems, k-means clustering applied to vegetation indices has also been widely and successfully used in agricultural research. For example, Shi et al. (2024) [
56], combined UAV-derived NDVI and other vegetation indices with k-means clustering to delineate management zones in crop fields, enabling targeted fertilization and variable management practices. This approach has been validated for corn and soybean fields, where clustering based on NDVI and terrain attributes facilitated effective zoning and management. Moreover, Ferro et al. (2023) [
57] applied k-means clustering to multispectral UAV images and vegetation indices (NDVI, NDRE, GNDVI, MSAVI) to identify zones of low, medium, and high vegetative vigor, supporting agronomic decision-making and yield prediction. Finally, Krklješ et al. (2025) [
58], defined compact zones in blueberry orchards by applying k-means clustering to NDVI derived from UAV multispectral orthomosaics, thereby supporting optimized soil sampling and orchard management.
Despite its effectiveness, the methodology also presents limitations. The presence of artefacts—represented by Cluster 5—may affect classification, as these areas cannot be fully characterized by spectral indicators alone. In the present study, indicators related to chlorophyll-a and eutrophication were not measured in situ but were indirectly estimated using UAV-based multispectral data and established remote sensing indices. Nevertheless, the findings are consistent with the spatial and hydrological characteristics of the study area as shown in the previously published work of Kokinou et al. (2023) [
15]. Although no dedicated in-situ sampling campaign was carried out concurrently with the UAV survey, the plausibility of the k-means clustering results can be indirectly evaluated against the findings of [
15], who performed a detailed spatiotemporal environmental assessment of the same Almyros karst system. In their study, monthly measurements of physicochemical parameters, nutrients and photosynthetic pigments, complemented by geophysical (spectral induced polarization) analyses and GIS mapping, revealed clear spatial gradients in water quality and identified specific pressure hotspots associated with agricultural activity, industrial infrastructure (power plant, desalination plant) and mixed land uses within the wetland and along the stream. When the spatial distribution of trophic clusters identified in the present work is compared qualitatively with the zones of degraded water quality and elevated pigment or nutrient levels reported by [
15], a strong correspondence emerges: areas classified here as “Medium–High” and “High” eutrophication tend to coincide with sectors that their work characterizes as environmentally stressed, whereas the “Low” and “Medium” clusters dominate in reaches where their measurements indicate comparatively better water status or stronger dilution effects. This agreement between independent, chemically and geophysical-based assessments and the unsupervised spectral partitioning applied here provides an important indirect validation of the clustering scheme and supports its interpretation as a meaningful representation of relative eutrophication levels along the Almyros Stream.
In conclusion, the gradation of clusters illustrates a spatial escalation of eutrophication from upstream to downstream, following the distribution of pollution sources and prevailing hydrological conditions. The internal cohesion and separation of the clusters underscore the stability of the k-means algorithm and its capacity to reveal essential environmental patterns.
Future studies could:
Incorporate concurrent field sampling of chlorophyll-a, nutrients, and physicochemical parameters during UAV campaigns. Such data would enable direct quantitative validation of remotely sensed eutrophication classes and support the calibration of spectral indicators under varying hydrological conditions.
Extend the methodology to repeated UAV surveys across different seasons and hydrological states would allow investigation of temporal dynamics in eutrophication patterns. This would enhance understanding of seasonal drivers, episodic nutrient inputs, and the persistence or variability of identified trophic hotspots.
Compare the performance of k-means with other unsupervised, semi-supervised, or hybrid machine-learning methods to better capture complex spectral–environmental relationships.
Apply the proposed framework to rivers and streams with different geomorphological, climatic, and optical properties. This would help assess its generalizability. Comparative studies across multiple sites could identify system-specific adaptations and support the development of standardized UAV-based eutrophication monitoring protocols.