1. Introduction
In Mediterranean river basins—many of which are already closed or rapidly approaching closure—the imbalance between water supply and demand is intensifying. This structural pressure is further exacerbated by growing competition among users, accelerating climate variability, and persistent governance shortcomings, all of which increase the potential for conflict over resource allocation. Within this context, demand-side management policies, especially water pricing, are increasingly viewed as central to achieving sustainable outcomes. Recognized by Article 9 of the European Water Framework Directive (WFD), water pricing aims to promote efficiency by reflecting the true economic and environmental value of water. Implementing effective pricing strategies in Mediterranean contexts is thus not only an economic imperative but also a critical pathway toward ecological resilience and institutional reform [
1,
2,
3,
4,
5,
6].
Irrigation water management is critical across Europe, where irrigated agriculture can account for over 80% of total water use in some regions [
7,
8,
9,
10]. In response to growing environmental and resource pressures, the European Union introduced the Water Framework Directive (2000/60/EC), which promotes river basin-level planning, full cost recovery, and environmental protection as guiding principles for sustainable water governance.
Despite transposing the Directive into national legislation, Greece continues to face significant challenges in implementing effective irrigation water pricing policies. Greek irrigation governance has historically relied on broad, horizontal measures, which have proven inadequate in addressing local complexities. One major challenge lies in the performance of Local Organizations for Land Improvement (LOLIs)—the entities responsible for managing much of the country’s irrigation infrastructure. Nearly 75% of the 416 LOLIs reportedly operate at a financial loss, failing to recover even basic operational costs. This persistent fiscal imbalance undermines infrastructure maintenance, long-term water availability, and compliance with EU directives.
Irrigation water pricing remains a contentious issue, sitting at the intersection of economic efficiency and equitable access. While numerous theoretical and empirical studies have explored pricing mechanisms, their real-world implementation is often constrained by political interests, institutional weakness, and administrative inertia. For example, Cornish et al. [
3] and Johansson et al. [
4] highlight the limited cost recovery and institutional bottlenecks that hinder reform efforts. Davidson et al. [
5,
6,
7] further stress how subsidy distortions and governance failures inhibit progress, particularly in developing and decentralized systems.
More recent research advocates for context-specific, climate-adaptive pricing models. Cortignani et al. [
8] and Gohar et al. [
9] offer regional case studies, while Dagnino et al. [
10] and Gómez-Ramos et al. [
11] develop empirical frameworks for policy evaluation in semi-arid regions. Debele et al. [
12] and Butinelli et al. [
13] propose dynamic and integrated pricing approaches, and several studies [
14,
15,
16] call for equity-adjusted volumetric pricing in smallholder systems.
Nevertheless, conventional econometric models still dominate the field, often assuming homogeneity across irrigation entities and overlooking significant performance differences. The use of unsupervised machine learning techniques, particularly clustering algorithms, remains scarce in irrigation pricing research [
17,
18]. Yet, methods such as K-means, Partitioning Around Medoids (PAM), and hierarchical clustering have the potential to uncover hidden patterns across financial, operational, and institutional variables [
19,
20,
21,
22,
23]. These approaches have been increasingly applied in adaptive water governance worldwide [
22,
23,
24,
25,
26,
27], and their relevance in the European context is gaining recognition [
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40].
In the Greek context, the existing literature has largely focused on national-level pricing models and contingent valuation methods [
41,
42,
43,
44,
45,
46,
47,
48,
49,
50]. Early work by Mallios and Latinopoulos [
41] demonstrated that willingness to pay is shaped not only by economic factors (e.g., crop type) but also by social and demographic characteristics. Latinopoulos et al. [
42] employed multi-criteria analysis to show that pricing above €0.10/m
3 could significantly reduce farm income. Other studies, such as those by Kampas et al. [
43] and Dercas et al. [
44], have linked irrigation pricing to broader agricultural policy, highlighting unintended effects of EU subsidy reforms [
37] and the importance of structural consolidation of smaller LOLIs. Scholars consistently report low cost recovery rates—often below 2%—and stress the urgency of governance reform [
41,
42,
43,
44,
45,
46,
47,
48,
49,
50].
Building on this foundation, recent efforts emphasize digital innovation and adaptive pricing. Vasilaki [
45] advocates for two-part tariffs that balance fairness and efficiency, while Kalavrouziotis et al. [
46] stresses the importance of remote sensing and smart metering. Papadopoulou et al. [
50] propose integrating water–energy–food nexus thinking into pricing design.
Despite the breadth of this research, a significant gap remains in the application of data-driven, exploratory methods that can account for the operational heterogeneity of local irrigation entities. No study to date has used unsupervised machine learning to classify Greek LOLIs based on performance, financial viability, and institutional characteristics. This gap is particularly notable given the fragmented and diverse nature of irrigation governance in Greece.
More analytically, the primary goal of this study is to evaluate the organizational structure, financial sustainability, and spatial distribution of Local Organizations for Land Improvement (LOLIs) in Greece, with a particular emphasis on the pricing of irrigation water. The study aims to uncover systemic disparities and provide evidence-based insights to inform the development of more effective, equitable, and regionally responsive water pricing policies, in alignment with environmental conditions, institutional capacity, and the sustainability objectives of the European Union. To achieve this goal, the study pursues the following specific objectives:
To analyze the organizational structure of LOLIs across various regions, identifying institutional strengths, weaknesses, and variations in governance models.
To assess the financial sustainability of LOLIs, with a focus on cost recovery mechanisms, revenue generation practices, and long-term operational viability.
To examine the spatial distribution of LOLIs in relation to environmental conditions, land use patterns, and the capacity of local institutions.
To evaluate current irrigation water pricing practices, focusing on key indicators such as cost per unit of water, irrigable land area, and actual water consumption.
To apply advanced clustering techniques that integrate economic and spatial data in order to detect underlying patterns, regional disparities, and performance differences in service delivery and pricing.
This study addresses the above objectives by applying unsupervised machine learning techniques—specifically K-means, PAM, and hierarchical clustering—combined with geospatial data and Principal Component Analysis (PCA). This approach enables the identification of naturally occurring clusters, revealing latent structures and patterns in LOLI performance that would be obscured by traditional regression-based or confirmatory methods. Unlike econometric models that test predefined hypotheses, clustering allows for exploratory analysis and the discovery of previously hidden groupings.
By doing so, this study offers a novel contribution to both Greek and international water governance scholarship. It introduces unsupervised learning into a field where such methods are underutilized, offering a data-driven basis for targeted and adaptive policy interventions. These findings move beyond the limitations of generalized, top-down approaches that have long dominated Greek irrigation policy.
The resulting classification supports more nuanced and context-sensitive reforms, aligned with the specific operational, financial, and geographic realities of different LOLI clusters. Moreover, the methodological framework is highly transferable to other European countries with decentralized irrigation systems facing similar implementation and equity challenges.
Ultimately, by integrating advanced computational techniques with institutional complexity, this study establishes a new analytical framework for sustainable irrigation water pricing reform in the face of climate variability and growing resource pressures.
2. Materials and Methods
2.1. Overview of Water Pricing Strategies in Greece
To analyze performance and sustainability across diverse irrigation contexts, this study incorporates a classification of local organizations for land improvement (LOLIs) using unsupervised learning techniques. Specifically, K-means and Partitioning Around Medoids (PAM) clustering algorithms are employed to segment LOLIs based on key financial and operational indicators. These include cost recovery rates, water use efficiency, maintenance expenditures, and geographic attributes. Clustering is particularly appropriate in this context because it enables the discovery of hidden patterns without assuming linear relationships or homogeneity, offering a more granular understanding of performance disparities [
51,
52,
53,
54,
55].
To contextualize the analysis, we first present a typology of water pricing strategies implemented across Greece (
Table 1), classified by geographic region and accompanied by their respective advantages and disadvantages, offering a foundation for interpreting the study’s findings.
The table provides a clear yet comprehensive overview of irrigation water pricing strategies implemented across diverse local contexts in Greece. It highlights how different pricing models are adapted to regional needs and institutional capacities, balancing the often-competing goals of simplicity, equity, efficiency, and sustainability. By outlining the advantages and disadvantages of each approach, it enables informed comparison and underscores the trade-offs policymakers must navigate when designing context-sensitive water pricing frameworks.
Among these models, volumetric pricing, which charges users based on actual water consumption, is the most widely used in agricultural areas. This strategy supports environmental and economic sustainability by incentivizing efficient water use and aligning with principles of cost recovery. However, its successful implementation hinges on the availability of reliable metering infrastructure and administrative capacity. These requirements can pose significant challenges, especially for small-scale farmers, potentially leading to inequities and reduced agricultural output if not accompanied by financial support or targeted subsidies.
The diversity of pricing models and their uneven implementation reveal the complexity and heterogeneity of irrigation water governance in Greece. To address this, the present study applies unsupervised machine learning techniques, specifically k-means and partitioning around medoids (PAM) clustering, to classify local organizations for land improvement (LOLIs) according to key financial and operational indicators. These methods allow for data-driven segmentation that uncover hidden performance patterns—such as cost recovery efficiency and infrastructure readiness—that traditional econometric approaches may overlook [
51,
52].
Clustering has proven effective in similar water governance studies where granular, evidence-based insights are essential for tailored policy interventions [
53,
54]. By integrating clustering with spatial and institutional data, this study contributes a novel analytical framework for identifying region-specific water pricing strategies. It enables decision-makers to prioritize investments, target reforms, and tailor support mechanisms to the distinct operational realities of different LOLIs.
In sum, this research advances both the empirical understanding of irrigation water pricing in Greece and the methodological toolkit available for water governance in the EU. The approach is scalable and transferable, offering a replicable model for analyzing performance disparities and supporting more effective, data-driven water policy design in diverse agricultural contexts.
2.2. Dataset—Preprocessing
The dataset used in this study is derived from the Second Update of the River Basin Management Plans, published by the Greek Ministry of Environment and Energy in December 2023. Although the plans were released recently, the data reflects conditions from the year 2021, resulting in a two-year reporting lag. Each River Basin District (RBD) was assigned to private civil and environmental engineering firms responsible for compiling and validating the data. Despite these efforts, significant challenges were encountered during data collection. Only 134 out of the 416 active LOLIs across Greece submitted complete financial and operational data, representing just 32% of the total. Furthermore, as of May 2025, the RBD of Eastern Macedonia (GR11) had yet to submit any financial report, leading to an unavoidable gap in coverage. Even among the submitted data, the quality varied. Some organizations still lack basic infrastructure such as water meters, resulting in approximate estimations of irrigation consumption. More than two decades after the enforcement of the EU Water Framework Directive (2000/60/EC), the lack of standardized monitoring practices in some regions remains a major limitation.
The dataset combines primary and secondary data sources. Key financial and operational variables—including total water volume, capital and functional costs, administrative expenses, total cost, cost per volume, income per volume, and percentage cost recovery—were extracted from the official financial report files titled ELX_2REV_P4.8_Oikonomiki-Analisi, where X denotes the specific River Basin District. The variable representing the irrigable area managed by each LOLI was sourced from the environmental report files ELX_2REV_P4.1_Pieseis. In a few cases where data was missing, values were cross-referenced and supplemented using records from the Ministry of Rural Development and Food. Geographic and categorical data such as the LOLI’s location coordinates, whether the organization is based on an island, or whether it borders another country, were manually compiled by the author based on the location of each LOLI’s administrative office, which typically lies within or near its irrigation area.
A thorough data cleaning process was applied to ensure the validity and comparability of the observations. R software was used throughout all stages of the analysis to ensure consistency and reproducibility. First, non-representative entities were excluded. These included GOLIs, Municipalities, and Public Domains of Drinking Water and Drainage (PDDWDs), as these entities are not directly responsible for distributing irrigation water to end users. Their cost structures often combine irrigation and potable water management, making it difficult to isolate relevant variables for this study. Additionally, several outliers were removed from the dataset. Two LOLIs reported zero income from users, while five recorded cost recovery rates below 20%. Conversely, two organizations showed recovery rates exceeding 1000%, and one organization—LOLI Erythropotamou—reported an exceptionally high cost per cubic meter of water (€0.50/m
3), a spike attributed to extraordinary repair costs following flood damage in 2021. These extreme values were considered unrepresentative of general conditions and thus excluded from analysis [
51,
52,
53,
54,
55].
The selection of the four core indicators—irrigable area, water volume, cost per volume, and percentage cost recovery—was driven by their relevance in capturing the operational scale, resource intensity, pricing efficiency, and financial sustainability of irrigation water management. Irrigable area and water volume reflect both the physical capacity and environmental burden of irrigation systems. Cost per volume serves as a proxy for how well actual costs are reflected in user tariffs, while percentage cost recovery indicates the financial viability of each Local Organization for Land Improvement (LOLI).
In addition, variables such as type of cultivation, geographic location (island vs. mainland), and proximity to national borders were included due to their structural and spatial significance. Different crop types impose varying water demands, with water-intensive crops more common on the mainland and drought-resistant crops dominating islands. Islands also face inherent water scarcity and higher energy costs, leading to elevated irrigation prices. Border regions, especially in northern Greece, contend with transboundary water dynamics, where upstream actions by neighboring countries affect downstream availability and infrastructure costs. These contextual variables enrich the analysis by capturing localized disparities that influence irrigation performance and pricing outcomes.
After cleaning, the final dataset includes 122 LOLIs from 12 River Basin Districts and 35 different prefectures, covering both mainland and island regions. It contains 20 standardized variables, rendering it statistically robust for the application of machine learning algorithms and exploratory data analysis. To prepare the dataset for clustering, several preprocessing steps were carried out. The variables selected for the clustering procedure were Cost per Volume, Percentage Cost Recovery, Water Volume, Irrigable Area, Island status, and whether the LOLI borders another country. Categorical variables—namely, Island and Bordering Country—were binary encoded (0 for No, 1 for Yes) to facilitate numerical processing.
All numerical variables were standardized using z-score transformation to ensure comparability. Standardization is crucial in clustering analysis, as differences in scale can cause variables with higher numeric ranges to dominate the distance metrics, such as Euclidean distance, which are commonly used in clustering algorithms. By scaling all variables to a standard normal distribution, the algorithm treats each variable equally during computation, thereby improving performance and accuracy. This preprocessing step follows established guidelines in the clustering literature, particularly the methodological frameworks outlined by Kaufman and Rousseeuw [
51] and Kassambara [
52]. Notably, no missing values remained in the final dataset, which ensured that the clustering analysis could proceed without the need for imputation or exclusion of additional observations.
In summary, the dataset represents a diverse and representative cross-section of irrigation management entities across Greece. The database has been thoroughly cleaned and standardized to ensure consistency and suitability for robust statistical and machine learning analyses. The data preparation process addressed both structural inconsistencies in the raw inputs and the specific technical requirements of the analytical methods employed in the study.
3. Results
3.1. K-Means Clustering
To determine the optimal number of clusters that best represent the diversity among Local organizations for land improvement (LOLI) in Greece, we applied the K-means clustering algorithm supported by multiple internal validity indices. These indices—such as the Silhouette score, Calinski-Harabasz criterion, and Davies-Bouldin index—provide a systematic basis for evaluating clustering quality and identifying the most meaningful segmentation structure. The number of votes each potential cluster solution received reflects how often it was favored across the different indices. The results are summarized in
Table 2 below.
The results imply that the existence of four distinct clusters among Greek LOLIs is strongly supported by multiple clustering validity indices, reinforcing the idea that these organizations face fundamentally different conditions and challenges. This provides a solid basis for designing region-specific water management strategies rather than applying uniform national policies. The strong consensus around four clusters also justifies targeted policy interventions, such as differentiated pricing schemes, tailored infrastructure investments, and context-specific governance models. Moreover, the spatial pattern suggested by the clustering aligns with environmental and geographic realities, implying that policy effectiveness can be significantly improved by considering local climate, water availability, and organizational performance. These findings support the shift toward data-driven, localized decision-making and emphasize the need for adaptive management strategies in agricultural water governance.
The silhouette plot provided in
Figure 1 displays the quality of the four-cluster k-means solution. The average silhouette width is 0.32, indicating moderate clustering quality. Most clusters (1, 2, and 3) show positive silhouette values, suggesting reasonable separation and cohesion. However, Cluster 4 contains several negative silhouette values, meaning some points may be misclassified or poorly matched to their assigned cluster.
These results imply that while the four-cluster model is generally supported, one cluster (Cluster 4) may contain ambiguous or overlapping observations, warranting further examination. This suggests the need for refinement of variables, the possible addition of contextual or geographic data, or the use of alternative clustering algorithms for improved separation. Nevertheless, the model remains practically useful, supporting differentiated strategies for LOLIs, but policymakers should be cautious about drawing overly rigid distinctions—particularly for clusters with borderline classification.
The cluster plot generated through principal component analysis (
Figure 2) illustrates the distribution of the four LOLI clusters in a two-dimensional space, with Dimensions 1 and 2 accounting for 62% of the total variance. Evidently, clusters 1 (red), 2 (green), and 4 (purple) are well-separated with clear groupings, while cluster 3 (blue) appears more dispersed and overlaps partially with other groups. One observation in cluster 4 (point 50) is an extreme outlier. The clear separation among most clusters supports the validity of the k-means solution and suggests that LOLIs differ significantly in their attributes. However, the overlap in cluster 3 and the presence of outliers imply that some LOLIs may not fit neatly into a single category. This underscores the need for cautious interpretation when applying cluster-based policies and highlights the potential value of hybrid or flexible management strategies, particularly in ambiguous or transitional cases. As visualized in the clustering graph, there is noticeable overlap between clusters 2 and 3, suggesting that some observations share similar characteristics and may not fit neatly into a single category.
3.2. PAM Clustering
To complement the K-means analysis and further validate the clustering structure of LOLI entities, the PAM algorithm was also applied. This robust clustering technique is particularly effective for datasets with non-spherical distributions or outliers, making it well-suited for capturing the nuanced operational differences among irrigation organizations. The number of clusters as illustrated in
Table 3 was again evaluated using a range of internal validation metrics, and the voting outcomes for each potential cluster solution are presented in
Table 2 below based on the results derived by PAM clustering.
The analysis of the optimal number of clusters using the NbClust package for the partitioning around medoids (PAM) algorithm indicates that four clusters are most consistently supported by internal validation indices. This suggests a natural segmentation in the dataset that aligns best with four distinct groupings. From a policy standpoint, these insights can inform more strategic resource allocation, targeted interventions, and long-term planning. When the data represent specific regions, user groups, or administrative entities, tailored policies can be designed to address the distinct needs and characteristics of each identified cluster. This approach improves both the efficiency and effectiveness of policy implementation by moving beyond uniform solutions and embracing data-driven differentiation. Additionally, the observation that alternative cluster counts—such as three or five—also show statistical validity suggests that policy frameworks should remain adaptive. Policymakers should be open to revisiting or refining cluster definitions if further contextual or qualitative information supports alternative groupings. Ultimately, integrating cluster analysis into the policymaking process enhances decision-making, improves service delivery, and ensures that strategies are better aligned with the complex structure of the underlying data.
The silhouette plot in
Figure 3, corresponding to the four-cluster solution, shows an average silhouette width of 0.29, indicating a relatively weak clustering structure. Silhouette values range from −1 to 1, with higher values reflecting more well-defined and cohesive clusters. In this plot, clusters 1 and 2 have generally higher silhouette values, indicating that their members are relatively well grouped and distinct from other clusters. In contrast, clusters 3 and 4 contain a noticeable number of observations with low or even negative silhouette values, pointing to potential overlap or misclassification. From a policy perspective, these results imply that while the four-cluster model provides some differentiation within the data, the distinctions between certain groups may not be strong. Therefore, any policy decisions or targeted interventions based on this clustering should be applied cautiously, especially for clusters with weak internal cohesion. It may be necessary to conduct further analysis, refine the clustering model, or consider supplementary data to ensure that policy actions are based on meaningful and reliable groupings. This cautious approach helps avoid misallocating resources or designing strategies that fail to address the nuanced differences within the population.
The cluster plot (
Figure 4) visually represents the distribution of four clusters in a two-dimensional space based on principal components that capture 62% of the data variance (Dim1: 39%, Dim2: 23%). The clusters show reasonably distinct groupings, with clusters 2 and 3 being more compact, while clusters 1 and 4 are more dispersed. A few outliers, such as observation 50 in cluster 4 and observation 106 in cluster 3, are distant from the core of their respective groups, indicating potential atypical cases that may require special attention. From a policy perspective, the clear separation of some clusters supports the viability of targeted strategies or interventions tailored to each group’s characteristics. However, the dispersion and outliers suggest that not all individuals or units within a cluster share homogeneous features. Therefore, while the four-cluster solution can guide differentiated policy development, it is also important to consider variability within clusters and to supplement clustering with individual-level assessments when precision is critical. This balance ensures that interventions are both broad enough to be practical and nuanced enough to be effective.
3.3. Hierarchical Clustering Algorithm
The last methodology employed in order to assess the optimal grouping structure of local organizations for land improvement (LOLIs), was hierarchical clustering as a complementary method to k-means and PAM. This approach does not require predefining the number of clusters and builds a nested hierarchy of partitions, making it especially useful for exploring multi-level patterns within the data. The following
Table 4 summarizes the suggested number of clusters based on various validity indices applied to the hierarchical algorithm, offering insights into the most consistent groupings observed across the clustering process.
Most of the indices (8) propose 4 clusters for the data. The bar chart showing the number of clusters suggested by NbClust for hierarchical clustering indicates that the majority of internal validation indices—specifically eight of them—support a four-cluster solution. This strong consensus suggests that the data naturally separate into four meaningful groups under a hierarchical clustering framework. The next most supported options, clusters 3 and 5, receive moderate endorsement, while other potential cluster counts receive minimal support. From a policy standpoint, this reinforces the robustness of using four clusters as a basis for structuring decisions, designing interventions, or segmenting populations. It allows policymakers to craft group-specific strategies that align with underlying patterns in the data. However, the presence of some support for alternative cluster numbers, especially 3 and 5, suggests that further contextual or qualitative input could help refine decisions where granularity matters. Overall, the results validate the four-cluster model as a sound and data-backed foundation for differentiated policy action.
The silhouette plot (
Figure 5) for the hierarchical clustering solution with four clusters shows an average silhouette width of 0.29, indicating a moderate and somewhat uncertain clustering structure. While clusters 2 and 3 exhibit reasonably high silhouette values for most of their members, clusters 1 and 4 include several observations with low or even negative silhouette values, suggesting poor cohesion within these clusters or overlap with other groups. These results imply that although a four-cluster solution is statistically supported, the actual separation among some clusters is weak. For policy design, this means that while a differentiated approach based on four segments is justifiable, it should be implemented with flexibility. Policies should allow for adaptive measures within clusters and consider supplementary diagnostics or data layers to improve targeting accuracy. Especially for populations in clusters with lower silhouette scores, further validation may be necessary before committing to specific interventions, ensuring resources are directed efficiently and equitably.
The cluster plot generated by principal component analysis (
Figure 6) for the hierarchical clustering solution displays four visually distinct groups in a two-dimensional space defined by the first two principal components, which together explain 62% of the total variance. Clusters 1, 2, and 4 are relatively compact and well-separated, suggesting strong internal cohesion and meaningful differentiation. Cluster 3, however, appears more centrally located and dispersed, overlapping partially with other groups, which aligns with the earlier silhouette analysis indicating weaker structure for this cluster. A few outliers, most notably observation 50 in cluster 1, are located far from their cluster’s core, raising potential concerns about their representativeness or classification. From a policy perspective, these results affirm the value of using four clusters to inform group-specific strategies, but they also call for caution. Specifically, interventions for cluster 3 may need additional nuance or support, and outliers like those in clusters 1 and 4 may warrant individual-level consideration. The overall structure supports differentiated policy design while emphasizing the importance of flexibility and further validation to ensure that group-based actions are both equitable and effective. Graphically, we have a similar pattern to the previous algorithms, and the reduction of dimensions does not show the differences among the observations. Clusters 2 and 3 seem to overlap.
The hierarchical clustering dendrogram (
Figure 7) reveals the structure of relationships among the observations, likely municipalities or geographic regions, based on similarity. In the dendrogram, Cluster 1 is depicted in red, Cluster 2 in green, Cluster 3 in blue, and Cluster 4 in purple, visually distinguishing the groupings identified by the k-means algorithm. The dendrogram has been cut to form four clusters, consistent with earlier analyses. The clusters are visually distinct, with relatively balanced sizes and a logical merging pattern as one moves up the tree. This suggests that the four-cluster solution effectively captures meaningful groupings in the data, reinforcing the outcome supported by both NbClust and silhouette analysis. From a policy standpoint, these clusters can guide regional planning, resource distribution, or targeted interventions by grouping areas with shared characteristics. Importantly, the dendrogram also allows for transparency and interpretability in how clusters were formed, which is valuable for justifying policy decisions. Policymakers can use this structure to prioritize cooperation within clusters and tailor policies to common needs, while also identifying regions that merge at higher heights—indicating greater dissimilarity—which may require more customized attention. This approach enhances the strategic relevance and fairness of policy design by aligning action with data-driven insights.
3.4. Model Performance Evaluations
The comparison of clustering algorithms using both the average silhouette score and the adjusted rand index (ARI) provides key insights into the internal cohesion and consistency of the clustering results. As shown in
Table 5, the K-means algorithm achieves the highest average silhouette score (0.32), indicating superior compactness and separation of clusters relative to PAM and hierarchical clustering, both of which score 0.29. This suggests that K-means offers slightly better-defined groupings among LOLI entities, supporting its effectiveness for segmenting water governance units with distinct operational characteristics.
Table 6, presenting the ARI, measures the agreement between different clustering solutions. As expected, each algorithm perfectly agrees with itself (ARI = 1). The k-means and PAM solutions show the highest mutual agreement (ARI = 0.7976), while hierarchical clustering has the lowest agreement with the others, especially with PAM (ARI = 0.6730). This indicates that k-means and PAM produce more similar cluster structures, reinforcing their relative consistency.
From a policy perspective on water pricing, these findings suggest that k-means clustering provides the most stable and interpretable segmentation of consumers or regions based on the available data. As water pricing policies increasingly aim for equity, efficiency, and sustainability, using a robust clustering model like k-means can help identify distinct consumer groups—such as high, moderate, and low users—based on behavior, geography, or infrastructure. These clusters can then form the basis of differentiated pricing strategies that promote conservation without disproportionately affecting vulnerable groups.
Given the relatively high agreement between k-means and PAM, either method could be used with confidence, though the slightly better silhouette performance of k-means makes it the preferred choice. Hierarchical clustering, despite its interpretability through dendrograms, shows weaker agreement and should be used primarily for exploratory analysis or to supplement decisions rather than to drive policy directly.
In practice, these clustering results can support tiered pricing models, targeted subsidies, or infrastructure investment prioritization by clearly delineating user groups with distinct water consumption patterns or service needs. Ensuring that pricing structures align with data-driven clusters enhances fairness and promotes more effective water resource management.
The PCA plots illustrate how the three clustering algorithms—k-means, PAM, and hierarchical—organize data into four clusters along the first two principal components. The k-means clustering result shows the most distinct and compact clusters, indicating clear separation between groups. PAM clustering produces a similar structure, though with slightly more overlap between clusters. Hierarchical clustering results in more dispersed and overlapping clusters, suggesting less precise grouping. These visual patterns confirm the earlier quantitative findings, where k-means achieved the highest silhouette score and strongest agreement with other methods. For water pricing policy, this analysis supports the use of k-means as the most reliable method for segmenting users or regions. Clusters derived from k-means can guide differentiated pricing strategies, such as charging higher rates for high-use groups while protecting low-consumption or vulnerable populations through targeted subsidies. Clearer clusters also support more focused infrastructure planning and resource allocation. In contrast, the less distinct structure from hierarchical clustering suggests it may be less suitable for direct pricing decisions but still useful for exploratory analysis. Overall, the clustering results provide a solid foundation for designing equitable, efficient, and data-driven water pricing policies.
According to the k-means algorithm, the LOLIs can be grouped into four different classes as illustrated in
Table 7.
To facilitate a more accurate interpretation of the clusters, descriptive statistics for the Greek LOLIs are provided in
Table 8.
Based on the results presented in
Table 8, several patterns emerge. Cluster 1 comprises small, island-based organizations with limited irrigable areas and low water volumes, yet relatively high water costs and strong cost recovery. Cluster 2 includes medium-sized systems with moderate water use, low cost per volume, and the highest cost recovery, suggesting efficient financial management. Cluster 3 reflects large-scale organizations with extensive irrigable areas and high water volumes, but low cost recovery, possibly due to underpricing or inefficiencies. Cluster 4 consists of intermediate systems with moderate scale and performance, characterized by low cost recovery and greater variability across key metrics. Overall, the findings reveal significant heterogeneity among LOLIs in terms of size, operational efficiency, and financial sustainability.
The next step in our analysis focused on presenting the characteristics of each cluster. Subsequently, a map of Greece was constructed (
Figure 8), serving as a valuable tool for visualizing the spatial distribution of LOLIs across the country based on the k-means clustering results.
Cluster 1 is concentrated in island regions like Crete and the Dodecanese, where water scarcity and high supply costs are common due to desalination and transport; despite small irrigable areas and high costs, moderate cost recovery suggests some pricing adaptation, warranting targeted infrastructure investment and localized policies. Cluster 2, scattered across northern and central mainland Greece, shows full cost recovery, low water costs, and efficient operations—likely linked to high-value crops—making these LOLIs strong candidates for replication and policy benchmarking. Cluster 3, found broadly across central and western Greece, manages large irrigable areas and low water costs but suffers from very poor cost recovery, pointing to governance and pricing weaknesses that require reforms in billing, tariffs, and subsidy dependence. Cluster 4, scattered near agriculturally intensive zones, uses large water volumes with low cost recovery, likely due to water-intensive farming; this calls for precision agriculture, tariff restructuring, and performance-linked subsidies to ensure long-term sustainability.
The map provided in
Figure 8 reveals several noteworthy spatial patterns. Crete, Greece’s largest island, is predominantly occupied by LOLIs classified under cluster 1, reflecting the region’s well-documented water scarcity and high supply costs. Western Greece is almost exclusively home to cluster 2 LOLIs, which aligns with its relatively high rainfall and efficient water management practices. In contrast, central and northern Greece are primarily associated with cluster 4, although some LOLIs from cluster 2 also appear in these areas, suggesting local variation in performance. Cluster 3 shows no clear geographic concentration, appearing sporadically across the country, indicating that its performance characteristics may stem more from institutional or operational differences than regional environmental factors.
The spatial visualization of clustering results reinforces the interpretation of performance patterns among irrigation organizations in Greece, revealing how environmental, institutional, and operational variables interact to shape outcomes. Cluster 1 is concentrated in the southern islands, particularly Crete, where persistent water scarcity and high supply costs—due to desalination, pumping, or transport—pose significant constraints on agricultural productivity. In these areas, high input costs can limit crop diversity and profitability, making it essential to promote drought-resistant or high-value crops and invest in localized, water-efficient technologies to maintain farm income. Cluster 2, found largely in water-abundant regions of western and central mainland Greece, benefits from natural rainfall, low irrigation costs, and efficient management. This creates favorable conditions for diverse, high-yield agriculture and stable farm incomes, highlighting the potential of these organizations to serve as models for good governance and best practices. Cluster 3 spans central and northern regions and is marked by large irrigable areas with low water prices but poor cost recovery, indicating misaligned pricing, weak revenue collection, and operational inefficiencies. These weaknesses jeopardize infrastructure sustainability and long-term agricultural viability, calling for reforms in cost-recovery mechanisms, performance-based subsidies, and technical support to strengthen institutional capacity. Cluster 4, more scattered and often aligned with zones of intensive agriculture and moderate climatic stress, is characterized by high water consumption and low financial returns. This pattern suggests a mismatch between water use and pricing, which risks overexploitation and diminishing returns in farming. Here, precision agriculture, tariff restructuring, and crop diversification strategies are critical to reversing inefficiencies and stabilizing income. Overall, while geography shapes access to water, it is organizational effectiveness and adaptive management that ultimately determine agricultural productivity and resilience. Water pricing policy in Greece must therefore evolve from uniform national models to regionally differentiated, data-driven frameworks that align with environmental conditions, support efficient water use, and protect farmer livelihoods. Integrating clustering insights into policy design will be essential for fostering sustainable irrigation systems and ensuring equitable agricultural development across diverse regions.
The geographic zoning helps confirm that regional water pricing strategies must be tailored to both local conditions and organizational performance. Uniform pricing policies would overlook the stark differences in water cost structures, usage patterns, and financial viability observed across regions. Therefore, water pricing reform in Greece should be cluster-informed, data-driven, and flexible enough to adapt to regional agricultural demands, infrastructure limitations, and environmental pressures.
Based on the above, water pricing policy in Greece must transition from standardized, national approaches to a differentiated, cluster-informed framework. Policies should reward organizational efficiency, correct structural imbalances, and ensure sustainable access to irrigation for farmers across diverse ecological and institutional settings. Integrating clustering analysis with environmental and spatial data enhances the capacity to design smart, adaptive water governance strategies that support both agricultural productivity and long-term resource sustainability.
4. Discussion
According to the k-means clustering results, four distinct groups of LOLIs in Greece exhibit clear spatial and operational patterns that have direct implications for irrigation water pricing. The clustering analysis, supported by spatial visualization, shows that water pricing is notably higher in island regions—especially Crete—classified under cluster 1. These areas face chronic water scarcity, low rainfall, and high exposure to drought conditions. Similar to the situation in Cyprus [
26], where limited water resources demand costly solutions such as desalination, Greek island LOLIs incur high operational costs. Despite these challenges, Cluster 1 achieves relatively high cost recovery, suggesting effective pricing mechanisms or financial balancing through subsidies. This underscores the importance of designing policies that reflect geographic and environmental realities, rather than applying one-size-fits-all pricing frameworks.
In contrast, clusters 3 and 4 consist of LOLIs (local irrigation organizations) that oversee large irrigable areas or manage significant volumes of irrigation water but have very low cost recovery, even though they benefit from lower per-unit water costs. This is counterintuitive when measured against the classical economy of scale principle, which posits that managing larger systems should lead to lower average costs through increased operational efficiency. The divergence from this theory suggests structural inefficiencies in water governance, possible underpricing, weak billing and collection mechanisms, or inadequate enforcement of tariffs. These findings are especially striking when compared to large-scale irrigation projects such as Alqueva in Portugal, where integrated infrastructure development has resulted in cost-effective water delivery and transformative impacts on agricultural productivity [
27]. The Greek context, by contrast, points to a pressing need for modernization and policy intervention in order to replicate similar success stories.
Cluster 2, comprising financially viable LOLIs located primarily in Western Greece, offers a model for sustainable irrigation governance. These regions benefit from the highest levels of natural rainfall, particularly due to the influence of the Pindus mountain range, which contributes to the abundance of freshwater resources. The combination of natural water availability and efficient management allows these LOLIs to maintain low water costs while achieving full or near-full cost recovery. This success reflects a convergence of environmental suitability and institutional capability. It also provides a clear example of how natural capital—when paired with good governance—can yield economically and environmentally sustainable outcomes. The practices observed in cluster 2 could serve as a benchmark for performance improvements in other regions.
Although machine learning tools like k-means clustering offer significant analytical power, this study reveals that spatial context remains essential in understanding and interpreting the results. The clusters align strongly with geographic patterns, reaffirming Waldo Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things.” [
28] In this case, proximity to water sources, climate zones, and environmental stressors proves more predictive of LOLI performance than clustering alone. These spatial factors must therefore be embedded into any policy framework aiming to reform irrigation water pricing.
The clustering analysis of local organizations for land improvement offers critical insights into the operational diversity and financial sustainability of irrigation governance in Greece. By categorizing organizations according to variables such as cost per volume, irrigable area, water consumption, and cost recovery, the analysis reveals four distinct performance profiles, each with unique challenges and implications for agricultural water management and policy design.
Cluster 1, comprising primarily island-based organizations—most notably in Crete and the southern Aegean—is characterized by high water costs and small irrigable areas. These conditions reflect the acute geographical constraints faced in these regions, including limited freshwater availability, dependence on desalination, or costly transport infrastructure. Despite elevated costs, these organizations maintain moderate cost recovery rates, likely through targeted subsidies or adjusted pricing. For agriculture, this means that farmers operate under high input costs, potentially limiting crop choices to high-value, low-water crops or reducing profit margins. Policy interventions should focus on improving the efficiency of financial support mechanisms, investing in decentralized water-saving technologies, and exploring alternative sources, such as treated wastewater, to lower long-term costs and increase resilience.
Cluster 2 represents the most financially sustainable group, composed of organizations with low cost per volume, small irrigable areas, and full cost recovery. These entities are primarily found in northern and central mainland Greece—regions with higher rainfall and more stable water availability. Their success likely stems from effective governance, modernized infrastructure, or alignment with profitable, water-efficient crops. These LOLIs serve as benchmarks for best practices in irrigation management and agricultural sustainability. For the farming sector, these conditions support lower production costs and potentially more competitive market positioning. Policies should aim to preserve their financial autonomy, encourage cross-regional knowledge transfer, and introduce innovative incentives to maintain their strong performance.
Cluster 3 includes organizations with extensive irrigable areas and low water costs but exhibits critically low cost recovery. This group is geographically widespread, with a noticeable concentration in central and western Greece. While these organizations benefit from scale and natural water abundance, their financial instability suggests underpricing, poor enforcement of payment, or heavy reliance on subsidies. The inconsistency in performance across these groups—reflected in silhouette scores—indicates fragmented governance or uneven policy application. For agriculture, this raises risks related to the long-term availability and reliability of irrigation services, particularly if infrastructure degrades due to underinvestment. Recommended policies include rationalizing tariff structures to better reflect usage costs, improving billing and collection systems, and linking subsidies to performance metrics and efficiency improvements.
Cluster 4 includes organizations with high water consumption, average cost per volume, and lowcost recovery. These entities often manage large-scale irrigation systems or support water-intensive crops yet fail to achieve financial sustainability. Geographically, they are scattered across northeastern and coastal regions with intermediate rainfall and periodic droughts. For agricultural operations, this situation can lead to inefficiencies, water overuse, and eventual constraints on productivity if financial deficits impede infrastructure maintenance. Policy should target the adoption of precision irrigation technologies, promote crop diversification toward water-efficient alternatives, and implement progressive pricing models that incentivize conservation.
The analysis indicates that proximity to national borders is not a significant determinant of the performance of local organizations for land improvement, highlighting the greater importance of economic structures, operational efficiency, and institutional capacity over simple geographic location. However, spatial visualization of the clusters reveals regional concentrations that align with patterns of rainfall, drought exposure, and agricultural activity. This suggests that while geography alone may not drive outcomes, environmental conditions in conjunction with operational realities shape distinct challenges and opportunities for irrigation governance. To deepen understanding and refine policy responses, future analyses should incorporate contextual agricultural variables such as dominant crop types, irrigation technology use, local income levels, and regional policy implementation practices.
The agricultural implications of these findings are substantial. In regions where water is expensive and infrastructure is underdeveloped—such as Cluster 1—farmers face higher production costs, which can erode income margins, limit crop choices, and increase vulnerability to climate-related risks. Without targeted support, producers in these areas may be unable to compete with counterparts in water-abundant regions, leading to regional disparities in agricultural viability. In contrast, areas classified under Cluster 2 benefit from efficient water management and low input costs, enhancing both farm profitability and economic resilience. Meanwhile, Clusters 3 and 4, which suffer from low cost recovery and operational inefficiencies despite managing large irrigated areas, present a risk of long-term financial unsustainability. If left unaddressed, this could result in the deterioration of irrigation services, reduced agricultural output, and ultimately, lower farmer incomes.
These disparities must be addressed through differentiated, regionally adaptive water pricing models that reflect not only environmental realities but also the socio-economic conditions of farming communities. Policymakers should avoid uniform national tariffs and instead implement pricing structures that are sensitive to local cost structures and crop value chains. In high-cost, water-scarce areas, for example, subsidies should be performance-based—targeted toward efficiency-enhancing technologies and conservation practices rather than across-the-board financial aid. In regions with low cost recovery, reforms should focus on improving billing systems, incentivizing timely payments, and linking financial support to governance improvements. Additionally, farm-level technical assistance should be expanded to promote crop diversification and the adoption of precision irrigation practices that reduce water waste and improve profitability.
Beyond pricing, strategic public investments are needed to modernize irrigation infrastructure, particularly in clusters with structural inefficiencies. This includes expanding metering systems, rehabilitating distribution networks, and supporting the deployment of digital monitoring tools. These efforts would not only increase the financial sustainability of water organizations but also enhance transparency, reduce losses, and improve service reliability, benefiting farmers directly through lower risks and higher yields.
In summary, water pricing policy in Greece must evolve from a generalized model to a more nuanced, data-informed strategy. By leveraging clustering insights and integrating agricultural and environmental variables, policymakers can design targeted interventions that ensure both the financial viability of irrigation organizations and the economic sustainability of the agricultural sector. A balanced mix of regionally adjusted tariffs, performance-based subsidies, infrastructure investment, and agronomic support will be key to strengthening rural livelihoods, securing water resources, and building a more resilient and equitable agricultural system across Greece.
The Water Framework Directive 2000/60/EC obliges EU member states to ensure adequate cost recovery in water services. The evidence from this study demonstrates that fulfilling this directive requires not just legal compliance but the development of policies that are responsive to local realities. The Greek government must transition from broad, national-level pricing strategies to targeted, region-specific policies that account for climate variability, water availability, and agricultural practices. Infrastructure investments modeled after the Alqueva project could reduce inefficiencies in clusters 3 and 4 by modernizing outdated systems and introducing smart technologies. Meanwhile, the cost recovery mechanisms seen in clusters 1 and 2 should be studied and adapted to less successful areas.
Achieving the Sustainable Development Goals (SDGs), particularly SDG 6 (Clean Water and Sanitation), SDG 12 (Responsible Consumption and Production), and SDG 13 (Climate Action), depends on the implementation of water policies that are both environmentally sustainable and socially equitable. A reformed water pricing system, grounded in data-driven cluster analysis and spatial intelligence, can promote efficient water use (contributing to SDG 12), ensure that water services are financially sustainable (supporting SDG 6), and build resilience to climate variability (advancing SDG 13). Additionally, integrating digital governance and transparent monitoring systems aligns with SDG 16 (Peace, Justice, and Strong Institutions) and SDG 9 (Industry, Innovation, and Infrastructure), enhancing accountability and innovation within water governance frameworks.
This study also highlights the importance of recognizing its own limitations. The clustering approach was based on selected financial and operational indicators, and regional disparities in data quality may affect the accuracy of conclusions. Future research should incorporate broader socio-economic, environmental, and institutional variables—such as land use, crop types, labor inputs, governance structures, and digital adoption rates—to create a more comprehensive understanding of LOLI performance. Moreover, the findings, while rooted in the Greek context, raise questions of broader European relevance. Comparative studies involving other EU countries with diverse water governance models could yield valuable insights and foster cross-border learning on best practices for sustainable irrigation management.
Ultimately, irrigation water pricing reform in Greece must move beyond uniform policies toward a differentiated and adaptive approach. It should integrate spatial and environmental data, reward efficiency, ensure cost recovery, and maintain affordability. Policymakers must embrace the interplay between technology, geography, and governance, recognizing that achieving sustainability, equity, and resilience in water management is not just a policy goal but a national imperative aligned with both European directives and global development targets.