Next Article in Journal
Assessing the Uptake of Toxic Elements by Brassica rapa and Associated Health Risks in Soils with Different Natural Background Levels
Previous Article in Journal
Influence of Rainfall on Urban Non-Point Source Pollution in Rivers from an Event-Based Perspective in Taihu Basin
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Water Resource Allocation: A Learning-Based Optimization Framework for Sustainable Decision-Making Under Uncertainty

1
Olid Laboratory, Faculty of Economics and Management of Sfax, University of Sfax, Sfax 3018, Tunisia
2
Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax 3018, Tunisia
3
RCM2+, Lusófona University, Campo Grande 376, 1749-024 Lisboa, Portugal
4
School of Applied Sciences, University of Campinas (UNICAMP), Campinas 13083-970, Brazil
*
Author to whom correspondence should be addressed.
Environments 2026, 13(2), 105; https://doi.org/10.3390/environments13020105
Submission received: 11 December 2025 / Revised: 8 February 2026 / Accepted: 9 February 2026 / Published: 13 February 2026

Abstract

Water allocation remains a critical global challenge due to increasing scarcity, competing sectoral demands, and environmental pressures, requiring approaches that balance efficiency, equity, and ecosystem sustainability while facing the inherent contextual uncertainty. Recent developments in operations research and statistical learning have paved the way for a new paradigm in nonlinear modeling under uncertainty, i.e., contextual optimization. This emerging framework seamlessly combines predictive analytics with robust optimization techniques to address sustainable decision-making problems in dynamic environments. In this study, we introduce a novel learning-enabled optimization method that extends the current domain of contextual stochastic optimization. Leveraging regression-based statistical learning techniques, our approach enhances predictive accuracy and reinforces decision robustness. Unlike traditional methods, which often struggle with parameter variability and unbounded solution spaces, our model establishes clear predictive bounds that reduce the uncertainty region, thereby minimizing deviations from optimality. We apply our methodology to water allocation in Tunisia’s coastal tourism sector (2010–2022), where resource availability is constrained and highly variable. While developed for this specific context, the framework is transferable to similar Mediterranean arid/semi-arid tourism regions subject to certain data and governance conditions. The proposed approach accurately predicts water demand and optimizes the allocation of diverse water sources, contributing to sustainable water resource management. This paper presents both theoretical foundations and practical applications of our method in complex, data-driven decision environments, demonstrating its relevance for achieving sustainable development goals.

1. Introduction

Water resource allocation has emerged as a global challenge of the 21st century, driven by escalating scarcity, intensifying competition among economic sectors, and environmental pressures. The tourism industry, particularly in arid and semi-arid regions, exemplifies this challenge with its concentrated seasonal demands and dependency on multiple water sources of varying reliability. Addressing these allocation dilemmas requires approaches that optimize efficiency, ensure equity across stakeholders, maintain ecosystem integrity, and account for the uncertainties that characterize water management contexts. There is a need to assess the dynamic complexities of real-world systems where hydrological variability, socioeconomic shifts, and climate change converge to create unprecedented decision-making environments. Recent advances in optimization methods for machine learning have demonstrated the potential to address these challenges through adaptive algorithmic frameworks [1].
Before proceeding, we define the core methodological concepts of our framework. Contextual optimization is a paradigm that conditions decision-making on observed contextual information, known as covariates, rather than relying on unconditional probability distributions. In this approach, decisions are treated as functions of the context, where the context vector contains auxiliary information such as demand forecasts, time indices, or spatial attributes. This approach contrasts with classical stochastic optimization, which often ignores context and may consequently suffer from overgeneralization. Furthermore, learning-based optimization integrates statistical learning techniques, such as regression or classification, within optimization frameworks. This typically involves a two-phase process: a predictive phase, where machine learning estimates uncertain parameters or probability distributions from data, followed by a prescriptive phase, where optimization uses these learned estimates to make informed decisions. This integrated approach, often termed ”predict-then-optimize” or integrated learning and optimization (ILO), combines both paradigms to predict context-conditional relationships, which then guide optimization decisions that adapt to specific contexts over time.
For example, in the context of water allocation, a classical approach might involve a fixed rule to allocate a certain percentage to groundwater. A learning-based approach adapts this allocation based on predicted demand, while a contextual approach further refines the decision by considering factors such as seasonality and location. Our framework implements the latter, conditioning allocations on primary context variables such as total demand while allowing for the future integration of broader seasonal, spatial, and regulatory contexts.
Traditional water resource allocation models employ static optimization, where decisions are optimized against the unconditional expectation of uncertain parameters. In contrast, contextual optimization conditions decisions on an observed context vector. This framework recognizes that water availability and demand patterns are not uniformly distributed but depend on observable contextual factors. A “one-size-fits-all” allocation policy that assigns fixed proportions, for instance, 40% surface water, 30% groundwater, and 30% desalination, fails to adapt to temporal context such as high tourist demand in summer versus low demand in winter, to hydrological state such as drought years requiring greater reliance on desalination, or to regulatory constraints such as seasonal restrictions on groundwater extraction. By learning through conditional distributions, contextual optimization enables adaptive decisions that respond dynamically to prevailing conditions.
Nonlinear mathematical modeling and decision-making under uncertainty represent two persistent challenges in operations research, particularly in sustainability contexts where data quality issues manifest across multiple dimensions. Following established information theory frameworks, data limitations can be categorized into objective dimensions of imprecision and incompleteness, alongside the subjective dimension of uncertainty [2]. Imprecision refers to the lack of exactness in measurements or observations, incompleteness denotes missing or unavailable information, while uncertainty captures the subjective interpretation of ambiguous evidence. While traditional optimization models perform well under clearly defined conditions with complete and precise data, contemporary decision-making environments demand further advancements to accommodate the multifaceted nature of information deficiencies. Current requirements necessitate enhanced methodological frameworks that can integrate adaptive learning mechanisms with robust optimization techniques to address dynamic changes and evolving system complexities.
The advent of statistical learning techniques, including regression-based approaches, has fundamentally transformed the modeling of complex nonlinear systems. These computational techniques enable the processing of massive datasets to identify intricate patterns that elude conventional statistical approaches, thereby substantially improving forecasting accuracy across diverse application domains. Nevertheless, a critical gap persists in effectively integrating such predictive capabilities within optimization frameworks. Existing methodologies often lack the adaptability required to accommodate evolving probability distributions or to leverage contextual information embedded within temporal data streams. This limitation becomes particularly acute in sustainability contexts where decision-making must simultaneously address environmental variability and long-term resource constraints.
Recent advances at the intersection of machine learning and operations research have catalyzed progress in both predictive accuracy and operational efficiency. The integration of machine learning techniques within optimization frameworks has evolved significantly, beginning with foundational work on ensemble methods [3] and progressing toward hybrid architectures that combine deep learning with mathematical programming [4]. These developments have been enriched by decision-support systems tailored for resource allocation problems [5], demonstrating the potential of cross-disciplinarity.
Despite these developments, current integration paradigms remain fragmented and lack systematic frameworks for managing uncertainty and variability. The field requires unifying methodologies capable of consistently addressing the stochastic nature of real-world decision environments. Contextual optimization has emerged to address this need by incorporating auxiliary information such as temporal patterns, spatial characteristics, and environmental conditions directly into mathematical optimization formulations [6]. This shift enables more personalized and context-aware decision support compared to classical stochastic optimization approaches that rely on unconditional probability distributions and consequently suffer from overgeneralization. Advances in computational power and algorithmic sophistication have accelerated this transition toward contextual methods.
In parallel, adaptive resource management has gained attention, with innovative data-driven metrics for systemic resilience based on real-time disruptions and historical analysis [7]. These insights emphasize the need for decision-making frameworks that are not only predictive but also resilient and adaptive to changing environments.
The study presented in this paper addresses the need for a consistent methodology linking predictive learning to optimization under uncertainty. This research proposes a new contextual optimization framework that leverages advanced analytical and optimization techniques to improve resilience, reduce deviations from optimality, and mitigate uncertainty-induced biases. Building on recent advances such as the reliability estimation indicators proposed by Zhang and Bose [8], our framework defines predictive thresholds that refine decision-making support.
Contextual optimization techniques typically rely on estimating conditional distributions based on historical covariates, enabling more personalized and effective support to decision-making. This approach extends beyond classical stochastic optimization, which often relies on unconditional distributions and therefore may suffer from overgeneralization [9]. Advances in computing power and learning algorithms have further accelerated the transition to contextual methods [10].
Parallel developments in sustainable decision-making and integrated learning frameworks have enriched this methodological landscape. Research contributions have demonstrated innovative approaches based on moderate supervision to promote equitable and participatory water governance [11], while other studies have highlighted the integration of forest ecosystem conservation with water management objectives aligned with the Sustainable Development Goals [12]. Additional work has explored adaptive management strategies that synthesize learning processes, participatory governance structures, and traditional ecological knowledge within socio-ecological systems [13]. These contributions illustrate the growing recognition that effective resource management requires frameworks capable of integrating multiple knowledge systems and stakeholder perspectives.
The theoretical foundation for integrated learning and optimization has matured considerably in recent years. Parameterized decision rules that associate actions with contextual covariates have been progressively refined to address increasingly complex applications while incorporating regularization techniques to prevent overfitting. Recent methodological advances have enabled direct optimization of decisions based on learned conditional distributions [14], with cutting-edge approaches integrating optimization objectives directly into the learning process to prioritize actionable decisions over simple predictive accuracy [15]. Water resource allocation represents a critical challenge for regions facing water scarcity and increasing demands from human activities, particularly the tourism sector. Comprehensive frameworks for urban water cycle management have emerged, emphasizing the integration of conventional and non-conventional sources [16,17]. Optimized selection frameworks have been developed for diverse water supply portfolios including surface water, groundwater, desalination, and wastewater reuse in coastal arid regions [18]. Integrated solutions for arid to semi-arid urbanized coastal regions address multiple topographic and resource constraints [19]. In the Tunisian context, hybrid fuzzy multi-criteria decision-making approaches have been applied to irrigation water allocation [20], complemented by integrated optimization and multi-criteria methods for broader water resources management [21]. Recent developments include fuzzy mathematical models with group decision-making for complex problems [22] and demonstrations of hybrid renewable energy systems (on-grid PV/wind) for desalination [23].
For studies concerned with forecasting environmental or resource-related variables, robust data-driven modeling frameworks have been established. Hao et al. [24] developed a methodology for analyzing and predicting runoff and sediment into reservoirs, while Wang et al. [25] proposed hybrid decomposition, reconfiguration models for long-term solar radiation prediction. These approaches offer alternatives for long-term prediction.
This paper proposes a novel learning-enabled contextual optimization framework, emphasizing its application to sustainable water resource allocation and demonstrating robust, actionable strategies for complex decision-making.
After this introduction, the remainder of this paper proceeds as follows. Section 2 provides a theoretical background and literature review. Section 3 presents our methodological framework, detailing both the data prediction component using machine learning and the optimization formulation. Section 4 introduces the Tunisia tourism water allocation case study, describing the study area, water resource systems, data sources, and variables. Section 5 presents the empirical results, including predictive model performance, and optimization outcomes. Section 6 discusses the findings and acknowledges the limitations and assumptions of the study. Finally, Section 7 provides concluding remarks.

2. Theoretical Background and Literature Review

Estimating the conditional distribution p ^ ( y x ) from data constitutes a fundamental aspect of SLO. The selection of an estimation technique has a direct impact on both the tractability and the robustness of the SDM problem.
In the context of Residual-Based Distribution Estimation, a prominent method leverages residuals obtained from regression models to construct empirical conditional distributions. As discussed by Deng and Sen [26] and Kannan et al. [27], the estimated conditional distribution is expressed as:
p ^ ( y x ) = 1 N i = 1 N δ e i ( y f ^ θ ( x ) ) ,
where δ e i is the Dirac delta function centered at residual e i ; f ^ θ ( x ) is the predicted value from a regression model parameterized by θ .
Weight-based distribution estimation employs weight-based estimation using k-nearest neighbors (kNNs) and kernel density estimation (KDE). The weighted conditional distribution is given by:
p ^ ( y x ) = 1 N i = 1 N w i δ y i ( y )
where w i is the weight assigned to observation i based on similarity between x and x i , and δ y i is the Dirac delta function centered at y i . The normalization i w i = 1 ensures that p ^ ( y | x ) is a valid probability distribution.
KDE refines this approach using kernel functions:
p ^ ( y x ) = 1 h d N i = 1 N K 2 ( x x i ) ( y y i ) h ,
where h is the bandwidth parameter and K is the kernel function.
Conditional mean estimation provides the conditional mean E ( y x ) . Ferreira et al. [28] employed regression trees for sales forecasting, and Liu et al. [29] applied analogous methods to delivery optimization:
y ^ = E ( y x ) 1 N i = 1 N δ y i ( x ) .
Regularization and sustainable decision-making are crucial when the feature space grows too wide, as models tend to memorize the training set. Regularization nudges the algorithm to keep its form simpler. Lin et al. [30] and Srivastava et al. [31] show that adding such penalties raises robustness. The general form is:
min θ E l ( y , f θ ( x ) ) + λ θ 2 ,
where l ( · ) is the loss function; λ is the regularization parameter; and θ 2 is the squared l 2 -norm.
Beyond regularization, SDM extends the framework by optimizing against the worst-case distribution within a predefined ambiguity set. As formulated by Kuhn et al. [32]:
min θ max Q Q E Q c ( θ , x , y ) ,
where Q is typically defined by a Wasserstein distance constraint from the empirical distribution.
Integrated learning and optimization (ILO) involves establishing parameter values for predictive models to achieve optimal policy performance. Bengio [33] initially proposed training a predictive model with a loss function that reflects the quality of actions.
The expected value-based optimization model loss function is expressed as:
L ( θ ) = E c ( z * ( x , g θ ) , y ) ,
where z * ( x , g θ ) stands for the best choice made for a given set of covariates x.

3. Methodology

3.1. Conceptual Framework

The methodology provides a unified and structured way of addressing optimization under uncertainty for multi-source water allocation in data-limited contexts (N < 20 observations). It integrates mathematical optimization with regression-based statistical learning to address intricate decision-making issues.
Our optimization model operates at two complementary levels. At the first level, the regression functions f i ( x i t ) , derived through machine learning, predict the historically optimal quantities allocated for each source i based on contextual characteristics. These predictions serve as reference values. At the second level, the optimization model determines the actual allocations x i t that, on the one hand, minimize deviations from predictions (thereby capturing efficient historical patterns) and, on the other hand, satisfy the global equilibrium constraint where the sum of all allocations equals the total demand y t . This hybrid approach combines learning from historical patterns with prescriptive optimization under physical constraints.
Figure 1 and Figure 2 illustrate the conceptual architecture and detailed mathematical formulation of this methodological framework, showing the progression from historical data patterns through generalized optimization to the two-phase workflow integrating machine learning predictions with multi-objective optimization. A complete list of all symbols and abbreviations used throughout this manuscript is provided in Appendix A.
The context vector c t encapsulates observable information at time t that influences both water demand and source availability. We define c t = { y t , τ t , s t , r t } , where y t represents the total sectoral water demand in Mm3/year and serves as the primary contextual variable in this study. The temporal or seasonal context is captured by τ t , which can be specified as a month index ranging from 1 to 12 or as a year identifier depending on data granularity. The system state vector s t captures infrastructure capacity, reservoir storage levels, and the operational status of treatment facilities. Finally, r t encompasses resource availability and regulatory constraints, including supply capacity limits, legally mandated environmental flows, and allocation priorities.
The context vector informs the predictive phase (Phase 1) through a learned mapping of the form ξ ^ t = f ML ( c t ; θ ) , where f ML represents the machine learning model—such as Random Forest or Gradient Boosting – with parameters θ . This model is trained to predict uncertain water source availability ξ t given the context c t .
Due to data availability constraints arising from annual aggregation in SONEDE reports, the current Tunisia case study employs a simplified context c t = { y t } , where y t represents the total tourism sector water demand for year t. This simplification does not limit the generality of the framework; the full multidimensional context is activated once monthly or daily data becomes available.
In the first stage, context-conditional prediction is performed, where for each water source i and time period t, we estimate:
f ^ i ( x i t c t ) = g i ( y t , τ t )
where y t R + represents total sectoral demand (the primary contextual variable), τ t { 1 , , 12 } denotes the seasonal context, and g i ( · ) is a regression function. In the second stage, context-aware optimization is applied where the optimization model adjusts allocations based on the provided context:
min i w i ( δ i t + + δ i t )
subject to:
f ^ i ( x i t c t ) δ i t + + δ i t = x i t
In this formulation, context { c t } enters through the prediction function f ^ i , ensuring allocations are adaptive to demand levels and temporal patterns rather than relying on static, unconditional distributions.
The context vector framework can accommodate diverse spatial and environmental factors. Latitude ( ϕ i ) and elevation ( h i ) are variables that influence evapotranspiration rates, pumping energy requirements, and seasonal temperature variations. For the Tunisia coastal tourism sector, which encompasses the governorates of Nabeul, Sousse, Monastir, Mahdia, Sfax, and Djerba, latitude varies narrowly between 33° and 37° N, while elevation is predominantly below 50 m above sea level due to the coastal plain topography. Given this low variability, spatial heterogeneity effects are implicitly captured through source-specific capacity constraints C i and cost coefficients rather than as explicit context components. For applications extending to geographically heterogeneous regions, such as mountainous versus coastal zones or multi-country basins, we recommend including ϕ i and h i as explicit components in the system state vector s t .
The temporal context component τ t can be defined at multiple resolutions. Julian day (ranging from 1 to 365) is appropriate for high-frequency operational planning and daily demand forecasting. Monthly resolution (ranging from 1 to 12) is suitable for capturing seasonal tourism patterns that distinguish peak season from low season. Annual resolution is used for long-term trend analysis and climate change adaptation studies. The current empirical application uses annual aggregation ( τ t = year ) due to data reporting frequency from Tunisia’s National Water Observatory (SONEDE).

3.2. Data Prediction Component: Regression-Based Statistical Learning

Regression-based statistical learning is employed to predict parameters that contain uncertainty. The prediction step can be represented as:
y = θ 1 X 1 + + θ n X n + ε ,
where θ = { θ 1 , θ 2 , . . . , θ n } represents the vector of model coefficients to be estimated. In simple linear regression, θ = { β 0 , β 1 } where β 0 is the intercept and β 1 the slope. In multiple regression, θ = { β 0 , β 1 , . . . , β k } where β j are the partial regression coefficients.
In our framework, θ = { θ 1 , θ 2 , , θ n } represents generic model parameters in the methodological framework (Equations (1)–(8)). For regression models, θ = { β 0 , β 1 , , β k } where β j are the estimated coefficients. Both notations refer to learned parameters: θ is used for general methodology while β denotes specific empirical coefficients obtained through regression analysis in the empirical application.
In terms of methodological positioning, our approach employs classical parametric regression (linear, polynomial, exponential, logarithmic, power functions) rather than complex machine learning algorithms (neural networks, ensemble methods, support vector machines).
We acknowledge that future research with longer time series (N > 30) could explore ensemble methods (random forests, gradient boosting) [3] or hybrid decomposition-reconfiguration models [25] to capture more complex nonlinear interactions. However, for the present dataset, classical regression provides the appropriate balance between predictive adequacy and methodological rigor.

3.3. Optimization Component

The mathematical formulation of our optimization model is given as follows. The global equilibrium constraint ensures that total allocations equal demand:
i = 1 n x i t δ t o t + + δ t o t = y t
Individual prediction constraints relate ML predictions to allocations for each source i:
f ^ i ( x i t ) δ i t + + δ i t = x i t , i , t
where f ^ i ( x i t ) represents the ML-predicted optimal contribution from source i (input), based on historical relationships identified by machine learning, and x i t is the allocation decision variable (output of optimization).
Objective Function:
min i = 1 n w i ( δ i t + + δ i t ) + w t o t ( δ t o t + + δ t o t )
where n = number of water sources ( n = 7 for Tunisia case); i = source index ( i { 1 , , n } ); t = time period; w i = priority weight for source i [dimensionless, 0 w i 1 ]; w t o t = weight for total equilibrium constraint [dimensionless]; δ i t + = positive deviation for source i at time t [Mm3]; δ i t = negative deviation for source i at time t [Mm3]; δ t o t + = positive deviation from total demand balance [Mm3]; and δ t o t = negative deviation from total demand balance [Mm3].
This corrected formulation ensures proper aggregation of sources ( x i t = y t ), allows optimization of deviations between predictions and actual allocations for each source, and maintains mathematical consistency of the problem.
The optimization remains necessary despite high R2 values because (a) multiple constraints are not captured by individual regressions (physical capacities, quality constraints, costs, environmental impacts); (b) changing conditions may require reallocation; and (c) multi-objective optimization balances competing priorities.
Variable x i t represents the optimal predicted quantity of resource i during period t; f ^ i ( x i t ) is the predictive regression function; w i is the priority weight; δ i t + and δ i t are positive and negative deviations; and T represents the total number of time periods in the planning horizon. Here, we focus on a single representative period ( T = 1 ) to demonstrate the methodology, though the framework is readily extensible to multi-period optimization.
The optimization framework accommodates network topology constraints that arise from the physical configuration of water supply infrastructure. Water cannot be freely transferred between sources and demand zones; instead, flows are constrained by pipeline connectivity, pumping station capacity, and inter-basin transfer agreements.
For a network-constrained formulation, the connectivity structure is represented through the incidence matrix A, where a i j = 1 indicates that a pipeline exists connecting source i to demand zone j, and a i j = 0 indicates that no direct transfer path exists. When physical connectivity is combined with capacity limits, an arc-flow constraint applies:
x i j , t a i j · C i j max
where x i j , t represents the flow from source i to zone j at time t, and C i j max is the maximum pipeline or pumping capacity.

3.4. Model Validation Framework

Given the inherent uncertainty in both data and model specification, a rigorous validation framework is essential for establishing reliability. This section defines the methodological approach for assessing model performance.
In terms of sample size, our dataset comprises 13 annual observations, which falls short of the 20 points theoretically recommended for univariate models ( k = 1 ) by the N 10 k + 10 rule [34]. However, the analyzed period contains substantial structural variability due to major exogenous shocks, such as the Arab Spring (2011) and the COVID-19 pandemic (2020–2021). These events provide the necessary data variance to compensate for the limited sample count. We also prioritize model parsimony, rejecting polynomial complexity unless it delivered an R 2 increase greater than 5%. The robustness of this approach is confirmed by the leave-one-out metrics.
Considering the bias–variance trade-off and the small sample size, we prioritize simpler functional forms (linear, logarithmic) over complex models to minimize overfitting risk. The selection of polynomial models for only three sources (x1, x3, x6) reflects careful balance between fit quality and generalization capacity.
As the optimal validation strategy for small datasets, Leave-One-Out Cross-Validation (LOOCV) iteratively holds out each observation as a test set while training on the remaining N 1 observations. The cross-validated R 2 (CV-R2) is computed as:
CV - R 2 = 1 t = 1 N ( y t y ^ ( t ) ) 2 t = 1 N ( y t y ¯ ) 2
where y ^ ( t ) represents the prediction for observation t from a model trained without observation t.
Model performance is assessed using complementary metrics. The RMSE (Root Mean Square Error), calculated as 1 N i = 1 N ( y i y ^ i ) 2 , measures the absolute deviation magnitude. The MAE (Mean Absolute Error), defined as 1 N i = 1 N | y i y ^ i | , offers a metric robust to outliers. Finally, the MAPE (Mean Absolute Percentage Error), expressed as 100 % N i = 1 N y i y ^ i y i , provides a scale-independent comparison.
To quantify generalization capacity, we define an Overfitting Index as the gap between training and cross-validated performance:
Overfitting Index = R Training 2 CV - R 2
Values below 0.03 indicate excellent generalization (≈ zero overfitting); values between 0.03 and 0.10 suggest acceptable generalization suitable for operational use; values exceeding 0.10 warrant caution.
Cross-Validation Protocol. Model generalization is assessed via Leave-One-Out Cross-Validation (LOOCV), optimal for small datasets [34]. For each observation t { 1 , , 13 } , the model is re-estimated using observations { 1 , , t 1 , t + 1 , , 13 } to predict y ^ ( t ) . Cross-validated performance is quantified through:
CV - R 2 = 1 t = 1 13 ( y t y ^ ( t ) ) 2 t = 1 13 ( y t y ¯ ) 2 , RMSE CV = 1 13 t = 1 13 ( y t y ^ ( t ) ) 2
Overfitting risk is assessed via the Overfitting Index = R Training 2 CV - R 2 , where values exceeding 0.10 indicate excessive model complexity.

4. Empirical Application: Contextual Water Allocation

4.1. System Description and Resource Portfolio

To demonstrate the proposed framework, we apply it to a multi-source water allocation problem in a region characterized by high seasonal demand and water scarcity. The system relies on a diverse portfolio of water sources, including conventional resources (surface and groundwater) and non-conventional alternatives (recycled water, harvested rainwater, and desalination). The objective is to optimize the allocation across these sources to meet the total demand while maintaining resilience to contextual uncertainties.
The region’s water supply portfolio integrates both conventional and non-conventional sources to meet growing demand. Conventional resources, specifically surface water ( x 3 ) and groundwater ( x 5 ), contribute the majority of supply (37.8% and 31.9% respectively), though they face significant variability and overdraft risks. To mitigate these constraints, non-conventional alternatives are mobilized, including recycled water ( x 1 ), harvested rainwater ( x 2 ), and rapidly expanding desalination capacity ( x 4 ), which triples its share to 14.6%. Supplementary inter-regional transfers are provided by the Nord Water pipeline ( x 6 ) and the Sbeitla-Jelma transfer ( x 7 ), securing essential backup supply from water-surplus regions.
In operational terms, the water distribution infrastructure comprises a tiered hierarchical network managed by SONEDE. The first tier consists of bulk production from desalination plants and major reservoirs feeding regional distribution hubs. The second tier consists of inter-regional pipelines including the Nord water pipeline (from Cap Bon to Sahel), which supplies the coastal tourism corridor with capacity C 6 max = 15 Mm3/year, and the Sbeitla-Jelma pipeline (from central aquifers to coastal zones). The third tier consists of municipal distribution networks connected to tourism establishments. The implicit assumption in our aggregated formulation is that intra-regional transfers are unconstrained within the tourism zone while inter-regional transfers face capacity limits. For a spatially disaggregated extension distinguishing Hammamet, Sousse-Monastir, and Djerba subzones, the incidence matrix A and capacity vector c max are necessary inputs, requiring geographic data currently outside the scope of the available SONEDE reports.
Tunisia’s tourism sector exhibits pronounced seasonal demand patterns that significantly affect water allocation requirements. The peak tourist season spans from May through October, during which hotel occupancy exceeds 75% and water consumption per tourist increases due to swimming pools, landscaping irrigation, golf courses, and higher showering frequency. In contrast, the low season from November through April is characterized by occupancy below 40% and reduced per-capita water consumption. Based on ONTT sector reports for 2018–2019, the peak-to-low season demand ratio reaches approximately 2.8:1.
Incorporating seasonal dynamics explicitly requires sub-annual data resolution. With monthly data, the context vector would expand to include τ t { 1 , , 12 } as a categorical month indicator, enabling a seasonal embedding through the formulation c t = { y t , sin ( 2 π τ t / 12 ) , cos ( 2 π τ t / 12 ) , } . This Fourier representation captures cyclical patterns while maintaining model parsimony. The current annual aggregation averages over these seasonal peaks, yielding conservative allocation estimates suitable for capacity planning but potentially suboptimal for operational scheduling. Future work with monthly SONEDE data would enable peak-season-specific allocation optimization, addressing the constraint period when tourism demand coincides with minimum groundwater recharge.

4.2. Data and Variables for Empirical Analysis

The empirical analysis utilizes a temporal dataset covering the annual water allocation from the identified sources and the corresponding total demand over a representative period. The dependent variable (y) represents the total sectoral water demand (million m3/year), while the independent variables ( x i ) correspond to the annual volumes allocated from each specific source i (million m3/year).
The dataset captures the historical relationships between individual source contributions and the global system demand, accounting for external shocks and structural shifts in the water supply portfolio.
For the empirical application covering 2010–2022, the context vector is specified with annual tourism water demand as the primary context, alongside variables representing temporal evolution, sectoral context through the number of registered hotels, and disruption indicators capturing major exogenous events. The regression models condition on total demand as the principal contextual variable, enabling the framework to navigate demand fluctuations dynamically rather than relying on fixed historical proportions.

4.3. Data Quality Assurance

To guarantee analytical reliability, we implemented a rigorous validation protocol. First, we verified mass balance coherence. The sum of individual source allocations ( i = 1 7 x i ) tracks total demand (y) with high precision, showing a mean absolute deviation of only 1.23 Mm3 (1.4%), attributable to minor rounding discrepancies. Crucially, the dataset registers historical volatility with high fidelity. Real-world shocks are distinct in the data. Water usage dipped 3.2% during the 2011 Arab Spring and plummeted 29.8% in the 2020 pandemic year. These values match the sharp decline in tourism reported by the ONTT and UNWTO. We also validated these figures against Ministry of Agriculture and ONAS records. Any small differences (under 5%) were resolved by speaking directly with the institutions. Statistical analysis (Grubbs’ test, α = 0.05) confirmed the 2020 anomaly as a legitimate extreme event rather than an error, justifying its retention. Finally, the time series contains zero missing values, a consistency ensured by the mandatory monitoring frameworks imposed by Law 75-16 on water resource protection.
These regression models serve as reference patterns in the optimization framework, guiding allocations toward historically efficient configurations while allowing deviations when physical constraints or changing conditions require adjustments.

4.4. Statistical Summary and Key Observations

Analysis of Table 1 reveals several important trends in Tunisia’s tourism water supply system. Tourism sector water demand increased from 82.4 Mm3 (2010) to 108.5 Mm3 (2022), representing a 31.7% growth over the period with a compound annual growth rate (CAGR) of 2.3%. Conventional sources remain dominant, with surface water ( x 3 , 37.8% average share) and groundwater ( x 5 , 31.9%) together supplying approximately 70% of tourism water needs, reflecting Tunisia’s historical reliance on these resources despite increasing stress. Meanwhile, desalinated water ( x 4 ) contribution tripled from 3.6 Mm3 (4.4%, 2010) to 15.8 Mm3 (14.6%, 2022), demonstrating investment in climate-resilient non-conventional sources for coastal tourism zones.
Inter-regional transfers from Nord water ( x 6 , 12.9% average) and Sbeitla-Jelma water ( x 7 , 3.3%) provide supplementary supply from water-surplus regions, highlighting the importance of large-scale transfer infrastructure for demand management. Recycled water ( x 1 , 2.3%) and harvested rainwater ( x 2 , 1.2%) remain marginal contributors despite their sustainability potential, indicating barriers to wider implementation including regulatory constraints, public perception, and infrastructure requirements. The 2020 COVID-19 shock resulted in a 29.8% demand reduction with proportional decreases across all sources, demonstrating the direct coupling between tourism activity and water demand. The rapid recovery in 2021–2022 (to 108.5 Mm3, exceeding pre-pandemic levels) suggests robust tourism sector resilience.
These observations underscore the complexity of managing a multi-source water supply system under conditions of growing demand, climate variability, and external shocks, motivating the need for an adaptive optimization framework.

4.5. Model Specification for Tunisia Case Study

We adopt the statistical learning paradigm [34], framing regression as a data-driven discovery of patterns while tailoring our methodology to the specific nuances of water resource management. Given the constraints of a limited sample size ( N = 13 ), we deliberately prioritize parametric models with minimal variables to maintain an optimal bias, variance trade-off and preclude the overfitting typical of complex machine learning (ML) architectures. This preference for transparency is further dictated by the needs of key stakeholders, such as SONEDE and tourism authorities; for these entities, explainable, closed-form regression functions are essential for regulatory approval and institutional trust, whereas “black-box” models often encounter resistance. From an operational standpoint, the analytical simplicity of our functions ensures the computational efficiency required for real-time re-optimization in planning scenarios. Crucially, this approach allows us to embed physical plausibility directly into the mathematical framework, utilizing logarithmic forms for diminishing returns in x 7 or exponential growth for desalination in x 4 , thereby ensuring that our extrapolations remain grounded in geographic and technical reality.
For the Tunisia tourism water allocation problem, the optimization model is instantiated with the seven water sources described above, using the best-fit regression functions. The complete formulation for period t = 1 is described below.
Objective Function:
Min Z = i = 1 7 w i ( δ i 1 + + δ i 1 ) + w t o t ( δ t o t + + δ t o t )
Symbol consistency with generic formulation: the subscript t = 1 replaces generic t for this single-period demonstration, and n = 7 replaces generic n for the Tunisia case portfolio. The optimized decision variables include allocations x i 1 , source-specific deviations δ i 1 ± , and total equilibrium deviations δ t o t ± [all in Mm3]. Input parameters include total demand y 1 , the source-specific prediction functions f ^ i ( x i 1 ) , and the priority weights w i , w t o t .
The transformation from the general methodology to the empirical application is consistent, with generic source counts and temporal indices addressing the demonstration period. The mapping from methodological variables to empirical allocation decisions ensures that all weights maintain identical dimensional and logical definitions across both formulations. Global Equilibrium Constraint:
x 11 + x 21 + x 31 + x 41 + x 51 + x 61 + x 71 δ t o t + + δ t o t = y 1
Individual prediction constraints are defined for each source i with period t = 1 .
The optimization model incorporates specific prediction constraints for each source to guide allocation. For recycled water ( x 1 ), a polynomial function (Equation (3)) captures the relationship:
2.55 x 11 2 + 44.52 x 11 + 64.53 δ 11 + + δ 11 = x 11
Similarly, the constraint for harvested rainwater ( x 2 ) follows a power function:
11.36 x 21 0.95 δ 21 + + δ 21 = x 21
Surface water ( x 3 ) allocation is governed by a polynomial prediction constraint:
0.05 x 31 2 + 2.78 x 31 + 62.02 δ 31 + + δ 31 = x 31
Desalinated water ( x 4 ) is modeled via an exponential function:
46.34 e 0.02 x 41 δ 41 + + δ 41 = x 41
Groundwater ( x 5 ) relies on a linear constraint:
5.24 x 51 + 52.57 δ 51 + + δ 51 = x 51
Nord water ( x 6 ) employs a polynomial form:
0.05 x 61 2 + 0.86 x 61 + 72.96 δ 61 + + δ 61 = x 61
Finally, Sbeitla-Jelma water ( x 7 ) is constrained by a logarithmic relationship:
184.05 ln ( x 71 ) 214.8 δ 71 + + δ 71 = x 71
To ensure physical realism, the model enforces non-negativity constraints on all deviation variables:
δ i 1 + 0 , δ i 1 0 , i = 1 , , 7
δ t o t + , δ t o t 0
These equations integrate the best-fit regression models from Table 2 as prediction constraints. The left-hand side f ^ i ( x i 1 ) represents the ML-predicted optimal contribution from source i, while the right-hand side x i 1 is the actual decision variable. The deviation variables δ i 1 ± capture discrepancies between ML predictions and optimal allocations, while the global constraint ensures mass balance, i.e., i = 1 7 x i 1 = y 1 . The model parameters ( w i , w t o t ) are determined through sensitivity analysis.

5. Results

5.1. Regression Model Performance and Selection

Multiple regression models were tested to identify the most reliable trend lines for predicting water demand from different water sources. A trend line is most reliable when its R 2 value is equal to or close to 1.00 (Figure 3 and Table 2).
Regression analysis reveals distinct, non-linear relationships between the water demand of the tourism sector and the availability of different water sources. Recycled water, surface water, and northern water imports are best represented by a second-order polynomial trendline with R 2 values of 0.40, 0.84, and 0.77, respectively.
For harvested rainwater, the relationship is best captured by a power trendline ( R 2 = 0.87 ). For desalinated water, an exponential trendline provides the best statistical representation ( R 2 = 0.78 ). The analysis of imported Sbeitla-Jelma water reveals a logarithmic trend ( R 2 = 0.70 ).
The temporal dataset (N = 13 annual observations, 2010–2022) presents inherent limitations for regression analysis. To address concerns regarding sample size adequacy, we employ multiple validation strategies appropriate for small-sample contexts, i.e., sample size justification and bias, variance trade-off.
As established in Section 3.4, our N = 13 dataset falls short of the N ≥ 20 threshold for univariate regression. However, the structural variability from major exogenous shocks (Arab Spring 2011, COVID-19 2020–2021) compensates for limited sample count, while model parsimony and cross-validation metrics (CV-R2 = 0.703, overfitting gap = 0.051) confirm acceptable generalization.
Given the small sample, we prioritize simpler functional forms (linear, logarithmic) over complex models to minimize overfitting risk. The selection of polynomial models for only three sources (x1, x3, x6) reflects careful balance between fit quality and generalization capacity, supported by the cross-validation results presented below.
Following the LOOCV protocol established in Section 3.4, Table 3 reports empirical validation metrics for all seven source models, extending beyond the in-sample R2 values in Table 2.
Model performance is assessed using complementary metrics where the Root Mean Square Error (RMSE) quantifies absolute deviation magnitude and the Mean Absolute Error (MAE) offers robustness to outliers. The Mean Absolute Percentage Error (MAPE) provides scale-independent comparison, while the Overfitting Index (defined as the Training R2 minus CV-R2) monitors generalization capacity, identifying potential model complexity issues when values exceed 0.10.
The assessment of model adequacy reveals a stability tension in the polynomial fit for x1 (Recycled water). Its Overfitting Index reaches 0.088, skirting the 0.10 red line. This friction is expected, as extracting stable signals from low-volume streams (mean = 2.1 Mm3/year) is notoriously difficult in small datasets. Alternative linear specification yields R2 = 0.389 (vs. 0.400 polynomial) but improved CV-R2 = 0.361 (vs. 0.312), reducing overfitting to 0.028. We retain the polynomial for marginal in-sample improvement (1.1% R2 gain), acknowledging that the optimization framework’s deviation variables ( δ 1 t ± ) accommodate prediction errors. Future model updates with N > 20 should prioritize linear simplification for x1.
The interpretation of cross-validation performance reveals that the average decline from training R2 (0.754) to CV-R2 (0.703) of 0.051 indicates acceptable generalization, suitable for operational water management. This means predictions may deviate by approximately 5–10% under new demand scenarios. Groundwater (x5) demonstrates the strongest predictive stability (CV-R2 = 0.908, overfitting index = 0.015), justifying its role as the backbone of the allocation strategy. In contrast, recycled water (x1) and desalinated water (x4) require more cautious interpretation due to higher generalization gaps. Consequently, given the borderline overfitting for recycled water, we recommend considering replacing its polynomial model with linear regression in future iterations to improve generalization.
To quantify the uncertainty of these estimates, we use the LOOCV residuals (Figure 4) to compute 95% prediction intervals according to:
PI 95 % ( x i ) = f ^ i ( y ) ± 1.96 × RMSE C V , i
For the mean demand scenario (y = 90.5 Mm3), this yields intervals of ±0.17 Mm3 (x2) to ±3.17 Mm3 (x3), representing 6–9% relative uncertainty bands. These intervals are incorporated into the Monte Carlo uncertainty analysis (Section 5.5).
The visual diagnostic assessment, presented in Figure 5, complements these quantitative metrics by showing scatter plots of observed versus predicted allocations for each source. These visualizations reveal systematic patterns, particularly for harvested water (x2) and groundwater (x5), which exhibit tight clustering around the diagonal and confirm high predictive accuracy ( R 2 > 0.85 ). In contrast, the 2020–2021 COVID-19 period (red diamonds) shows systematic deviations for recycled, harvested, and northern import sources, reflecting the unprecedented demand reduction during the pandemic. Sources with higher uncertainty, such as surface water and recycled water, display a wider scatter consistent with their lower fit values and larger cross-validation errors.
The multiple regression analysis shows remarkably high model accuracy, with an R 2 value of 0.998 for the Tunisia tourism case (2010–2022), showing exceptional predictive capability within this specific temporal and sectoral scope. The estimated coefficients reveal that harvested water ( β = 1.658 ), imported northern water ( β = 1.461 ), and imported Sbeitla-Jelma water ( β = 1.395 ) have the strongest positive influence on total water demand.

5.2. Optimization Results

The model uses Goal Programming (GP), a multi-objective optimization technique that minimizes weighted deviations from pre-specified targets, to minimize deviations from targeted water requirements.
We implemented the model in Python v3.13 using the minimize function from the scipy.optimize library v1.17. The experiments were executed on a personal computer with Intel Core i7 2.6 GHz and 8 GB RAM.
To assess model robustness, we generated 20 distinct weight scenarios using Latin Hypercube Sampling (LHS) to ensure representative coverage of the weight space (Table 4). Each scenario assigns different relative priorities w i [ 0 , 1 ] to the seven sources, with normalization constraint i w i = 1 .
Figure 6 illustrates the distribution of objective function values across the 20 weight scenarios, revealing the model’s response to varying decision-maker priorities. The x-axis represents the instance number (weight scenario identifier), while the y-axis displays the corresponding objective function value (total weighted deviation). The objective function values range from a minimum of 0.007 (Instance 10) to a maximum of 0.397 (Instance 3), demonstrating substantial variation depending on weight allocation among sources. Lower objective values indicate better overall conformity between ML predictions, optimal allocations, and demand equilibrium. Instance 10 achieves the best performance by assigning highest priority weights to sources with most stable prediction patterns, while Instance 3 exhibits the highest deviation due to prioritization of sources with inherently more variable allocation relationships. This variability across scenarios underscores the importance of calibrating priority weights based on policy objectives and source reliability characteristics.
The empirical analysis operationalizes the theoretical frameworks introduced in Section 2. The conditional distribution estimation methods (residual-based, weight-based, and conditional mean approaches from Section 2) are implemented through the regression models in Table 2, where each functional form (polynomial, exponential, logarithmic, power, linear) represents an estimated conditional relationship p ^ ( y | x i ) . The regularization principles discussed in Section 2 are implicitly embedded in model selection, where R 2 values guide functional form choice to balance fit quality with generalizability, avoiding overfitting while capturing meaningful patterns.
The ILO framework provides the conceptual foundation for our two-phase methodology. Machine learning establishes predictive functions f ^ i ( x i t ) (Phase 1), which are then incorporated as soft constraints within Goal Programming optimization (Phase 2, Equations (12)–(22)). This integration exemplifies ILO’s emphasis on optimizing policy performance L ( θ ) = E [ c ( z * ( x , g θ ) , y ) ] rather than merely maximizing predictive accuracy.
The SDM approach underpins our treatment of uncertainty through deviation variables ( δ ± ) that accommodate discrepancies between historical patterns and optimal allocations under current constraints, allowing adaptation while maintaining mass balance. This extends classical stochastic optimization by conditioning decisions on context-specific covariates (demand y t , source characteristics) rather than unconditional distributions, addressing the overgeneralization problem previously noted (Section 2).

5.3. Sensitivity Analysis and Robustness Evaluation

The core of our methodology is an optimization model designed for efficient and equitable water allocation. To examine how the model responds to changes in decision-maker priorities, we generated 20 scenarios with distinct priority weight vectors w = [ w 1 , w 2 , , w 7 ] , where each weight w i ( i = 1 , , 7 ) represents the relative importance assigned to water source i, with normalization constraint w i = 1 . Figure 7 illustrates the resulting objective function values, where lower values indicate better conformity between historically efficient patterns and feasible allocation strategies.
Figure 7 provides a visualization of the objective function response across all 20 weight scenarios, confirming the model’s robustness to varying priority structures. The x-axis represents the weight scenario identifier (Instance 1 through 20), where each instance corresponds to a distinct priority allocation among the seven water sources ( w 1 , w 2 , , w 7 ). The y-axis displays the minimized objective function value, calculated as the sum of weighted deviations between machine learning predictions and optimized allocations. Lower values on the y-axis indicate better alignment between historically efficient patterns and current optimal allocations under physical constraints. The weight vectors are detailed in Table 3, where w i represents the priority weight assigned to source i, with the normalization constraint w i = 1 ensuring valid probability distributions.
Figure 7 illustrates the distribution of objective function values across the 20 weight scenarios, revealing the model’s response to varying decision-maker priorities. The x-axis represents the instance number (weight scenario identifier from 1 to 20), while the y-axis displays the corresponding minimized objective function value, defined as the sum of weighted deviations between machine learning predictions and optimal allocations across all sources. Lower objective values indicate better conformity between historical efficiency patterns (captured by ML models) and feasible allocation strategies under current constraints. The observed range from 0.007 (Instance 10) to 0.397 (Instance 3) demonstrates variation depending on weight allocation, reflecting the inherent trade-offs between prioritizing different sources. Instance 10, which assigns highest weights to desalinated water ( w 4 = 0.327 ) and Sbeitla-Jelma water ( w 7 = 0.306 ), achieves the best objective value by leveraging sources with high predictive reliability. Conversely, Instance 3, which heavily weights surface water ( w 3 = 0.201 ) and Nord water ( w 6 = 0.191 ), yields a higher objective value due to the larger deviations required to satisfy physical constraints for these sources.
The sensitivity analysis confirms the robustness and consistency of the proposed model. The objective function values ranged from 0.007 to 0.397 across all scenarios. The most frequent positive deviation was δ 1 + = 4.173 , occurring in Instances 5, 9, 10, and 17, while the most frequent negative deviation was δ 4 = 2.066 , recorded in seven instances.
Resource allocations remained stable across all scenarios, with variations in x i below 5%. The most sensitive variables were identified as x 1 , x 3 , x 4 , and x 6 .

5.4. Benchmark Comparison with Conventional Optimization Approaches

To demonstrate the added value of integrating regression-based predictions into goal programming, we compare our methodology against three alternative allocation approaches using identical demand scenarios and physical constraints (Table 5).
Strategy 1—Fixed Proportions (Historical Average)—Conventional water resource planning often allocates based on fixed historical proportions:
x i fixed = y t × x ¯ i j = 1 7 x ¯ j
Strategy 2—Regression-Informed GP (This Study)—Our approach derives targets from learned source, demand relationships (Equations (14)–(20)), allowing allocations to adapt to contextual demand levels.
Strategy 3—Equal Priority Weights (Naïve Optimization)—Baseline optimization where all sources receive equal importance ( w i = 1/7 i ).
Strategy 4—Robust Optimization (Worst-Case)—Min, max formulation protecting against prediction uncertainty:
min x max ϵ U i = 1 7 w i | f ^ i ( x i ) + ϵ i x i |
where U = { ϵ : ϵ 2 × RMSE C V } represents the 95% confidence uncertainty box.
Our analysis yields several critical insights regarding the trade-off between efficiency and robustness. While the regression-informed strategy achieves superior performance under normal conditions, the robust optimization approach provides better protection against extreme shortages. Statistically, the proposed method demonstrates a significant improvement over fixed-proportion allocations. Operational feasibility is supported by low computational costs allowing real-time reallocation, though extreme stress scenarios technically exceed capacity, they are managed by distributing cuts to low-priority sources.

5.5. Monte Carlo Uncertainty Propagation Analysis

While the 20 weight scenarios (Section 5.3) assess sensitivity to decision-maker priorities, they do not quantify robustness to input data uncertainty, specifically, errors in demand forecasts (yt) and regression predictions ( f ^ i ). We address this through Monte Carlo simulation that perturbs both sources simultaneously.
Two primary sources of uncertainty are considered. First, the Demand Forecast Error accounts for variations in tourism water demand due to visitor arrivals, climate anomalies, and policy changes. We model y t as normally distributed, y t N ( μ y , σ y 2 ) , where μ y = 90.5 Mm3 (mean 2010–2022) and σ y = 9.8 Mm3 (historical standard deviation). Second, the Regression Prediction Error is addressed, as each source model exhibits residual error (Table 3). We model prediction deviations as f ^ i , pert = f ^ i ( x i ) + ϵ i , where ϵ i N ( 0 , RMSE i 2 ) .
To quantify the impact of input data uncertainty, we performed a Monte Carlo simulation. This approach allow us to evaluate how variations in demand forecasts ( y t ) and prediction errors in individual source models ( f ^ i ) propagate through the optimization framework. The simulation follows a rigorous protocol (Table 6). First, demand is modeled based on historical mean and standard deviation; second, prediction errors are sampled from source-specific residual distributions; and third, the model is iteratively solved to generate a distribution of optimal allocation outcomes and feasibility states.
The analysis of the simulation results (Table 7) reveals the structural robustness of the water allocation framework.
The simulation metrics provide critical insights into the system’s operational reliability. A feasibility rate of 96.3% was observed, which suggests that the system remains stable under most anticipated uncertainty scenarios. However, the identified 3.7% risk of infeasibility highlights the potential for stress events during extreme demand peaks or supply disruptions. These findings indicate that while the regression-informed optimization handles variability effectively, maintainig a strategic capacity buffer remains essential for critical infrastructure management.
In terms of sensitivity to uncertainty assumptions (Table 8), we test robustness to our uncertainty parameterization by conducting a secondary analysis varying σ y by ±50%:
A key finding of this analysis is that feasibility degrades linearly with uncertainty (correlation r = −0.998, p< 0.001), dropping below 95% in high-uncertainty scenarios. This underscores the need for improved demand forecasting to maintain operational reliability.
As seen in Figure 8, since Panel (b) displays a bimodal pattern, the historical mean (7.26 Mm3) becomes an unreliable metric for sizing. To prevent shortages during peak tourism, planners must bypass the average and specifically target the high-demand mode (8–10 Mm3). In practice, this necessitates a 15–20% safety margin above the baseline.
Based on these findings, we propose a set of operational recommendations for Tunisia’s tourism water managers. First, maintaining a 5% contingency reserve above nominal allocations is advisable to absorb the majority of uncertainty-driven variations. Second, real-time monitoring should prioritize surface water and desalination sources due to their higher variability, enabling early detection of deviations. Third, implementing quarterly model recalibration to incorporate the latest demand data can significantly reduce prediction errors and improve feasibility rates.

6. Discussion

6.1. Methodological Performance and Resource Allocation Dynamics

The results presented in the previous section highlight the comparative advantages of integrating regression-based predictions within a multi-objective optimization framework. The polynomial models demonstrated varying degrees of success across different water sources, with conventional resources like groundwater showing higher predictive stability compared to emerging non-conventional sources. The remarkably high R 2 of the goal programming model underscores its ability to capture the complex interdependencies within multi-source portfolios while balancing competing allocation targets.
Our approach complements existing water management methodologies, such as Multi-Criteria Decision-Making (MCDM) and GIS-based analysis, by providing a data-driven path toward adaptive allocation. While methodologies like ELECTRE or TOPSIS are invaluable for stakeholder-driven prioritization, the contextual optimization framework offers a technical solution for real-time adjustments based on evolving demand patterns and resource availability.

6.2. Uncertainty Management and Comparative Methodology

The treatment of uncertainty through deviation variables and Monte Carlo simulation connects to a growing body of research on uncertainty-based water allocation optimization. We situate our findings within this methodological landscape through explicit comparisons.
Comparing with Bayesian Network approaches, Wang et al. [35] achieved 95.7% systemic resilience in agricultural contexts using Bayesian networks coupled with bi-level multi-objective programming. Our Monte Carlo analysis (Section 5.5) achieved comparable 96.3% feasibility under simultaneous demand and prediction uncertainty. A key methodological distinction is that their bi-level approach requires explicit stakeholder preference elicitation for the upper-level objectives, whereas our regression-informed targets derive directly from historical efficiency patterns embedded in SONEDE operational data, reducing implementation complexity for centralized utilities.
Comparing with Deep Learning Forecasting, Liu et al. [36] pioneered Non-stationary Transformers for urban water demand forecasting, demonstrating that accounting for multiple uncertainty sources simultaneously improves allocation schemes by 12–18%. Our benchmark comparison (Table 5) shows 73% improvement over fixed-proportion allocation and 45% improvement over GP-without-regression, comparable in magnitude. Their approach requires substantially more data (N > 100 observations) than our regression-based method (N = 13), making our framework more suitable for data-limited Mediterranean tourism contexts.
Comparing with Market-Based Mechanisms, Vahedizade et al. [37] demonstrated 45% increases in total user benefits through real-time ANFIS-based streamflow forecasts and option contracts applied to the Gorganrood River basin. Our framework similarly leverages forecasted allocations to guide optimization, though we target utility-controlled systems rather than water markets. A market mechanism is optimal for competitive multi-user basins, whereas the one herein proposed is better suited to centralized management structures where inter-source trade-offs (not inter-user competition) drive allocation decisions.
Comparing with Risk-Sensitive Programming, Yue et al. [38] developed Copula-based interval programming incorporating decision-maker risk tolerance, while Li et al. [39] combined Two-Stage Stochastic Programming with Conditional Value-at-Risk (CVaR) for agricultural water management. Our deviation variables (Equations (5) and (6)) provide analogous risk control by penalizing shortfalls more heavily than surpluses. Their CVaR approach explicitly quantifies tail risks; our sensitivity analysis (Table 8) demonstrates how feasibility degrades linearly with uncertainty magnitude ( r = 0.998 ), providing similar operational guidance for risk-aware planning.
Table 9 frames our regression-informed Goal Programming against established uncertainty frameworks. We focus on a specific friction point, the trade-off between data-heavy complexity and the need for agile, low-latency solutions.
Collectively, these comparisons situate our regression-informed Goal Programming as a middle-ground methodology more data-efficient than deep learning approaches [36], less stakeholder-intensive than Bayesian network methods [35], and better suited to centralized utilities than market mechanisms [37]. The 96.3% Monte Carlo feasibility and 73% benchmark improvement demonstrate comparable performance with reduced implementation requirements, a practical advantage for water-scarce regions with limited technical capacity.

6.3. Contextual Interpretation and Implementation Boundaries

Our methodology demonstrates context-specific advantages for Tunisia’s coastal tourism water system (2010–2022), but critical assumptions apply.
First, regarding the temporal validity window, the regression models capture historical allocation patterns during a period characterized by stable centralized governance, significant tourism growth, and rapid desalination expansion. Consequently, any substantial future deviation, such as a geopolitical crisis or plateauing desalination capacity, determines a risk of extrapolation that necessitates model recalibration every 3–5 years.
Second, regarding geographic transferability boundaries, the framework is directly applicable to Mediterranean coastal tourism regions that share key characteristics: multi-source portfolios, high seasonal demand variability, and centralized utility governance. Compatible contexts include coastal Spain, Southern Italy, and Greek islands, whereas monsoon-driven systems or unregulated markets require methodological adaptation.
Monsoon-driven systems (India, Southeast Asia) require seasonal regression models; single-source dominance (>80% from one source) makes optimization overhead unjustified; unregulated markets require different frameworks assuming utility-controlled allocation.
Third, regarding integration with existing planning tools, for operational implementation in Tunisia, we propose a three-tier decision architecture. For operational implementation, we propose a three-tier decision architecture. Strategic Planning (5-year horizon) uses regression models to set infrastructure investment targets, such as desalination capacity expansion based on x 4 growth trajectory. Tactical Allocation (annual) involves running GP optimization with updated demand forecasts to determine source-level allocations for the upcoming tourism season. Finally, Operational Adjustment (monthly) requires monitoring actual vs. predicted allocations and triggering re-optimization if deviations exceed control limits, for example, when |actual − predicted| > 2 × RMSE.
This hierarchical approach ensures the framework complements rather than replaces existing SONEDE planning processes.
In terms of spatiotemporal and environmental boundary conditions, the framework’s current specification conditions on annual total demand ( y t ) as the primary contextual variable. This choice reflects several operational and statistical factors, including SONEDE’s annual reporting cycle for tourism water allocation, the annual cycles of strategic infrastructure decisions such as desalination capacity expansion, and the need for model parsimony given the N = 13 sample size, where incorporating additional contextual dimensions like season, location, or climate would risk overfitting. Regarding applicability boundaries, the proposed methodology is particularly suited for Mediterranean coastal tourism regions sharing characteristics such as multi-source portfolios, centralized utility governance, and high seasonal demand variability, as seen in coastal Spain, Southern Italy, and Greek islands. However, several contexts require substantial adaptation; for example, monsoon-driven systems in India or Southeast Asia require seasonal regression models to capture radical hydrological shifts, while decentralized water markets relying on inter-user competition necessitate a game-theoretic formulation. Furthermore, high-altitude regions where elevation affects pumping were not modeled in this coastal-centric framework, and areas with strong spatial heterogeneity require full GIS integration. Tunisia’s coastal tourism zone itself is characterized by a subtropical Mediterranean latitude ( 33 37 N) and a coastal plain elevation (0–200m) with minimal pumping costs. The climate is arid to semi-arid (annual P = 200 –400 mm), and the geology involves coastal aquifers alongside deep Saharan groundwater. Since these specific conditions are embedded in the historical data (2010–2022), transferability to regions with substantially different physical characteristics requires rigorous model recalibration to maintain reliability.
The integration of regression-based predictions as target values in Goal Programming represents an innovative approach combining learning from historical efficiencies with optimization under constraints. Unlike classical approaches where targets are set exogenously, our method uses patterns learned from data to inform allocation objectives while allowing optimal deviations when physical constraints require it.
The current implementation’s use of annual aggregated data inherently limits the granular integration of spatiotemporal and environmental variables. We acknowledge these limitations and provide a roadmap for future extensions.
Annual aggregation can mask intra-annual dynamics such as seasonal tourism peaks (June, September) and Julian day effects. Similarly, the current model aggregates across Tunisia’s coastal zone, averaging over latitude gradients and local microclimates. Future research with monthly datasets ( N 60 ) should employ seasonal sub-models where coefficients α i ( m t ) and β i ( m t ) vary by month.
Future iterations could explicitly model temperature-driven evapotranspiration rates and groundwater recharge dynamics. We recommend incorporating specific constraints derived from the Tunisian Water Code (Law 75-16), such as minimum environmental flows for wadis ( x 3 E F m i n ), sustainable aquifer yields ( x 5 S Y a n n u a l ), and brine discharge limits for desalination plants ( x 4 C a p e n v i r o ).
Tunisia’s centralized governance simplifies implementation, but inter-regional transfers are subject to physical capacity limits (e.g., x 6 15 Mm3/year for the Nord pipeline). While these constraints are implicit in the 2010–2022 historical data, they should be explicitly modeled in multi-period extensions to enhance prescriptive reliability.
The optimization framework operates within Tunisia’s regulatory environment as defined by the Water Code (Law 75-16 of 1975, amended 2001). This legal structure establishes several allocation priority hierarchies that should inform the priority weights w i in the goal programming formulation.
The regulatory allocation priority stack begins with drinking water supply, designated as the first priority and receiving the highest weight w drinking . Agricultural irrigation constitutes the second priority, followed by industrial and tourism uses at the third priority tier. Environmental flows for wadi ecosystems represent the fourth priority tier, with the caveat that they are mandatory minimum thresholds during critical periods.
The regulatory parameters that can be directly integrated into the optimization constraints include maximum sustainable yield for each aquifer, which constrains groundwater extraction such that x 5 S Y annual . Desalination plants must comply with brine discharge limits set by the National Agency for Environmental Protection, imposing the constraint x 4 C a p enviro . Inter-regional transfer quotas governed by bilateral agreements between governorates impose constraints of the form x 6 Q Nord and x 7 Q Sbeitla . Finally, Law 75-16 mandates minimum flow maintenance in surface water courses during drought periods through the constraint x 3 E F min .
Institutional coordination involves three key entities. SONEDE (Société Nationale d’Exploitation et de Distribution des Eaux) serves as the national utility for production and distribution, providing operational data and implementing allocation decisions. The Ministry of Agriculture oversees agricultural water rights and manages the inter-sectoral balance between agriculture and tourism through the CRDA. ANPE (Agence Nationale de Protection de l’Environnement) enforces environmental constraints including brine discharge limits and groundwater extraction caps.
The current model implementation treats regulatory constraints as implicit bounds captured by historical allocation patterns rather than as explicit inequality constraints. This approach is valid when historical allocations have respected legal limits, but becomes inadequate if future scenarios require testing allocations that approach or exceed these boundaries. It is recommended that future extensions explicitly encode regulatory parameters { S Y annual , C a p enviro , Q Nord , E F min } as hard constraints, with values obtained through formal data-sharing agreements with SONEDE and the Ministry of Agriculture.
While the present study contributes methodologically, several limitations should be acknowledged. There are data-related limitations, as there is a limited temporal extent that constrains the robustness of regression models, an aggregation at annual level masks intra-annual dynamics, and a lack of quality parameters in the optimization. In terms of methodological assumptions, linear aggregation assumption ignores potential synergies, the static regression relationships may not hold under future changes, and single-objective formulation does not explicitly address cost, energy, or environmental objectives. As for transferability considerations, the framework developed for Tunisia’s context may require adaptation for other regions with different governance structures.

7. Conclusions

This study demonstrates the effectiveness of a hybrid approach for Tunisia’s tourism sector water allocation, with potential transferability to similar contexts. The approach combines regression-based statistical learning and mathematical optimization to address the challenges of water resource management in the tourism sector. By integrating diverse regression models within an adaptive framework, the proposed method enhances both the accuracy and efficiency of resource allocation.
The literature review highlights the development of relevant paradigms such as Sustainable Decision Making, Sequential Learning and Optimization, and Integrated Learning and Optimization. This study bridges the gap between theoretical insights and practical application by offering a comprehensive solution capable of addressing the multifaceted objectives and constraints inherent to real-world resource management challenges.
A promising direction for future research would be to replace simple regression models with more sophisticated approaches such as hybrid decomposition-reconfiguration models [25] or ensemble methods, which could improve prediction robustness, particularly for long time horizons and in contexts of rapid environmental change.
Based on this research, we offer practical implementation recommendations. We suggest to invest in comprehensive, high-frequency monitoring systems; periodically update ML models as new data accumulates; use multi-scenario analysis as a participatory tool; begin with pilot zones before system-wide deployment; and integrate the framework with existing planning tools.
Regarding the contribution to decision-making under uncertainty, the contextual optimization framework developed in this study improves resilience in water allocation by establishing predictive thresholds that bound the uncertainty region, thereby reducing deviations from optimality. Building on reliability estimation approaches such as those proposed by Zhang and Bose [8], our method mitigates uncertainty-induced biases by integrating regression-based predictions directly into the optimization formulation. This approach enables decision-makers to navigate the inherent variability in water resource systems while maintaining alignment with historically efficient allocation patterns.
While the methodology demonstrates robust performance for Tunisia’s coastal tourism sector (2010–2022), its applicability to other contexts depends on boundary conditions.
Regarding data requirements, the applicability of this methodology to other contexts depends on specific data requirements, including a minimum temporal extent of 10–15 years to support regression power, complete coverage of demand cycles including disruption events, and consistent measurement protocols.
Concerning system characteristics, the framework is most beneficial for multi-source portfolios with stable governance structures and high seasonal demand variability, where the benefits of adaptive optimization outweigh the modeling overhead.

Author Contributions

Conceptualization, M.M., M.A.E. and F.S.P.; methodology, M.M., B.H. and T.C.; software, M.M. and T.C.; validation, M.M., B.H., M.A.E. and F.S.P.; formal analysis, M.M. and F.S.P.; investigation, M.M., B.H. and T.C.; resources, M.A.E. and F.S.P.; data curation, M.M. and T.C.; writing: original draft preparation, M.M.; writing: review and editing, M.A.E., F.S.P. and T.C.; visualization, M.M. and T.C.; supervision, M.A.E. and F.S.P.; project administration, T.C. and F.S.P.; funding acquisition, T.C. and F.S.P. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by RCM2+, Lusófona University/COFAC. This research received no further funding.

Data Availability Statement

The water allocation and tourism demand data (2010–2022) used in this study were obtained through official collaboration with the Société Nationale d’Exploitation et de Distribution des Eaux (SONEDE) and the Office National du Tourisme Tunisien under a research agreement with the University of Sfax. Aggregated annual data at the source-level used for all analyses are available upon request. Interested researchers may request additional details from the corresponding author. Regarding ethical and data confidentiality considerations, while aggregated source-level data are presented in Table 1 and are sufficient for full reproducibility of the regression analyses and optimization results, individual establishment-level data remain confidential under the terms of the SONEDE/ONTT collaboration agreement. This protects commercial confidentiality of individual tourism operators, competitive business information, and proprietary infrastructure configurations. Researchers seeking additional disaggregated data may contact the corresponding author to explore possibilities within appropriate institutional data-sharing frameworks.

Acknowledgments

We express our sincere gratitude to SONEDE and the Office National du Tourisme Tunisien for providing access to water allocation and tourism statistics data under research collaboration agreement. We particularly thank the Planning Department at SONEDE and the Research Directorate at ONTT for their valuable insights into Tunisia’s water resource challenges and tourism sector dynamics.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Nomenclature Used

Table A1. Nomenclature for indices, decision variables, parameters, and statistical notation.
Table A1. Nomenclature for indices, decision variables, parameters, and statistical notation.
SymbolDescriptionDomain/Units/Notes
Indices and Sets
  iWater source index { 1 , 2 , , 7 }
  tTime period index { 1 , 2 , , T }
  NNumber of observations13 (years 2010–2022)
  TNumber of planning periods1 (single-period demonstration)
  kRegression coefficient index { 0 , 1 , , p }
Decision Variables (Optimization Outputs)
   x i t Allocation from source i at time tMm3/year
   δ i t + Positive deviation (source i)Mm3/year
   δ i t Negative deviation (source i)Mm3/year
   δ t o t + Total positive deviationMm3/year
   δ t o t Total negative deviationMm3/year
Parameters (Model Inputs)
   y t Total water demand at time tMm3/year
   f ^ i ( x i t ) ML prediction functionMm3/year
   w i Priority weight (source i) [ 0 , 1 ] , dimensionless
   w t o t Weight (equilibrium constraint) [ 0 , 1 ] , dimensionless
   β j Regression coefficient jVarious (model-dependent)
   θ Generic parameter vector R p
Statistical Notation, Probability and Distributions
   R 2 ; CV- R 2 Coeff. of determination; Cross-validated [ 0 , 1 ]
  RMSE; MAERoot mean square error; Mean absolute errorMm3/year
  MAPEMean absolute percentage error%
   p ^ ( y | x ) Estimated conditional distribution
   K ( · ) ; hKernel function; Bandwidth parameter
   N ( μ , σ 2 ) Normal distributionMean μ , variance σ 2
Context and Tunisia Case Study
   c t Context vector at time t { y t , τ t , s t , r t }
   y t Total water demand (primary context)Mm3/year
   τ t Temporal/seasonal context (month) { 1 , , 12 }
   s t System state vector (infrastructure)Operational status
   r t Resource constraints and availabilityCapacity limits
   x 1 x 7 Water sources (Recycled to Sbeitla-Jelma)
  SONEDE; ONTTUtility and Tourism authorities
Spatial and Environmental Context
   ϕ i Latitude of source iDegrees N
   h i Elevation of source im above sea level
   C i Source-specific capacity constraintMm3/year
Network Topology
  ANetwork incidence matrixBinary
   a i j Connectivity indicator (source i to zone j) { 0 , 1 }
   C i j max Maximum pipeline/pumping capacityMm3/year
   x i j , t Flow from source i to zone j at time tMm3/year
Regulatory Parameters
   S Y annual Maximum sustainable aquifer yieldMm3/year
   C a p enviro Desalination brine discharge limitMm3/year
   Q Nord ; Q Sbeitla Inter-regional transfer quotasMm3/year
   E F min Minimum environmental flow requirementMm3/year
Note: All water volumes are expressed in Mm3/year (million cubic meters per year) unless otherwise specified. Weights and probabilities are dimensionless and bounded in [0, 1].

References

  1. Liu, X.; Qi, H.; Jia, S.; Guo, Y.; Liu, Y. Recent Advances in Optimization Methods for Machine Learning: A Systematic Review. Mathematics 2025, 13, 2210. [Google Scholar] [CrossRef]
  2. Smets, P. Varieties of ignorance and the need for well-founded theories. Inf. Sci. 1991, 57, 135–144. [Google Scholar] [CrossRef]
  3. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  4. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  5. Jang, H. A decision support framework for robust R&D budget allocation using machine learning and optimization. Decis. Support Syst. 2019, 121, 1–12. [Google Scholar] [CrossRef]
  6. Sadana, U.; Chenreddy, A.; Delage, E.; Forel, A.; Frejinger, E.; Vidal, T. A survey of contextual optimization methods for decision-making under uncertainty. Eur. J. Oper. Res. 2025, 320, 271–289. [Google Scholar] [CrossRef]
  7. Zarghami, S.A.; Zwikael, O. Measuring project resilience—Learning from the past to enhance decision making in the face of disruption. Decis. Support Syst. 2020, 160, 113831. [Google Scholar] [CrossRef]
  8. Zhang, X.; Bose, I. Reliability estimation for individual predictions in machine learning systems: A model reliability-based approach. Decis. Support Syst. 2024, 186, 114305. [Google Scholar] [CrossRef]
  9. Birge, J.R.; Louveaux, F. Introduction to Stochastic Programming, 3rd ed.; Springer: New York, NY, USA, 2011; p. 485. [Google Scholar] [CrossRef]
  10. Mišić, V.; Perakis, S. Data Analytics in Operations Management: A Review. Manuf. Serv. Oper. Manag. 2020, 22, 158–169. [Google Scholar] [CrossRef]
  11. Li, L.; Li, L.; Li, Q.; Shah, A.A. How Does Moderate Supervision Curb Elite Capture? Lessons from China’s Sustainable Water Governance. Sustainability 2025, 17, 9577. [Google Scholar] [CrossRef]
  12. Carvajal, J.; Sucozhañay, A.; Célleri, R.; Timbe, L. The Relationship Between Water, Society, and the Sustainable Development Goals: A Case Study of Forest Conservation in a Rural Community. Sustainability 2025, 17, 9548. [Google Scholar] [CrossRef]
  13. Morante, M.C.D.; Casas, A.F.; Rodríguez, C.M. Adaptive Water Management from a Socio-Ecological Perspective: A Systematic Review of Co-Learning Strategies and Traditional Knowledge. Sustainability 2025, 17, 9597. [Google Scholar] [CrossRef]
  14. Bertsimas, D.; Kallus, N. From Predictive to Prescriptive Analytics. Manag. Sci. 2019, 66, 1025–1044. [Google Scholar] [CrossRef]
  15. Mandi, A.; Amos, B.; Kolter, J.Z. Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities. J. Artif. Intell. Res. 2024, 80, 1623–1701. [Google Scholar] [CrossRef]
  16. Rodrigues, P.M.; Pinto, F.S.; Marques, R.C. A framework for enabling conditions for wastewater reuse. Sustain. Prod. Consum. 2024, 46, 355–366. [Google Scholar] [CrossRef]
  17. Bolognesi, T.; Pinto, F.S.; Farrelly, M. (Eds.) Routledge Handbook of Urban Water Governance; Routledge: London, UK, 2023. [Google Scholar] [CrossRef]
  18. Diogo, A.F.; Resende, R.A.; Oliveira, A.L. Optimised Selection of Water Supply and Irrigation Sources: A Case Study on Surface and Underground Water, Desalination, and Wastewater Reuse in a Sahelian Coastal Arid Region. Sustainability 2021, 13, 12696. [Google Scholar] [CrossRef]
  19. Diogo, A.F.; Oliveira, A.L. An Integrated Water Resources Solution for a Wide Arid to Semi-Arid Urbanized Coastal Tropical Region with Several Topographic Challenges: A Case Study. Water 2025, 17, 2750. [Google Scholar] [CrossRef]
  20. Elleuch, M.A.; Anane, M.; Euchi, J.; Frikha, A. Hybrid fuzzy multi-criteria decision making to solve the irrigation water allocation problem in the Tunisian case. Agric. Syst. 2019, 176, 102644. [Google Scholar] [CrossRef]
  21. Elleuch, M.A.; Elleuch, L.; Frikha, A. A hybrid approach for water resources management in Tunisia. Int. J. Water 2019, 13, 80–99. [Google Scholar] [CrossRef]
  22. Elleuch, M.A.; Euchi, J.; Haddar, B.; Frikha, A. A fuzzy mathematical model with group decision-making to solve the water allocation problem: Tunisian case. Process Integr. Optim. Sustain. 2023, 7, 439–472. [Google Scholar] [CrossRef]
  23. Mallek, M.; Elleuch, M.A.; Euchi, J.; Jerbi, Y. Optimum design of on-grid PV/wind hybrid system for desalination plant: A case study in Sfax, Tunisia. Desalination 2024, 576, 117358. [Google Scholar] [CrossRef]
  24. Hao, C.; Qiu, J.; Li, F. Methodology for analyzing and predicting the runoff and sediment into a reservoir. Water 2017, 9, 440. [Google Scholar] [CrossRef]
  25. Wang, S.; Qiu, J.; Li, F. Hybrid decomposition-reconfiguration models for long-term solar radiation prediction only using historical radiation records. Energies 2018, 11, 1376. [Google Scholar] [CrossRef]
  26. Deng, Y.; Sen, S. Predictive stochastic programming. Comput. Manag. Sci. 2022, 19, 65–98. [Google Scholar] [CrossRef]
  27. Kannan, R.; Bayraksan, G.; Luedtke, J.R. Residuals-based distributionally robust optimization with covariate information. Math. Program. 2024, 207, 369–425. [Google Scholar] [CrossRef]
  28. Ferreira, K.J.; Lee, B.H.A.; Simchi-Levi, D. Analytics for an Online Retailer: Demand Forecasting and Price Optimization. Manuf. Serv. Oper. Manag. 2015, 18, 69–88. [Google Scholar] [CrossRef]
  29. Liu, S.; He, L.; Shen, M.; Max-Shen, Z. On-Time Last-Mile Delivery: Order Assignment with Travel-Time Predictors. Manag. Sci. 2021, 67, 4095–4119. [Google Scholar] [CrossRef]
  30. Lin, C.; Kaushik, C.; Dyer, E.; Muthukumar, V. The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective. J. Mach. Learn. Res. 2024, 25, 4470–4554. [Google Scholar]
  31. Srivastava, A.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  32. Kuhn, D.; Esfahani, P.M.; Nguyen, V.A.; Shafieezadeh-Abadeh, S. Wasserstein distributionally robust optimization: Theory and applications in machine learning. Oper. Res. Proc. 2019, 130–166. [Google Scholar] [CrossRef]
  33. Bengio, Y. Using a financial training criterion rather than a prediction criterion. Int. J. Neural Syst. 1997, 8, 433–443. [Google Scholar] [CrossRef] [PubMed]
  34. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  35. Wang, L.; Jiang, R.; Zhao, Y.; Xie, J.; Zhang, X.; Zuo, G.; Wang, S.; Lu, X. A Bayesian network-based bi-level multi-objective programming model for uncertainty agricultural water management and allocation optimization. J. Clean. Prod. 2025, 528, 146734. [Google Scholar] [CrossRef]
  36. Liu, J.; Zhou, X.L.; Xu, Y.P.; Fan, Z.W. Intensive hierarchical optimal water allocation with uncertainties based on Non-stationary Transformers. J. Hydrol. 2025, 661, 133646. [Google Scholar] [CrossRef]
  37. Vahedizade, S.; Emamjomehzadeh, O.; Kerachian, R.; Forouhar, L. A real-time market-based framework for basin-wide surface water pricing and allocation considering the available water uncertainty. J. Environ. Manag. 2023, 345, 118767. [Google Scholar] [CrossRef]
  38. Yue, W.; Yu, S.; Xu, M.; Rong, Q.; Xu, C.; Su, M. A Copula-based interval linear programming model for water resources allocation under uncertainty. J. Environ. Manag. 2022, 317, 115318. [Google Scholar] [CrossRef] [PubMed]
  39. Li, M.; Fu, Q.; Singh, V.P.; Liu, D.; Gong, X. Risk-based agricultural water allocation under multiple uncertainties. Agric. Water Manag. 2020, 233, 106105. [Google Scholar] [CrossRef]
Figure 1. Two-phase contextual optimization framework. Phase 1 learns historical allocation patterns via regression (predict). Phase 2 optimizes allocations subject to demand balance and physical constraints, using learned patterns as soft targets (prescribe).
Figure 1. Two-phase contextual optimization framework. Phase 1 learns historical allocation patterns via regression (predict). Phase 2 optimizes allocations subject to demand balance and physical constraints, using learned patterns as soft targets (prescribe).
Environments 13 00105 g001
Figure 2. Detailed mathematical components of the contextual optimization framework. Regression learns f ^ i ( y ) mappings; optimization uses these as adaptive targets via deviation variables. Context currently specified as { y t } , extensible to multidimensional vectors.
Figure 2. Detailed mathematical components of the contextual optimization framework. Regression learns f ^ i ( y ) mappings; optimization uses these as adaptive targets via deviation variables. Context currently specified as { y t } , extensible to multidimensional vectors.
Environments 13 00105 g002
Figure 3. Regression Model Performance ( R 2 ) Across Water Sources and Functional Forms. Each cluster represents a water source; bars show R 2 values for five regression types. The best performing model for each source (marked with ⋆) is selected for the optimization framework (Table 2). Color legend applies consistently across all sources.
Figure 3. Regression Model Performance ( R 2 ) Across Water Sources and Functional Forms. Each cluster represents a water source; bars show R 2 values for five regression types. The best performing model for each source (marked with ⋆) is selected for the optimization framework (Table 2). Color legend applies consistently across all sources.
Environments 13 00105 g003
Figure 4. Leave-One-Out Cross-Validation residual distributions ( e ( t ) = y t y ^ ( t ) ) for all sources. Symmetric distributions around zero indicate unbiased predictions. Source x5 (Groundwater) exhibits lowest variance (IQR = 0.82 Mm3); x3 (Surface) shows highest uncertainty (IQR = 2.14 Mm3). The 2020 observation (outlier markers) confirms unprecedented COVID-19 demand disruption.
Figure 4. Leave-One-Out Cross-Validation residual distributions ( e ( t ) = y t y ^ ( t ) ) for all sources. Symmetric distributions around zero indicate unbiased predictions. Source x5 (Groundwater) exhibits lowest variance (IQR = 0.82 Mm3); x3 (Surface) shows highest uncertainty (IQR = 2.14 Mm3). The 2020 observation (outlier markers) confirms unprecedented COVID-19 demand disruption.
Environments 13 00105 g004
Figure 5. Observed vs. Predicted Water Allocations with COVID-19 Period Highlighted. Points below the diagonal indicate under-prediction; above indicate over-prediction. The 2020–2021 COVID-19 period (red diamonds) shows systematic deviations in x 1 , x 2 , x 6 due to unprecedented 29.8% demand reduction. Combined panel (bottom right) displays normalized allocations (0–1 scale) across all sources with overall GP model R 2 = 0.998 .
Figure 5. Observed vs. Predicted Water Allocations with COVID-19 Period Highlighted. Points below the diagonal indicate under-prediction; above indicate over-prediction. The 2020–2021 COVID-19 period (red diamonds) shows systematic deviations in x 1 , x 2 , x 6 due to unprecedented 29.8% demand reduction. Combined panel (bottom right) displays normalized allocations (0–1 scale) across all sources with overall GP model R 2 = 0.998 .
Environments 13 00105 g005
Figure 6. Objective function values across 20 weight scenarios. Each instance represents a distinct priority vector w (detailed in Table 3). Lower values indicate better conformity between ML predictions and optimal allocations under physical constraints.
Figure 6. Objective function values across 20 weight scenarios. Each instance represents a distinct priority vector w (detailed in Table 3). Lower values indicate better conformity between ML predictions and optimal allocations under physical constraints.
Environments 13 00105 g006
Figure 7. Sensitivity analysis of objective function values across 20 weight scenarios. The x-axis shows Instance Number (1–20), where each instance represents a distinct priority allocation vector w = [ w 1 , w 2 , , w 7 ] among the seven water sources. The y-axis displays the minimized total weighted deviation (Objective Function Value), where lower values indicate better conformity between machine learning predictions and feasible allocations.
Figure 7. Sensitivity analysis of objective function values across 20 weight scenarios. The x-axis shows Instance Number (1–20), where each instance represents a distinct priority allocation vector w = [ w 1 , w 2 , , w 7 ] among the seven water sources. The y-axis displays the minimized total weighted deviation (Objective Function Value), where lower values indicate better conformity between machine learning predictions and feasible allocations.
Environments 13 00105 g007
Figure 8. Monte Carlo uncertainty propagation (N = 1000). (a) Objective function distribution (median = 0.096, right-skewed); (b) Desalinated water (x4) exhibits bimodal allocation pattern reflecting low-demand (6.8 Mm3) vs. high-demand (7.8 Mm3) operating regimes; (c) Cumulative feasibility converges to 96.3% [95% CI: 95.1–97.5%].
Figure 8. Monte Carlo uncertainty propagation (N = 1000). (a) Objective function distribution (median = 0.096, right-skewed); (b) Desalinated water (x4) exhibits bimodal allocation pattern reflecting low-demand (6.8 Mm3) vs. high-demand (7.8 Mm3) operating regimes; (c) Cumulative feasibility converges to 96.3% [95% CI: 95.1–97.5%].
Environments 13 00105 g008
Table 1. Annual water allocation data for Tunisia’s tourism sector (2010–2022).
Table 1. Annual water allocation data for Tunisia’s tourism sector (2010–2022).
Yeary x 1 x 2 x 3 x 4 x 5 x 6 x 7 Hotels
(Total)(Recycled)(Harvested)(Surface)(Desal.)(Ground)(Nord)(Sbeitla)(N)
201082.41.80.931.23.626.810.42.7856
201179.81.70.830.13.825.910.12.5842
201285.31.91.032.44.227.611.02.8867
201388.62.01.133.84.828.511.53.0881
201491.22.11.235.05.529.411.93.1894
201587.42.01.133.25.828.111.22.9878
201689.82.11.234.16.428.911.63.0886
201794.52.31.336.27.230.212.43.2912
201898.72.41.438.08.531.513.03.4935
2019103.22.61.539.89.832.813.63.6958
202072.51.50.727.47.223.29.22.3924
202184.61.91.032.010.527.010.82.8931
2022108.52.81.641.215.834.514.03.8972
Mean90.52.11.134.26.928.911.73.0903
Std Dev9.80.40.34.03.43.31.40.440
% Share1002.31.237.87.631.912.93.3-
Notes: All water volumes in Mm3/year. y = total tourism sector water demand; x1–x7 = source allocations; Hotels = number of registered tourism establishments. The 2020 decrease reflects COVID-19 pandemic impact on tourism activities. The progressive increase in desalinated water ( x 4 ) from 4.4% (2010) to 14.6% (2022) illustrates Tunisia’s strategic shift toward non-conventional sources. Data sources: SONEDE operational records and ONTT tourism statistics under University of Sfax research collaboration agreement (2010–2022).
Table 2. Regression analysis results for water source allocation prediction in Tunisia’s tourism sector.
Table 2. Regression analysis results for water source allocation prediction in Tunisia’s tourism sector.
Source Exp.Lin.Log.Poly.Pow.
Recycled f ( y ) 74.88 e 0.22 x 31.00 x + 75.24 50.39 ln ( x ) + 121 2.55 x 2 + 44.52 x + 64.53 102.91 x 0.37
water R 2 0.3230.3890.3500.4000.307
Harvested f ( y ) 45.68 e 0.07 x 9.31 x + 10.65 117.72 ln ( x ) 151 0.09 x 2 + 6.33 x + 29.37 11.36 x 0.95
water R 2 0.7390.8190.7540.8230.866
Surface f ( y ) 60.39 e 0.04 x 5.05 x + 45.21 69.27 ln ( x ) 38.92 0.05 x 2 + 2.78 x + 62.02 28.73 x 0.55
water R 2 0.6910.8240.6410.8390.670
Desal. f ( y ) 46.34 e 0.02 x 2.13 x + 19.71 91.28 ln ( x ) 210.3 0.00 x 2 + 1.83 x + 26.19 6.06 x 0.77
water R 2 0.7770.7190.6370.7190.766
Ground f ( y ) 64.21 e 0.04 x 5.24 x + 52.57 68.29 ln ( x ) 23.20 0.00 x 2 + 5.13 x + 53.30 31.90 x 0.55
water R 2 0.7320.9230.7550.9000.766
Nord f ( y ) 59.85 e 0.03 x 3.84 x + 45.50 66.77 ln ( x ) 51.98 0.05 x 2 + 0.86 x + 72.96 26.03 x 0.53
water R 2 0.6380.7380.5640.7700.586
Sbeitla f ( y ) 48.76 e 0.12 x 18.95 x + 0.05 184.05 ln ( x ) 214.8 0.73 x 2 + 35.09 x 71.22 10.93 x 1.23
Jelma R 2 0.4360.6520.7050.6650.495
Bold R 2 values indicate the best model fit for each water source selected for the optimization framework. Coefficients are dimensionally consistent with x and y expressed in Mm3/year.
Table 3. Leave-One-Out Cross-Validation Results with Overfitting Assessment (N = 13 folds).
Table 3. Leave-One-Out Cross-Validation Results with Overfitting Assessment (N = 13 folds).
SourceBest ModelR2CV-R2RMSECVMAECVMAPEOverfit
(Train)(LOOCV)(Mm3/yr)(Mm3/yr)(%)Index *
x1 (Recycled)Polynomial0.4000.3120.1830.1426.760.088
x2 (Harvested)Power0.8660.8230.0870.0696.270.043
x3 (Surface)Polynomial0.8390.7941.6181.2833.750.045
x4 (Desalinated)Exponential0.7770.7210.9760.75810.410.056
x5 (Groundwater)Linear0.9230.9081.0460.8312.880.015
x6 (Nord)Polynomial0.7700.7120.5210.4133.530.058
x7 (Sbeitla)Logarithmic0.7050.6510.1920.1535.100.054
Mean-0.7540.7030.6600.5215.530.051
* Overfitting Index = Training R2− CV-R2. Values > 0.10 indicate significant overfitting (red). Values < 0.03 indicate excellent generalization (green). Critical Finding: x1 (Recycled water) shows borderline overfitting (0.088), suggesting the polynomial model may be capturing noise rather than signal due to low absolute volumes (mean 2.1 Mm3/yr). Data source: SONEDE operational records (2010–2022).
Table 4. Summary of 20 Optimization Instances with Weight Scenarios and Source-Specific Deviations.
Table 4. Summary of 20 Optimization Instances with Weight Scenarios and Source-Specific Deviations.
InstanceWeight Vector w = [ w 1 , w 2 , w 3 , w 4 , w 5 , w 6 , w 7 ] Opt. Val.Main Deviation
[Normalized, w i = 1 ][Mm3][Type, Mm3]
1[0.125, 0.102, 0.113, 0.140, 0.157, 0.158, 0.205]0.289 δ 4 = 2.066
2[0.226, 0.155, 0.216, 0.022, 0.203, 0.074, 0.104]0.045 δ 4 = 2.066
[0.155, 0.130, 0.121, 0.201, 0.018, 0.191, 0.184]0.397 δ 3 = 3.279
4[0.144, 0.293, 0.086, 0.032, 0.047, 0.267, 0.131]0.066 δ 4 = 2.066
5[0.022, 0.033, 0.275, 0.091, 0.219, 0.249, 0.110]0.093 δ 1 + = 4.173
6[0.062, 0.176, 0.022, 0.213, 0.237, 0.011, 0.279]0.032 δ 6 = 3.014
7[0.171, 0.171, 0.200, 0.017, 0.201, 0.060, 0.181]0.035 δ 4 = 2.066
8[0.238, 0.232, 0.049, 0.028, 0.201, 0.081, 0.172]0.057 δ 4 = 2.066
9[0.014, 0.075, 0.283, 0.069, 0.111, 0.174, 0.273]0.057 δ 1 + = 4.173
10 ⋆⋆[0.002, 0.152, 0.151, 0.327, 0.032, 0.031, 0.306]0.007 δ 1 + = 4.173
11[0.282, 0.092, 0.134, 0.199, 0.166, 0.052, 0.075]0.158 δ 6 = 3.014
12[0.283, 0.174, 0.128, 0.111, 0.164, 0.105, 0.036]0.229 δ 4 = 2.066
13[0.143, 0.078, 0.105, 0.253, 0.226, 0.084, 0.110]0.252 δ 6 = 3.014
14[0.176, 0.090, 0.256, 0.146, 0.134, 0.122, 0.076]0.081None
15[0.206, 0.097, 0.096, 0.145, 0.114, 0.211, 0.133]0.101 δ 3 = 3.279
16[0.193, 0.115, 0.170, 0.191, 0.093, 0.090, 0.138]0.188 δ 4 = 2.066
17[0.165, 0.119, 0.234, 0.132, 0.092, 0.213, 0.045]0.073 δ 1 + = 4.173
18[0.070, 0.101, 0.152, 0.184, 0.262, 0.055, 0.176]0.101 δ 4 = 2.066
19[0.217, 0.058, 0.213, 0.174, 0.111, 0.119, 0.108]0.163None
20[0.238, 0.092, 0.175, 0.209, 0.136, 0.066, 0.084]0.118 δ 6 = 3.014
Notes: Weight Vector—All weights w i [ 0 , 1 ] with normalization i = 1 7 w i = 1 ; generated via Latin Hypercube Sampling (LHS) for space coverage. Units (dimensionless, relative priority). Optimal Value—Units (Mm3). Total weighted deviation i w i ( δ i t + + δ i t ) + w t o t ( δ t o t + + δ t o t ) . Lower values indicate better conformity between ML predictions and allocations. Main Deviation— δ i ± [source index, value in Mm3]. δ + for positive deviation (allocation > prediction); δ for negative deviation (allocation < prediction). “None” indicates all deviations < 0.5 Mm3. For stability, optimal allocations x i t remain stable across scenarios (Std. Dev. < 2% of mean), confirming model robustness to priority shifts. Guide—Instance 10 (⋆⋆) shows best performance (0.007) via stable source prioritization ( w 4 = 0.327 , w 7 = 0.306 ). Instance 3 (⋆) shows lowest conformity (0.397) due to variable source weighting. Computation—scipy.optimize.minimize (SLSQP); on Intel i7, 8 GB RAM; with avg. runtime 47.3 ms per instance.
Table 5. Benchmark comparison across four allocation strategies and three demand scenarios.
Table 5. Benchmark comparison across four allocation strategies and three demand scenarios.
ScenarioMetricS1: FixedS2: Regr-GPS3: Equal-WS4: Robust
Scenario A: Mean Demand (y = 90.5 Mm3)
Total Deviation0.3840.1030.2470.156
Max Source Deviation4.83 (x3)3.28 (x3)4.17 (x6)3.94 (x3)
Constraint Violations2 sources01 source0
Comp. Time (ms)0.147.342.1238.7
Scenario B: High Demand (y = 105 Mm3, 95th percentile)
Total Deviation0.5210.1870.3940.223
FeasibilityFailedYesYesYes
x4 Utilization (%)78.491.284.688.7
Reallocation from S.A.14.5 Mm311.2 Mm314.5 Mm312.8 Mm3
Scenario C: Extreme Stress (y = 120 Mm3, theoretical drought)
Total DeviationAll strategies infeasible (total capacity = 118.3 Mm3)
Shortage (Mm3)6.71.74.22.9
Priority Source Cutx7 (−100%)x1 (−45%)x2 (−100%)x7 (−88%)
Notes: Statistical Significance—Paired t-test comparing S2 vs. S1 deviations across 20 weight scenarios—t(19) = 8.73, p < 0.001, Cohen’s d = 1.95 (large effect size). S2 vs. S3—t(19) = 4.21, p< 0.001, d = 0.94 (large effect). Interpretation—Regression-informed approach demonstrates statistically robust superiority.
Table 6. Simulation Protocol: Monte Carlo Uncertainty Propagation.
Table 6. Simulation Protocol: Monte Carlo Uncertainty Propagation.
Simulation Protocol
1. Initialize: Set best-performing priority weights ( w * ).
2. Sampling: For each replication s = 1 , , 1000 :
   a. Generate perturbed demand y s N ( μ y , σ y 2 ) .
   b. Generate prediction shocks ϵ i , s N ( 0 , RMSE i 2 ) for each source i.
   c. Update targets: f ^ i , s = f ^ i + ϵ i , s .
3. Execution: Solve Goal Programming model with { y s , f ^ i , s } .
4. Aggregation: Compute statistics for objective values, allocations, and feasibility.
Table 7. Monte Carlo simulation results: robustness to input uncertainty (1000 replications).
Table 7. Monte Carlo simulation results: robustness to input uncertainty (1000 replications).
MetricMeanStd Dev95% CICV (%)Range
Objective Function Performance
   Objective Value (Z)0.1180.084[0.009, 0.412]71.2[0.003, 0.521]
   Deviation from Nominal+14.6%---[−97.1%, +405.8%]
Allocation Variability (Mm3/yr)
   x1 (Recycled)2.080.19[1.73, 2.46]9.1[1.52, 2.68]
   x2 (Harvested)1.120.09[0.95, 1.30]8.0[0.88, 1.42]
   x3 (Surface)33.821.67[30.64, 37.12]4.9[29.14, 38.95]
   x4 (Desalinated)7.261.02[5.38, 9.35]14.1[4.71, 10.58]
   x5 (Groundwater)29.151.08[27.12, 31.29]3.7[26.43, 32.07]
   x6 (Nord)11.940.54[10.94, 13.08]4.5[10.32, 13.76]
   x7 (Sbeitla-Jelma)2.960.20[2.60, 3.37]6.8[2.41, 3.62]
Feasibility and Constraint Compliance
   Feasible solutions (n)963----
   Feasibility rate (%)96.3%-[95.1, 97.5]--
   Mean constraint violation *0.03 Mm30.12[0.00, 0.42]-[0.00, 0.84]
   Max constraint violation *0.52 Mm30.23[0.18, 0.84]-[0.00, 1.47]
* Measured only for infeasible solutions (n = 37). Nominal Z = 0.103 (deterministic Instance 10 solution). CV = Coefficient of Variation (Std Dev/Mean × 100%). Red indicates sources with CV > 10%, requiring enhanced monitoring.
Table 8. Sensitivity analysis on demand uncertainty magnitude.
Table 8. Sensitivity analysis on demand uncertainty magnitude.
Scenario σ y (Mm3Feasibility Rate (%)Mean Z95% CI Width
Conservative4.9 (50% reduction)98.70.0940.201
Baseline9.8 (observed)96.30.1180.403
Pessimistic14.7 (50% increase)92.10.1470.618
Table 9. Comparative assessment of uncertainty-based water allocation methodologies.
Table 9. Comparative assessment of uncertainty-based water allocation methodologies.
MethodologyPerformanceData #TimeContextDistinguishing Feature
Wang [35] Bayesian95.7% resilience>50HighAgriculturalStakeholder preference elicitation
Liu [36] Transformer12–18% gain>100V.HighUrbanNon-stationary trend capture
Vahedizade [37] ANFIS45% benefit>30MediumBasinReal-time market pricing
Yue [38] Copula-ILPRisk-controlled>20HighMulti-regionJoint probability modeling
Li [39] SP + CVaRTail risk < 5%>25HighAgriculturalDownside risk protection
This Study96.3% feasible13LowTourismData-efficient transparency
Note: Performance metrics vary by study definition; comparisons are approximate. Time: Low < 1 min, Medium 1–5 min, High > 5 min, V. High requires GPU. Our 73% improvement (Table 5) is contextually comparable to reported gains accounting for differing baselines.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mallek, M.; Haddar, B.; Elleuch, M.A.; Pinto, F.S.; Cetrulo, T. Water Resource Allocation: A Learning-Based Optimization Framework for Sustainable Decision-Making Under Uncertainty. Environments 2026, 13, 105. https://doi.org/10.3390/environments13020105

AMA Style

Mallek M, Haddar B, Elleuch MA, Pinto FS, Cetrulo T. Water Resource Allocation: A Learning-Based Optimization Framework for Sustainable Decision-Making Under Uncertainty. Environments. 2026; 13(2):105. https://doi.org/10.3390/environments13020105

Chicago/Turabian Style

Mallek, Marwa, Boukthir Haddar, Mohamed Ali Elleuch, Francisco Silva Pinto, and Tiago Cetrulo. 2026. "Water Resource Allocation: A Learning-Based Optimization Framework for Sustainable Decision-Making Under Uncertainty" Environments 13, no. 2: 105. https://doi.org/10.3390/environments13020105

APA Style

Mallek, M., Haddar, B., Elleuch, M. A., Pinto, F. S., & Cetrulo, T. (2026). Water Resource Allocation: A Learning-Based Optimization Framework for Sustainable Decision-Making Under Uncertainty. Environments, 13(2), 105. https://doi.org/10.3390/environments13020105

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop