1. Introduction
Reference evapotranspiration (ET
0) is a critical variable for estimating crop water demand, designing efficient irrigation systems, and managing water resources sustainably. Its importance is especially pronounced in arid and semi-arid regions, where water scarcity poses a substantial constraint on agricultural productivity. The FAO-56 Penman–Monteith equation is widely recognized as the most robust and physically consistent method for ET
0 estimation [
1]. However, the practical application of this method is often hindered in developing countries due to the unavailability or low quality of required meteorological data, such as air temperature, humidity, solar radiation, wind speed, and sunshine duration [
2,
3]. In response to these limitations, a range of low-input alternatives has been proposed. Empirical models such as Hargreaves–Samani or Priestley–Taylor require fewer inputs but often underperform across diverse climatic conditions due to their limited generalizability [
4,
5,
6].
To address the trade-off between input simplicity and prediction accuracy, machine learning (ML) techniques have recently emerged as promising data-driven approaches. In recent years, different ML methods including artificial neural networks (ANNs), k-nearest neighbors (KNN), and adaptive neuro-fuzzy inference systems (ANFISs) have shown great potential for estimating ET
0 using a limited number of readily available input variables [
7,
8,
9]. These approaches offer notable flexibility in the selection of inputs, making them particularly suitable for regions with sparse or incomplete meteorological records. Among these ML techniques, ensemble methods have attracted particular attention due to their superior predictive performance and robustness. By aggregating the outputs of multiple base learners, ensemble models—such as random forest (RF) and extreme gradient boosting (XGB)—can mitigate overfitting and enhance generalizability across varying environmental conditions. This is especially advantageous when modeling heterogeneous data collected from diverse climatic zones [
10].
Building on this advantage, researchers have increasingly explored the potential of spatial and temporal features as sole predictors in ET
0 estimation. Several studies [
11,
12] have demonstrated the effectiveness of ML models using spatial–temporal inputs—such as latitude, longitude, altitude, and month number—for ET
0 estimation, even in the absence of meteorological variables.
This is because geographical variables such as latitude and altitude indirectly capture climatological influences by representing solar radiation, temperature gradients and atmospheric pressure, while the number of months serves as a proxy for seasonal variations [
13]. Given the increasing climate variability and data scarcity in many parts of the world, such data-light approaches could significantly inform policy decisions related to sustainable agriculture and water allocation.
Despite their promising potential, the application of such models with little effort in geographically underrepresented and data-poor regions—such as Paraguay—is still limited. This study aims to evaluate the capability of modern ML algorithms to estimate ET0 solely from geographic and temporal predictors, namely latitude, longitude, altitude, and the number of months. In this way, it addresses a critical gap in the literature concerning the robustness, accuracy, and spatial transferability of cost-effective, data-efficient ET0 estimation frameworks within the South American context.
To fulfill this aim, a set of ML models was developed and evaluated for monthly ET0 prediction using only four easily obtainable predictors: latitude, longitude, altitude, and month number. Five ML algorithms—ANN, KNN, ANFIS, RF, and XGB—were applied within a five-stage modeling framework, (1) ET0 calculation (2) baseline model evaluation, (3) ensemble modeling, (4) leave-one-station-out validation, and (5) spatial interpolation, using the inverse distance weighting (IDW) method.
To further enhance the practical value of the developed models, spatially continuous ET0 maps were generated using the IDW interpolation method. IDW is a simple yet effective geostatistical approach that estimates values at unsampled locations by weighting nearby observations based on their inverse distance. In the context of ET0 estimation, IDW allows the conversion of point-based predictions into continuous surface maps, which are particularly useful for spatial decision-making in irrigation planning and water resource allocation. By visualizing ET0 patterns across the landscape, stakeholders can identify high-demand areas, optimize crop selection, and implement region-specific water-saving strategies. This is especially important in data-scarce regions, where the lack of dense meteorological networks often limits spatially explicit water management practices.
The novelty of this research lies in its demonstration that robust and spatially continuous ET0 estimates can be achieved in the absence of conventional meteorological data. This framework provides a scalable and economically viable alternative for ET0 estimation in data-scarce environments. Given its cost-effectiveness and ease of implementation, the proposed approach holds strong potential to support irrigation scheduling, agricultural planning, and water management in developing regions.
4. Discussion
The comparative evaluation of ML and neuro-fuzzy inference systems in this study revealed substantial differences in predictive capability for estimating ET
0 using only geo-temporal predictors (latitude, longitude, altitude, and month number). While all models were able to capture the general seasonal dynamics of ET
0, their precision and robustness varied notably, reflecting differences in their ability to model the nonlinear and spatially heterogeneous nature of atmospheric water demand in Paraguay. These findings are broadly consistent with previous efforts to estimate ET
0 from limited or purely geo-temporal predictors [
12], but the present study advances this line of research by providing a systematic, Paraguay-wide evaluation across 19 stations with monthly mean data.
The ANFIS configured with a Gaussian membership function consistently outperformed all other models, achieving the lowest RMSE (0.289 mm day
−1) and highest R
2 (0.950) during the testing phase. This superior performance is attributable to ANFIS’s hybrid architecture, which combines the adaptive learning capabilities of neural networks with the interpretability and flexibility of fuzzy logic. Previous studies have also highlighted the effectiveness of Gaussian-shaped membership functions for hydrometeorological modeling [
24,
38,
39], and our results extend this evidence by demonstrating that such functions remain highly effective even under minimal predictor inputs restricted to latitude, longitude, altitude, and month.
Tree-based ensemble methods, particularly RF and XGB, provided highly competitive results (R
2 ≈ 0.910–0.915; RMSE ≈ 0.380 mm day
−1), with superior stability across stations compared to ANFIS. The strong generalization capacity of RF and XGB can be attributed to their ability to capture complex nonlinear interactions while controlling overfitting through ensemble averaging and regularization, respectively. Their robustness in the leave-one-station-out validation indicates that such methods are more resilient to the spatial variability and data sparsity typical of meteorological networks, a finding consistent with previous work on climate variable interpolation and prediction [
40,
41]. Our contribution includes the use of leave-one-station-out validation, allowing spatial generalizability to be explicitly assessed in a data-scarce national context, which adds value beyond the approaches typically used in earlier studies.
The KNN model achieved commendable accuracy given its algorithmic simplicity (R
2 = 0.906; RMSE = 0.394 mm day
−1) and produced exceptionally high performance at certain stations (e.g., R
2 > 0.96 at Pedro Juan Caballero). Nonetheless, its station-level variability in leave-one-station-out tests underscores its reliance on local data density and its sensitivity to spatial clustering effects. By contrast, the ANN, constrained by a shallow architecture, exhibited the weakest predictive performance overall (R
2 = 0.882), reaffirming that insufficient network depth and complexity limit the model’s capacity to approximate the highly nonlinear processes driving ET
0 variation. The sensitivity of KNN to local data density and the underperformance of a shallow ANN are consistent with earlier reports on the limitations of distance-based learners and under-parameterized neural networks in hydrological modeling tasks [
42,
43]. This comparison underlines the importance of selecting models with sufficient complexity and addressing spatial clustering effects when applying ML to ET
0 prediction.
The ensemble modeling framework further enhanced predictive accuracy, with the inverse RMSE-based weighting scheme yielding slightly better performance than the R
2-based scheme (R
2 = 0.925 vs. 0.923 in testing). The increased weight assigned to ANFIS under the inverse RMSE criterion likely contributed to this improvement, as the scheme prioritized models with lower prediction errors. This finding is consistent with ensemble learning theory, which emphasizes the benefits of weighting base learners according to performance-related criteria [
34,
35,
36]. Although the gains were modest, the consistently high R
2 values (>0.92) across weighting schemes demonstrate the robustness of the parallel hybrid framework.
From a climatological perspective, the spatial distribution maps of ET
0 revealed a clear and recurrent annual cycle, with maxima in summer (January–March) and minima in winter (May–July), modulated by Paraguay’s latitudinal extent, topographic variation, and seasonal shifts in temperature and solar radiation. Persistent hotspots in the northeast and southeast highlight regions of elevated atmospheric water demand, which may correspond to agricultural zones with high evapotranspiration losses. These patterns are in agreement with FAO-56 Penman–Monteith–based climatologist for similar subtropical regions in South America [
44,
45]. By explicitly mapping these dynamics with ML–derived estimates, our study extends earlier climatological assessments and provides a spatially detailed reference for irrigation planning in Paraguay.
It should be acknowledged that the models developed in this study were calibrated and validated using ET0 values derived from the FAO-56 Penman–Monteith equation at 19 meteorological stations in Paraguay. Therefore, the ML models do not estimate ET0 from physical principles directly, but rather approximate the Penman–Monteith outputs through geo-temporal predictors. While this approach is contingent on the spatial patterns of pre-calculated ET0, it offers a data-efficient surrogate framework that can be particularly useful in regions where the full set of meteorological inputs required for Penman–Monteith is unavailable. In this sense, our methodology complements rather than replaces physically based methods, and its primary value lies in extending ET0 estimation to data-scarce environments where conventional computation is impractical.
Regarding transferability, direct application of these models to other regions without recalibration may lead to biased or unreliable estimates, as also reported in previous ET
0 modeling studies that emphasize the need for local adjustment [
24,
46]. To ensure robust performance in different environments, the framework should be recalibrated using regionally derived Penman–Monteith (or equivalent) ET
0 values. Nevertheless, the conceptual framework is highly adaptable and can be extended to diverse climatic zones.
Finally, we note that the present models are intended for estimating ET
0 (mm day
−1) under existing climatic conditions. Prediction of future ET
0 would require the inclusion of time-lagged predictors and potentially dynamic climate projections, as highlighted in previous studies exploring ET
0 forecasting under climate change scenarios [
47,
48]. Incorporating such approaches represents a promising direction for future research and could further extend the applicability of data-driven ET
0 models in agricultural and water resource management.
5. Conclusions
This study demonstrated the effectiveness of ML algorithms and neuro-fuzzy inference systems for estimating ET0 using only geo-temporal predictors such as latitude, longitude, altitude, and month. Among the evaluated approaches, ANFIS with Gaussian membership functions consistently achieved the highest accuracy, while ensemble-based models such as RF and XGB exhibited strong robustness and stability across stations. Furthermore, ensemble integration enhanced predictive skill, with inverse RMSE-based weighting performing slightly better than R2-based weighting. Spatial ET0 distribution maps generated through IDW interpolation revealed pronounced seasonal variability, with higher evaporative demand in the Chaco region and lower values in eastern Paraguay, particularly during winter months. These findings confirm that data-efficient modeling strategies can serve as practical and cost-effective alternatives in data-scarce environments, supporting irrigation scheduling and agricultural planning.
Looking ahead, several promising research directions could further advance the applicability of the proposed framework. Incorporating additional climatic variables (e.g., temperature, solar radiation, humidity, and wind speed) would improve model sensitivity and capture short-term fluctuations. The adoption of advanced deep learning architectures such as Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) may also enhance the ability to learn temporal and spatial dependencies, especially for forecasting tasks. In addition, integrating remotely sensed products (e.g., MODIS, Landsat, or ERA5 reanalysis) could enable scalable, spatially explicit ET0 estimation in regions with sparse meteorological coverage. Finally, extending the methodology to other climatic zones and linking ET0 predictions with crop growth models and irrigation management systems will strengthen its relevance for sustainable agriculture and water resource management under diverse and changing environmental conditions.