Next Article in Journal
A Cost-Efficient Aggregation Strategy for Federated Learning in UAV Swarm Networks Under Non-IID Data
Previous Article in Journal
Evolutionary Analysis of Air Traffic Situation in Multi-Airport Terminal Areas
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Driven Estimation of Reference Evapotranspiration in Paraguay from Geographical and Temporal Predictors

by
Bilal Cemek
1,*,
Erdem Küçüktopçu
1,
Maria Gabriela Fleitas Ortellado
1 and
Halis Simsek
2
1
Department of Agricultural Structures and Irrigation, Ondokuz Mayıs University, 55139 Samsun, Türkiye
2
Department of Agricultural and Biological Engineering, Purdue University, West Lafayette, IN 47907, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(21), 11429; https://doi.org/10.3390/app152111429 (registering DOI)
Submission received: 9 September 2025 / Revised: 2 October 2025 / Accepted: 22 October 2025 / Published: 25 October 2025

Abstract

Reference evapotranspiration (ET0) is a fundamental variable for irrigation scheduling and water management. Conventional estimation methods, such as the FAO-56 Penman–Monteith equation, are of limited use in developing regions where meteorological data are scarce. This study evaluates the potential of machine learning (ML) approaches to estimate ET0 in Paraguay, using only geographical and temporal predictors—latitude, longitude, altitude, and month. Five algorithms were tested: artificial neural networks (ANNs), k-nearest neighbors (KNN), random forest (RF), extreme gradient boosting (XGB), and adaptive neuro-fuzzy inference systems (ANFISs). The framework consisted of ET0 calculation, baseline model testing (ML techniques), ensemble modeling, leave-one-station-out validation, and spatial interpolation by inverse distance weighting. ANFIS achieved the highest prediction accuracy (R2 = 0.950, RMSE = 0.289 mm day−1, MAE = 0.202 mm day−1), while RF and XGB showed stable and reliable performance across all stations. Spatial maps highlighted strong seasonal variability, with higher ET0 values in the Chaco region in summer and lower values in winter. These results confirm that ML algorithms can generate robust ET0 estimates under data-constrained conditions, and provide scalable and cost-effective solutions for irrigation management and agricultural planning in Paraguay.

1. Introduction

Reference evapotranspiration (ET0) is a critical variable for estimating crop water demand, designing efficient irrigation systems, and managing water resources sustainably. Its importance is especially pronounced in arid and semi-arid regions, where water scarcity poses a substantial constraint on agricultural productivity. The FAO-56 Penman–Monteith equation is widely recognized as the most robust and physically consistent method for ET0 estimation [1]. However, the practical application of this method is often hindered in developing countries due to the unavailability or low quality of required meteorological data, such as air temperature, humidity, solar radiation, wind speed, and sunshine duration [2,3]. In response to these limitations, a range of low-input alternatives has been proposed. Empirical models such as Hargreaves–Samani or Priestley–Taylor require fewer inputs but often underperform across diverse climatic conditions due to their limited generalizability [4,5,6].
To address the trade-off between input simplicity and prediction accuracy, machine learning (ML) techniques have recently emerged as promising data-driven approaches. In recent years, different ML methods including artificial neural networks (ANNs), k-nearest neighbors (KNN), and adaptive neuro-fuzzy inference systems (ANFISs) have shown great potential for estimating ET0 using a limited number of readily available input variables [7,8,9]. These approaches offer notable flexibility in the selection of inputs, making them particularly suitable for regions with sparse or incomplete meteorological records. Among these ML techniques, ensemble methods have attracted particular attention due to their superior predictive performance and robustness. By aggregating the outputs of multiple base learners, ensemble models—such as random forest (RF) and extreme gradient boosting (XGB)—can mitigate overfitting and enhance generalizability across varying environmental conditions. This is especially advantageous when modeling heterogeneous data collected from diverse climatic zones [10].
Building on this advantage, researchers have increasingly explored the potential of spatial and temporal features as sole predictors in ET0 estimation. Several studies [11,12] have demonstrated the effectiveness of ML models using spatial–temporal inputs—such as latitude, longitude, altitude, and month number—for ET0 estimation, even in the absence of meteorological variables.
This is because geographical variables such as latitude and altitude indirectly capture climatological influences by representing solar radiation, temperature gradients and atmospheric pressure, while the number of months serves as a proxy for seasonal variations [13]. Given the increasing climate variability and data scarcity in many parts of the world, such data-light approaches could significantly inform policy decisions related to sustainable agriculture and water allocation.
Despite their promising potential, the application of such models with little effort in geographically underrepresented and data-poor regions—such as Paraguay—is still limited. This study aims to evaluate the capability of modern ML algorithms to estimate ET0 solely from geographic and temporal predictors, namely latitude, longitude, altitude, and the number of months. In this way, it addresses a critical gap in the literature concerning the robustness, accuracy, and spatial transferability of cost-effective, data-efficient ET0 estimation frameworks within the South American context.
To fulfill this aim, a set of ML models was developed and evaluated for monthly ET0 prediction using only four easily obtainable predictors: latitude, longitude, altitude, and month number. Five ML algorithms—ANN, KNN, ANFIS, RF, and XGB—were applied within a five-stage modeling framework, (1) ET0 calculation (2) baseline model evaluation, (3) ensemble modeling, (4) leave-one-station-out validation, and (5) spatial interpolation, using the inverse distance weighting (IDW) method.
To further enhance the practical value of the developed models, spatially continuous ET0 maps were generated using the IDW interpolation method. IDW is a simple yet effective geostatistical approach that estimates values at unsampled locations by weighting nearby observations based on their inverse distance. In the context of ET0 estimation, IDW allows the conversion of point-based predictions into continuous surface maps, which are particularly useful for spatial decision-making in irrigation planning and water resource allocation. By visualizing ET0 patterns across the landscape, stakeholders can identify high-demand areas, optimize crop selection, and implement region-specific water-saving strategies. This is especially important in data-scarce regions, where the lack of dense meteorological networks often limits spatially explicit water management practices.
The novelty of this research lies in its demonstration that robust and spatially continuous ET0 estimates can be achieved in the absence of conventional meteorological data. This framework provides a scalable and economically viable alternative for ET0 estimation in data-scarce environments. Given its cost-effectiveness and ease of implementation, the proposed approach holds strong potential to support irrigation scheduling, agricultural planning, and water management in developing regions.

2. Materials and Methods

2.1. Study Area

Paraguay lies between 19° and 28° south latitude and 54° and 63° west longitude and is located in a continental desert climate zone. Meteorological data from 19 stations over 5 years (2018–2022) were used for this study. Data from 19 stations of the Paraguayan Meteorological Service were used for this study. The locations of these stations in Paraguay are shown in Figure 1. The geographical coordinates of the stations can be found in Table 1.

2.2. Methods

The methodology employed in this study was structured into a five-stage framework comprising (Figure 2): (1) ET0 calculation (2), baseline model evaluation (ML algorithms), (3) ensemble learning implementation, (4) leave-one-station-out cross-validation, and (5) spatial interpolation using the IDW technique. In the first stage (Section 2.2.1), ET0 was calculated for each of the 19 climatic stations and for each month of the study period using the traditional FAO-56 Penman–Monteith method based on monthly mean values of the required meteorological parameters. These computed ET0 values then served as the output variable for training and testing the ML models. The second stage (Section 2.2.2), baseline models were evaluated to assess the individual performance of various ML algorithms using standard performance metrics. The third stage (Section 2.2.3) focused on ensemble averaging technique aiming to enhance predictive accuracy by integrating the strengths of multiple algorithms. In the fourth stage (Section 2.2.4), leave-one-station-out cross-validation was applied to examine the spatial generalizability and robustness of the models across different geographic locations. Finally (Section 2.2.5), the fifth stage, involved spatial interpolation of ET0 distribution maps using the IDW method to generate continuous prediction surfaces from discrete station-level outputs.

2.2.1. ET0 Calculation

In this study, ET0 was calculated from the FAO-56 Penman–Monteith equation [1] using monthly averaged values along with the other necessary parameters in order to obtain stable long-term conditions for model development.
E T 0 = Δ ( R n G ) + ρ a c p ( e s e a ) / r a Δ + γ 1 + r s r a ρ w λ
where ET0 represents the reference evapotranspiration (mm day−1). The variable Rn denotes the net radiation flux density at the surface (MJ m−2 day−1), while G is the soil heat flux density (MJ m−2 day−1). The parameters es and ea correspond to the saturation vapor pressure and actual vapor pressure of the air (kPa), respectively. The slope of the saturation vapor pressure–temperature curve is expressed as Δ (kPa °C−1), and γ is the psychrometric constant (kPa °C−1). The aerodynamic resistance to turbulent transfer of heat and water vapor from the surface to the reference height is defined as ra (s m−1), whereas the bulk surface resistance (rs) accounts for the resistance to vapor flow from within the leaf, canopy, or soil to the atmosphere (s m−1). Additional parameters include the air density (ρa, kg m−3), the specific heat of moist air at constant pressure (cp, MJ kg−1 °C−1), the density of liquid water (ρw, kg m−3), and the latent heat of vaporization (λ, MJ kg−1).

2.2.2. Machine Learning (ML) Algorithms

In this study, several ML algorithms were implemented to estimate ET0 based solely on geographic and temporal predictors, including latitude, longitude, altitude, and number of months. These algorithms were selected due to their proven success in nonlinear regression tasks and their frequent use in hydrological and ecological modeling. The models were designed to estimate retrospective or contemporaneous monthly ET0; forecasting future ET0 values would require time-lagged predictors and was beyond the scope of this study. The ML models used in this study include:
Artificial neural networks (ANN): In the field of artificial intelligence, ANNs are recognized as computational paradigms inspired by the structural and functional characteristics of the neural networks of the human brain [14]. Due to their high flexibility and learning capability, they are widely used to solve complex nonlinear problems [15,16]. In this study, a feedforward multilayer perceptron (MLP) architecture was employed for ET0 modeling. The MLP architecture is structured into an input layer, a hidden layer(s) and an output layer. The neurons in the hidden and output layers work by a weighted aggregation of the inputs followed by the application of a nonlinear activation function to enable complex mappings [17]. To optimize network performance, different activation functions—namely logsig, tansig, ReLU, and purelin—were employed and compared. Training was conducted using the Levenberg–Marquardt backpropagation algorithm, selected for its superior convergence properties and effectiveness in handling nonlinear regression tasks [18,19]. To prevent overfitting and enhance generalization, early stopping and regularization techniques were applied. Hyperparameter optimization was performed via grid search technique and tuning parameters including the number of hidden layers, number of hidden neurons, activation functions, and number of training epochs.
K-Nearest Neighbors (KNN): As a non-parametric and instance-based method, the KNN algorithm has been extensively applied in classification and regression tasks due to its simplicity and effectiveness [20,21]. In the context of regression, KNN predicts the target value of a query instance by identifying the k (the number of neighbors) most similar instances (neighbors) in the training dataset and averaging their corresponding output values. The similarity between data instances is usually measured using distance metrics such as Euclidean, Manhattan or Minkowski distance [22]. In this study, Euclidean distance was used as it is widely used and has proven to be useful for features with continuous value. The optimal value of k was determined using a grid search approach by minimizing the prediction error in the validation dataset.
Adaptive Neuro-Fuzzy Inference System (ANFIS): The ANFIS is a hybrid intelligent system that integrates the learning capabilities of ANNs with the reasoning mechanism of fuzzy logic [23]. This combination enables ANFIS to model complex, nonlinear relationships while maintaining interpretability through fuzzy if-then rules. The framework is particularly effective in capturing the uncertainty and imprecision inherent in many environmental and agricultural datasets [24,25,26]. ANFIS is typically based on the first-order Sugeno fuzzy inference system [27,28]. The architecture comprises five layers: (i) the fuzzification layer, which transforms crisp inputs into fuzzy sets using membership functions; (ii) the rule layer, where fuzzy if-then rules are applied; (iii) the normalization layer, which computes normalized firing strengths; (iv) the defuzzification layer, where output functions (usually linear or constant) are calculated; and (v) the output layer, which aggregates the final model output. The system learns by updating both the premise parameters (defining membership functions) and the consequent parameters (in the rule outputs) through a hybrid optimization approach. This typically involves a combination of least-squares estimation (for the consequent parameters) and backpropagation (for the premise parameters).
Random Forest (RF): Random Forest (RF) is an ensemble learning algorithm that constructs multiple decision trees during training and produces the final prediction by averaging the outputs of individual trees in regression tasks [29]. As a non-parametric, data-driven approach, RF is capable of modeling complex and nonlinear relationships between input features and the target variable, making it particularly suitable for environmental and agricultural application [30,31]. In the RF algorithm, each decision tree is trained on a bootstrap sample drawn from the original training dataset. Furthermore, at each node split, a random subset of predictor variables is selected to determine the best split. This randomization introduces diversity among the trees and helps reduce the risk of overfitting. The final model output is obtained by aggregating the predictions from all trees in the ensemble. Several hyperparameters affect the performance of the RF model, including the number of trees in the forest (n_estimators), the maximum depth of each tree (max_depth), the number of features considered at each split (max_features), the minimum number of samples required to split a node (min_samples_split), and the minimum number of samples required to be at a leaf node (min_samples_leaf). In this study, these parameters were optimized using grid search and cross-validation techniques to minimize prediction error and enhance model generalization.
Extreme Gradient Boosting (XGB): The XGB is an advanced implementation of gradient boosted decision trees designed for speed and performance [32]. It has gained widespread popularity due to its superior predictive accuracy, regularization capabilities, and computational efficiency. In regression tasks, XGB builds an ensemble of weak prediction models, typically decision trees, in a sequential manner where each new tree attempts to minimize the residuals of the previous ensemble. The XGB algorithm optimizes a regularized objective function that includes both a loss function and a penalty term for model complexity. This regularization mechanism helps prevent overfitting, a common issue in boosting-based algorithms. Additionally, XGB supports parallel processing, missing value handling, and sparse-aware learning, making it well-suited for large and complex datasets. Several hyperparameters influence the model’s performance, including the number of boosting rounds (n_estimators), maximum tree depth (max_depth), learning rate, and regularization parameters. In this study, these parameters were optimized through grid search and k-fold cross-validation based on validation set performance.

2.2.3. Parallel Hybrid Model

In this study, a parallel hybrid modeling framework was developed to enhance the accuracy of ET0 predictions by integrating outputs from five ML algorithms: ANN, KNN, RF, XGB, and ANFIS. The individual predictions were combined using a weighted averaging ensemble technique, with weights systematically assigned based on each model’s performance metrics [33]. To optimize the final aggregated estimate, various weighting strategies were employed, including inverse error weighting (RMSE-based) and R2-based weighting [34,35,36]. Both methods stand out for their simplicity and low computational cost, and have been successfully applied in various environmental and hydrological prediction studies in the literature.
Inverse Error Weighting: In the inverse error weighting method, each model’s performance error (e.g., RMSE) is inverted and used as its weight. Models with lower errors receive higher weights. The normalized weights are then used to compute the final ensemble prediction according to Equation (2):
E T 0 , p r e ( t ) = i = 1 m w i E T 0 , i ( t ) ;           w i = 1 / R M S E i j = 1 m ( 1 / R M S E j )
R2-Based Weighting: In the second method, the coefficient of determination (R2) of each model is directly used for weighting. As the model’s performance increases, its contribution rate also rises. The weights are normalized and expressed as Equation (3):
E T 0 , p r e ( t ) = i = 1 m w i E T 0 , i ( t ) ;           w i = R 2 i j = 1 m ( R 2 j )
where ET0,pre(t) is ensemble-predicted reference evapotranspiration (mm day−1) for observation t, ET0,i(t) is reference evapotranspiration predicted by model i for observation t (mm day−1), m is the number of base models in the ensemble, wi is normalized weight assigned to model i, i is 1, …, m indexes the base models in the ensemble, t is 1, …, n indexes the data points (observations).

2.2.4. Leave-One-Station-Out Validation

In the leave-one-station-out validation (cross-validation), meteorological data collected from 19 different stations are partitioned into 19 subsamples, each corresponding to a specific station (Figure 3). All data within a given subsample originate exclusively from the same station. In each iteration, one subsample is held out as the validation set, while the model is trained on the remaining 18 subsamples. This procedure is repeated 19 times, ensuring that each station serves once as the validation set. In this way, every sample is tested independently, and the model’s spatial generalizability is effectively evaluated.

2.2.5. Spatial Maps of ET0

The IDW method was employed to create monthly spatial maps of ET0. IDW is a deterministic spatial interpolation method that estimates values at unsampled locations based on the values of nearby measured points, with the assumption that points closer to the target location have a greater influence on the estimated value than those further away. The weight assigned to each known point is inversely proportional to its distance from the location of the estimate and is usually raised to a power parameter [37]. In this study, the monthly ET0 values calculated at 19 meteorological stations were spatially interpolated using the IDW method to produce continuous monthly maps for the entire study area. This approach enables the visualization of the spatial variability of ET0 in regions lacking direct measurements.

2.2.6. Model Performance Criteria

The performance of the model was quantitatively assessed by statistical measures, namely the coefficient of determination (R2), the mean absolute error (MAE) and the root mean square error (RMSE), which together provide a complementary insight into the accuracy, precision and overall predictive ability.
R 2   =   1     i = 1 n y c a l , i     y p r e , i 2   i = 1 n y c a l , i     y m e a n 2  
R M S E = i = 1 n y c a l , i     y p r e , i 2 n
M A E = i = 1 n y c a l , i     y p r e , i n
where ycal,i is the calculated value, ypre,i is the predicted value, ymean is the mean of calculated values, and n is the number of data.

3. Results

3.1. Evaluation of FAO-56 Penman–Monteith Method

The FAO-56 Penman–Monteith analysis revealed clear seasonal and spatial variability in daily ET0 across Paraguay (Table 2). ET0 values were highest during the austral summer months (December–February), often exceeding 5.0 mm day−1 at stations such as Pozo Colorado (5.37 mm day−1 in January) and General Bruguez (5.08 mm day−1 in January), and lowest during winter (June–July), dropping below 2.0 mm day−1 at stations like Encarnación (1.33 mm day−1 in June) and Villarrica (1.50 mm day−1 in June). Stations in the Chaco region of western Paraguay, including Pozo Colorado, Mariscal Estigarribia, and General Bruguez, consistently exhibited higher ET0 throughout the year, reflecting the region’s semi-arid climate with high temperatures, low humidity, and reduced cloud cover. In contrast, stations in the eastern and southern regions, such as Encarnación, Villarrica, and Caazapá, showed more pronounced seasonal fluctuations, with lower ET0 during winter months due to cooler temperatures and higher atmospheric moisture. Transitional months (April–May and September–October) displayed intermediate ET0 values across most stations, corresponding to the seasonal shift between summer and winter. These results indicate that ET0 in Paraguay is primarily controlled by seasonal climatic dynamics, with regional differences, particularly the elevated evaporative demand in the Chaco, strongly influencing its spatial distribution.

3.2. Evaluation of ML Algorithms

Data pre-processing constitutes an essential preliminary stage in the development of robust and reliable ML architectures. In this phase, the raw dataset undergoes systematic refinement to mitigate potential biases due to stochastic noise, incomplete observations, and structural inconsistencies. The implemented workflow included a number of well-established but methodologically critical procedures, including comprehensive data cleaning, variable transformation, and stratified partitioning to ensure a balanced representation of the subsets. For model calibration, 90% of the training data was used and hyperparameter tuning was performed on a randomly selected 90% subset of this training pool. This process was embedded in a ten-fold cross-validation scheme, ensuring statistical rigor, minimizing the risk, of overfitting and improving the external validity of the model’s performance metrics. The resulting optimal hyperparameter configurations for each algorithm are listed in Table 3.
The performance metrics for the evaluated ML algorithms are summarized in Table 4 for comparative assessment. At this stage of analysis, only geo-temporal predictors—namely latitude, longitude, altitude, and month number—were incorporated into the models. During the model development process, a range of ANN architectures was explored, including both single- and double-hidden-layer configurations with different neuron counts. The architecture that achieved the most favorable trade-off between training accuracy and generalization capacity consisted of a single hidden layer with five neurons. This network was trained for 300 epochs using the Levenberg–Marquardt (LM) optimization algorithm, applying the tansig activation function in the hidden layer and the purelin function in the output layer. While this design produced satisfactory results during training, its predictive accuracy in the testing phase was the lowest among all evaluated models (R2 = 0.882), likely due to its limited representational capacity for capturing complex, nonlinear relationships. The KNN model, configured with three neighbors, uniform weighting, and the Euclidean distance metric, demonstrated strong performance despite its simplicity, achieving R2 = 0.906 and RMSE = 0.394 mm day−1 values comparable to those obtained by the RF and XGB models. The RF model employed 173 trees with a maximum depth of 7, a sqrt setting for the maximum number of features, and a minimum leaf size of 2. This configuration ranked among the top performers in the testing phase, delivering R2 = 0.913 and RMSE = 0.379 mm day−1. The XGB model, designed with a relatively low learning rate (0.159) and 135 trees, facilitated a more gradual and balanced learning process. Restricting the maximum depth to 3 helped mitigate overfitting, resulting in high predictive accuracy (R2 = 0.910) and a low error rate (RMSE = 0.387 mm day−1).
The influence of input features on model predictions was evaluated using SHapley Additive exPlanations (SHAP), a game-theoretic approach that provides consistent and locally accurate attribution values for each predictor. This method enables a detailed interpretation of feature contributions across different ML models, thereby improving the transparency and explainability of the predictive framework adopted in this study. Figure 4 presents the mean absolute SHAP values for four input variables (latitude, longitude, altitude, and month) across the RF, XGB, ANN, and KNN models. The results reveal that the “month” variable exerts the greatest influence in all models (≈1.0), underscoring the dominant role of seasonal variability in explaining ET0. The variables (“latitude” and “longitude”) also display appreciable importance, particularly in the ANN model, where their contributions (0.18 and 0.16, respectively) are markedly higher compared with the other algorithms. This finding suggests that spatial positioning has a more pronounced impact on ANN-based predictions. Conversely, “altitude” consistently emerges as the least influential factor, with SHAP values between 0.04 and 0.12. Overall, these results indicate that temporal variability serves as the primary driver of model performance, while spatial factors—especially “latitude” and “longitude”—provide complementary explanatory power, and “altitude” contributes only marginally.
ANFIS models were developed and evaluated using various types of membership functions, including triangular (trimf), trapezoidal (trapmf), and Gaussian (gaussmf). As presented in Table 5, the ANFIS configuration employing the gaussmf membership function achieved superior performance, yielding the lowest error (RMSE = 0.289 mm day−1) and the highest coefficient of determination (R2 = 0.950) across both training and testing datasets. The trimf function also demonstrated strong predictive capability, although slightly below that of gaussmf, whereas the trapmf function exhibited comparatively lower accuracy. These findings indicate that the gaussmf is particularly effective at capturing the nonlinear dependencies inherent in the data, highlighting the suitability of ANFIS for ET0 prediction. The principal advantage of ANFIS lies in its capacity for adaptive learning and its interpretable structure grounded in fuzzy logic, which enables both flexibility in modeling complex relationships and transparency in the decision-making process.
The comparative scatterplots of the calculated versus predicted ET0 for each model are illustrated in Figure 5, providing a visual representation of the prediction accuracy across models. It is evident from both the scatterplots and the performance metrics that all models captured the general trends of ET0 reasonably well, yet distinct differences in predictive precision are observed. ANFIS demonstrated the superior predictive performance among all evaluated models, achieving the lowest RMSE (0.289 mm day−1) and the highest R2 (0.950) on the testing dataset. This indicates its exceptional capability to capture complex nonlinear relationships in ET0 dynamics. While RF and XGB models also provided strong accuracy (R2 ≈ 0.910–0.915, RMSE ≈ 0.380–0.390 mm day−1), their performance was slightly lower compared to ANFIS. KNN offered competitive results given its simplicity, whereas the shallow MLP model underperformed, reflecting its limited capacity to model intricate patterns in the data. Overall, these results highlight the effectiveness of combining fuzzy inference with adaptive learning in ANFIS for high-precision ET0 prediction, while also emphasizing the trade-offs between interpretability, computational efficiency, and predictive accuracy among different ML approaches.

3.3. Enhancing Model Performance with Ensemble Methods

In this stage, a parallel hybrid modeling framework was implemented to enhance the predictive accuracy of the model outputs. Ensemble models were constructed using weighting strategies based on the R2 and the inverse of the RMSE. The corresponding weight coefficients assigned to each model are presented in Table 6. Under the R2-based weighting scheme, the weights were relatively balanced across models, ranging between 0.193 and 0.208 in the testing phase. Notably, ANFIS received the highest weight (0.208), followed closely by KNN (0.203), RF (0.200), XGB (0.199), and ANN (0.193) indicating that all models contributed almost equally to the ensemble. This balanced distribution suggests that each algorithm demonstrated comparable predictive capacity when assessed using R2 as the weighting criterion. In contrast, the inverse RMSE-based scheme resulted in a more pronounced variation in weight allocation. ANFIS achieved the highest testing-phase weight (0.257), substantially exceeding its allocation in the R2-based scheme. It was followed by RF (0.196), XGB (0.192), and KNN (0.188), while ANN consistently received the lowest weights (0.167), reflecting its weaker predictive performance. The markedly higher share of ANFIS under the inverse RMSE criterion highlights its superior capacity to capture underlying patterns in the data, thereby making a dominant contribution to the ensemble. These differences underscore that the weighting strategy plays a pivotal role in determining the relative contributions of individual models and can substantially influence the ensemble’s sensitivity to specific data characteristics.
The inverse RMSE-based weighting scheme achieved marginally better performance than the R2-based weighting in both training and testing phases. Although the improvements are relatively small, they indicate that assigning higher weights to models with lower prediction errors (as per inverse RMSE) slightly enhances the ensemble’s generalization ability. The consistently high R2 values (>0.92) for both schemes confirm the robustness of the ensemble modeling framework (Figure 6). Notably, ANFIS emerged as the dominant contributor under the inverse RMSE-based weighting scheme, receiving a substantially larger weight allocation compared to other algorithms. This suggests that ANFIS was more effective in capturing complex, nonlinear patterns in the data, which likely played a key role in the improved accuracy of the ensemble. By contrast, ANN consistently received the lowest weights in both schemes, reflecting its comparatively weaker predictive performance.

3.4. Evaluation of Leave-One-Out Station Validation

The leave-one-out station validation showed clear differences in the prediction performance between the evaluated models (Table A1). The ensemble-based approaches such as RF and XGB, together with ANFIS, performed consistently better than ANN at most stations. At the Mariscal Estigarribia station, for example, RF and XGB achieved R2 values of over 0.95 with a lower RMSE (~0.29–0.30 mm day−1), while ANN yielded weaker results (R2 = 0.913). The performance patterns varied slightly between stations. At the Puerto Casado station, ANN showed relatively strong generalization (R2 = 0.953 during testing), but ANFIS maintained more stable accuracy during both training and testing. In Caballero and Pozo Colorado station, KNN and RF showed excellent predictive power, with KNN in Caballero station achieving an R2 of 0.965 during testing. ANFIS often provided the highest accuracy under nonlinear conditions, as observed in Concepción and Encarnación station, where R2 was above 0.97. ANN consistently had the lowest predictive skill, with higher errors observed especially in Pilar station (R2 = 0.621 during the testing). In contrast, XGB often provided robust and stable predictions, especially in San Estanislao station (R2 = 0.950) and Paraguarí (R2 = 0.963). RF was also among the best models in several cases, such as in General Bruguéz (R2 = 0.965) and Caazapá (R2 = 0.960) stations. Overall, the results show that ensemble methods (RF and XGB) and neuro-fuzzy models (ANFIS) have superior predictive ability compared to ANN in leave-one-out station validation, confirming their suitability for capturing the complex and nonlinear relationships in station-based climate data.

3.5. Generation of Spatial Distribution Maps

Monthly spatial distribution maps of ET0 in Paraguay were generated using the IDW interpolation method, based on quality-controlled and validated station observations (Figure 7). These maps provide a detailed assessment of the geographic and seasonal variability of ET0 across the country. Results revealed a distinct annual cycle, with pronounced spatial heterogeneity influenced by topography, latitude, and seasonal climatic conditions.
The spatial and temporal evolution of ET0 across Paraguay reveals a pronounced seasonal cycle. During the summer months (January–March), ET0 reached its highest levels, with values typically exceeding 4.5 mm day−1 and persistent hotspots in the northeastern and southeastern regions. From April to June, a steady decline was observed, culminating in the annual minimum in June when ET0 ranged from 1.5 to 2 mm day−1, particularly in the western Chaco. Beginning in July, ET0 gradually increased, with averages rising from around 2.5 mm day−1 in July to nearly 4 mm day−1 by September, reflecting the transition into spring. The late spring and summer months (October–December) were characterized by a rapid recovery, with widespread values again surpassing 4.5 mm day−1 and clear maxima in the northeastern and southeastern zones. Throughout the year, spatial patterns remained consistent, with the Chaco region generally exhibiting lower values and the eastern lowlands sustaining the highest atmospheric water demand. These findings emphasize the strong climatic control on ET0 seasonality and the importance of accounting for regional variability in water resource management and agricultural planning.

4. Discussion

The comparative evaluation of ML and neuro-fuzzy inference systems in this study revealed substantial differences in predictive capability for estimating ET0 using only geo-temporal predictors (latitude, longitude, altitude, and month number). While all models were able to capture the general seasonal dynamics of ET0, their precision and robustness varied notably, reflecting differences in their ability to model the nonlinear and spatially heterogeneous nature of atmospheric water demand in Paraguay. These findings are broadly consistent with previous efforts to estimate ET0 from limited or purely geo-temporal predictors [12], but the present study advances this line of research by providing a systematic, Paraguay-wide evaluation across 19 stations with monthly mean data.
The ANFIS configured with a Gaussian membership function consistently outperformed all other models, achieving the lowest RMSE (0.289 mm day−1) and highest R2 (0.950) during the testing phase. This superior performance is attributable to ANFIS’s hybrid architecture, which combines the adaptive learning capabilities of neural networks with the interpretability and flexibility of fuzzy logic. Previous studies have also highlighted the effectiveness of Gaussian-shaped membership functions for hydrometeorological modeling [24,38,39], and our results extend this evidence by demonstrating that such functions remain highly effective even under minimal predictor inputs restricted to latitude, longitude, altitude, and month.
Tree-based ensemble methods, particularly RF and XGB, provided highly competitive results (R2 ≈ 0.910–0.915; RMSE ≈ 0.380 mm day−1), with superior stability across stations compared to ANFIS. The strong generalization capacity of RF and XGB can be attributed to their ability to capture complex nonlinear interactions while controlling overfitting through ensemble averaging and regularization, respectively. Their robustness in the leave-one-station-out validation indicates that such methods are more resilient to the spatial variability and data sparsity typical of meteorological networks, a finding consistent with previous work on climate variable interpolation and prediction [40,41]. Our contribution includes the use of leave-one-station-out validation, allowing spatial generalizability to be explicitly assessed in a data-scarce national context, which adds value beyond the approaches typically used in earlier studies.
The KNN model achieved commendable accuracy given its algorithmic simplicity (R2 = 0.906; RMSE = 0.394 mm day−1) and produced exceptionally high performance at certain stations (e.g., R2 > 0.96 at Pedro Juan Caballero). Nonetheless, its station-level variability in leave-one-station-out tests underscores its reliance on local data density and its sensitivity to spatial clustering effects. By contrast, the ANN, constrained by a shallow architecture, exhibited the weakest predictive performance overall (R2 = 0.882), reaffirming that insufficient network depth and complexity limit the model’s capacity to approximate the highly nonlinear processes driving ET0 variation. The sensitivity of KNN to local data density and the underperformance of a shallow ANN are consistent with earlier reports on the limitations of distance-based learners and under-parameterized neural networks in hydrological modeling tasks [42,43]. This comparison underlines the importance of selecting models with sufficient complexity and addressing spatial clustering effects when applying ML to ET0 prediction.
The ensemble modeling framework further enhanced predictive accuracy, with the inverse RMSE-based weighting scheme yielding slightly better performance than the R2-based scheme (R2 = 0.925 vs. 0.923 in testing). The increased weight assigned to ANFIS under the inverse RMSE criterion likely contributed to this improvement, as the scheme prioritized models with lower prediction errors. This finding is consistent with ensemble learning theory, which emphasizes the benefits of weighting base learners according to performance-related criteria [34,35,36]. Although the gains were modest, the consistently high R2 values (>0.92) across weighting schemes demonstrate the robustness of the parallel hybrid framework.
From a climatological perspective, the spatial distribution maps of ET0 revealed a clear and recurrent annual cycle, with maxima in summer (January–March) and minima in winter (May–July), modulated by Paraguay’s latitudinal extent, topographic variation, and seasonal shifts in temperature and solar radiation. Persistent hotspots in the northeast and southeast highlight regions of elevated atmospheric water demand, which may correspond to agricultural zones with high evapotranspiration losses. These patterns are in agreement with FAO-56 Penman–Monteith–based climatologist for similar subtropical regions in South America [44,45]. By explicitly mapping these dynamics with ML–derived estimates, our study extends earlier climatological assessments and provides a spatially detailed reference for irrigation planning in Paraguay.
It should be acknowledged that the models developed in this study were calibrated and validated using ET0 values derived from the FAO-56 Penman–Monteith equation at 19 meteorological stations in Paraguay. Therefore, the ML models do not estimate ET0 from physical principles directly, but rather approximate the Penman–Monteith outputs through geo-temporal predictors. While this approach is contingent on the spatial patterns of pre-calculated ET0, it offers a data-efficient surrogate framework that can be particularly useful in regions where the full set of meteorological inputs required for Penman–Monteith is unavailable. In this sense, our methodology complements rather than replaces physically based methods, and its primary value lies in extending ET0 estimation to data-scarce environments where conventional computation is impractical.
Regarding transferability, direct application of these models to other regions without recalibration may lead to biased or unreliable estimates, as also reported in previous ET0 modeling studies that emphasize the need for local adjustment [24,46]. To ensure robust performance in different environments, the framework should be recalibrated using regionally derived Penman–Monteith (or equivalent) ET0 values. Nevertheless, the conceptual framework is highly adaptable and can be extended to diverse climatic zones.
Finally, we note that the present models are intended for estimating ET0 (mm day−1) under existing climatic conditions. Prediction of future ET0 would require the inclusion of time-lagged predictors and potentially dynamic climate projections, as highlighted in previous studies exploring ET0 forecasting under climate change scenarios [47,48]. Incorporating such approaches represents a promising direction for future research and could further extend the applicability of data-driven ET0 models in agricultural and water resource management.

5. Conclusions

This study demonstrated the effectiveness of ML algorithms and neuro-fuzzy inference systems for estimating ET0 using only geo-temporal predictors such as latitude, longitude, altitude, and month. Among the evaluated approaches, ANFIS with Gaussian membership functions consistently achieved the highest accuracy, while ensemble-based models such as RF and XGB exhibited strong robustness and stability across stations. Furthermore, ensemble integration enhanced predictive skill, with inverse RMSE-based weighting performing slightly better than R2-based weighting. Spatial ET0 distribution maps generated through IDW interpolation revealed pronounced seasonal variability, with higher evaporative demand in the Chaco region and lower values in eastern Paraguay, particularly during winter months. These findings confirm that data-efficient modeling strategies can serve as practical and cost-effective alternatives in data-scarce environments, supporting irrigation scheduling and agricultural planning.
Looking ahead, several promising research directions could further advance the applicability of the proposed framework. Incorporating additional climatic variables (e.g., temperature, solar radiation, humidity, and wind speed) would improve model sensitivity and capture short-term fluctuations. The adoption of advanced deep learning architectures such as Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) may also enhance the ability to learn temporal and spatial dependencies, especially for forecasting tasks. In addition, integrating remotely sensed products (e.g., MODIS, Landsat, or ERA5 reanalysis) could enable scalable, spatially explicit ET0 estimation in regions with sparse meteorological coverage. Finally, extending the methodology to other climatic zones and linking ET0 predictions with crop growth models and irrigation management systems will strengthen its relevance for sustainable agriculture and water resource management under diverse and changing environmental conditions.

Author Contributions

Conceptualization, B.C. and M.G.F.O.; methodology, B.C. and E.K.; software, B.C. and E.K.; validation, E.K. and H.S.; formal analysis, B.C. and E.K.; investigation, B.C. and M.G.F.O.; resources, M.G.F.O.; data curation, B.C. and E.K.; writing—original draft preparation, B.C. and M.G.F.O.; writing—review and editing, E.K. and H.S.; visualization, E.K. and H.S.; supervision, B.C. and H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ET0Reference evapotranspiration (mm day−1)
ANNArtificial Neural Network
KNNk-Nearest Neighbors
RFRandom Forest
XGBExtreme Gradient Boosting
ANFISAdaptive Neuro-Fuzzy Inference System
IDWInverse Distance Weighting
MLMachine Learning
MAEMean Absolute Error
RMSERoot Mean Square Error
R2Coefficient of determination
Physical Variables
RnNet radiation flux density at the surface (MJ m−2 day−1)
GSoil heat flux density (MJ m−2 day−1)
esSaturation vapor pressure (kPa)
eaActual vapor pressure (kPa)
Δ Slope of saturation vapor pressure–temperature curve (kPa °C−1)
γPsychrometric constant (kPa °C−1)
raAerodynamic resistance (s m−1)
rsSurface resistance (s m−1)
ρaAir density (kg m−3)
cpSpecific heat of moist air at constant pressure (MJ kg−1 °C−1)
ρwDensity of liquid water (kg m−3)
λLatent heat of vaporization (MJ kg−1)

Appendix A

Table A1. Leave-one-out station test results.
Table A1. Leave-one-out station test results.
Station NameModelsTrainingTesting
RMSEMAER2RMSEMAER2
General BruguézRF0.2700.1930.9520.2470.1770.965
KNN0.3210.2100.9320.3340.2890.935
ANN0.4600.3140.8610.3080.2310.954
XGB0.2700.1930.9520.2360.1670.968
ANFIS (trapmf)0.2220.2780.9280.1960.2420.922
San EstanislaoRF0.3860.2920.9020.2010.2620.930
KNN0.3720.2830.9090.2440.3090.892
ANN0.4300.3000.8790.3040.4200.886
XGB0.2980.2280.9410.1450.1900.950
ANFIS (gauss)0.2310.2900.9250.1640.2030.934
Salto del GuairáRF0.3690.2550.8710.2820.1720.863
KNN0.3700.2560.8720.3030.2450.803
ANN0.5260.4000.7540.4030.3140.703
XGB0.3700.2550.8720.2650.1730.855
ANFIS (gauss)0.3820.2860.8640.2810.2230.830
Aeropuerto Silvio PettirossiRF0.2650.1880.9530.2480.1920.965
KNN0.2650.1880.9540.3000.2420.948
ANN0.4580.3210.8610.3140.2510.943
XGB0.2650.1880.9540.2310.1750.969
ANFIS (gauss)0.2960.2180.9420.2540.2000.965
ParaguaríRF0.2780.2010.9490.2490.1960.963
KNN0.2640.1880.9540.3150.2460.941
ANN0.4830.3360.8460.2840.2390.952
XGB0.2640.1880.9540.2480.1960.963
ANFIS (gauss)0.2950.2170.9420.2790.2190.959
Capitán MezaRF0.2650.1890.9540.2280.1770.968
KNN0.2750.1910.9500.2980.2430.946
ANN0.4380.3120.8730.5340.4460.827
XGB0.2650.1880.9540.2270.1750.969
ANFIS (trimf)0.2270.2840.9250.1530.2620.936
ConcepciónRF0.4870.3730.8510.4910.4110.915
KNN0.4750.3740.9210.3890.3620.975
ANN0.3850.2740.9060.4040.3470.864
XGB0.4000.3280.9510.4010.3820.982
ANFIS (gauss)0.3000.2230.9420.1830.1540.981
Mariscal EstigarribiaRF0.3050.2240.9390.2900.2280.951
KNN0.2730.1880.9510.2940.2340.950
ANN0.5240.3870.8180.3870.3150.913
XGB0.2740.1980.9500.2960.2290.949
ANFIS (gauss)0.1950.2380.9440.2580.3270.938
EncarnaciónRF0.2660.1900.9530.2050.1460.976
KNN0.2760.1930.9500.3320.2840.938
ANN0.4710.3280.8530.3060.2570.947
XGB0.2660.1900.9530.1920.1390.979
ANFIS (gauss)0.2970.2190.9420.3210.2750.970
Coronel OviedoRF0.2780.2010.9490.2690.1900.956
KNN0.2640.1880.9540.2380.1740.966
ANN0.4720.3300.8530.3290.2710.935
XGB0.2640.1880.9540.2550.1800.961
ANFIS (trimf)0.3200.2430.9320.3320.2660.949
PilarRF0.2660.1890.9530.3040.2390.941
KNN0.3020.1950.9400.4500.3490.870
ANN0.5770.4240.7810.6510.5830.621
XGB0.2800.2030.9480.2170.1710.970
ANFIS (trapmf)0.3820.3060.9040.3560.2930.926
San Juan BautistaRF0.2620.1860.9550.3310.2360.929
KNN0.2960.2210.9420.3160.2400.936
ANN0.4130.2930.8880.3270.2470.931
XGB0.2610.1860.9550.3230.2220.933
ANFIS (gauss)0.2940.2160.9430.3390.2490.945
CaazapáRF0.2690.1900.9520.2550.2040.960
KNN0.2740.1890.9500.2710.2220.955
ANN0.4670.3250.8560.2780.2270.953
XGB0.2640.1870.9540.2610.2050.959
ANFIS (trimf)0.3200.2430.9320.3250.2420.948
Aeropuerto GuaraníRF0.2650.1880.9540.2760.2070.955
KNN0.2640.1880.9540.2760.2060.955
ANN0.4630.3430.8580.5460.4350.823
XGB0.2640.1880.9540.2760.2060.955
ANFIS (trimf)0.3200.2430.9320.3480.2730.928
VillarricaRF0.2780.2010.9490.2620.2010.957
KNN0.2630.1870.9540.2530.1940.960
ANN0.4640.3190.8580.3070.2400.941
XGB0.2650.1890.9540.2770.2090.952
ANFIS (gauss)0.2950.2170.9430.6400.5600.952
San PedroRF0.4040.3040.8930.2890.2370.942
KNN0.3320.2520.9280.2500.1960.957
ANN0.4300.2960.8810.2510.2070.963
XGB0.3000.2150.9410.2680.2160.950
ANFIS (trimf)0.3200.2420.9330.3750.3060.939
Pozo ColoradoRF0.2670.1900.9530.2960.2290.949
KNN0.2660.1900.9530.2460.1930.965
ANN0.4790.3310.8480.3950.3120.910
XGB0.2660.1900.9530.2900.2280.951
ANFIS (gauss)0.0890.4690.9411.0370.9590.977
Pedro Juan
Caballero
RF0.4950.3860.8410.4470.3630.829
KNN0.5340.4250.8150.3090.4540.773
ANN0.5280.3950.8190.3560.4960.717
XGB0.3220.2340.9330.2760.2100.935
ANFIS (gauss)0.1870.2300.9490.2190.2770.933
Puerto CasadoRF0.4030.3060.8930.3760.3080.909
KNN0.5290.4210.8160.5090.3920.833
ANN0.4660.3260.8570.2700.2080.953
XGB0.3620.2770.9140.3820.3020.906
ANFIS (gauss)0.0870.4670.9420.3330.6970.958

References

  1. Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapotranspiration: Guidelines for Computing Crop Requirements; FAO Irrigation and Drainage Paper No. 56; FAO: Rome, Italy, 1998. [Google Scholar]
  2. Droogers, P.; Allen, R.G. Estimating reference evapotranspiration under inaccurate data conditions. Irrig. Drain. Syst. 2002, 16, 33–45. [Google Scholar] [CrossRef]
  3. Rahimikhoob, A. Estimation of evapotranspiration based on only air temperature data using artificial neural networks for a subtropical climate in Iran. Theor. Appl. Climatol. 2010, 101, 83–91. [Google Scholar] [CrossRef]
  4. Valiantzas, J.D. Simplified forms for the standardized FAO-56 Penman–Monteith reference evapotranspiration using limited weather data. J. Hydrol. 2013, 505, 13–23. [Google Scholar] [CrossRef]
  5. Almorox, J.; Senatore, A.; Quej, V.H.; Mendicino, G. Worldwide assessment of the Penman–Monteith temperature approach for the estimation of monthly reference evapotranspiration. Theor. Appl. Climatol. 2018, 131, 693–703. [Google Scholar] [CrossRef]
  6. Todorovic, M.; Karic, B.; Pereira, L.S. Reference evapotranspiration estimate with limited weather data across a range of Mediterranean climates. J. Hydrol. 2013, 481, 166–176. [Google Scholar] [CrossRef]
  7. Pandey, P.; Nyori, T.; Pandey, V. Estimation of reference evapotranspiration using data driven techniques under limited data conditions. Model. Earth Syst. Environ. 2017, 3, 1449–1461. [Google Scholar] [CrossRef]
  8. Torres, A.F.; Walker, W.R.; McKee, M. Forecasting daily potential evapotranspiration using machine learning and limited climatic data. Agric. Water Manag. 2011, 98, 553–562. [Google Scholar] [CrossRef]
  9. Ferreira, L.B.; da Cunha, F.F.; de Oliveira, R.A.; Fernandes Filho, E.I. Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM–A new approach. J. Hydrol. 2019, 572, 556–570. [Google Scholar] [CrossRef]
  10. Amiri, M.; Sharafi, S.; Ghaleni, M.M. Enhancing daily reference evapotranspiration (ETref) prediction across diverse climatic zones: A pattern mining approach with DIRECTORS model. J. Hydrol. 2025, 657, 133045. [Google Scholar] [CrossRef]
  11. Kisi, O.; Sanikhani, H.; Zounemat-Kermani, M.; Niazi, F. Long-term monthly evapotranspiration modeling by several data-driven methods without climatic data. Comput. Electron. Agric. 2015, 115, 66–77. [Google Scholar] [CrossRef]
  12. Shirin Manesh, S.; Ahani, H.; Rezaeian-Zadeh, M. ANN-based mapping of monthly reference crop evapotranspiration by using altitude, latitude and longitude data in Fars province, Iran. Environ. Dev. Sustain. 2014, 16, 103–122. [Google Scholar] [CrossRef]
  13. Trajkovic, S. Temperature-based approaches for estimating reference evapotranspiration. J. Irrig. Drain. Eng. 2005, 131, 316–323. [Google Scholar] [CrossRef]
  14. Haykin, S. Neural Networks and Learning Machines; Pearson Education: Delhi, India, 2009. [Google Scholar]
  15. Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef]
  16. Abdolrasol, M.G.; Hussain, S.S.; Ustun, T.S.; Sarker, M.R.; Hannan, M.A.; Mohamed, R.; Ali, J.A.; Mekhilef, S.; Milad, A. Artificial neural networks based optimization techniques: A review. Electronics 2021, 10, 2689. [Google Scholar] [CrossRef]
  17. Pal, S.K.; Mitra, S. Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 1992, 3, 683–697. [Google Scholar] [CrossRef]
  18. Kartal, V. Prediction of monthly evapotranspiration by artificial neural network model development with Levenberg–Marquardt method in Elazig, Turkey. Environ. Sci. Pollut. Res. 2024, 31, 20953–20969. [Google Scholar] [CrossRef]
  19. Okkan, U. Application of Levenberg-Marquardt optimization algorithm based multilayer neural networks for hydrological time series modeling. Int. J. Optim. Control. Theor. Appl. (IJOCTA) 2011, 1, 53–63. [Google Scholar] [CrossRef]
  20. Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Wang, R. Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 1774–1785. [Google Scholar] [CrossRef] [PubMed]
  21. Song, Y.; Liang, J.; Lu, J.; Zhao, X. An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 2017, 251, 26–34. [Google Scholar] [CrossRef]
  22. Thant, A.A.; Aye, S.M.; Mandalay, M. Euclidean, manhattan and minkowski distance methods for clustering algorithms. Int. J. Sci. Res. Sci. Eng. Technol. 2020, 7, 553–559. [Google Scholar] [CrossRef]
  23. Jang, J.-S. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
  24. Tabari, H.; Kisi, O.; Ezani, A.; Talaee, P.H. SVM, ANFIS, regression and climate based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment. J. Hydrol. 2012, 444, 78–89. [Google Scholar] [CrossRef]
  25. Cobaner, M. Evapotranspiration estimation by two different neuro-fuzzy inference systems. J. Hydrol. 2011, 398, 292–302. [Google Scholar] [CrossRef]
  26. Terzi, Ö.; Erol Keskin, M.; Dilek Taylan, E. Estimating evaporation using ANFIS. J. Irrig. Drain. Eng. 2006, 132, 503–507. [Google Scholar] [CrossRef]
  27. Svalina, I.; Galzina, V.; Lujić, R.; Šimunović, G. An adaptive network-based fuzzy inference system (ANFIS) for the forecasting: The case of close price indices. Expert Syst. Appl. 2013, 40, 6055–6063. [Google Scholar] [CrossRef]
  28. Talei, A.; Chua, L.H.C.; Wong, T.S. Evaluation of rainfall and discharge inputs used by Adaptive Network-based Fuzzy Inference Systems (ANFIS) in rainfall–runoff modeling. J. Hydrol. 2010, 391, 248–262. [Google Scholar] [CrossRef]
  29. Fawagreh, K.; Gaber, M.M.; Elyan, E. Random forests: From early developments to recent advancements. Syst. Sci. Control. Eng. Open Access J. 2014, 2, 602–609. [Google Scholar] [CrossRef]
  30. da Silva Júnior, J.C.; Medeiros, V.; Garrozi, C.; Montenegro, A.; Gonçalves, G.E. Random forest techniques for spatial interpolation of evapotranspiration data from Brazilian’s Northeast. Comput. Electron. Agric. 2019, 166, 105017. [Google Scholar] [CrossRef]
  31. Granata, F. Evapotranspiration evaluation models based on machine learning algorithms—A comparative study. Agric. Water Manag. 2019, 217, 303–315. [Google Scholar] [CrossRef]
  32. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T. Xgboost: Extreme Gradient Boosting, R Package Version 0.4-2; CRAN: Vienna, Austria, 2015; Volume 1, pp. 1–4. [Google Scholar]
  33. Fathi, M.; Azadi, M.; Kamali, G.; Meshkatee, A.H. Improving precipitation forecasts over Iran using a weighted average ensemble technique. J. Earth Syst. Sci. 2019, 128, 133. [Google Scholar] [CrossRef]
  34. Lorenz, R.; Herger, N.; Sedláček, J.; Eyring, V.; Fischer, E.M.; Knutti, R. Prospects and caveats of weighting climate models for summer maximum temperature projections over North America. J. Geophys. Res. Atmos. 2018, 123, 4509–4526. [Google Scholar] [CrossRef]
  35. Merrifield, A.L.; Brunner, L.; Lorenz, R.; Medhaug, I.; Knutti, R. An investigation of weighting schemes suitable for incorporating large ensembles into multi-model ensembles. Earth Syst. Dyn. 2020, 11, 807–834. [Google Scholar] [CrossRef]
  36. Acar, E.; Rais-Rohani, M. Ensemble of metamodels with optimized weight factors. Struct. Multidiscip. Optim. 2009, 37, 279–294. [Google Scholar] [CrossRef]
  37. Küçüktopcu, E.; Cemek, B. A comparison of deterministic and stochastic models for predicting air and litter properties in a broiler building. Int. J. Environ. Sci. Technol. 2022, 19, 12369–12384. [Google Scholar] [CrossRef]
  38. Firat, M.; Güngör, M. Hydrological time—Series modelling using an adaptive neuro—Fuzzy inference system. Hydrol. Process. Int. J. 2008, 22, 2122–2132. [Google Scholar] [CrossRef]
  39. Kişi, Ö.; Öztürk, Ö. Adaptive neurofuzzy computing technique for evapotranspiration estimation. J. Irrig. Drain. Eng. 2007, 133, 368–379. [Google Scholar] [CrossRef]
  40. Hengl, T.; Nussbaum, M.; Wright, M.N.; Heuvelink, G.B.; Gräler, B. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 2018, 6, e5518. [Google Scholar] [CrossRef]
  41. Tyralis, H.; Papacharalampous, G.; Langousis, A. A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 2019, 11, 910. [Google Scholar] [CrossRef]
  42. Oyebode, O.; Stretch, D. Neural network modeling of hydrological systems: A review of implementation techniques. Nat. Resour. Model. 2019, 32, e12189. [Google Scholar] [CrossRef]
  43. Taheri, M.; Bigdeli, M.; Imanian, H.; Mohammadian, A. An Overview of Evapotranspiration Estimation Models Utilizing Artificial Intelligence. Water 2025, 17, 1384. [Google Scholar] [CrossRef]
  44. D’Andrea, M.F.; Rousseau, A.N.; Bigah, Y.; Gattinoni, N.N.; Brodeur, J.C. Trends in reference evapotranspiration and associated climate variables over the last 30 years (1984–2014) in the Pampa region of Argentina. Theor. Appl. Climatol. 2019, 136, 1371–1386. [Google Scholar] [CrossRef]
  45. De la Casa, A.; Ovando, G. Variation of reference evapotranspiration in the central region of Argentina between 1941 and 2010. J. Hydrol. Reg. Stud. 2016, 5, 66–79. [Google Scholar] [CrossRef]
  46. Tunca, E. Evaluating the performance of the TSEB model for sorghum evapotranspiration estimation using time series UAV imagery. Irrig. Sci. 2024, 42, 977–994. [Google Scholar] [CrossRef]
  47. Roy, D.K.; Sarkar, T.K.; Kamar, S.S.A.; Goswami, T.; Muktadir, M.A.; Al-Ghobari, H.M.; Alataway, A.; Dewidar, A.Z.; El-Shafei, A.A.; Mattar, M.A. Daily prediction and multi-step forward forecasting of reference evapotranspiration using LSTM and Bi-LSTM models. Agronomy 2022, 12, 594. [Google Scholar] [CrossRef]
  48. Kadkhodazadeh, M.; Valikhan Anaraki, M.; Morshed-Bozorgdel, A.; Farzin, S. A new methodology for reference evapotranspiration prediction and uncertainty analysis under climate change conditions based on machine learning, multi criteria decision making and Monte Carlo methods. Sustainability 2022, 14, 2601. [Google Scholar] [CrossRef]
Figure 1. Geographic location of Paraguay and the 19 meteorological stations used in this study.
Figure 1. Geographic location of Paraguay and the 19 meteorological stations used in this study.
Applsci 15 11429 g001
Figure 2. Flowchart of the research stages for ET0 estimation in Paraguay.
Figure 2. Flowchart of the research stages for ET0 estimation in Paraguay.
Applsci 15 11429 g002
Figure 3. Leave-one-station-out validation method for ET0 estimation in Paraguay.
Figure 3. Leave-one-station-out validation method for ET0 estimation in Paraguay.
Applsci 15 11429 g003
Figure 4. Feature importance, evaluated using the mean SHAP values by using the ANN, KNN, RF, and XGB algorithms.
Figure 4. Feature importance, evaluated using the mean SHAP values by using the ANN, KNN, RF, and XGB algorithms.
Applsci 15 11429 g004
Figure 5. The scatterplots of measured and predicted ET0 values by using the different algorithms.
Figure 5. The scatterplots of measured and predicted ET0 values by using the different algorithms.
Applsci 15 11429 g005
Figure 6. The scatterplots of measured and predicted ET0 values by using weighting strategies.
Figure 6. The scatterplots of measured and predicted ET0 values by using weighting strategies.
Applsci 15 11429 g006
Figure 7. Monthly spatial distribution maps of ET0 in Paraguay (The maps were generated in QGIS at the default spatial resolution of approximately 1 km × 1 km).
Figure 7. Monthly spatial distribution maps of ET0 in Paraguay (The maps were generated in QGIS at the default spatial resolution of approximately 1 km × 1 km).
Applsci 15 11429 g007
Table 1. Geographical characteristics of the meteorological stations in Paraguay.
Table 1. Geographical characteristics of the meteorological stations in Paraguay.
Station NameRegionLatitude (°S)Longitude (°W)Altitude (m)
General BruguézPresidente Hayes24°45′58°50′89
San EstanislaoSan Pedro24°40′56°26′183
Salto del GuairáCanindeyú24°03′54°19′297
Aeropuerto Silvio PettirossiCentral25°14′57°31′89
ParaguaríParaguarí25°46′57°15′116
Capitán MezaItapúa26°56′55°12′263
ConcepciónConcepción23°25′57°18′75
Mariscal EstigarribiaBoquerón22°02′60°37′167
EncarnaciónItapúa27°20′55°50′90
Coronel OviedoCaaguazú25°28′56°24′159
PilarÑeembucú26°51′58°19′58
San Juan BautistaMisiones26°40′57°09′131
CaazapáCaazapá26°11′56°22′142
Aeropuerto GuaraníAlto Paraná25°21′54°27′247
VillarricaGuairá25°46′56°26′163
San PedroSan Pedro24°04′57°06′81
Pozo ColoradoPresidente Hayes23°27′58°52′98
Pedro Juan CaballeroAmambay23°35′55°44′563
Puerto CasadoAlto Paraguay22°17′57°56′78
Table 2. Monthly averages of daily ET0 (mm day−1).
Table 2. Monthly averages of daily ET0 (mm day−1).
StationJanFebMarAprMayJunJulAugSepOctNovDec
General Bruguez5.084.884.073.403.223.543.302.813.333.673.945.06
San Estanislao4.544.743.893.583.173.023.332.993.604.294.344.84
Salto del Guairá4.504.394.333.934.353.903.133.324.294.353.944.47
Aeropuerto Silvio Pettirossi4.984.844.083.773.673.493.093.543.383.884.755.15
Paraguarí4.754.603.863.603.843.383.433.543.964.104.515.09
Capitán Meza4.644.543.673.853.612.222.242.873.183.514.405.04
Concepción4.073.853.382.792.001.621.662.072.693.413.793.96
Mariscal Estigarribia5.214.773.823.091.881.501.672.242.993.934.594.86
Encarnación4.734.683.102.931.811.331.372.012.703.183.555.09
Coronel Oviedo4.884.784.053.783.683.513.653.253.713.994.535.04
Pilar4.624.523.813.753.622.542.983.653.703.734.364.79
San Juan Bautista4.604.684.093.863.782.713.393.693.353.944.335.15
Caazapá4.674.483.993.663.883.442.913.773.443.844.175.02
Aeropuerto Guaraní4.694.813.813.151.981.531.692.272.923.704.745.21
Villarrica4.764.813.853.212.001.501.682.352.873.704.514.97
San Pedro4.734.553.793.532.451.601.572.122.693.804.604.68
Pozo Colorado5.374.854.163.532.351.622.232.583.174.074.825.22
Pedro Juan Caballero4.424.203.673.332.161.851.872.543.113.684.274.58
Puerto Casado4.874.543.923.832.551.532.462.373.553.934.635.01
Table 3. Hyperparameter values used in machine learning algorithms.
Table 3. Hyperparameter values used in machine learning algorithms.
Hyperparameters Tuned
ANN
Number of hidden layers1
Number of hidden neurons5
Activation function in hidden LayerTansig
Activation function in output LayerPurelin
Number of epochs300
Network structure4–5–1
KNN
Optimal neighbor3
Weights = ’uniform’Uniform
Distance functionEuclidean distance function
RF
Number of estimators173
Maximum depth7
max_featuresSqrt
min_samples_leaf2
min_samples_split5
XGB
Number of estimators135
Number of learning rates0.159
Max dept3
Table 4. MAE, RMSE, and R2 statistics of the ANN, KNN, RF, and XGB algorithms in the training and testing phases.
Table 4. MAE, RMSE, and R2 statistics of the ANN, KNN, RF, and XGB algorithms in the training and testing phases.
ModelTrainingTesting
MAE (mm day−1)RMSE (mm day−1)R2MAE (mm day−1)RMSE (mm day−1)R2
ANN0.2460.3120.9320.3120.4430.882
KNN0.1590.2110.9690.2480.3940.906
RF0.1780.2280.9640.2440.3790.913
XGB0.1680.2200.9660.2430.3870.910
Table 5. MAE, RMSE, and R2 statistics of the ANFIS models in the training and testing phases.
Table 5. MAE, RMSE, and R2 statistics of the ANFIS models in the training and testing phases.
ModelTrainingTesting
MAE (mm day−1)RMSE (mm day−1)R2MAE (mm day−1)RMSE (mm day−1)R2
ANFIS-Trimf0.2110.2740.9480.2200.3000.948
ANFIS-Trapmf0.2570.3220.9280.2730.3540.926
ANFIS-Gaussmf0.2000.2660.9510.2020.2890.950
Table 6. Weight coefficients (ωi) of individual models under different weighting schemes in the training and testing phases.
Table 6. Weight coefficients (ωi) of individual models under different weighting schemes in the training and testing phases.
Weighting SchemeModelTrainingTesting
R2-based weightingANFIS0.1990.208
RF0.2020.200
XGB0.2020.199
KNN0.2030.199
ANN0.1950.193
Inverse RMSE-based weightingANFIS0.1830.257
RF0.2120.196
XGB0.2200.192
KNN0.2300.188
ANN0.1550.167
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cemek, B.; Küçüktopçu, E.; Fleitas Ortellado, M.G.; Simsek, H. Data-Driven Estimation of Reference Evapotranspiration in Paraguay from Geographical and Temporal Predictors. Appl. Sci. 2025, 15, 11429. https://doi.org/10.3390/app152111429

AMA Style

Cemek B, Küçüktopçu E, Fleitas Ortellado MG, Simsek H. Data-Driven Estimation of Reference Evapotranspiration in Paraguay from Geographical and Temporal Predictors. Applied Sciences. 2025; 15(21):11429. https://doi.org/10.3390/app152111429

Chicago/Turabian Style

Cemek, Bilal, Erdem Küçüktopçu, Maria Gabriela Fleitas Ortellado, and Halis Simsek. 2025. "Data-Driven Estimation of Reference Evapotranspiration in Paraguay from Geographical and Temporal Predictors" Applied Sciences 15, no. 21: 11429. https://doi.org/10.3390/app152111429

APA Style

Cemek, B., Küçüktopçu, E., Fleitas Ortellado, M. G., & Simsek, H. (2025). Data-Driven Estimation of Reference Evapotranspiration in Paraguay from Geographical and Temporal Predictors. Applied Sciences, 15(21), 11429. https://doi.org/10.3390/app152111429

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop