1. Introduction
Evapotranspiration (ET) refers to the combined process of evaporation from soil and water surfaces and transpiration from the vegetation canopy [
1]. Accurate estimation of ET is a critical parameter for effective water resource planning, irrigation management, energy production, and flood control [
2,
3,
4]. Meteorological factors play a crucial role in agricultural productivity and the efficiency of water management systems. In particular, consistent and reliable meteorological data are essential for optimal irrigation scheduling, which directly affects crop yield and overall production performance [
4].
Crop evapotranspiration (ETc) is a fundamental component of the hydrological cycle, representing the transfer of water from soil to the atmosphere [
5]. Typically, ETc is estimated by multiplying reference evapotranspiration (ETo) by the crop coefficient (Kc). Consequently, ETo serves as a key parameter in estimating irrigation water requirements [
6], designing irrigation systems [
7], and assessing drought risks [
8,
9]. Accurate estimation of ETo is particularly crucial for analyzing the water stress effects on crop yield in arid and semi-arid regions, where water resources are scarce and climatic variability is high [
10]. Several empirical and physically based methods have been developed to estimate ETo accurately, including Hargreaves–Samani [
11], Thornthwaite [
12], Blaney–Criddle [
13], Priestley–Taylor [
14], and Penman–Monteith (PM) [
15] approaches. Among these, the PM method has been designed as the gold standard by the Food and Agriculture Organization (FAO) and is recommended in the FAO-56 guidelines for regions where complete meteorological data are available [
16]. Due to its high accuracy, the FAO-56 PM method has been validated in various countries, including China [
17], Brazil [
18], Egypt [
19], Iran [
20], the Czech Republic [
21], and Turkey [
22]. Although the method has demonstrated high accuracy, its major limitation is the dependence on in situ meteorological measurements, such as air temperature (T), solar radiation (Rs), wind speed (U
2), and relative humidity (RH) [
16]. This dependency poses significant challenges in regions where meteorological data are sparse or unavailable [
23]. Additionally, retrieved data can be affected by sensor characteristics, instrument drift, or a low temporal sampling frequency [
24]. Furthermore, the sparse distribution of meteorological stations often leads to data gaps and limited accessibility to the data, as stations are managed by different administrative entities [
24,
25].
Two alternative approaches are commonly employed when meteorological data are missing or of insufficient quality. The first approach involves empirical methods that require fewer input parameters. Among these, the Hargreaves–Samani model is the most widely used, requiring only maximum (T
max) and minimum air temperature (T
min), as well as extra-terrestrial radiation [
26,
27]. However, empirical methods exhibit lower accuracy when applied to the different climatic regions due to their limited generalization capability [
28,
29,
30]. The second approach utilized global climate reanalysis datasets to compensate for missing or low-quality ground meteorological observations. Reanalysis datasets are produced by assimilating various satellite and ground-based observational data into weather prediction models [
24,
31]. Although such datasets provide a promising alternative in regions with sparse meteorological networks, they contain uncertainties due to differences in the model physics used and the data assimilation method [
32]. Therefore, reanalysis products are often evaluated against ground-based meteorological observations to ensure their reliability and accuracy. For these purposes, several gridded datasets have been employed to assess model performance, including CLDAS in China [
33], CERES in South America [
34], NASA POWER in Mexico [
35], AgERA5 in West Africa [
36], and CMIP6 in East Asia [
37]. To overcome performance limitations associated with empirical or reanalysis-based methods, machine learning (ML) models have been used in recent years due to their superiority in modeling ETo. Commonly applied algorithms include Extreme Gradient Boosting Algorithm (XGBoost) [
38], Random Forest (RF) [
39], Artificial Neural Networks [
40], Convolutional Neural Networks [
41], and Ensemble Methods [
4]. Nevertheless, most of the studies have relied on ground-based meteorological data for ETo estimation. Furthermore, the success of ML models is highly dependent on the quality and quantity of the input data. Despite the increasing availability of global reanalysis products, a systematic evaluation is still required to bridge the gap between coarse-resolution satellite data and local-scale ETo requirements in complex terrains like Turkey. While several studies have utilized ERA5-Land data for climatic analysis [
42,
43], its direct integration with advanced ML models for regional ETo estimation remains relatively unexplored. This study addresses this necessity by defining the physical limitations of reanalysis datasets in capturing local microclimates and proposing an ML based correction framework.
The primary objective of this study is to assess the robustness, performance, and consistency of ML models in estimating ETo using either a full reanalysis dataset (ERA5-Land) or a simplified approach based solely on limited variables (Tmax and Tmin). To achieve this goal, the study focuses on three specific aims: (i) implementing a solar radiation (Rs) QA/QC calibration step prior to comparison; (ii) systematically evaluating full versus temperature-only ERA5-Land inputs across multiple climate types; and (iii) quantifying the trade-off between accuracy and computational cost among the selected ML models.
3. Results
3.1. Impact of Rs Calibration on ETo Estimation
In this study, a QAQC analysis was employed to calibrate the Rs. The calibration process involved dividing the observation records into 60-day intervals. Within each period, the measured Rs was compared against the theoretical Rso. Adjustment was triggered if the ratio of Rso to Rs (derived from the average of the top 10% of daily Rs values in a 60-day period) was less than 0.97 or greater than 1.037 [
47]. The Rso values were calculated using the day of the year, atmospheric water vapor, and station elevation, following the methodology of Allen et al. [
16].
Figure 2 illustrates a representative comparison of measured Rs versus the expected Rso for the Düzce weather station for the period 2005 to 2010. The Düzce weather station was chosen because it represents a transitional climate zone and possesses highly consistent, continuous data records.
The calibration of Rs resulted in significant improvements in ETo estimates across 33 meteorological stations.
Table 2 presents the comparison of ETo estimations using calibrated and uncalibrated Rs over the 30-year study period (1981–2010). The calibration procedure yielded a high correlation, with the R
2 ranging from 0.96 to 0.99. The RMSE for all stations was 0.22 mm day
−1. Results showed that performance metrics varied with climatic conditions. Coastal stations (e.g., Antalya, Mersin, Izmir, Trabzon, and Samsun) demonstrated the lowest errors (RMSE: 0.12–0.17 mm day
−1; MAE: 0.06–0.09 mm day
−1), whereas stations situated in transitional climates or continental interiors with complex topography (e.g., Gaziantep, Edirne, Isparta, and Diyarbakir) exhibited higher discrepancies (RMSE: 0.36–0.43 mm day
−1 and MAE: 0.19–0.33 mm day
−1). Despite these spatial variations, all stations maintained R
2 values above 0.96, confirming the robustness of the calibration methodology across diverse environmental conditions.
3.2. Evaluation of ERA5-Land Gridded Data
In this study, ERA5-Land gridded weather data were evaluated with ground measurements from 1981 to 2010 across Turkey (33 stations). For this purpose, time series of Rs, T
max, T
min, U
2, RH, and P at daily scales were used, as these are the required input parameters for daily ETo calculation using the FAO 56-PM method.
Figure 3 depicts the performance of ERA5-Land data against the in situ ground weather measurements. Generally, Rs, T
max, T
min, and P variables exhibit high correlations with ground measurements and lower bias values (1.09, −1.13, 0.21, and 0.43 for Rs, T
max, T
min, and P, respectively). In contrast, U
2 and RH have lower R
2 values and higher RMSE and MAE values. The average bias for U
2 is 0.867, indicating that ERA5-Land tends to overestimate U
2. Likewise, the positive bias (2.59) calculation for RH indicates an overestimate for RH.
There was a significant positive correlation between estimated and measured Rs. For all stations, the R2 value was calculated as 0.878, and the RMSE and MAE were calculated as 3.052 and 2.234 MJ m−2 day−1, respectively. Since the magnitude of the bias is relatively small compared to the average of Rs, the ERA5-Land Rs data can be considered as reliable.
The T
max comparison plot in
Figure 3 shows that the estimated and measured data closely follow each other, with an R
2 of 0.959, while the RMSE and MAE are 2.510 °C and 1.953 °C, respectively. The mean bias of −1.139 °C indicates a slight negative tendency of ERA5-Land to underestimate T
max, which becomes more evident during cold winter days but remains relatively small in magnitude, suggesting that ERA5-Land T
max can be considered accurate for climatic studies.
For Tmin, the results are broadly similar in terms of overall performance, with R2 of 0.905 and RMSE and MAE of 2.851 °C and 2.097 °C, respectively. However, the mean bias differs, with a mean bias of 0.219 °C indicating that ERA5-Land tends to overestimate nighttime temperatures slightly. This bias in Tmin is more pronounced during colder winter times, highlighting some limitations of ERA5-Land under certain conditions.
For U2, the agreement between ERA5-Land and MGM observations is weaker than for Tmax, Tmin, and Rs, with an R2 value of 0.280 and RMSE and MAE of 1.469 m s−1 and 1.101 m s−1, respectively. The mean bias of about 0.867 m s−1 indicates that ERA5-Land overestimates near-surface wind speeds, suggesting that the estimated U2 by ERA5-Land should be used with bias-correction procedures rather than directly.
For RH, ERA5-Land and ground stations exhibit a moderate level of agreement, with a mean R2 of 0.646 and RMSE and MAE of roughly 10.770% and 8.457%, respectively. The mean bias of approximately 2.585% indicates a positive trend in ERA5-Land RH.
For P, ERA5-Land exhibits a high level of consistency with ground stations, with a mean R2 of 0.991 and RMSE and MAE on the order of 0.808 kPa and 0.629 kPa, respectively. The mean bias of 0.431 kPa indicates that ERA5-Land tends to slightly overestimate surface pressure, although the magnitude of the bias is relatively small, showing that ERA5-Land P can be used reliably for most climatic studies.
3.3. Performance of Machine Learning Models
Due to ETo being physically non-negative, any negative ML predictions were processed by setting them to zero (
). These negative values occurred only near-zero ETo conditions and represented a negligible fraction of the dataset. For transparency, the frequency of negative raw predictions is reported in
Section 4.3. The estimated ETo values ranged from 0.12 to 12.17 mm d
−1 for RF, from 0.0 to 12.64 mm d
−1 for XGBoost, and from 0.0 to 10.19 mm d
−1 for ELM. As shown in
Figure 4, the XGBoost model provided the highest accuracy in the testing phase, whereas RF performed best during training. These models yielded the highest R
2 values of 0.95 (testing) and 0.99 (training), along with the lowest RMSE and MAE values of 0.43 and 0.30 mm d
−1 (testing), and 0.19 and 0.13 mm d
−1 (training), respectively. Although ELM exhibited slightly lower performance, it still produced reasonably accurate ETo estimates with RMSE (0.43 mm d
−1 for training and 0.45 mm d
−1 for testing) and MAE (0.30 mm d
−1 for training and 0.32 mm d
−1 for testing) values, and a lower R
2 value of 0.95 and 0.95 for training and testing, respectively.
For the ML models trained with limited data (only T
max and T
min), the estimated minimum and maximum ETo values were 0.13 and 10.01 mm day
−1 for RF, 0.20 and 9.02 mm day
−1 for XGBoost, and 0.0 and 8.45 mm day
−1 for ELM, respectively (
Figure 5). RF performed best for training (R
2 = 0.93, RMSE = 0.52 mm day
−1, and MAE = 0.36 mm day
−1), while ELM was the most accurate for testing (R
2 = 0.90, RMSE = 0.43 mm day
−1, and MAE = 0.21 mm day
−1). Although ELM was the best model for testing, other models also achieved similar performance, with R
2 = 0.90, RMSE = 0.44 mm day
−1, and MAE = 0.21 mm day
−1 for XGBoost, and R
2 = 0.90, RMSE = 0.44 mm day
−1, and MAE = 0.21 mm day
−1 for RF.
The estimation accuracy of the ML models was further analyzed across the four primary climate types in Turkey. As summarized in
Table 3, the models demonstrated varying levels of performance based on regional climatic characteristics. The Mediterranean climate zone exhibited the highest accuracy (R
2 up to 0.96 for XGBoost), likely due to the consistent dominance of Rs as the primary driver of ETo. Similarly, the transitional climate showed very low error margins (RMSE: 0.41–0.42 mm/day). In contrast, the Black Sea climate yielded slightly higher errors and lower correlations (R
2 approx. 0.91), which can be attributed to more frequent cloud cover and higher humidity levels that introduce more complexity into the ETo process. Despite these regional variations, all models remained robust, with R
2 values exceeding 0.90 across all climate types.
In addition to the estimation performance of the ML models used in this study, the computational cost was also evaluated. All experiments were conducted under identical hardware (MacBook M3 Pro laptop) and software conditions to ensure a fair comparison, and the results are presented in
Table 4. The RF model became computationally expensive due to the large number of trees combined with significant depth (up to 25 levels) and complex splitting rules. Consequently, RF exhibited the longest training time among the evaluated models, with a total of 121.22 min. Even with a large number of trees and a broad hyperparameter search, XGBoost outperformed the other models in speed, completing its training in just 3.1 min, which was significantly faster than both RF and ELM. This speed demonstrates the efficiency of the gradient boosting implementation, which likely benefited from parallel processing and optimized tree construction. The ELM model took 27.42 min to train. While ELM is known for speed, the model was slowed down by large hidden layers (2000 neurons) and multiple activation functions. Consequently, ELM was significantly slower than XGBoost, though still considerably faster than RF. This underscores the critical trade-off between model complexity and training efficiency, which must be balanced against predictive performance in practical applications.
3.4. Evaluation of ETo Estimation with Reanalysis Data
To assess the reliability of ERA5-Land reanalysis data as a standalone estimator for ETo, performance metrics were calculated for three distinct temporal subsets: the full 30-year study period (1981–2010), the training period (1981–2000), and the testing period (2001–2010). These results are summarized in
Table 5. For the full 30-year period, a strong linear relationship (R
2 = 0.90) was observed between the ERA5-Land estimates and ground station observations. As shown in
Figure 6, the slope of the regression line was 0.9989, indicating a robust estimation close to unity, while a slight positive intercept (0.3348) suggests a minor systematic overestimation by the model. In terms of error magnitude for the full dataset, the RMSE and MAE were calculated as 0.72 mm day
−1 and 0.55 mm day
−1, respectively. The positive bias of 0.31 mm day
−1 confirms that ERA5-Land tends to yield slightly higher ETo values than observed. Descriptive statistics further reveal that ERA5-Land closely monitored the extremes; the maximum estimated ETo was 11.43 mm day
−1, comparable to the observed maximum of 13.33 mm day
−1, and minimum values were similarly consistent (0.14 vs. 0.04 mm day
−1). To statistically evaluate the relationship between ground-based observations and ERA5-Land estimations, a paired Student’s
t-test was performed. The analysis revealed a statistically significant difference between the two datasets (t = 275.41, df = 341,516,
p < 0.001), with a mean difference of 0.31 mm/day.
To better understand the ERA5-Land performance under different atmospheric conditions, the data were analyzed on both a monthly and a seasonal basis.
Figure 7 presents the long-term mean daily ETo for both datasets across the year. The ERA5-Land product (
red dashed line) closely follows the seasonal pattern of ET, reaching its maximum and minimum value in July (5.80 mm day
−1) and December (0.91 mm day
−1), respectively. Despite the high correlation, there is a systematic difference in certain months. As illustrated in
Figure 7, this bias is not uniform over the year. It is minimal in winter months, with differences under 0.20 mm day
−1, but increases substantially during the late-summer to autumn transition.
The analysis of monthly bias (
Figure 8) shows that the high overestimation occurred in September (0.48 mm day
−1), followed by August (0.45 mm day
−1) and October (0.45 mm day
−1). This pattern suggests that although ERA5-Land reproduces the peak summer radiation and temperature well, it tends to cool more slowly than the in situ station observations during the transition into autumn. Therefore, a higher ETo was estimated than that measured on the ground.
The seasonal performance metrics were presented in
Table 6. The highest RMSE was observed in summer (0.93 mm day
−1), which is consistent with the higher ETo during summer. In contrast, the greatest systematic bias was observed in autumn (0.41 mm day
−1). Winter exhibited the lowest absolute error (RMSE: 0.46 mm day
−1). The disparity between the low winter R
2 of 0.25 and the high annual R
2 of 0.90–0.96 does not indicate poor model performance but rather reflects the low amplitude of ETo values during winter. Given that daily ETo is negligible in this season, minor absolute errors (RMSE 0.46 mm day
−1) significantly reduce the value of R
2. Nevertheless, these errors are practically insignificant for agricultural planning.
4. Discussion
4.1. Rs Calibration Using Rso Calculations
The calibration of Rs against theoretical Rso had a clear and measurable impact on the quality of ETo estimates across the 33 meteorological stations. By constraining measured Rs with physically based Rso values computed from the day of year, atmospheric water vapor, and station elevation following the study by Allen et al. [
16], the QA/QC procedure substantially reduced biases in the radiation input and, in turn, in the derived ETo series. Several physical and observational factors likely contribute to these spatial differences. Inland and topographically complex stations are more prone to localized shading, orographic cloud formation, and frequent changes in aerosol loading and dust transport, all of which can alter the effective relationship between Rs and Rso even under clear-sky conditions [
57,
58]. Methodologically, the use of 60-day windows to perform the Rs calibration is a pragmatic choice, but it also introduces some important limitations. While a 60-day interval is long enough to smooth out short-lived weather anomalies and highlight sensor drift or persistent bias, it may not be optimal in all climates or seasons.
4.2. Performance of ERA5-Land Reanalysis Data
One of the main objectives of this study was to evaluate the capability of ERA5-Land reanalysis data to replace ground-based observations in Turkey. Results demonstrated a strong agreement for Rs, Tmax, and P (R
2 > 0.90). These results align with several recent studies that have shown that ERA5-Land provides robust estimates for radiative and thermal parameters in Mediterranean climates [
31,
59,
60]. In contrast, U
2 showed poor agreement (R
2 = 0.28) and a marked positive bias, and RH showed moderate agreement (R
2 = 0.64); despite these aerodynamic inaccuracies, the agreement in the ETo was estimated from the ERA5-Land inputs, with the in situ calculations remaining high (R
2 = 0.90). Multiple empirical studies demonstrate that the FAO-PM method can be used to estimate ETo accurately when U
2 or RH is missing or inaccurate in humid [
61], semi-arid [
62] and Mediterranean [
63] environments, and these results are also agreement with those of Sentelhas et al. [
64], who found that the aerodynamic term in the PM equation has a lower sensitivity in certain inland or transitional climate zones compared to the radiative term. Finally, the results of Rosa et al. [
35] indicated that ETo can be estimated accurately under Brazilian climate conditions even when specific reanalysis inputs exhibit lower accuracy. Consequently, ERA5-Land products remain a viable alternative for ETo estimation in similar climates. The significant difference (
p < 0.001) obtained from the
t-test requires careful interpretation. Faber and Fonseca [
65] stated that studies with huge sample sizes, such as the one presented here (N > 340,000), are often identified as statistically significant due to minimal and practically insignificant deviations. This phenomenon, known as the ‘large sample size effect,’ suggests that the statistical significance may not necessarily imply a lack of model reliability. Furthermore, the mean difference of 0.31 mm/day is relatively small for regional ETo estimations. This minor deviation can be attributed to the 0.1° spatial resolution of ERA5-Land, which may not fully capture the complex topographic features and local micro-climates of Turkey. Additionally, the performance of the reanalysis data shows seasonal fluctuations, being more robust during high-radiation summer months compared to the more variable conditions of winter and transitional periods. Overall, despite the statistical difference, the high R
2 values and low mean difference confirm that ERA5-Land is a viable alternative for ETo estimation in the region.
A deeper physical interpretation of the high correlation (
) achieved despite biases in U
2 and RH was provided by analyzing the monthly physical decomposition of
into its radiative (ET
rad) and aerodynamic (ET
aero) components (
Figure 9). Analysis revealed that the radiative term contributed an average of 74.1% to total ETo. Seasonal fluctuations are shown by this contribution, with a peak of 80.7% reached in May and a minimum of 65.3% in December. Due to the high accuracy in solar radiation and temperature, which are the primary drivers for ET
rad, as shown by ERA5-Land, the impact of aerodynamic input errors is significantly attenuated. This dominance effectively buffers the cumulative ETo estimation against the larger uncertainties found in U
2 and RH modeling. These results suggest that for ETo monitoring in Mediterranean and Continental climates, ensuring the accuracy of radiative parameters is more critical than high-precision U
2 data.
4.3. Evaluation of ML Models
To assess the performance of the three ML models used in this study (RF, XGBoost, and ELM), two scenarios were considered. In the first scenario, full data predictors were used. In the second, only Tmax and Tmin were used to reflect the ground stations that record only temperature.
Under the full dataset scenario, the XGBoost model achieved the best generalization performance on testing and resulted in the lowest errors overall. The RF model showed the best fit during the training phase but degraded more in testing, suggesting a tendency toward overfitting. In contrast, XGBoost presented a more favorable balance between bias and variance. The ELM model performed slightly worse than XGBoost and RF but remained robust, indicating that it can still learn most ETo variations when a full set of predictors is available, which is consistent with the ability of ELM to approximate complex nonlinear relationships using a single hidden layer.
The superior performance of XGBoost can be explained by the strong capability of gradient boosting ensembles to model nonlinear interactions among predictors and handle noisy data, while its built-in regularization helps to reduce overfitting [
66]. Similar results were reported by Ge et al. [
67], who found that XGBoost outperformed other ML models for ET estimation. Kaissi et al. [
68] in Morocco and Lin et al. [
69] in Taiwan also reported that XGBoost produced the most accurate ET estimations. Moreover, XGBoost has shown strong performance in other agricultural applications, including yield estimation in wheat [
70], cotton [
71], maize [
72], disease detection [
73], and soil moisture mapping [
74].
The ML models also differed slightly in the upper range of the estimated ETo values (e.g., XGBoost up to ~12.6 mm d
−1; RF up to ~12.2 mm d
−1; and ELM up to ~10.2 mm d
−1). These differences suggest distinct extrapolation behaviors at the tails of the ETo distribution, indicating that each model has its own strengths and weaknesses in capturing the full range of ETo values [
75].
In the second scenario (reduced input), model accuracy slightly decreased but remained high. In this case, the models had to infer the effects of radiation and humidity indirectly from diurnal temperature range and seasonal patterns. The RF model, again, showed the best fit during training, while the ELM model presented the best performance in testing and was closely followed by the XGBoost and RF. This result suggests that ELM can still capture the main signal in ETo despite the reduced input setting.
While the FAO-56 PM equation is physically constrained to produce non-negative values, ML models function as unconstrained statistical regressors [
76]. Consequently, when the actual ETo is near zero, the regression function may fluctuate slightly below zero due to residual noise. In the raw (unconstrained) outputs, XGBoost and ELM produced minor negative values (minimums of −0.05 and −0.60 mm day
−1, respectively); these rare cases were clipped to 0 in the final reported ETo series. An analysis of the frequency of these occurrences reveals that they are rare and physically negligible. Negative predictions constituted only 0.0008% of the total test dataset for XGBoost and 0.052% for ELM.
The regional performance analysis shows that geographical location and climate type have a strong effect on the accuracy of ETo estimation. Higher accuracy in Mediterranean regions is consistent with previous studies and indicates that reanalysis-based ML models work well in clear sky and radiation-dominated conditions [
77,
78]. In contrast, the lower performance observed in the Black Sea region suggests that complex land–atmosphere interactions and local humidity conditions are still difficult to capture with ~9–11 km resolution datasets such as ERA5-Land [
59,
79].
In terms of computational efficiency, RF was the most expensive, reflecting its use of many deep trees with complex splits. ELM had intermediate computational cost due to its large hidden layers and multiple activation functions. XGBoost was the fastest of the three. These findings are consistent with the use of highly optimized and parallelized gradient boosting implementations, which make XGBoost suitable for large datasets and environments [
80,
81].
4.4. Seasonal Bias and Land-Atmosphere Coupling
An analysis of seasonal performance revealed a systematic positive bias in the ERA5-Land-based ETo estimates during the transition from summer to autumn (September–October). During this period, ERA5-Land tended to overestimate ETo compared to ground observations. This discrepancy highlights the impact of inaccuracies in the ERA5-Land meteorological data. While the ETo definition assumes a standardized reference surface with a constant surface resistance (70 s m
−1), the uncertainties in surface variable estimations (e.g., U
2 and RH) lead to deviations in the calculated evaporative demand, rather than a contradiction of the reference surface parameters themselves. The overestimation is primarily driven by the biases identified in the aerodynamic and radiative variables within the ERA5-Land dataset. ERA5-Land exhibits a systematic overestimation of U
2 and a bias in RH errors during these months. In transitional seasons, reanalysis models often struggle to resolve local atmospheric stability and cloud dynamics, such as the increasing frequency of stratiform clouds or fog typical of the autumn season in Turkey [
31,
59]. Consequently, the model likely computes higher Rs and U
2 than observed on the ground, leading to higher ETo values. This suggests that while ERA5-Land captures the general annual cycle, the local calibration of aerodynamic variables (U
2 and RH) is crucial for accurate autumn irrigation planning. However, the integration of XGBoost as an ML layer effectively addresses these localized challenges. The superior performance of the XGBoost model is attributed to its ability to learn the complex, non-linear relationships between aerodynamic inputs and the high-quality radiative data of ERA5-Land. By assigning more weight to the dominant radiative components, the XGBoost model effectively filters out the noise from biased U
2 and RH inputs. This ensures that the framework remains robust and physically grounded even during the transitional seasons where standalone reanalysis products exhibit their highest discrepancies, thereby providing a high-quality regional estimation tool.
4.5. Limitations and Future Directions
Several alternative ETo estimation approaches have been tested in the literature. Empirical models like Hargreaves–Samani or Priestley–Taylor are widely used because they require fewer input variables. However, they often need local calibration to provide accurate results in different climates. Another common approach is using remotely sensed data (e.g., MODIS or Landsat). While satellite-based methods provide excellent spatial coverage, they are frequently limited by cloud cover and lower temporal frequency compared to reanalysis products. Our approach, combining ERA5-Land with ML models, bridges these gaps by providing continuous, high-temporal-resolution data that can be adapted to data-scarce regions. Despite the promising results, this study has certain limitations. First, the 0.1° (~9–11 km) spatial resolution of ERA5-Land might still be too coarse to capture microclimatic effects in regions with extremely complex topography or sharp elevation changes. Second, the performance of the ML models is directly linked to the quality of the ERA5-Land inputs. As observed in this study’s results, variables like U2 and RH showed lower correlation with ground data, which can introduce uncertainties in ETo estimation. Future studies could explore the use of downscaling techniques or the integration of deep learning architectures like Long Short-Term Memory (LSTM) networks to better handle the temporal dynamics of meteorological variables.
5. Conclusions
This study evaluates the ML models for estimating ETo accurately across diverse climatic regions in Turkey, employing a comparative analysis between ground-based observations and ERA5-Land reanalysis data. A critical finding of this study is the necessity of rigorous data quality control, specifically the calibration of measured Rs against Rso curves, which significantly improved the reliability of ground observations, yielding high consistency across all stations with R2 ranging from 0.96 to 0.99. This step was essential for establishing a reliable starting point for the evaluation of the reanalysis data and the subsequent assessment of the ML models. The evaluation of the ERA5-Land dataset demonstrated a strong agreement with ground observations, which achieved R2 values exceeding 0.90. However, seasonal bias was identified during the transition from summer to autumn (September–October), where the reanalysis model tended to overestimate evaporative demand. In the comparative analysis of ML algorithms, XGBoost emerged as the superior model, providing the optimal balance between estimation accuracy, generalization capability, and computational efficiency. Despite the superior training performance of the RF model, it showed indications of overfitting and was computationally intensive, requiring longer training durations as a result of its complex ensemble of trees. Conversely, the ELM offered a robust alternative, particularly in scenarios limited to minimal meteorological inputs (Tmax and Tmin), where it slightly outperformed other models in testing accuracy. Ultimately, this research confirms that the integration of ERA5-Land reanalysis data with the XGBoost ML model provides a robust and high-precision framework for regional ETo estimation across diverse climates. Unlike standalone reanalysis products, this integrated approach successfully bridges the gap between coarse-resolution global data and local meteorological observations. Specifically, the XGBoost model acts as a non-linear error-correction tool that mitigates the systematic biases identified in ERA5-Land’s U2 and RH variables. By effectively prioritizing the highly accurate radiative signals from the reanalysis product, the XGBoost framework achieved superior performance (R2 = 0.95) and physical consistency across all evaluated climate zones. This integration proves that even in topographically complex regions, reanalysis–ML hybrid models provide a reliable alternative to ground-based observations for long-term hydrological and agricultural water management. Future research should focus on developing specific bias correction techniques for the aerodynamic variables in reanalysis data to mitigate seasonal overestimations and further refine water resource management strategies in topographically complex terrains.