You are currently viewing a new version of our website. To view the old version click .
Agriculture
  • Article
  • Open Access

10 December 2025

Assessing Reference Evapotranspiration Estimation Considering Irrigation Scheduling Intervals

,
,
,
,
and
1
Departament d’Enginyeria Industrial i Construcció, Àrea d’Enginyeria Agroforestal, Universitat de les Illes Balears, Carretera de Valldemossa km 7.5, 07122 Palma, Spain
2
Departament d’Enginyeria Rural i Agroalimentària, Universitat Politècnica de València, Camí de Vera s/n, 46022 València, Spain
3
Servicio de Regadíos, Consejería de Gestión Forestal y Mundo Rural, Junta de Extremadura, Avda. Luis Ramallo s/n, 06800 Mérida, Spain
*
Author to whom correspondence should be addressed.
This article belongs to the Section Agricultural Water Management

Abstract

Accurate estimation of reference evapotranspiration (ETo) is essential for irrigation planning in semi-arid Mediterranean regions. This study evaluated temperature-based and neural network models for estimating daily ETo and its accumulated values over multiple timescales, using data from two lysimeter stations in Albacete and Badajoz, Spain. Model performance was assessed against the FAO56 PM equation and against lysimeter measurements to quantify the joint effect of benchmark choice and temporal aggregation. Under FAO56 PM benchmarking, RRMSE for temperature-based models in Albacete decreased from about 0.18 for daily ETo to around 0.08 for monthly accumulated ETo, while complex models achieved daily RRMSE near 0.06–0.07, and all models exhibited RRMSE below 0.08 at monthly and longer scales. When lysimeter ETo was used as the benchmark, errors increased and became more variable in winter, with daily RRMSE often exceeding 0.22, indicating reduced lysimeter reliability under cold, calm conditions. Overall, extending the estimation interval from daily to multi-day periods markedly reduced errors and narrowed differences among models. These results show that, in the semi-arid Mediterranean environments studied, temperature-based models can provide operationally reliable estimates at irrigation-relevant timescales, while FAO56 PM offers a more robust primary benchmark than lysimeter measurements for winter irrigation planning.

1. Introduction

Accurate estimation of crop water requirements is crucial for the design and management of irrigation systems. When considering soil water balance, crop water requirements primarily depend on estimating crop evapotranspiration (ETc). A widely adopted and straightforward method for estimating ETc, initially proposed by [1], consists of multiplying a crop-specific coefficient (Kc), which accounts for the specific physiological and morphological traits, by the reference evapotranspiration (ETo) for a grass surface [2]. ETo represents the atmospheric demand for water by integrating soil evaporation and plant transpiration from a standardised reference surface. The FAO Penman–Monteith method (FAO56 PM) defines this reference surface as a hypothetical grass crop with specific characteristics, including a height of 0.12 m, a surface albedo of 0.23, and a constant surface resistance of 70 s m−1 under standard conditions [2].
The FAO56 PM equation, regarded as the standard method for estimating ETo, requires multiple meteorological inputs, including maximum and minimum air temperatures (Tmax and Tmin), solar radiation (Rs), air humidity, and wind speed (u2). However, due to the frequent scarcity or unreliability of weather data in some regions, different simplified methods with reduced input requirements have been proposed. One of these variants is the temperature-based model proposed by the FAO, hereafter referred to as PMT, which relies primarily on temperature data, often the most recorded and, in many datasets, the only available variables [3]. Another widely used temperature-based model is the Hargreaves–Samani (HS) equation [4]. This method has become popular because of its simplicity and minimal data requirements. In this sense, the development of temperature-based methods for estimating ETo is supported by several reasons. On the one hand, temperature, along with Rs, accounts for most of the variability in ETo [5], and the daily temperature range can indirectly reflect other climatic factors such as humidity, cloud cover, and advection-related processes [6,7,8]. Moreover, temperature is the most widely monitored and easily measured meteorological variable, often recorded even in areas lacking complete weather datasets [9]. As a result, numerous simplified models using only temperature inputs have been proposed across diverse climates [10]. While many of these models aim to reproduce the results of the FAO56 PM method, they often lack its physical foundations [11]. For this reason, FAO recommends estimating any missing inputs and preserving the original formulation to maintain the integrity of the physical relationships between variables, an approach that is sometimes overlooked in practice [11].
Several studies have shown that both PMT and HS equations can provide accurate ETo estimates, with relatively low errors when compared to the FAO56 PM reference method, e.g., [3,10,12,13]. Although these alternatives improve applicability in data-limited contexts, they often require local calibration or methodological adjustments to achieve acceptable accuracy [14,15]. Calibration procedures often rely on FAO56 PM estimates, typically involving the adjustment of a single average calibration coefficient per site. One of the main limitations of these calibrated models is their lack of transferability, as the calibration remains site-specific and cannot be applied to locations without prior local parameterisation. In most cases, the HS parameters are calibrated using the entire dataset from a given station, or even from a group of stations, resulting in a single set of factors per location. In contrast, only a limited number of studies have explored the monthly calibration of HS parameters, generally focusing on deriving monthly coefficients rather than examining the monthly performance of both calibrated and non-calibrated equations to determine whether seasonal trends justify such calibration [16]. Using a single correction factor for an entire station may not adequately address systematic biases occurring at shorter temporal scales. Therefore, applying a month-specific coefficient enables the model to capture seasonal variability more effectively, improving the accuracy of ETo estimates and, consequently, crop irrigation requirements.
Alongside these empirical methods, data-driven techniques, particularly machine learning (ML) models, have become valuable alternatives for ETo estimation when input data are limited. Among these, artificial neural networks (ANN) [17], support vector machines (SVM) [18,19], gene expression programming (GEP) [20,21], and random forest (RF) [22,23], among others, have been recognised for their efficiency in ETo prediction using minimal input variables. Despite their strong predictive abilities, these methods do not inherently incorporate the physical principles underlying the FAO56 PM framework [11]. These models are trained with site-specific datasets, and their estimation accuracy typically decreases outside the training context, although it remains higher than that of temperature-based models. To simulate actual prediction conditions with limited or even absent climatic records, some studies assessed the generalisation ability of ML models using spatial k-fold validation, i.e., reserving the complete data set of a different independent station in each iteration for testing the models, e.g., [20]. Moreover, previously to model training, preliminary techniques such as clustering stations with similar climatic characteristics may help to enhance the transferability of models across regions [24].
Another experimental method to determine ET is the use of weighing lysimeters, which estimate water loss based on the components of the soil water balance within a controlled cultivation system [25]. In cases where the crop grown on the lysimeter surface meets the standard reference conditions, the recorded evapotranspiration can be regarded as ETo [26]. However, despite their potential to provide highly accurate short-term ETo measurements, both lysimeters and other advanced experimental tools, such as eddy covariance flux towers, present several limitations. These include high costs, complex installation and maintenance, the need for sufficient fetch, and intensive data processing requirements [27]. Moreover, the limited spatial extent of typical meteorological station plots may hinder these instruments from capturing representative surface conditions [28]. As a result, direct ETo measurements are rarely available and/or fully reliable. In this context, some studies have assessed the calibration and/or validation of empirical and/or ML models against lysimeter-derived ETo values instead of the FAO56 PM reference method, e.g., [16,29]. However, it is important to note that if the lysimeter system does not maintain the FAO standard reference crop conditions, this may lead to biased conclusions regarding model performance.
As evidenced in the literature, the calibration and validation of ETo methods, such as empirical equations and ML models, have been extensively studied. However, most studies focus on achieving high accuracy in daily ETo estimates, based on the premise that precise daily irrigation doses are necessary. Nonetheless, considering that the soil can act as a buffer for water availability, it may be more realistic to require accurate estimations only for accumulated ET values over extended periods, such as a week, a common timeframe used for scheduling and adjusting irrigation practices. Some studies have assessed the reliability of ETo estimates over extended periods. For instance, although the HS equation has shown reasonable accuracy daily, Hargreaves et al. [30] noted that its performance improves when applied to 5-day or longer intervals. This is because short-term estimates tend to exhibit greater variability due to factors such as shifting weather fronts and fluctuations in wind speed and cloud cover. However, these results referred to averaged ETo values, not to cumulated ones.
This paper has two main objectives. First, it evaluates the implications of using grass-reference lysimeter observations as benchmarks for assessing ETo models, compared with the recommended practice of using FAO56 PM as the target. Second, it analyses the practical implications of daily ETo model accuracy for irrigation scheduling by comparing the performance of different ETo modelling approaches when evaluated on both daily and accumulated ETo values at trial, weekly, fortnightly, monthly, and annual intervals. Daily and accumulated ETo estimates were calculated for all models and timescales, using FAO56 PM estimates and lysimeter measurements as target values, to assess how discrepancies in daily ETo propagate over time and may affect irrigation decision making at different intervals.

2. Materials and Methods

2.1. Data Set

This study used climate records collected from two lysimetric facilities in Spain: Las Tiesas, near Albacete in the Castilla-La Mancha region (39°03′ N, 2°05′ W; 695 m above sea level), and La Orden, near Badajoz in the Extremadura region of southwestern Spain (38°51′ N, 6°40′ W; 198 m above sea level) (Figure 1). For Las Tiesas, the data cover the period 2007 to 2015, excluding 2010, while for La Orden, it spans from January 2007 to December 2016, excluding 2013. Variables recorded include Tmax, mean air temperature (Tmean), and Tmin, u2, maximum (RHmax) and minimum (RHmin) relative humidity, Rs, and ETo. The study period was characterised by typical climatic conditions, without abrupt deviations or anomalies. Las Tiesas is located on the Eastern Mancha plateau and is subject to a semi-arid continental climate, characterised by pronounced seasonal variation, with mean temperatures ranging from 4.6 °C in January to 24.1 °C in July and an average annual precipitation of 314 mm. Conversely, La Orden experiences a Mediterranean climate with Atlantic influence, with milder winters and warmer summers, average temperatures of 9 °C in January and 26 °C in July, and an average annual rainfall of approximately 525 mm. According to the Köppen climate classification, Las Tiesas is cold semi-arid (BSk), whereas La Orden has a hot-summer Mediterranean climate (Csa). Thus, although both stations are located in Mediterranean-type semi-arid environments, they exhibit contrasting climatic regimes in terms of thermal and pluviometric conditions. Both stations meet the criteria for reference sites. Data quality control procedures were used to identify and remove outliers. Additional climatic information is available in [29].
Figure 1. Geographical location of the meteorological stations considered in the study.

2.1.1. ‘Las Tiesas’ Agricultural Research Facility

‘Las Tiesas’ research facility is representative of the 110,000 ha of irrigated farmland that characterises the Eastern Mancha area. Climate data were collected by an agrometeorological station installed within a 1.6 ha plot of tall fescue (Festuca arundinacea Schreb., cv. “Galatea”). Air temperature at 2 m was measured with a platinum resistance thermometer (sensor Pt 100 Ω, model MP100, Campbell Scientific, Logan, UT, USA), and relative humidity at 2 m with a Rotronic hygrometer (model C-80, Rotronic AG, Bassersdorf, Switzerland). Wind speed at 2 m was recorded with a cup anemometer (Switching Anemometer, model A100R, Vector Instruments Ltd., Rhyl, North Wales, UK). Net radiation (Rn) was obtained as the difference between net shortwave and net longwave radiation. Incoming and reflected shortwave radiation were measured with two pyranometers (model CM14, Kipp & Zonen B.V., Delft, The Netherlands), and longwave radiation with a pyrgeometer (model CG2, Kipp & Zonen B.V., Delft, The Netherlands). All sensors were connected to a CR10X datalogger (Campbell Scientific, Logan, UT, USA), which sampled at intervals shorter than 10 s. These high-frequency readings were subsequently averaged to derive hourly and daily climatic variables. Further details regarding the sensors and their specifications are available in [29].
A large precision weighing lysimeter was installed within the same grass reference plot, approximately 4 m from the agrometeorological station. The grass canopy was actively growing, uniformly maintained at 0.10–0.15 m, and ensured complete ground coverage. To preserve conditions close to the FAO reference standard [2,31], the plot was regularly irrigated using a fully underground 15 × 12.5 m sprinkler system that operated overnight to avoid peak evaporative demand. Frequent mowing and consistent irrigation ensured that the soil remained near field capacity and that the vegetation was kept in optimal condition. The lysimeter tank, with dimensions of 2.3 × 2.7 × 1.7 m (6.21 m2 surface area) and weighing approximately 14.5 t, was embedded within this reference surface and managed under the same agronomic conditions as the surrounding plot to ensure representativeness. The weighing system was based on a set of beams and counterweights designed to balance the static load of the soil-filled tank, reducing the force applied to the weighing beam at a ratio of 1000:1. This force was transmitted to a steel load cell connected to a datalogger. The system was regularly calibrated with known weights, and the combined resolution of the load cell and logger allowed for the detection of mass variations as small as 0.250 kg, equivalent to 0.04 mm of water. The system recorded weight data every 15 min, based on measurements taken every second. Further technical details on the lysimeter design and operation are provided in [26,29]. Variations in lysimeter mass were used to determine ETo. Irrigation periods were scheduled exclusively during nighttime hours and were excluded from the hourly ETo calculations. Daily verification of lysimeter data was performed to detect and correct any potential anomalies. Data gaps were associated with rainfall events, instrument calibration and maintenance procedures, or soil disturbance within the lysimeter tank. Any records affected by such incidents were discarded and excluded from the analysis. Furthermore, quality assurance and quality control (QA/QC) protocols for lysimetric ETo measurements were implemented following the guidelines proposed by [27,32].

2.1.2. ‘La Orden’ Agricultural Research Facility

‘La Orden’ agricultural research site is situated within the central area of the irrigated agrarian zone known as Vegas Bajas del Guadiana, which spans approximately 35,000 ha across 15 km of the lower Guadiana River basin. Data were collected from a 1.3 ha plot uniformly covered with grass (Festuca arundinacea Moench) and surrounded by other irrigated crops. Throughout the measurement period, the grass was regularly irrigated and mowed to preserve conditions similar to those of the standard reference crop. As a result, the soil was maintained close to field capacity, and the grass canopy was kept at a height between 0.10 and 0.15 m. Climate variables were monitored using an automated weather station positioned above the grass surface. Air temperature and relative humidity at 1.40 m were measured with a combined probe (sensor model HMP60, Campbell Scientific, Logan, UT, USA), and wind speed at 2 m with an anemometer (model 05103, Young, Traverse City, MI, USA). Rs at 1.70 m was measured directly with a CMP3 pyranometer (Kipp & Zonen, B.V., Delft, The Netherlands) [30]. High-frequency measurements were recorded by a CR10X datalogger (Campbell Scientific, Logan, UT, USA), and hourly mean values were obtained by averaging these records. Rn was estimated from measured Rs and meteorological variables using the FAO-56 radiation equations [2]. Further technical details regarding the sensors and instrumentation are provided in [29].
The weighing lysimeter, installed 10 m from the weather station, had tank dimensions of 2.67 × 2.25 × 1.5 m (surface area of 6 m2), as described by [33]. The tank was supported by a balance mechanism equipped with a counterweight system designed to compensate for its weight. This assembly was connected to a load cell with a nominal capacity of 10 kg and a sensitivity of 2 mV V−1. Regular calibration using known weights was performed to ensure measurement accuracy. The combined sensitivity of the load cell and datalogger enabled detection of mass variations as small as 0.20 kg, equivalent to 0.033 mm of water. The sampling interval was 0.05 s, and average weight values were recorded every 5 min. Hourly ETo rates were computed from successive weight changes in the lysimeter, and daily values were obtained by summing the hourly estimates. As with the ‘Las Tiesas’ facility, irrigation was performed exclusively at night, and any measurements corresponding to irrigation periods were excluded from the analysis. Only days without disturbances, such as irrigation, rainfall, mowing, or equipment malfunction, were included in the final dataset. QA/QC procedures for lysimetric ETo measurements were also implemented.

2.2. Models Assessed and Input Combinations

Two input combinations with relevant differences in their estimation accuracy of daily ETo values were assessed to evaluate eventual changes in their estimation accuracy for cumulated ETo values, considering different intervals. Therefore, two widely used methods for estimating ETo relying on a few input variables were evaluated: the HS equation [4] and the PMT version applied with missing data [2]. On the other hand, two feedforward neural networks with backpropagation were implemented, each using a different combination of climatic variables, namely: (i) Tmax, Tmin, and Ra (ANN3), i.e., relying on the same inputs as the HS and PMT equations; and (ii) Tmax, Tmin, RHmax, RHmin, u2, and Rs (ANN6), i.e., relying on the complete set of inputs required by the FAO56 PM reference method. Ra was calculated according to [2] based on the site’s latitude. All models were applied at a daily time step. Daily climatic variables were derived from the subhourly observations recorded at each station.

2.2.1. Hargreaves-Samani Equation

As described by Hargreaves et al. [4], ETo can be estimated using the following equation:
E T o H S = A H C R a T m e a n + 17.8 Δ T
where EToHS represents the estimated reference evapotranspiration (mm day−1); Ra is the extraterrestrial radiation (MJ m−2 day−1); Tmean is the mean daily air temperature (°C); ΔT is the daily temperature range (°C); and AHC is the adjusted Hargreaves coefficient. The latter is calculated as 0.0135 kRs/λ, where 0.0135 is a factor for conversion of units from the American to the International System; kRs is an empirical radiation adjustment coefficient (°C−0.5); λ is the latent heat of vaporisation (2.45 MJ kg−1). Equation (1) was developed based on the estimation of the Rs, rather than on its direct measurement, using the following approach:
R s = k R s R a Δ T
An AHC equal to 0.0023 was proposed as a general reference value [2,4]. Further details on the development of the HS equation can be found in [30].

2.2.2. Temperature-Based FAO56 PM Equation for Estimating Missing Data (PMT)

Allen et al. [2] developed a methodology that enables the application of the FAO56 PM equation even when relative humidity, psychrometric observations and solar radiation inputs are unavailable. This approach estimates the dew point temperature (Tdew) from either Tmin or Tmean and derives Rs based on ΔT. Vapour pressure deficit (VPD) is calculated by subtracting the actual vapour pressure (ea) from the saturation vapour pressure (es), which is typically estimated as the average of the saturation pressures at Tmax and Tmin. The value of ea can be estimated based on the mean relative humidity (RHmean) using the following expression:
e a = e s RH mean 100
In the absence of RHmean data, ea can be estimated by approximating Tdew to Tmin, as recommended in [2], using the following expression:
e a = e o T min = 0.611 exp 17.27 T min T min + 237.3

2.2.3. ANN Models

This study considers feedforward neural networks trained by backpropagation, a widely applied configuration, commonly known as multilayer perceptrons. Neurons were configured according to [34]. Architectures with a single hidden layer and a variable number of neurons (1–25) were considered. The hyperbolic tangent sigmoid activation function was applied in the hidden layer, and a linear transfer function was used in the output layer. The networks were implemented in MATLAB (2025 release; MathWorks, Natick, MA, USA) and trained using the Levenberg–Marquardt algorithm with early stopping based on a validation subset comprising 30% of the training data. Mean square error (MSE) was used as the performance function. The main training parameters were kept constant for all ANN configurations. They were set as follows: maximum number of epochs 1000, performance goal 0, maximum validation failures 10, minimum performance gradient 1 × 10−6, initial μ (mu) 0.001, μ decrease factor 0.1, μ increase factor 10, maximum μ 1 × 1010, and no time limit on training.
The application of ANN models requires defining a data partition, i.e., the training, cross-validation, and test sets. Rather than adopting a conventional hold-out strategy, which may lead to partial conclusions due to the limited representativeness of a single train-test configuration, e.g., [20], an annual k-fold validation approach was implemented for ANN3 and ANN6. This methodology involved subsequent data partitioning, reserving each time a different year for testing, while the remaining years were used for training. This resulted in an 8-fold validation for Las Tiesas (Albacete) and a 9-fold validation for La Orden (Badajoz), requiring 8 and 9 train–test cycles, respectively, to systematically evaluate model performance across all available yearly data patterns. Such a temporally structured validation approach ensures a robust model assessment. Although more complex methods, such as bootstrap or leave-one-out cross-validation, the annual-fold validation used here provides a computationally efficient and temporally consistent alternative [29]. To further ensure robustness, each network configuration was trained 50 times with different random initialisations of weights, thereby mitigating the influence of random factors and enabling a reliable assessment of model performance variability.

2.3. Calibration Strategies

2.3.1. HS and PMT Equations

The application of the HS equation should be limited to the climatic conditions for which it has been locally calibrated, as recommended in [35]. In this context, the performance of both the original and calibrated versions of the HS equation has been extensively evaluated across a variety of climatic settings. However, providing a comprehensive review of those studies lies beyond the scope of this work. In contrast, the calibration of the PMT equation is less frequently performed, particularly when certain input variables are estimated rather than measured [16]. Nonetheless, in this study, PMT outputs were also calibrated to enable a more balanced comparison with the calibrated HS estimates.
In this study, calibration was performed by applying a linear correction that adjusted only the slope coefficient. For each station, monthly calibration was conducted by comparing the original ETo estimates with the benchmark ETo, as described in the following section. Specifically, the daily ratio of the estimated to benchmark ETo values was calculated to determine the slope adjustment for each month. The monthly average of these ratios was then computed to obtain a single calibration coefficient for each month and station. These monthly coefficients were subsequently applied to the original ETo estimates, resulting in calibrated values that more accurately reflected the seasonal variability. These calibrated versions are hereafter referred to as HS_cal and PMT_cal, respectively. Unlike the more common approach in the literature, which applies a single calibration factor per station, the use of monthly coefficients in this work allows for a more accurate correction of seasonal biases in ETo estimation [16]. Notably, only the ETo equations themselves were calibrated in this study, while no calibration was applied to any of their input variables, such as Rs or RHmean. This choice was based on several considerations. First, when certain inputs are unavailable and must be estimated using auxiliary empirical equations, the lack of observed data makes it unfeasible to calibrate parameters such as kRs. Second, consistent with this, the term non-calibrated, as used for the HS and PM equations in this study, refers to cases in which neither the ETo equations nor the equations used to estimate missing inputs were adjusted. Third, calibrating input variables before calibrating the ETo equation itself could affect the resulting parameter values of the ETo models, potentially introducing inconsistencies. To ensure a fair comparison between calibrated and non-calibrated ETo estimates, including the ANN models, as well as to assess their impact on irrigation water requirements, all calibrations in this study were strictly based on ETo reference values, without any prior modification of input variables.
The entire dataset was used for both HS and PMT calibration and evaluation purposes. The development of calibration constants with generalisation capability lies beyond the scope of this study. So, it was not deemed necessary to divide the data into separate calibration and validation subsets for these equations. The use of independent test datasets is particularly relevant in the calibration of parametric models, where the resulting equations for the calibration constants should ideally be evaluated on data not involved in their estimation, or more complex approaches such as ANNs. Furthermore, previous studies (e.g., [36]) have noted that although local k-fold validation may provide a more robust evaluation of calibrated estimates, the performance differences are generally minimal when a sufficiently long time series is available. This effect becomes even less significant when the evaluation period is short (e.g., on a monthly basis).

2.3.2. Monthly ANN3 Models

In parallel with the calibration procedure applied to the HS and PMT equations, the ANN3 models were also assessed on a monthly basis, thus allowing a consistent comparison across approaches. For each station, the available dataset was split by month and year, and a specific ANN3 model was trained and tested independently for each monthly subset using a k-fold validation approach in which each year within a given month served once as the test set. This was translated into 96 ANN3 models (12 × 8) in Badajoz and into 108 ANN3 models (12 × 9) in Albacete. The training process followed the same structure described in previous sections, with a single hidden layer, a variable number of neurons (1–25), and 50 repetitions per architecture to select the best-performing model based on the lowest RRMSE. This monthly training approach enabled the identification of optimal ANN configurations for each month and year, thereby enhancing the models’ capacity to capture intra-annual ETo variability. The resulting predictions from the best ANN3 models per month and station (hereafter referred to as ANN3_month) were then used to evaluate their performance against reference values, following the same criteria as for the HS and PMT equations.

2.4. Accumulated ETo Estimation and Assessment

In order to evaluate the practical implications of ETo estimation bearing in mind usual irrigation scheduling intervals, the accumulated values of ETo were calculated for each model, both calibrated and non-calibrated, as well as for lysimeter-based measurements and the reference FAO56 PM equation. The accumulated ETo was computed over five different time intervals: year, month, fortnight, week, and trial. These aggregation levels were selected to represent the potential decision-making periods used in irrigation planning and scheduling, ranging from broad estimations of total crop water consumption (annual) [37], and seasonal evaluations (monthly) [38], to more frequent operational intervals (fortnightly, weekly) [39], and even site-specific intensive intervals (trial-based) [40].
By integrating ETo values over these periods, it was possible to examine the extent to which model calibration, input selection, and estimation method influence cumulative water demand estimations. This step is particularly relevant for quantifying the effect of model choice on irrigation scheduling accuracy, ensuring that comparisons between approaches are not limited to daily performance but extended to the time scales most relevant to agricultural water management. The accumulated ETo for a given time interval and model was calculated as the sum of the daily ETo values within that period. To ensure consistency and comparability across models, the same number of days with available data was used for all models within each time interval. This procedure was uniformly applied across all methods and time scales, enabling a fair assessment of cumulative estimates.

2.5. Benchmarking

The FAO56 PM equation was adopted as the reference method for model calibration and evaluation, given its widespread recommendation as the standard approach for estimating ETo and validating empirical methods in the absence of direct measurements [2]. This equation is an adaptation of the original Penman-Monteith formulation for a standardised reference crop, clipped grass, 0.12 m in height, under fixed values of surface resistance, aerodynamic resistance, and albedo, as well as constant air density and latent heat of vaporisation [41]. The daily ETo, expressed in mm day−1, is computed as follows:
E T o P M = 0.408   Δ R n G + γ 900 T mean + 273 u 2 e s e a Δ + γ 1 + 0.34   u 2
where Rn is the net radiation at the crop surface (MJ m−2 day−1); G is the soil heat flux density (MJ m−2 day−1); Tmean is the mean daily air temperature at 2 m height (°C); γ is the psychrometric constant (kPa °C−1); Δ is the slope of vapor pressure curve (kPa °C−1); es is the saturation vapor pressure (kPa); ea is the actual vapor pressure (kPa); and u2 is the wind speed at 2 m height (m s−1). For the daily time step considered in this study, soil heat flux was assumed to be negligible and therefore set to zero (G ≈ 0) in all FAO56 PM calculations, and all required parameters were computed using the equations proposed in FAO-56 guidelines [2].
To assess the effect of the benchmarks, all model results were compared with both lysimeter data and the reference FAO56 PM estimations. The FAO56 PM, widely recognised as the standard method for estimating ETo, served as the primary reference for calibration and validation, according to [2], while lysimeter-based ETo values were evaluated as an alternative baseline for model assessment and irrigation water requirement estimation. Thus, both FAO56 PM ETo values and lysimeter measurements were used as targets for model training and testing.
Finally, a seasonal breakdown of the results was conducted to identify potential patterns across different seasons, given that ETo values exhibit substantial fluctuations between winter and summer. This analysis aimed to determine whether model performance and calibration accuracy varied across seasons.

2.6. Performance Assessment

Several error metrics were computed to evaluate the predictive accuracy of the methods under study, following the approach described in [42]. Specifically, the mean absolute error (MAE), mean bias error (MBE), and relative root mean squared error (RRMSE) were calculated as expressed in Equations (6)–(8). In these expressions, xi denotes the observed ETo value and the predicted counterpart, while n represents the total number of observations within the ETo dataset. The RRMSE is a dimensionless indicator, whereas MAE and MBE are expressed in units of mm per unit of time interval (e.g., mm day−1).
MAE = 1 n i = 1 n x i x i ^
MBE = 1 n i = 1 n x i ^ x i
RRMSE = 1 x ¯ 1 n i = 1 n x i x i ^ 2
Additionally, the coefficient of determination (R2) was computed to evaluate the degree of correlation between observed and predicted ETo values, based on the ratio of their respective standard deviations.
R 2 = c o v x i , x i ^ σ x i σ x i ^ 2
The error metrics described were computed for both Albacete and Badajoz stations, considering daily and accumulated periods of ETo estimates derived from the HS and PMT equations and the ANN models, in both calibrated and non-calibrated forms. In all cases, the reference values used for comparison corresponded to the benchmarks defined in the previous section.

3. Results

3.1. Model Performance Assessment Against FAO56 PM Targets

Table 1 presents the global performance indicators of the considered models, divided by timescale, at Albacete and Badajoz, using FAO56 PM estimations as benchmarks. In Albacete, all models showed a gradual decrease in RRMSE as the accumulation interval increased, indicating, as expected, that model accuracy improved when ETo values were aggregated over longer timescales. For the HS model, RRMSE values ranged from 0.179 for daily ETo to 0.025 for annual estimates, with 0.140 for trial, 0.116 for weekly, 0.095 for fortnightly, and 0.076 for monthly. The reduction between daily and weekly scales was 6.3%, followed by 4.0% from weekly to monthly, and 5.1% from monthly to annual, resulting in a total absolute decrease of approximately 15.4% across the entire range. The corresponding MAE values ranged from 0.57 mm day−1 to 18.45 mm year−1, with intermediate values of 1.07 mm (trial), 1.77 mm (week), 2.96 mm (fortnight), and 4.81 mm (month). Similarly, the PMT model displayed a very comparable pattern, with RRMSE values of 0.183 (daily), 0.134 (trial), 0.118 (week), 0.099 (fortnight), 0.084 (month), and 0.032 (annual). The decrease was more gradual, amounting to 6.5% from daily to weekly intervals, 3.5% from weekly to monthly intervals, and 5.2% from monthly to annual intervals, for a total reduction of roughly 15.1%. MAE values ranged from 0.59 mm day−1 to 25.68 mm year−1, with intermediate figures of 1.03 mm (trial), 1.87 mm (week), 3.19 mm (fortnight), and 5.30 mm (month). Conversely, the ANN6 model mainly differed in the magnitude of errors, which were substantially smaller across all timescales. RRMSE decreased from 0.070 at the daily scale to 0.063 (trial), 0.057 (week), 0.052 (fortnight), 0.048 (month), and 0.028 (annual). The reduction was therefore 1.3% from daily to weekly, 0.9% from weekly to monthly, and 2.0% from monthly to annual, leading to a total absolute decrease of 4.2%. The corresponding MAE values ranged from 0.20 mm day−1 to 22.65 mm year−1, with intermediate values of 0.43 mm (trial), 0.80 mm (week), 1.48 mm (fortnight), and 2.69 mm (month). Overall, all models showed a consistent pattern of decreasing error as the accumulation interval increased, indicating that daily discrepancies tend to diminish when ETo is aggregated over multi-day or seasonal periods. Thus, even for the temperature-based models, the relative discrepancy with FAO56 PM becomes significantly smaller at weekly and longer timescales, while ANN6 displays more minor additional improvements with aggregation because its daily agreement with FAO56 PM is already high.
Table 1. Average performance indicators (MAE, RRMSE, MBE, R2) of calibrated and non-calibrated HS, PMT, and ANN models for estimating daily and accumulated reference evapotranspiration at different timescales in Albacete and Badajoz, using FAO56 PM targets.
When comparing the models, differences were substantial at the daily scale (RRMSE = 0.179 and 0.183 for HS and PMT, respectively, versus 0.070 for ANN6), indicating that ANN6 achieved higher accuracy on this scale. However, at weekly and fortnightly intervals, the gap between temperature-based and data-driven models narrowed considerably (RRMSE approximately 0.10–0.12 for HS/PMT versus 0.05–0.06 for ANN6). At monthly to yearly scales, the differences became minimal, with all models showing RRMSE ≤ 0.08 and converging to values near 0.03–0.04 at the most extended periods. The calibrated versions of HS and PMT (HS_cal and PMT_cal) displayed the same decreasing trend, with RRMSE values ranging from 0.174 to 0.023 and from 0.176 to 0.024, respectively, across the daily-to-yearly spectrum. ANN3 and ANN3_month produced intermediate results, with RRMSE values ranging from 0.164 to 0.023 and from 0.166 to 0.022, respectively, consistent with the general pattern seen in the other models. Slight differences between calibrated and non-calibrated configurations suggest that variations due to calibration or ANN structure were minor compared to the effect of the accumulation interval. Furthermore, the highest R2 values were observed for monthly accumulated ETo across all models considered. These patterns reveal that model choice has a greater impact when daily ETo is assessed, whereas differences between models diminish as ETo is aggregated over weekly or more extended periods. The limited effect of calibration also implies that, under FAO56 PM benchmarking, the primary factor driving performance improvements is the timescale of ETo aggregation rather than detailed adjustments of model parameters.
In Badajoz, similar results were observed. All models exhibited a steady decline in RRMSE with increasing accumulation intervals, following the same general trend observed in Albacete. For HS, RRMSE values were 0.155 for daily ETo, 0.132 for trial, 0.113 for weekly, 0.101 for fortnightly, 0.093 for monthly, and 0.070 for annual estimations. The decrease between daily and weekly intervals was 4.2%, 2.0% from weekly to monthly and 2.3% from monthly to annual, amounting to an overall reduction of 8.5%. Corresponding MAE values ranged from 0.61 mm day−1 to 65.25 mm year−1. PMT followed a very similar pattern, with RRMSE values of 0.168 (daily), 0.133 (trial), 0.126 (weekly), 0.112 (fortnight), 0.103 (month), and 0.066 (annual), with an overall absolute reduction of 10.2%. MAE ranged from 0.67 mm day−1 to 63.64 mm year−1. Conversely, ANN6 maintained significantly lower errors, with RRMSE of 0.057 (daily), 0.052 (trial), 0.047 (weekly), 0.044 (fortnight), 0.042 (month), and 0.029 (annual). The reductions were 1.0% from daily to weekly, 0.5% from weekly to monthly, and 1.8% from monthly to annual, totalling 2.8% across the day-to-year spectrum. MAE values ranged from 0.19 mm day−1 to 25.53 mm year−1, with intermediate values of 0.43 mm (trial) and 0.86 mm (week).
Across models, differences remained evident at the daily scale (RRMSE = 0.155 and 0.168 for HS and PMT, respectively, versus 0.057 for ANN6), but, as in Albacete, these differences significantly diminished with increasing accumulation intervals. At weekly and fortnightly scales, RRMSE values were 0.11–0.13 for HS and PMT, compared to 0.05–0.06 for ANN6, while over monthly to annual intervals, all models converged to values below 0.08, with differences less than 0.03. HS_cal and PMT_cal showed nearly identical patterns, with RRMSE values ranging from 0.149 to 0.057 and from 0.159 to 0.059, respectively, across daily to annual scales. Similarly, ANN3 and ANN3_month produced intermediate results (0.126–0.050 and 0.124–0.046), displaying performance trends similar to those observed in Albacete. The fact that these patterns closely mirror those observed in Albacete suggests that the decrease in error with increasing accumulation interval is consistent across the two semi-arid Mediterranean locations analysed.

3.2. Performance of Models Against Lysimeter Targets

Table 2 presents the global performance indicators of the considered models, split by timescale, for Albacete and Badajoz, using lysimeter measurements as the benchmark. In Albacete, all models exhibited higher RRMSE values when evaluated against lysimeter data than against FAO56 PM, yet the gradual improvement with more extended accumulation periods remained evident. For HS, RRMSE values dropped from 0.221 at the daily scale to 0.179 (trial), 0.155 (weekly), 0.132 (fortnightly), 0.115 (monthly), and 0.054 (annual). The reduction between daily and weekly periods was 6.6%, followed by 4.0% from weekly to monthly and 6.1% from monthly to annual, giving an overall decrease of 16.6%. MAE values ranged from 0.71 mm day−1 to 44.78 mm year−1. The PMT model followed a comparable progression, with RRMSE values of 0.224 (daily), 0.167 (trial), 0.154 (weekly), 0.133 (fortnight), 0.118 (month), and 0.050 (annual). The total reduction reached 17.4%, and MAE varied between 0.73 mm day−1 and 40.69 mm year−1. Conversely, ANN6 produced considerably lower errors at all timescales, with RRMSE decreasing from 0.144 (daily) to 0.115 (trial), 0.100 (week), 0.084 (fortnight), 0.076 (month), and 0.038 (annual). The corresponding decline was 10.6%, and MAE ranged from 0.46 mm day−1 to 30.68 mm year−1. Thus, although all models exhibited larger errors against lysimeter ETo than against FAO56 PM, the progressive reduction in RRMSE with increasing accumulation interval remained evident, indicating that daily discrepancies were also attenuated when ETo was aggregated over multi-day or seasonal periods under lysimeter benchmarking.
Table 2. Average performance indicators (MAE, RRMSE, MBE, R2) of calibrated and non-calibrated HS, PMT, and ANN models for estimating daily and accumulated reference evapotranspiration at different timescales in Albacete and Badajoz, using lysimeter measurements as targets.
Regarding differences among models, the most considerable contrasts occurred at the daily scale (RRMSE of 0.221 and 0.224 for HS and PMT vs. 0.144 for ANN6), but decreased notably as the period lengthened. At weekly and fortnightly resolutions, RRMSE values ranged from 0.13 to 0.15 for HS and PMT and approximately 0.10 for ANN6, while at monthly and annual levels, all models presented values below 0.08. HS_cal and PMT_cal exhibited a very similar evolution (RRMSE of 0.213–0.045 and 0.215–0.047, respectively), whereas ANN3 and ANN3_month showed intermediate results (0.204–0.045 and 0.207–0.046).
In the case of Badajoz, the same overall pattern was observed. All models yielded higher RRMSE values under lysimeter benchmarking, although the decrease with increasing aggregation remained consistent. HS showed RRMSE values of 0.255 (daily), 0.207 (trial), 0.174 (weekly), 0.154 (fortnight), 0.131 (month), and 0.070 (annual). The reduction from daily to weekly was 8.1%, followed by 4.3% from weekly to monthly and 6.1% from monthly to annual, resulting in an overall decrease of 18.5%. MAE values varied between 0.86 mm day−1 and 58.43 mm year−1. PMT behaved in parallel, with RRMSE values of 0.266 (daily), 0.190 (trial), 0.189 (weekly), 0.168 (fortnight), 0.144 (month), and 0.078 (annual), corresponding to a total reduction of 18.8%. MAE ranged from 0.92 mm day−1 to 66.10 mm year−1. In contrast, ANN6 again showed more minor errors, with RRMSE values of 0.258 (daily), 0.194 (trial), 0.164 (weekly), 0.151 (fortnight), 0.130 (month), and 0.086 (annual). The total decrease was 17.2%, and MAE ranged from 0.90 mm day−1 to 74.74 mm year−1.
Between models, the most pronounced differences were again found at the daily and weekly levels (RRMSE ≈ 0.22–0.27 for HS/PMT vs. 0.26 for ANN6), whereas from fortnightly to annual periods, all models performed more similarly, with RRMSE ≤ 0.15 and converging towards 0.07–0.09 at the most extended intervals. HS_cal and PMT_cal maintained the same decreasing tendency (RRMSE = 0.243–0.048 and 0.248–0.050, respectively), while ANN3 and ANN3_month remained intermediate (0.240–0.047 and 0.237–0.046). Overall, slight differences were observed between calibrated and non-calibrated versions, and RRMSE variability among models diminished progressively with the length of the accumulation period. Compared with the FAO56 PM benchmark (Table 1), these daily and weekly RRMSE values show that ANN6 no longer exhibits a clear advantage over HS and PMT when the lysimeter is used as the reference, particularly in Badajoz. In addition, R2 values were slightly higher in Albacete (up to 0.982) than in Badajoz (up to 0.940), confirming a more stable agreement between estimated and benchmark ETo in the former site and suggesting that lysimeter-based targets were more affected by local variability at Badajoz.

3.3. Seasonal Variation of Model Performance Under FAO56 PM and Lysimeter Benchmarks

Figure 2 and Figure 3 show the weekly evolution of RRMSE for daily ETo and weekly accumulated ETo (ETo-AcW) in Albacete and Badajoz, respectively, using FAO56 PM estimations as the benchmark.
Figure 2. Weekly variation of RRMSE values for daily ETo and weekly accumulated ETo (ETo-AcW) estimated by HS, PMT, ANN3_month, and ANN6 models in Las Tiesas (Albacete), using FAO56 PM targets.
Figure 3. Weekly variation of RRMSE values for daily ETo and weekly accumulated ETo (ETo-AcW) estimated by HS, PMT, ANN3_month, and ANN6 models in La Orden (Badajoz), using FAO56 PM targets.
As observed, the model accuracy varied throughout the year, with lower performance during the cooler weeks (November–January) and higher accuracy during the warmer period (June–August). In Albacete, daily ETo RRMSE values for ANN6 ranged from 0.035 (week 26) to 0.180 (week 52), while in Badajoz, they ranged from 0.024 (week 30) to 0.174 (week 5). For ETo-AcW, the corresponding ranges were 0.025 (week 26)–0.159 (week 52) in Albacete and 0.021 (week 30)–0.146 (week 49) in Badajoz. Among the temperature-based models, RRMSE values for ETo-AcW ranged from 0.040 (week 34) to 0.483 (week 48) for HS, from 0.041 (week 25) to 0.635 (week 50) for PMT, and from 0.043 (week 25) to 0.594 (week 51) for ANN3_month in Albacete. In Badajoz, the ranges were 0.066–0.402 for HS (weeks 26 and 49, respectively), 0.071 (week 35)–0.613 (week 48) for PMT, and 0.031 (week 33)–0.594 (week 49) for ANN3_month. The most minor difference between daily and weekly RRMSE occurred around mid-summer (weeks 25–30), while the largest occurred in late autumn (weeks 48–52). Overall, the seasonal pattern was more pronounced for daily ETo than for ETo-AcW, indicating that model performance is strongly modulated by the annual cycle of atmospheric demand, with more stable behaviour during the peak evaporative season and enhanced sensitivity to synoptic variability during the colder weeks.
Figure 4 and Figure 5 show the same analysis using lysimeter measurements as benchmarks. In Albacete, daily ETo RRMSE values presented a minimum of approximately 0.15 for both HS (week 24) and PMT (week 22) models, either calibrated or not. Maximum errors were 0.63 (week 4) for HS, 0.83 (week 50) for PMT, 0.59 (week 49) for ANN3_month, and 0.60 (week 4) for ANN6. The corresponding RRMSE ranges were 0.13–0.59 (weeks 28–49) for ANN3_month and 0.09–0.60 (weeks 36–4) for ANN6. The corresponding RRMSE ranges for ETo-AcW were 0.04 (week 25)–0.99 (week 50) for HS_cal, 0.04 (week 6)–0.64 (week 4) for PMT, 0.06 (week 24)–0.48 (week 7) for ANN3_month, and 0.04 (week 32)–0.54 (week 4) for ANN6. In Badajoz, daily ETo errors ranged from 0.16 (week 32) to 0.77 (week 49) for HS, 0.16 (week 32) to 0.73 (week 47) for PMT, 0.15 (week 32) to 0.68 (week 48) for ANN3_month, and 0.14 (week 32) to 0.58 (week 48) for ANN6. For ETo-AcW, they varied between 0.06 (week 29)–0.55 (week 12) for HS, 0.06 (week 29)–0.97 (week 49) for PMT, 0.07 (week 18)–0.43 (week 6) for ANN3_month, and 0.09 (week 28)–0.53 (week 6) for ANN6. As in the FAO56 PM-based analysis, weekly accumulated estimates (ETo-AcW) were consistently more accurate than daily estimates, except for certain winter weeks, when missing or sparse data led to occasional deviations. In some winter weeks, RRMSE peaks exceeded 1.00 for ETo-AcW, reaching 1.09 in Albacete (HS_cal) and 1.34 in Badajoz (ANN3_month). Occasional peaks in daily ETo were also observed, with maximum values of 2.94 for ANN3_month in Badajoz. These peaks occurred during periods of very low reference ETo and reduced data availability, when small absolute discrepancies translate into significant relative errors. Overall, seasonal RRMSE patterns were consistent across sites and benchmarks, with higher variability and larger errors observed under the lysimeter reference than under FAO56 PM, particularly during the coldest part of the year, while differences among models diminished during summer.
Figure 4. Seasonal variation of RRMSE values for daily ETo and weekly accumulated ETo (ETo-AcW) estimated by HS, PMT, ANN3_month, and ANN6 models in Las Tiesas, considering lysimeter measurement as targets.
Figure 5. Seasonal variation of RRMSE values for daily ETo and weekly accumulated ETo (ETo-AcW) estimated by HS, PMT, ANN3_month, and ANN6 models in Las La Orden, considering lysimeter measurement as targets.
In summary, across both sites and benchmarks, all models exhibited a consistent reduction in RRMSE and MAE as the accumulation interval increased from daily to trial, weekly, fortnightly, monthly, and annual scales (Table 1 and Table 2). Under FAO56 PM benchmarking, daily RRMSE ranged approximately between 0.06 and 0.18, whereas at monthly and annual scales, it was below 0.08 for all models at both sites, with ANN6 showing the lowest errors throughout. When lysimeter measurements were used as benchmarks, daily RRMSE values increased to approximately 0.14–0.27, while weekly values remained below about 0.18, and monthly and annual RRMSE rarely exceeded 0.15. The seasonal analysis (Figure 2, Figure 3, Figure 4 and Figure 5) further showed that errors were most significant and most variable during the coldest weeks of the year and smallest during summer, and that ETo-AcW generally presented lower RRMSE than daily ETo. Differences between temperature-based and ANN models were pronounced at the daily scale. Still, they became progressively smaller at weekly and longer intervals, so that at monthly and annual scales, the performance of all models converged within relatively narrow RRMSE ranges under both benchmarks.

4. Discussion

Our results show that model accuracy and dispersion highly depend on the chosen benchmark. Errors were systematically higher and more variable when lysimeter data were used, particularly during the coldest weeks of the year, whereas FAO56 PM produced lower and more stable values. This divergence suggests that the use of lysimeter measurements as a benchmark for assessing ETo may reflect the influence of local environmental and operational factors. During winter, lysimeters can be affected by mechanical and microclimatic disturbances, such as frost, condensation, or the oasis and clothesline effects, which may alter the energy balance and induce anomalous fluxes. Their accuracy may also be influenced by restricted fetch and drainage dynamics, which are negligible under summer conditions but become significant in cold, calm periods [43,44]. Consequently, apparent model errors tend to increase when the lysimeter is used as the target, not necessarily because models perform worse, but because the benchmark itself becomes less representative of the standardised reference surface described in FAO56. These findings agree with [45]. From a benchmarking perspective, our results indicate that, under the semi-arid Mediterranean conditions analysed, FAO56 PM behaves as a physically consistent and operationally robust reference. In contrast, lysimeter-based ETo should be interpreted as an experimental benchmark whose representativeness is more sensitive to local conditions. Nevertheless, lysimeters might remain a valuable tool for directly measuring evapotranspiration under real conditions, especially during well-controlled periods or for model calibration purposes. During the winter season, when water-demanding crops such as barley, wheat, rapeseed, or leafy vegetables are commonly cultivated, special care should be taken when interpreting lysimeter data for irrigation planning.
When lysimeter data were used, the period of increased model error extended roughly from weeks 1–12 and 40–52, in line with previous findings [16]. This broader high-error window indicates that non-calibrated models appeared to perform worse over a larger portion of the year when lysimeter data were used as the benchmark. However, this apparent degradation mainly reflects the greater sensitivity of lysimeter measurements under cold conditions, where limited reference availability and minor measurement errors disproportionately increase the overall error, particularly in Badajoz, where such conditions are more frequent. Thus, discrepancies between benchmarks are largely seasonal. FAO56 PM remains physically consistent throughout the year, while lysimeter measurements introduce additional variability in winter due to both environmental and operational factors. As a result, lysimeter observations can accentuate models’ tendency to under- or overestimate irrigation requirements, as reflected in higher MBE values. By contrast, FAO56 PM remains the widely recommended reference for ETo estimation, since it is based on well-established physical principles and can be applied reliably whenever the required climatic data are available, unlike the constraints associated with lysimeter measurements, as also noted in [46]. Therefore, in the context of semi-arid Mediterranean climates similar to those of Albacete and Badajoz, FAO56 PM appears to be a more suitable primary benchmark for model evaluation and ETo estimation. At the same time, lysimeter observations provide complementary experimental evidence, particularly useful for detecting departures between physically based estimates and actual evapotranspiration under specific site conditions.
Regarding the effect of the irrigation interval on ETo accuracy, model errors decreased consistently with more extended accumulation periods across all benchmarks and stations, confirming that temporal aggregation smooths daily discrepancies. Extending the estimation period from daily to multi-day or weekly irrigation intervals resulted in substantial reductions in RRMSE across all models, particularly for HS and PMT, which exhibited the most significant daily errors. This behaviour reflects a compensation effect among day-to-day deviations, which tend to cancel each other out when aggregated, provided that positive and negative errors are approximately balanced over time. Therefore, differences among models progressively diminished at longer timescales, and by monthly or annual intervals, their performance became nearly equivalent. Similar findings on the influence of timescales were reported by [16], although their indicators were derived from daily estimates over a given period rather than from accumulated ETo values. It is important to note that this compensation mainly affects the non-systematic component of the error. Systematic biases are only partially reduced and may still be reflected in MAE and MBE at longer timescales. Nevertheless, the systematic decrease in RRMSE and the relatively small MBE values observed at trial, weekly and fortnightly scales suggest that, under the semi-arid Mediterranean conditions considered here, a substantial fraction of daily errors tends to compensate in cumulative ETo. This mechanism is likely to operate in other environments where daily deviations fluctuate around zero, and irrigation is scheduled over multi-day intervals. However, its magnitude should always be verified with local data. Conversely, in sites where simpler methods exhibit a persistent bias of the same sign, the scope for error compensation would be much more limited, and a marked reduction in error differences with aggregation should not be expected.
The practical implication is that trial, weekly, or fortnightly accumulated ETo estimations provide a more realistic assessment framework for irrigation scheduling than daily values. Farmers usually plan irrigation in multi-day intervals, and the soil water buffer further mitigates daily fluctuations in evapotranspiration [11]. Therefore, evaluating model performance exclusively at the daily scale can exaggerate differences that are not operationally significant, especially when irrigation decisions are based on cumulative water requirements. In this context, temperature-based models applied in this study might provide sufficiently accurate results when data availability is limited, while the more complex ANN formulations, such as ANN6, confirm that, under complete datasets, physically based estimates can be closely reproduced without substantially improving at longer intervals. In this sense, ANN6 was not proposed as an operational model but instead used as a complex reference to contextualise the performance of temperature-based approaches. Since it relies on the same meteorological inputs as FAO56 PM, its practical relevance is limited, as the reference equation can be directly computed when full datasets are available. Overall, these findings confirm that benchmark selection and temporal aggregation are crucial sources of variability in ETo model performance. At the same time, differences among modelling approaches are significantly reduced at operational timescales relevant to irrigation management, so that the principal added value of this study lies in quantifying how benchmark choice and aggregation interval affect the apparent adequacy of well-established empirical and ANN-based methods under semi-arid Mediterranean conditions.
The ability of temperature-based methods and reduced-input ANN configurations to provide reasonable accuracy in the study area can be partly explained by the climatic setting. In semi-arid Mediterranean climates, key drivers of ETo, such as net radiation, vapour pressure deficit, and aerodynamic demand, exhibit strong seasonal and synoptic co-variability with air temperature, so that temperature and solar radiation alone can account for a significant fraction of ETo variability [16]. As a result, Tmax and Tmin carry substantial information about the broader energy and moisture regime, which helps to explain the good performance of HS, PMT and the reduced-input ANN3 in our experiments, in line with previous work showing that reduced-input data-driven and empirical models can reach accuracies comparable to full-input formulations under similar Mediterranean conditions [29]. Furthermore, the inputs used in ANN3 (Tmax, Tmin and Ra) are not independent: Ra and the seasonal cycle of air temperature are tightly coupled in these environments, indirectly capturing much of the variability in Rs and Rn. While this reliance on correlation structures enhances model performance in the present semi-arid Mediterranean conditions, it also implies that the empirical relationships learned by these models may not be directly transferable to markedly different climatic regimes without retraining and additional validation.
Several limitations should be considered when interpreting these results. First, the analysis is based on long-term records from only two grass-reference lysimetric stations located in semi-arid Mediterranean environments: a cold semi-arid climate in Albacete and a hot-summer Mediterranean climate with Atlantic influence in Badajoz. Consequently, the quantitative indicators reported here are conditioned by these specific climatic and management conditions and by the quality and length of the available datasets. Second, the study does not include independent validation using reanalysis products or additional observational networks, so the robustness of the conclusions at regional scales cannot be assessed. These factors imply that the findings should be viewed as site-specific and primarily applicable to semi-arid Mediterranean environments with similar data availability. Nevertheless, the qualitative patterns identified are supported by physical reasoning and are expected to have a more general scope. In particular, the greater robustness of FAO56 PM as a benchmark compared with lysimeter targets in winter, the marked reduction in apparent model differences with increasing aggregation intervals, and the partial compensation of daily errors in cumulative ETo reflect mechanisms that can, in principle, operate at other sites. Their expression will depend on the local bias pattern of the simpler methods: where daily deviations fluctuate around zero, a substantial reduction in error differences with aggregation is likely, whereas in locations with a persistent over- or underestimation pattern, the scope for error compensation will be more limited. Thus, while the magnitude of the error reduction remains site-specific and should be verified with local data, the conceptual framework and the role of daily bias patterns in shaping cumulative ETo errors can be qualitatively extended to other networks, including those based solely on FAO P-M reference estimates.

5. Conclusions

This study evaluated the performance of temperature-based (HS and PMT) and neural network (ANN3 and ANN6) models for estimating daily ETo and its accumulated values over different time intervals at two contrasting Mediterranean sites in southeastern and southwestern Spain. The analysis was conducted using both the FAO56 PM equation and lysimeter measurements as benchmarks, and across different temporal aggregation intervals relevant to irrigation and water planning management, ranging from trial to weekly, fortnightly, monthly, and annual periods, under the semi-arid Mediterranean conditions represented by these sites. As a result, increasing the estimation interval from daily to weekly or fortnightly periods markedly reduced model errors and narrowed differences among approaches, indicating that temporal aggregation compensates for daily fluctuations and yields a more realistic assessment for irrigation scheduling. Since irrigation is usually not scheduled daily but rather at intervals that may account for soil water buffering and operational constraints, evaluating models at these timescales provides a more realistic measure of their accuracy. This reinforces the need to assess models beyond daily estimates, as otherwise they may lead to misleading conclusions for irrigation planning and on-farm decision-making. Most of the practically relevant error reduction was already achieved at weekly and fortnightly scales, while systematic components of the error remained visible in MAE and MBE at longer timescales.
The results also demonstrated that benchmark selection and temporal aggregation are crucial factors that govern model performance variability. Errors were systematically higher and more dispersed when lysimeter data were used, particularly during winter, likely reflecting the influence of environmental and operational factors under cold and calm conditions. By contrast, FAO56 PM provided more stable and consistent reference values throughout the year, confirming its suitability as the standard method for evaluating ETo models under semi-arid Mediterranean conditions. In this context, FAO56 PM emerges as a more suitable primary benchmark for model evaluation. In contrast, lysimeter-based ETo should be regarded as an experimental reference that is extremely valuable for validating and refining physically based formulations, but whose representativeness is more sensitive to local site characteristics and to the strict fulfilment of FAO reference crop requirements.
Finally, under complete climatic datasets, ANN6 achieved the highest accuracy when compared with FAO56 PM. Still, its relevance is primarily methodological, as it relies on the same inputs as the reference equation and is therefore not intended as an operational alternative. Temperature-based models, such as HS, PMT, exhibited stable behaviour across sites and benchmarks, and their accuracy at multi-day scales was comparable to that of more complex neural network formulations. These findings indicate that simple models remain robust and operationally useful tools in data-scarce or heterogeneous conditions, particularly when irrigation is scheduled over multi-day intervals, and a substantial part of daily errors tends to compensate in cumulative ETo. At the same time, FAO56 PM might be the most reliable alternative for reference evapotranspiration assessment when sufficient input data are available for its application. Lysimeter measurements, although subject to environmental variability, remain an essential component of experimental validation and contribute to refining physically based estimation approaches. The empirical relationships highlighted in this study are therefore best understood as site-specific and mainly relevant to semi-arid Mediterranean environments with similar climatic and management conditions. They are conditioned by the length and quality of the datasets available at the two experimental sites. The systematic decrease in model error observed at operational timescales indicates that, under these conditions, partial compensation of daily biases in cumulative ETo is likely to occur, provided that daily deviations fluctuate approximately around zero. This behaviour should nevertheless be confirmed with local data before extrapolating to markedly different climatic regimes or to locations where simpler methods exhibit a persistent over- or underestimation pattern.

Author Contributions

Conceptualization, A.R., P.G.-A., F.T., M.L. and P.M.; methodology, A.R., P.G.-A., F.T., M.L. and P.M.; validation, A.R., P.G.-A., F.T., M.L. and P.M.; formal analysis, A.R.; data curation, L.A.M.; writing—original draft preparation, A.R., P.G.-A., F.T., M.L. and P.M.; writing—review and editing, A.R., P.G.-A., F.T., M.L. and P.M.; visualization, A.R. and P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data from La Orden (Badajoz) will be made available on request.

Acknowledgments

The authors are grateful to Ramón López-Urrea for providing the dataset of the Las Tiesas (Albacete) station.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AHCAdjusted Hargreaves coefficient
ANNArtificial neural networks
ANN3Artificial neural network with three inputs
ANN3_monthArtificial neural network with three inputs with a monthly training approach
ANN6Artificial neural network with six inputs
eaActual vapour pressure
esSaturation vapour pressure
ETEvapotranspiration
ETcCrop evapotranspiration
EToReference evapotranspiration
ETo-AcWWeekly accumulated reference evapotranspiration
FAO56 PMFAO56 Penman–Monteith
GSoil heat flux density
GEPGene expression programming
HSHargreaves-Samani
HS_calHargreaves-Samani monthly calibrated
KcCrop coefficient
kRsempirical radiation adjustment coefficient
MAEMean absolute error
MBEMean bias error
MLMachine learning
MSEMean square error
PMTTemperature-based version of Penman–Monteith
PMT_calTemperature-based version of Penman–Monteith monthly calibrated
R2Coefficient of determination
RaExtraterrestrial radiation
RFRandom forest
RHmaxMaximum relative humidity
RHminMinimum relative humidity
RnNet radiation
RRMSERelative root mean squared error
RsSolar radiation
SVMSupport vector machines
TdewDew point temperature
TmaxMaximum air temperature
TmeanMean air temperature
TminMinimum air temperature
u2Wind speed at two meters high
VPDVapour pressure deficit
γPsychrometric constant
ΔSlope of vapor pressure curve
ΔTDaily temperature range
λLatent heat of vaporisation

References

  1. Doorenbos, J.; Pruitt, W.O. Guidelines for Predicting Crop Water Requirements; Food and Agriculture Organization: Rome, Italy, 1977; p. 144. [Google Scholar]
  2. Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapotranspiration—Guidelines for Computing Crop Water Requirements. In FAO Irrigation and Drainage Paper 56; Food and Agriculture Organization: Rome, Italy, 1998. [Google Scholar]
  3. Paredes, P.; Pereira, L.S.; Almorox, J.; Darouich, H. Reference Grass Evapotranspiration with Reduced Data Sets: Parameterization of the FAO Penman-Monteith Temperature Approach and the Hargeaves-Samani Equation Using Local Climatic Variables. Agric. Water Manag. 2020, 240, 106210. [Google Scholar] [CrossRef]
  4. Hargreaves, G.H.; Samani, Z.A. Reference Crop Evapotranspiration from Temperature. Appl. Eng. Agric. 1985, 1, 96–99. [Google Scholar] [CrossRef]
  5. Samani, Z. Estimating Solar Radiation and Evapotranspiration Using Minimum Climatological Data. J. Irrig. Drain. Eng. 2000, 126, 265–267. [Google Scholar] [CrossRef]
  6. Geerts, B. Empirical Estimation of the Monthly-Mean Daily Temperature Range. Theor. Appl. Clim. 2003, 74, 145–165. [Google Scholar] [CrossRef]
  7. Betts, A.K.; Desjardins, R.; Worth, D.; Beckage, B. Climate Coupling between Temperature, Humidity, Precipitation, and Cloud Cover over the Canadian Prairies. J. Geophys. Res. Atmos. 2014, 119, 13305–13326. [Google Scholar] [CrossRef]
  8. Frazer, M.E.; Ming, Y. Understanding Controlling Factors of Extratropical Humidity and Clouds with an Idealized General Circulation Model. J. Clim. 2022, 35, 5321–5337. [Google Scholar] [CrossRef]
  9. Jones, P.D.; Moberg, A. Hemispheric and Large-Scale Surface Air Temperature Variations: An Extensive Revision and an Update to 2001. J. Clim. 2003, 16, 206–223. [Google Scholar] [CrossRef]
  10. Raziei, T.; Pereira, L.S. Estimation of ETo with Hargreaves-Samani and FAO-PM Temperature Methods for a Wide Range of Climates in Iran. Agric. Water Manag. 2013, 121, 1–18. [Google Scholar] [CrossRef]
  11. Pereira, L.S.; Allen, R.G.; Smith, M.; Raes, D. Crop Evapotranspiration Estimation with FAO56: Past and Future. Agric. Water Manag. 2015, 147, 4–20. [Google Scholar] [CrossRef]
  12. Tomas-Burguera, M.; Vicente-Serrano, S.M.; Grimalt, M.; Beguería, S. Accuracy of Reference Evapotranspiration (ETo) Estimates under Data Scarcity Scenarios in the Iberian Peninsula. Agric. Water Manag. 2017, 182, 103–116. [Google Scholar] [CrossRef]
  13. Rodrigues, G.C.; Braga, R.P. Estimation of Reference Evapotranspiration during the Irrigation Season Using Nine Temperature-Based Methods in a Hot-Summer Mediterranean Climate. Agriculture 2021, 11, 124. [Google Scholar] [CrossRef]
  14. Pandey, P.K.; Pandey, V. Parametric Calibration of Hargreaves-Samani (HS) Reference Evapotranspiration Equation with Different Coefficient Combinations under the Humid Environment. HydroResearch 2023, 6, 147–155. [Google Scholar] [CrossRef]
  15. Berti, A.; Tardivo, G.; Chiaudani, A.; Rech, F.; Borin, M. Assessing Reference Evapotranspiration by the Hargreaves Method in North-Eastern Italy. Agric. Water Manag. 2014, 140, 20–25. [Google Scholar] [CrossRef]
  16. Martí, P.; López-Urrea, R.; Mancha, L.A.; González-Altozano, P.; Román, A. Seasonal Assessment of the Grass Reference Evapotranspiration Estimation from Limited Inputs Using Different Calibrating Time Windows and Lysimeter Benchmarks. Agric. Water Manag. 2024, 300, 108903. [Google Scholar] [CrossRef]
  17. Kumar, M.; Raghuwanshi, N.S.; Singh, R. Artificial Neural Networks Approach in Evapotranspiration Modeling: A Review. Irrig. Sci. 2011, 29, 11–25. [Google Scholar] [CrossRef]
  18. Kushwaha, N.L.; Rajput, J.; Sena, D.R.; Elbeltagi, A.; Singh, D.K.; Mani, I. Evaluation of Data-Driven Hybrid Machine Learning Algorithms for Modelling Daily Reference Evapotranspiration. Atmosphere-Ocean 2022, 60, 519–540. [Google Scholar] [CrossRef]
  19. Shiri, J.; Nazemi, A.H.; Sadraddini, A.A.; Landeras, G.; Kisi, O.; Fakheri Fard, A.; Marti, P. Comparison of Heuristic and Empirical Approaches for Estimating Reference Evapotranspiration from Limited Inputs in Iran. Comput. Electron. Agric. 2014, 108, 230–241. [Google Scholar] [CrossRef]
  20. Shiri, J.; Marti, P.; Singh, V.P. Evaluation of Gene Expression Programming Approaches for Estimating Daily Evaporation through Spatial and Temporal Data Scanning. Hydrol. Process. 2014, 28, 1215–1225. [Google Scholar] [CrossRef]
  21. Shiri, J.; Sadraddini, A.A.; Nazemi, A.H.; Kisi, O.; Landeras, G.; Fakheri Fard, A.; Marti, P. Generalizability of Gene Expression Programming-Based Approaches for Estimating Daily Reference Evapotranspiration in Coastal Stations of Iran. J. Hydrol. 2014, 508, 1–11. [Google Scholar] [CrossRef]
  22. Abed, M.; Imteaz, M.A.; Ahmed, A.N.; Huang, Y.F. Modelling Monthly Pan Evaporation Utilising Random Forest and Deep Learning Algorithms. Sci. Rep. 2022, 12, 13132. [Google Scholar] [CrossRef]
  23. Karimi, S.; Shiri, J.; Marti, P. Supplanting Missing Climatic Inputs in Classical and Random Forest Models for Estimating Reference Evapotranspiration in Humid Coastal Areas of Iran. Comput. Electron. Agric. 2020, 176, 105633. [Google Scholar] [CrossRef]
  24. Ferreira, L.B.; Da Cunha, F.F.; Da Silva, G.H.; Campos, F.B.; Dias, S.H.B.; Santos, J.E.O. Generalizability of Machine Learning Models and Empirical Equations for the Estimation of Reference Evapotranspiration from Temperature in a Semiarid Region. Acad. Bras. Cienc. 2021, 93, e20200304. [Google Scholar] [CrossRef]
  25. Gavilán, P.; Lorite, I.J.; Tornero, S.; Berengena, J. Regional Calibration of Hargreaves Equation for Estimating Reference ET in a Semiarid Environment. Agric. Water Manag. 2006, 81, 257–281. [Google Scholar] [CrossRef]
  26. López-Urrea, R.; Martín de Santa Olalla, F.; Fabeiro, C.; Moratalla, A. Testing Evapotranspiration Equations Using Lysimeter Observations in a Semiarid Climate. Agric. Water Manag. 2006, 85, 15–26. [Google Scholar] [CrossRef]
  27. Allen, R.G.; Pereira, L.S.; Howell, T.A.; Jensen, M.E. Evapotranspiration Information Reporting: I. Factors Governing Measurement Accuracy. Agric. Water Manag. 2011, 98, 899–920. [Google Scholar] [CrossRef]
  28. Sentelhas, P.C.; Gillespie, T.J.; Santos, E.A. Evaluation of FAO Penman–Monteith and Alternative Methods for Estimating Reference Evapotranspiration with Missing Data in Southern Ontario, Canada. Agric. Water Manag. 2010, 97, 635–644. [Google Scholar] [CrossRef]
  29. Martí, P.; González-Altozano, P.; López-Urrea, R.; Mancha, L.A.; Shiri, J. Modeling Reference Evapotranspiration with Calculated Targets. Assessment and Implications. Agric. Water Manag. 2015, 149, 81–90. [Google Scholar] [CrossRef]
  30. Hargreaves, G.H.; Allen, R.G. History and Evaluation of Hargreaves Evapotranspiration Equation. J. Irrig. Drain. Eng. 2003, 129, 53–63. [Google Scholar] [CrossRef]
  31. Doorenbos, J.; Pruitt, W.O. Guidelines for Predicting Crop Water Requirements; FAO: Rome, Italy, 1981. [Google Scholar]
  32. Allen, R.G.; Pereira, L.S.; Howell, T.A.; Jensen, M.E. Evapotranspiration Information Reporting: II. Recommended Documentation. Agric. Water Manag. 2011, 98, 921–929. [Google Scholar] [CrossRef]
  33. Yrissarry, J.J.B.; Naveso, F.S. Use of Weighing Lysimeter and Bowen-Ratio Energy-Balance for Reference and Actual Crop Evapotranspiration Measurements. Acta Hortic. 2000, 537, 143–150. [Google Scholar] [CrossRef]
  34. Haykin, S. Neural Networks: A Comprehensive Foundation, 2nd ed.; Prentice Hall PTR: Hoboken, NJ, USA, 1998; ISBN 0132733501. [Google Scholar]
  35. Samani, Z. Discussion of “History and Evaluation of Hargreaves Evapotranspiration Equation” by George H. Hargreaves and Richard G. Allen. J. Irrig. Drain. Eng. 2004, 130, 447–448. [Google Scholar] [CrossRef]
  36. Shiri, J.; Sadraddini, A.A.; Nazemi, A.H.; Marti, P.; Fakheri Fard, A.; Kisi, O.; Landeras, G. Independent Testing for Assessing the Calibration of the Hargreaves–Samani Equation: New Heuristic Alternatives for Iran. Comput. Electron. Agric. 2015, 117, 70–80. [Google Scholar] [CrossRef]
  37. Mialyk, O.; Schyns, J.F.; Booij, M.J.; Su, H.; Hogeboom, R.J.; Berger, M. Water Footprints and Crop Water Use of 175 Individual Crops for 1990–2019 Simulated with a Global Crop Model. Sci. Data 2024, 11, 206. [Google Scholar] [CrossRef]
  38. Karamouz, M.; Kerachian, R.; Zahraie, B. Monthly Water Resources and Irrigation Planning: Case Study of Conjunctive Use of Surface and Groundwater Resources. J. Irrig. Drain. Eng. 2004, 130, 391–402. [Google Scholar] [CrossRef]
  39. Roy, A.; Sahai, A.K.; Ghosh, S. District to Subdistrict Scale Optimum Irrigation Water Management Planning at Multi-Week Lead Time. J. Earth Syst. Sci. 2025, 134, 102. [Google Scholar] [CrossRef]
  40. Okayama, A.; Yamamoto, A.; Kimura, M.; Matsuno, Y. Utilizing Convolutional Neural Network (CNN) for Orchard Irrigation Decision-Making. Environ. Monit. Assess. 2025, 197, 168. [Google Scholar] [CrossRef]
  41. Mendicino, G.; Senatore, A. Regionalization of the Hargreaves Coefficient for the Assessment of Distributed Reference Evapotranspiration in Southern Italy. J. Irrig. Drain. Eng. 2013, 139, 349–362. [Google Scholar] [CrossRef]
  42. Willmott, C.J. Some Comments on the Evaluation of Model Performance. Bull. Am. Meteorol. Soc. 1982, 63, 1309–1313. [Google Scholar] [CrossRef]
  43. Lyles, B.F.; Sion, B.D.; Page, D.; Crews, J.B.; McDonald, E.V.; Hausner, M.B. Closing the Water Balance with a Precision Small-Scale Field Lysimeter. Sensors 2024, 24, 2039. [Google Scholar] [CrossRef] [PubMed]
  44. Gebler, S.; Hendricks Franssen, H.-J.; Pütz, T.; Post, H.; Schmidt, M.; Vereecken, H. Actual Evapotranspiration and Precipitation Measured by Lysimeters: A Comparison with Eddy Covariance and Tipping Bucket. Hydrol. Earth Syst. Sci. 2015, 19, 2145–2161. [Google Scholar] [CrossRef]
  45. Pereira, L.S.; Allen, R.G.; Paredes, P.; López-Urrea, R.; Raes, D.; Smith, M.; Kilic, A.; Salman, M. Crop Evapotranspiration—Guidelines for Computing Crop Water Requirements, 2nd ed.; FAO Irrigation and Drainage Paper, No.56 Rev.1.; FAO: Rome, Italy, 2025; ISBN 978-92-5-140060-9. [Google Scholar]
  46. de Carvalho, L.G.; Evangelista, A.W.P.; Oliveira, K.M.G.; Silva, B.M.; Alves, M.d.C.; Júnior, A.d.S.; Miranda, W.L. FAO Penman-Monteith Equation for Reference Evapotranspiration from Missing Data. Idesia 2013, 31, 39–47. [Google Scholar] [CrossRef][Green Version]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.