Prediction of Potential and Actual Evapotranspiration Fluxes Using Six Meteorological Data-Based Approaches for a Range of Climate and Land Cover Types

Evapotranspiration is the major component of the water cycle, so a correct estimate of this variable is fundamental. The purpose of the present research is to assess the monthly scale accuracy of six meteorological data-based models in the prediction of evapotranspiration (ET) losses by comparing the modelled fluxes with the observed ones from eight sites equipped with eddy covariance stations which differ in terms of vegetation and climate type. Three potential ET methods (Penman-Monteith, Priestley-Taylor, and Blaney-Criddle models) and three actual ET models (the Advection-Aridity, the Granger and Gray, and the Antecedent Precipitation Index method) have been proposed. The findings show that the models performances differ from site to site and they depend on the vegetation and climate characteristics. Indeed, they show a wide range of error values ranging from 0.18 to 2.78. It has been not possible to identify a single model able to outperform the others in each biome, but in general, the Advection-Aridity approach seems to be the most accurate, especially when the model calibration in not carried out. It returns very low error values close to 0.38. When the calibration procedure is performed, the most accurate model is the Granger and Gray approach with minimum error of 0.13 but, at the same time, it is the most impacted by this process, and therefore, in a context of data scarcity, it results the less recommended for ET prediction. The performances of the investigated ET approaches have been furthermore tested in case of lack of measured data of soil heat fluxes and net radiation considering using empirical relationships based on meteorological data to derive these variables. Results show that, the use of empirical formulas to derive ET estimates increases the errors up to 200% with the consequent loss of model accuracy.


Introduction
Evapotranspiration (ET) is the major component of the water cycle [1]. It is fundamental in many different applied hydrology studies [2,3], hence its incorrect assessments impact the hydrological soil-water balance [4]. Eddy covariance (EC) stations are considered among the most reliable systems for determining the ET losses but there are limitations in their use. The limitations come from the high costs of installation and maintenance, in addition, long-term ET measurements are complex to obtain and even if observational data exist, algorithms for data pre-processing are time-consuming [5]. Furthermore, flux towers coverage at the global scale is evidently quite inhomogeneous and quite scarce in some areas of the world. For all these reasons, in time, many scholars have proposed different models for the prediction of actual (AET) and potential (PET) ET. Among a large number of models, the conventional meteorological data-based approaches, though could appear old fashioned, are still nowadays preferred. The reason is the simplicity of computation and in the large availability of meteorological data required for the simulation [6,7]. This aspect is essential especially in rural water basin contexts where the lack of data represents the major challenge in evapotranspiration water losses assessment [8,9].

The Experimental Sites
Observed ET data from eight eddy covariance (EC) towers have been used to compare the accuracy of the six selected models in predicting the evapotranspiration fluxes. The EC stations belong to three global networks namely Fluxnet (http://fluxnet.fluxdata.org/, accessed on 20 January 2021), TERENO (http://teodoor.icg.kfa-juelich.de/overview-en, accessed on 20 January 2021), and AmeriFlux (http://ameriflux.lbl.gov/, accessed on 20 January 2021) platforms. They are featured by different biomes which combine four different vegetation types (grassland, cropland, forest and wetland) and two climate conditions (Mediterranean, Oceanic).
For what concern the selection of the vegetation type, grasslands and forests have been considered since they are among the largest ecosystems in the world covering respectively about the 40% and 30% of the earth's land area excluding Greenland and Antarctica [44,45]. Cropland has expanded rapidly in order to feed the world's growing population [46] while wetland, although represents a small portion of the landscape, is found throughout the world [47]. Grassland ecosystems (GRA) are areas covered by grass-dominated vegetation with little or no trees and a variable intensity of meadows, steppes, and grasslands grazed. Cropland (CRO) is farmland with agricultural and/or horticultural products. The forest cover (FOR) includes terrestrial habitats dominated by trees and other woody plants. The wetland (WET) consists of permanent mosaic of water and herbaceous or woody vegetation. With respect to the climate, sites located in Mediterranean and Oceanic areas have been considered since they present very different temperature and precipitation annual patterns, with a marked seasonality in the case of the Mediterranean climate over the temperate Oceanic. Indeed, according to the Koppen classification, the Mediterranean climate (Cs) has mild, wet winters and warmto-hot, dry summers with the precipitation mainly occurring during winter and autumn. The subtypes "Csa" and "Csb" represent a Mediterranean climate with average temperature respectively above and below 22 • during the warmest month. The oceanic climate (Cf) is characterized by cool winters and warm summers, and the rainfall events are evenly dispersed during the year. Cf has the two subtypes "Cfa" and "Cfb" too. Eight areas with the considered biomes have been selected for the present study including "us-twt" (FLUXNET2015 DOI: 10.18140/FLX/1440106) located on Twitchell island, Sacra-mento Country, California (USA) [48,49], "us-tw1" located on Twitchell island, Sacramento Country, California (USA) (FLUXNET2015 DOI: 10.18140/FLX/1440108) [50][51][52], "us-arm" located near Billings, in northwest Noble County, Oklahoma (USA) (FLUXNET2015 DOI: 10.18140/FLX/1440066) [53,54], "us-fwf" located near Flagstaff, Coconino County in Arizona (USA) (AMERIFLUX DOI:10.17190/AMF/1246052) [55,56], "de-rur" located in Simmerath, north Rhine-Westphalia (Germany) (https://deims.org/356417de-5a3c-42 9d-82c1-08a4e924ab3b, accessed on 20 January 2021) [57,58], "de-hai" located within the Hainich National Park in central Germany close to the city of Eisenach, western Thuringia (FLUXNET2015 DOI:10.18140/FLX/1440148) [59][60][61], "de-sfn" is located near Seeshaupt, Weilheim-Schongau district, in Bavaria, Germany (FLUXNET2015 DOI:10.18140/FLX/ 1440219) [62,63], "us-me3" located near Sisters, a city in Deschutes County, Oregon, United State (FLUXNET2015 DOI:10.18140/FLX/1440080) [64][65][66].
In details, the EC measurements inevitably include gaps within the time series recorded every 30 min. According to [67,68], the days with rates of missing data than 80% have been excluded by the analysis. The days with a percentage of gap than the above-mentioned threshold, have been subjected to a standard gap-filling dure, suggested by [69]. This approach replaces the missing values with the averag under similar meteorological conditions within a time-window of ±7 days. A summ the percentages of missing data for the whole periods of observation and each provided in the Table 1 and in Table S1. These sites have been preferred among all available locations with the required characteristics of climate and vegetation since they offer a better data quality standard, with longer periods of observation and low frequency in the occurrence of missing data that means a limited lack of measurements of latent heat flux (LE) ( Table 1). In details, the EC measurements inevitably include gaps within the time series of data recorded every 30 min. According to [67,68], the days with rates of missing data higher than 80% have been excluded by the analysis. The days with a percentage of gaps lower than the above-mentioned threshold, have been subjected to a standard gap-filling procedure, suggested by [69]. This approach replaces the missing values with the average value under similar meteorological conditions within a time-window of ±7 days. A summary of the percentages of missing data for the whole periods of observation and each site are provided in the Table 1 and in Table S1. 2.1.1. Climate Characterization of the Sites Figure 2 illustrates the climate characterization for each of the selected sites, with the temporal patterns of the monthly average precipitation and air temperature computed over the periods of observation using data from the EC towers. Figure 2 shows that us-twt station is characterized by minimum temperature of 6 • C occurring during December and maximum temperature of about 21 • C during July. The wettest month is December with 81 mm of rain fall while the driest month is July with no rain. The site has a typical Mediterranean climate (Csa) characterized by annual mean air temperature of 14.6 • C and mean annual precipitation of 344 mm.
The site "us-arm" is located in a humid subtropical (Cfa) area with mean annual temperature of 15.3 • C and mean annual precipitation of 646 mm. The mean monthly temperature and the precipitation range respectively from 2 • C (January) to 29 • C (July) and from 18 mm (January) to 105 mm (June).
The site "us-fwf" has a cool-summer Mediterranean climate (Csb) with an annual rainfall amount of 539 mm and a mean annual temperature of 8.4 • C. The monthly temperature reaches peaks of 19 • C during July and a minimum value of −2 • C in December while the monthly precipitation moves from 13 mm in March to 135 mm in July.
The test site of "de-rur" is located in an area with Oceanic (cfb) climate where the annual mean precipitation and air temperature are of approximately 895 mm and 8.4 • C. The monthly temperature varies from about 2 • C in February to 17 • C in July while the monthly precipitation from 48 mm to 110 mm during the same months.
The climate for us-me3 is temperate Mediterranean with a mean annual air temperature and precipitation of 8.2 • C and 384 mm respectively. The monthly temperature and precipitation reach minimum values of −1 • C (November) and 3 mm (July) and maximum values of 20 • C (July) and 82 mm (November).
The "de-hai" tower is located in a maritime temperate climate (Cfb) area with total annual depth of precipitation of about 806 mm and mean annual air temperature of about 8.3 • C. The lowest monthly temperature is of −0.5 • C and it is reached during January while the highest is 17 • C in August. The monthly precipitation ranges between 46 mm in October and 89 mm in May.
"Us-tw1" is featured by a Mediterranean climate (Csa) with mean annual temperature of 15 • C and a total annual precipitation volume of 399 mm. During July no rain falls while December is the wettest month with 140 mm of precipitation. The temperature reaches maximum and minimum monthly values respectively of 8 • C (January) and 21 • C (June). The "de-sfn" station is located in an area with temperate climate (Cfb). The mean annual air temperature and precipitation are 8.3 • C and 914 mm respectively. The hottest month is July with average temperature of 18 • C while the coldest is February with 0.1 • C. The rain falls most during August (140 mm) while it reaches its lowest value during February (34 mm).  The moisture index (I M ) proposed by Thornthwaite [16] has been calculated for each investigated site so as to provide a more detailed climate characterization. It can be estimated as: where: P and PET respectively represent the average annual precipitation and potential evapotranspiration which is given by the sum of monthly PET i values computed as: where T i is the average air temperature during the i-th month of the year (in degree Celsius) while I is the annual heat index got from the summation of monthly heat indices i: The coefficient a is evaluated as: In Table 2 for each climate type, the corresponding range of I M values has been defined.  The site us-twt is located on a cultivated crop consisting of rice planted at a density of 14-17 g m −2 . The tower "us-arm" lies on a land primarily covered by winter wheat and grass-land/pasture. The EC station us-fwf is above an area covered by forest converted to grassland by intense burning in 1996 so currently, the trees density is of 0 trees ha −1 . The vegetation after the fire, consisted of sparse grasses, forbs, and shrubs. De-rur is a grassland site. The composition of higher plant species at the site is typical for traditionally managed grasslands and consists of Ranunculus repens-Alopecurus pratensis plant community, where a majority of species are identified as meadow foxtail, perennial rye grass, rough meadow grass, and common sorrel. The land biome of us-me3 is almost exclusively composed of young ponderosa pine trees with a density lower than normally found in similar ecosystem. De-hai is a forest area which is primarily composed by beech with smaller percentage of ash, maple, and other deciduous and coniferous species. The forest has a tree density of about 330 trees ha −1 . Us-tw1 eddy covariance station is located in an area with vegetation including hydrophytic plants with emergent marsh species like Schoenoplectus acutus and Typha species. A relatively greater proportion of vegetation consists of S. acutus stems 1-3 m tall. The site of de-sfn is a natural bog-pine forest where the vegetation is quite heterogeneous and includes peat mosses (Sphagnum spp.) in addition to heather (Calluna vulgaris L.), bog bilberry scrubs (Vaccinium uliginosum L.s.l.) ISPRS Int. J. Geo-Inf. 2021, 10, 192 8 of 29 and several species of the sedge-family (Cyperaceae, mainly Eriophorum vaginatum L., hare's-tail cottongrass).

Instrumentation and Dataset
Meteorological daily data used to run the selected ET prediction models are net solar radiation, air temperature, relative humidity, wind speed, latent heat flux, precipitation, and soil heat-flux density. Actual evapotranspiration has been calculated from the latent heat flux using the procedure suggested by FAO [41]: where ET EC represents the actual evapotranspiration flux (mm −1 ), LE is the latent heat flux (Mj m −2 d −1 ), λ is the latent heat of vaporization (MJ kg −1 ). The above-said measurements have been collected by the monitoring systems at the different sites at hourly and sub-hourly timescale and then aggregated at daily scale which is the scale of analysis used in the present analysis.
The instrumentation of each site is briefly described in the following. In the site us-twt, the fluxes of carbon dioxide, methane, latent heat, and sensible heat have been measured using the EC method while a sonic anemometer allows to measure wind speed and temperature at the measurement height of 3.05 m. The station us-arm is equipped with a sonic anemometer (Gill-Solent WindMaster Pro), an open-path infrared gas analyzer (IRGA LiCorLI-7500), and a set of meteorological and soil sensors which allow to monitor radiation, wind speed and direction, air temperature and humidity, precipitation, soil heat flux. The measurement height is 4 m. In the site us-fwf, the measurements have been performed using a 3D sonic anemometer (CSAT3, Campbell Scientific, Logan, UT, USA), a closed path CO 2 /H 2 O analyzer (Li-7000, Li-Cor, Lincoln, NE, USA) at the height of 60 m.a.s.l. Additionally air and soil meteorological measurements have been recorded. The test-site de-rur includes a 2.6 m-high eddy covariance tower, installed since 2011 with a sonic anemometer and an infrared gas analyzer. Eddy-covariance measurements at us-me3 station have been collected using a three-dimensional sonic anemometer and an open-path infrared gas analyzer with additional measurements including atmospheric temperature, relative humidity, and precipitation. The eddy-covariance data were collected at the height of 12 m. At de-hai site, using the eddy covariance technique, water vapor, heat. and momentum fluxes have been continuously measured at a height of about 43 m.a.s.l. The equipment consisted of a triaxial sonic anemometer (Gill Solent R3, Gill Instruments, Lymington, UK) and a CO 2 and water infrared gas analyzer (LiCor 6262-3, LiCor Inc., Lincoln, NE, USA). In addition, the tower measured wind direction and velocity, air humidity and air temperature, radiation and precipitation. The tower of us-tw1 is equipped to analyze energy, H 2 O, CO 2 , and CH 4 fluxes. Fluxes are measured in detail using an open-path infrared gas analyzer (LI-7500 or LI-7500 A for CO 2 and H 2 O, LI-7700 for CH4, LiCOR Inc., Lin-coln, NE, USA). In addition, the air temperature and threedimensional wind speeds are measured with a sonic anemometer (WindMaster Pro 1352 or 1590, Gill Instruments Ltd., Lymington, Hampshire, England). The eddy-covariance data at de-sfn test site are collected at the height of 6 m. The flux measurements are collected using an open-path infrared gas analyzer (IRGA, LI7500, Li-Cor, Inc., Lincoln, NE, USA) in addition, a 3-D sonic anemometer (CSAT-3, Campbell Scientific, Inc., Logan, UT, USA), a heated tipping bucket rain gauge 52202 (Campbell Scientific), and a heated and ventilated CNR4 (Kipp and Zonen, Delft, The Netherlands) are currently in use.
The monitoring period differs from site to site (Table 3), the shortest one refers to sites us-tw1 and de-sfn where two years of measurements (from 2012 to 2014) are available while the longest period of observation ranging from 2003 to 2012, is related to the site us-arm. In the following table a summary of the characteristics of the test-sites can be found.

Description of the Selected AET and PET Models and Their Evaluation
ET predictions from six models have been compared with the measured data. Three PET models and three AET methods have been selected. The description of these models is available in the following.

PET Approaches
The combination-type model suggested by Penman-Monteith (PM), the radiationbased approach of Priestley-Taylor (PT), and the temperature-based formula proposed by Blaney-Criddle (BC) have been used in the present study to model PET. PM is probably the most commonly used method in the scientific literature to derive potential evapotranspiration [70]. The PM method assumes that all the energy for evaporation can be used by the plants and that water first has to pass through stomata openings, total leaf area, and soil surface against surface resistance and then it diffuses into the atmosphere against the aerodynamic resistance [71]. Since PM approach takes into account the heat and water vapor mass transfer mechanisms, it is considered a combination equation. The PM equation is: where ∆ is the slope of the saturation vapor pressure-temperature curve (kPa • C −1 ) expressed as: T mean represents the average temperature between maximum and minimum values during the day ( • C), λ is the latent heat of vaporization (MJ kg −1 ), γ is the psychrometric constant (kPa • C −1 ), and E A is the drying power of the air given by the following formula: where u is the wind speed (ms −1 ), 2.6(1 + 0.54·u) represents a function of the wind f(u), e s and e a are respectively the saturation vapor pressure (kPa) and the vapor pressure (kPa), Rn is the net radiation (MJ m −2 d −1 ) and Gsoil is the soil heat-flux density at the soil surface (MJ m −2 d −1 ) which can be considered to be negligible at daily time scale [41] as in the case of the sites ID5 and ID7 where measured values are not available.
Priestley-Taylor (PT) model was developed for the prediction of potential evaporation fluxes from a wet vegetated surface under conditions of minimum advection [72][73][74]. It simplifies the PM equation removing the aerodynamic component, when the surrounding environment is wet or humid conditions prevail. Priestley-Taylor [12] equation is given by: where α is the advection correction coefficient and it is set at the value of 1.26.
The Blaney-Criddle (BC) model is a simpler alternative to the PM and PT equations since it requires only mean temperature for monthly ET assessment. The usual form of the BC equation is: where p is the mean percentage of total daytime hours (at daily or monthly scale depending on the considered period) and it is unitless.

AET Models
In the present study, AET fluxes have been estimated using three meteorological data-based approaches including the Advection Aridity (AA) model, the Granger and Gray (GG) model, and the Antecedent Precipitation Index (API) model. The AA approach [26] and the GG method [29] belong to the category of complementary relationship (CR) models and are among the most used within this class. The AA formulation can be written as: The present equation combines an aerodynamic term with an energy component based on net incoming radiation.
According to Granger and Gray [29], AET can be calculated using the following equation: It describes ET from non-saturated surfaces. In order to account for the departure from saturated conditions, Equation (12) applies the concept of relative evaporation introduced by the use of G, the relative evaporation parameter expressed as: where D is the relative drying power: AET has been also estimated using the API model. Mawdsley and Ali [21] modified the Priestley-Taylor model for the estimation of non-potential evapotranspiration. They found that the parameter α was affected by water shortages, which can be described as a function of soil moisture [75]. Because surface soil moisture is not routinely available, they used the API index [76] to mimic soil moisture conditions: where i is the considered number of days which precede the day t, k is an empirical decay factor, and P t is the rainfall for the day t.
The non-potential evapotranspiration fluxes can be calculated using the following formulation: In the previous equation, α API is a dimensionless coefficient and it is a function of API. In particular, when API is lower than or equal to 20 mm [21], α API can be expressed as: When API is higher than 20 mm: (18) assuming that for an over-saturated system (i.e., API > 20 mm), AET rates are no longer dependent on the soil water content, but they are a constant percentage of PET.

Calibration of the Models for the Prediction of AET
The need for a calibration procedure of the AET approaches which involves the models parameters and which is able to increase the accuracy of the ET predictions, has been confirmed in time by several scholars [77,78]. In general, the calibration procedure of the AA model involves the wind function f(u) and the α-coefficient, for GG method the calibration parameter is the relative evaporation G, while for API method, α-coefficient is subjected to the calibration process. The scientific literature suggests a moderate range of variability for these parameters. With regard to the α-coefficient, it has been found to vary between 1.05 and 2.20 [79,80]. Concerning the wind function, many researches proposed different values of aw and bw such as aw = 0.37 and bw = 0.22 [81], aw = 1.313 and bw = 1.381 [6], or aw = 0.1954 and bw = 0.4703 [82]. With respect to the parameter G, several authors indirectly derived it in time [83,84].
In the present research, a one-step calibration procedure has been performed for API and the GG models, where the best fitting between monthly observed ET (ET EC ) and modelled ET has been reached by minimizing the model errors. The calibration procedure concerning the AA approach is a two-step process [82,85] because the formulation of the model has a two-term structure.
The term linked to the drying power of advected air has been calibrated discarding the data of the dry days identifiable as those with a soil water content lower than the fixed threshold of 20% [86]. The wind function has been iteratively calibrated until reaching the best fitting between the observed and modelled ET during the well-watered days. Subsequently, the α-coefficient has been calibrated considering both the wet and dry days and so, the whole period of observation.

The Indirect Estimates of Net Radiation and Soil Heat-Flux
According to Allen et al. [41], the variables Rn and Gsoil, if not available from in situ measurements, can be derived by the following empirical relationships: where Rns is net shortwave radiation and Rnl is the outgoing net longwave radiation and can be respectively expressed as: where Rs is the incoming solar radiation, β is the canopy reflection coefficient, σ SB is the Stefan-Boltzmann constant, T max,K and T min,K are respectively the maximum and minimum absolute temperature during the 24 h period, Rso is the clear-sky solar radiation. Rs and Rso can be calculated as follows: (60) π G sc d r [(ω 2 − ω 1 ) sin ϕ sin δ + cos ϕ cos δ(sin ω 2 − sin ω 1 )] In the previous equations, as and bs are regression constants, expressing the fraction of extraterrestrial radiation reaching the earth on overcast days, n represents the actual duration of sunshine, N is the symbol for the maximum possible duration of sunshine or daylight hours, Ra is the extraterrestrial radiation in the hour, G sc indicates the solar constant, d r expresses the inverse relative distance Earth-Sun, ω 1 and ω 2 respectively are the solar time angle at beginning and at end of period, ϕ is the latitude, δ solar declination, z is the station elevation above sea level.
The equation of soil heat-flux density at the soil surface is: where c s indicates the soil heat capacity, T i and T i−1 respectively are the air temperature at time i and i − 1, ∆t is the selected time interval, ∆y is the effective soil depth.

Models Evaluation
The evapotranspiration losses computed at monthly scale using the selected models have been compared to the observed values of ET from the eddy-covariance towers so as to estimate the accuracy of each approach. The comparison allows to test the performances of the models in simulating the actual ET fluxes. The goodness-of-fit indices used for the comparison are the normalized root mean square error (RMSEd) which measures the error intensity, the index of agreement (d) which measures the patterns agreement, the correlation coefficient (r) which estimates the correlation between measured and simulated variables. The corresponding equations are: where n is the length of the monthly sample, ET mod,i is the monthly modelled ET value while ET EC,i is the observed one from EC measurements, and σ the ET standard deviation.

Results of the Climate Investigation
The values of the moisture index (I M ) ( Table 4) calculated for each test site, suggest that four sites can be classified as arid or semi-arid including us-twt, us-arm, us-me3, us-tw1 while us-fwf, de-rur, de-hai, and de-sfn can be listed among the humid and sub-humid climatic types. In details, the most arid site is us-twt with a value of I M of −65 while the most humid is de-rur with I M of 68.  Figure 2, it can be deduced that the site with the highest annual rainfall is de-sfn with 914 mm followed by de-rur with 895 mm. The area with the lowest annual precipitation is us-twt with 344 mm of rain fallen but similar values have been reached by us-me3 and us-tw1 respectively with 384 and 399 mm of cumulative rainfall.
The site with the highest mean annual temperature is us-arm with 15.3 • C while the lowest temperature (8.2 • C) occurs in the test site of us-me3.
In addition, from Figure 2 it emerges clearly that the Mediterranean areas present a seasonal variation of rainfall patterns indeed, most of the precipitation occurs in winter months followed by a dry period during summer. On the other side, the Oceanic areas show precipitation with an even distribution throughout the year. In both climates, the temperature reaches peak during the summer months from July to August while the coldest months are November, December, January, and February.

Results of the Montlhy Predictions Using the Selected ET Models
The evapotranspiration fluxes estimated by the six models and using the measured input parameters show similar temporal patterns (Figure 3) even if the observed ET values are significantly overestimated. This occurs for all sites except for the de-rur.
The values of the goodness-of-fit indices used for the quantitative assessment of models accuracy in the prediction of ET fluxes at monthly scale are shown in Table 5, both in case of measured/observed and empirically-derived Rn and Gsoil. In addition, in Table  5, the most accurate model is indicated for each site. In general, the AET models are better able to reproduce the ET measurements than PET approaches with maximum values of RMSEd at most of 1.97 against 3.34 of PET models. In particular, in almost all sites, the best performing approach appears to be the AA model with average values of RMSEd, d, and r respectively of 0.8, 0.6, and 0.9. A reduction of the prediction accuracy can be observed when the indirect evaluation of Rn and Gsoil has been performed. In this case, the best performing methods exhibit values of RMSEd close to 2 and agreements lower than 0.5 although with high values of r.
A summary of the goodness of fit indices referring to the non-calibrated best performing approach for each site and with reference to the specific land cover type and climate condition has been reported in Figure 4. The highest prediction errors are reached for forest ecosystems and arid/semiarid conditions where the values of RMSEd are close to 1.2 while the grassland environments and the humid/sub-humid climates present the better accuracy with RMSEd around 0.2.
In case of lack of measurements of Rn and Gsoil, these variables are estimated using empirical formulas. The errors between observed and modelled Rn and Gsoil are shown in Table 6. The error ranges from a minimum value of 0.07 to a maximum value of 1.31 for the net radiation while it moves between −46.87 and 3.68 for the soil heat-flux density. So, the error made in estimating Gsoil is persistently higher than the one related to Rn. The values of the goodness-of-fit indices used for the quantitative assessment of models accuracy in the prediction of ET fluxes at monthly scale are shown in Table 5, both in case of measured/observed and empirically-derived Rn and Gsoil. In addition, in Table 5, the most accurate model is indicated for each site. In general, the AET models are better able to reproduce the ET measurements than PET approaches with maximum values of RMSEd at most of 1.97 against 3.34 of PET models. In particular, in almost all sites, the best performing approach appears to be the AA model with average values of RMSEd, d,   A summary of the goodness of fit indices referring to the non-calibrated best performing approach for each site and with reference to the specific land cover type and climate condition has been reported in Figure 4. The highest prediction errors are reached for forest ecosystems and arid/semiarid conditions where the values of RMSEd are close to 1.2 while the grassland environments and the humid/sub-humid climates present the better accuracy with RMSEd around 0.2. Reference is made to (a) land cover (CRO, GRA, FOR, WET that respectively stand for cropland, grassland, forest, and wetland) and to (b) the moisture index IM (arid/semi-arid versus sub-humid/humid, ID numbers are sorted for increasing IM).

Models RMSE d (-) d (-) r (-) RMSE d (-) d (-) r (-)
In case of lack of measurements of Rn and Gsoil, these variables are estimated using empirical formulas. The errors between observed and modelled Rn and Gsoil are shown in Table 6. The error ranges from a minimum value of 0.07 to a maximum value of 1.31 for the net radiation while it moves between −46.87 and 3.68 for the soil heat-flux density. So, the error made in estimating Gsoil is persistently higher than the one related to Rn. Besides the quantitative aspect, the temporal patterns of monthly observed Rn and Gsoil plotted against the modelled ones are shown in Figure 5. The figure reveals, at first sight, that they differ both in terms of temporal dynamics and magnitude.  Besides the quantitative aspect, the temporal patterns of monthly observed Rn and Gsoil plotted against the modelled ones are shown in Figure 5. The figure reveals, at first sight, that they differ both in terms of temporal dynamics and magnitude.

Results of the Calibration Procedure
A site-specific calibration procedure has been performed in order to reduce the prediction errors of the selected models. The model calibration has involved a scenario where Rn and Gsoil were directly measured and not provided by the use of empirical formulas in order to avoid even larger models distortion. Since the AET models have returned better performances than PET approaches, the calibration procedure has been performed only for AA, API, and GG models (AA cal , API cal , and GG cal ). The values of α-coefficient and the formulations of the wind function and of the relative evaporation parameter for each site after the calibration procedure (respectively α CAL , f(u) CAL and G CAL ) are shown in Table 7. For the sites ID1 and ID4, the values of α CAL resulting from the calibration of API approach are close to 1.26 where for the other sites, they are significantly different from this standard coefficient. With regard to the values of α CAL resulting from the calibration of AA approach, they are more likely similar to 1.26. For the experimental sites ID4 and ID6, the formulations of G, after the calibration process, approach the original one while this does not happen for the other sites. Concerning the parameter b w of the wind function, the calibration procedure returns for all sites, a value ten times lower than the original one of 0.54.

Results of the Calibration Procedure
A site-specific calibration procedure has been performed in order to reduce the prediction errors of the selected models. The model calibration has involved a scenario where Rn and Gsoil were directly measured and not provided by the use of empirical formulas in order to avoid even larger models distortion. Since the AET models have returned better performances than PET approaches, the calibration procedure has been performed only for AA, API, and GG models (AAcal, APIcal, and GGcal). The values of α-coefficient and the formulations of the wind function and of the relative evaporation parameter for  Table 7. Goodness of fit indices for the eight experimental sites. The most accurate models in case of calibrated and non-calibrated approach are compared (Table 5). The predictions of AET fluxes resulting from the calibrated models (API CAL , AA CAL , GG CAL ) and from the most accurate non-calibrated models have been compared using the selected goodness of fit indices (Table 7).

Site Models RMSEd (-) d (-) r (-) Calibrated Parameters
In general, the calibrated models present lower error than the most accurate models in case of non-calibrated approach. In Figure 6, the monthly patterns of the ET losses modelled using the calibrated approaches are illustrated along with the observed ET values and the ones from the most accurate non-calibrated method for each site and, at a fist visual inspection, it is possible to discern an improvement in ET estimates related to the calibration procedure.
The relative difference in RMSE, r, and d before and after the calibration procedure as shown in Figure 7, allows to observe that the error decreases due to this process indeed, when the difference in RMSE is positive, the model performances improve and vice versa for d and r.
In Figure 8, it is possible to observe in which kind of climatic condition and land cover, the calibrated models exhibit the worst performances in terms of RMSE, d, and r. The forest systems and the arid/semi-arid climates are confirmed to have the lowest performances also in case of calibrated approach. ISPRS Int. J. Geo-Inf. 2021, 10,192 19 of 29 visual inspection, it is possible to discern an improvement in ET estimates related to the calibration procedure. The relative difference in RMSE, r, and d before and after the calibration procedure as shown in Figure 7, allows to observe that the error decreases due to this process indeed, when the difference in RMSE is positive, the model performances improve and vice versa for d and r.   In Figure 8, it is possible to observe in which kind of climatic condition and land cover, the calibrated models exhibit the worst performances in terms of RMSE, d, and r. The forest systems and the arid/semi-arid climates are confirmed to have the lowest performances also in case of calibrated approach.

Discussion
The findings illustrated in the previous chapter highlight various aspects which can be summarized as follows.
PET approaches are less accurate than AET models since the ET estimates computed using these methods significantly overestimate the observed ET with high RMSE even if r is quite high too. Actually, among PET models, Priestley-Taylor method exhibits the best performances, so it represents an exception to the rule. It shows an accuracy comparable to the API model used for the prediction of AET and in addition, for the site us-twt (ID1), PT model is the most accurate.
Among the AET models, the AA approach presents the highest performances for most of the studied sites (ID 2, 3,5,6,7,8) and comparable in the others (ID 1,4) while API model is the most accurate for the prediction of monthly ET for the site of de-rur (ID4). Anyway, it should be noted that, for de-rur, all the AET models present similar performances (Table 4).
With regard to the land cover type, the highest values of RMSE and the lowest values of r and d occur for the forest cover type systems (ID5 and ID6) (Figure 4 left panel).
This result has been confirmed by previous investigations which documented a lower accuracy of the ET models in predicting the evapotranspiration fluxes for forest sites [37,87]. The reason could lie in the fact that the proposed models do not have correction terms which take into account the roughness sub-layer (RSL) effects typical of tall canopylike forest ecosystems. The roughness sub-layer (RSL) is the lowest atmospheric layer immediately adjoining an area covered with roughness elements like trees. It extends from the forest vegetation height up to about twice that height. The EC towers allows measurements inside this layer but for these measurements, the classical method of the flux gradient relationship based on the surface-layer theory, fails and modifications to the methods are required in order to consider the RSL effect [88,89].
Another reason could be linked to the strong heterogeneity of the forest environments where the reference source area for the input data could be different [37].
On the other side, for what concerns the climate conditions, the humid and sub-humid sites ID3, ID4, ID6, and ID8 present the lowest prediction errors (lowest RMSE and largest r and d) as illustrated in Figure 4 right panel.
The reason of these results could be found in the models' structure which not specifically consider the soil moisture with the exception of API model which however is only

Discussion
The findings illustrated in the previous chapter highlight various aspects which can be summarized as follows.
PET approaches are less accurate than AET models since the ET estimates computed using these methods significantly overestimate the observed ET with high RMSE even if r is quite high too. Actually, among PET models, Priestley-Taylor method exhibits the best performances, so it represents an exception to the rule. It shows an accuracy comparable to the API model used for the prediction of AET and in addition, for the site us-twt (ID1), PT model is the most accurate.
Among the AET models, the AA approach presents the highest performances for most of the studied sites (ID 2, 3,5,6,7,8) and comparable in the others (ID 1,4) while API model is the most accurate for the prediction of monthly ET for the site of de-rur (ID4). Anyway, it should be noted that, for de-rur, all the AET models present similar performances (Table 4).
With regard to the land cover type, the highest values of RMSE and the lowest values of r and d occur for the forest cover type systems (ID5 and ID6) (Figure 4 left panel).
This result has been confirmed by previous investigations which documented a lower accuracy of the ET models in predicting the evapotranspiration fluxes for forest sites [37,87]. The reason could lie in the fact that the proposed models do not have correction terms which take into account the roughness sub-layer (RSL) effects typical of tall canopy-like forest ecosystems. The roughness sub-layer (RSL) is the lowest atmospheric layer immediately adjoining an area covered with roughness elements like trees. It extends from the forest vegetation height up to about twice that height. The EC towers allows measurements inside this layer but for these measurements, the classical method of the flux gradient relationship based on the surface-layer theory, fails and modifications to the methods are required in order to consider the RSL effect [88,89].
Another reason could be linked to the strong heterogeneity of the forest environments where the reference source area for the input data could be different [37].
On the other side, for what concerns the climate conditions, the humid and sub-humid sites ID3, ID4, ID6, and ID8 present the lowest prediction errors (lowest RMSE and largest r and d) as illustrated in Figure 4 right panel.
The reason of these results could be found in the models' structure which not specifically consider the soil moisture with the exception of API model which however is only based on the use of Antecedent Precipitation Index designed to provide a proxy of surface volumetric water content [21][22][23][24]. Indeed, as suggested by other studies [90,91], the ET models have better performances in humid regions where the climate variables are a controlling factor for ET and lower accuracy in arid regions where the moisture availability is the dominating factor. Another reason is the background conditions in which the models were originally proposed that mainly refer to humid environments. AA method was proposed based on experimental data of Hupsel Creek in the eastern part of The Netherland in Oceanic climate [26]. The BC equation was based on experimental data from several locations in America including Montrose, Colorado in continental climate, Salt river valley, Arizona in arid climate, High valley areas, Colorado in continental climates, Altus, Oklahoma and Kearney, Nebraska in humid climates. GG was validated using field data monitored at two stations in western Canada in the province of Saskatchewan with a continental climate [17] while for PT, data from Aspendale, Victoria (Australia) in tropical climate, Madison, Wisconsin (America), in continental condition, Gurley in northern New South Wales (Australia) in humid climate and Hay in southern New South Wales with semi-arid climate have been used [12]. The data used for the formulation of API come from Trumpington, Cambridge, U.K. in humid climate, Cardington, Bedford, U.K. in oceanic climate [21][22][23][24]. The original PM equation was developed at the Rothamsted Experimental Station, Harpenden, UK with oceanic climate [19].
When the lack of data requires the indirect estimates of meteorological input variables to run the ET models, it is possible to come to interesting results. Models estimates using Rn and Gsoil from empirical formulas return higher average errors than the corresponding estimates obtained by using observed flux variables (Table 5). Indeed, in some cases the errors are more than twice as large as the ones derived from measured Rn and Gsoil (ID2, ID3, ID4, ID8). In particular, the error increases more than 200% for the site ID4, while it is less extreme for ID7 where only a variation of 7% occurs. Results from previous studies confirm that, in general, the prediction performances of the ET models decrease with decreasing data availability. For instance, [38] showed that the highest accuracy occurred when radiation data are available. Indeed, the radiation methods returned very high determination coefficient, index of agreement, and slope of regression which are close to 1.
In the same way, [40] came to the conclusion that when input data collection is difficult, the ET estimates are less accurate, and the application of temperature-based models are most desirable for the predictions of ET fluxes. The same reasoning applies to [43] which claimed that when limited weather data are available, a strong overestimation of the ET fluxes can occur.
The use of modelled fluxes also impacts on the choice of the most performing method. For the site ID4 the most effective approach changes from API model to AA model with the use of empirically derived variables while for the site ID8, the AA model gives way to GG model ( Table 5). The empirical formula used in case of lack of measured data to compute the net radiation overestimates the observed Rn values reaching RMSE at most of 1.31 (ID3) while the equation used for the computation of Gsoil strongly underestimates the measurements with values of RMSE even lower than −45 as it can be deduced from Table 6.
In confirmation of what stated above, Figure 5 reveals that even if the measured and simulated values of net radiation have similar temporal dynamics, they differ in terms of magnitude. On the other side, the predicted soil heat-flux density values strongly differ from the observed ones both in term of size and temporal development. Indeed, the observed Gsoil follows a seasonal pattern with growing phases from March to October and peak values approximately on July, the decreasing phases occur during Autumn and Winter with lowest values during October and November. The modelled soil heat flux follows a temporal pattern where there is no chance to identify a seasonality not even identifiable peaks but the curve appears almost flattened.
Since all models considerably overestimate the observed ET, a calibration procedure appears essential to reduce the prediction errors.
Ref. [38] came to the same conclusion, indeed, it considered the calibration an essential procedure to adapt the application of ET models to the regional conditions.
In the same way, ref. [42] found large biases when ET models are applied in inappropriate regimes in which the considered approach has been not developed.
In the present study, when the monthly patterns of ET losses modelled using the calibrated approaches are compared with the ET resulting from the best performing mod-els before the calibration process, the former results more consistent with the ET fluxes provided by the eddy-covariance towers ( Figure 6) This is confirmed by the values of the errors (Table 7) which also highlight that the site-specific calibration allows to increase the accuracy in ET predictions of the API, AA, and GG approaches. In details, GG cal model seems to be the most accurate model in case of calibration, while AA cal is the worst performing approach probably due to the complex calibration process in two-step.
In Figure 7, the improvement related to the calibration procedure can be detected with greater clarity indeed, the variations in the values of RMSE, d, and r before and after the calibration process are shown. The relative difference in RMSE before and after the calibration process is overall positive which implies that the error decreases due to this procedure and consequently, the model performance increases for each site and approach. In the same way, the variations of r and d are overall negative, so the calibration process allows to improve these indices which results in an increasing model performance for each site and approach. The relative differences in the values of RMSE and d are considerably higher than the differences in r, with peaks for FOR vegetation type. Among the considered models, the GG approach resulting the best performing calibrated method, is the most impacted by the calibration procedure, indeed, it returns variation of RMSE up to 83% and reduction of d up to 106%.
For what concerns the performances of the calibrated models with regard to the climate conditions and the vegetation type, it can be said that the sites with sub-humid/humid climates (Cfb) as in the case of non-calibrated approaches, present the best fitting (Figure 8 right panel) while the systems with forest land cover are associated with the largest prediction errors (Figure 8 left panel).

Conclusions
The performances of six meteorological data-based models in the prediction of ET have been investigated using high-quality datasets from eight eddy covariance towers belonging to TERENO, FLUXNET, and AMERIFLUX networks. The eight sites are characterized by different climates and vegetation types. In each of the considered sites, the AET losses have been predicted at monthly scale, using the GG, the AA, and the API approaches while the PET fluxes have been modelled using BC, PT, and PM methods belonging respectively to the categories of temperature-based, radiation-based, and combination-based approaches. It is difficult to detect a general accuracy of the approaches which depends on the characteristics of the site, and to identify a single model able to outperform the others for a considered biome but some general tendencies appear. The AET models are better able to predict the ET fluxes than the PET approaches. Before the calibration process, the AA method is the best performing in almost each system. The sites characterized by arid/semi-arid climates and forest vegetation type present the largest average model errors with values around 64% for RMSE, 68% for d, and 87% for r. The poor performances of the selected approaches in arid region could be related to the evapotranspiration controlling factor that is the soil moisture rather than the climate variables and to the predominantly humid background conditions in which the models were developed. The low accuracy linked to the forest systems is presumably due to the roughness sub-layer effect which could influence eddy covariance measurements and to the heterogeneous land surfaces which could produce errors in the input data.
The low predictive power of the ET models in some biomes confirms the need for a site-specific calibration procedure which allows to obtain better model accuracy. For the AA model, the parameters involved in the calibration process are the wind function f(u) and the α-coefficient, the parameters subject to calibration are the relative evaporation G and the α-coefficient respectively for GG model and API method. The calibration process of AA model consists of two phases, so it is particularly complex and the parameters resulting from this procedure significantly differ from their original values.
The calibration process allows to improve the models accuracy resulting in average values of RMSE, d, and r respectively close to 32%, 80%, and 88%. Even after the calibration, the sites with arid/semi-arid climates and forest vegetation type are associated to the lowest model accuracy. The most accurate model becomes GG method after the calibration procedure, at the same time, it results the most affected by this process while the least impacted is the AA models. In addition, the model errors in case of use of empirical formulas to derive Rn and Gsoil have been calculated. Indeed, if the measured data of these variables are not available at the experimental site, they can be indirectly estimated from climate data. The results suggest that the errors values increase with a consequent reduction of the model accuracy. Despite the worsening of the performances, in some cases of lack of measurements, the use of empirical formulas and so of an approach fully based on meteorological data is the only way forward. The model calibration has been only performed assuming that the variables Rn and Gsoil were available from in situ measurements. Indeed, the use of empirical formulas to derive net radiation and soil heat-flux would have caused a larger models distortion during the calibration phase and consequently higher estimation errors.
When, in an experimental site, the only measured variables are precipitation and temperature as it happens in most of the local weather stations, the best choice to model ET fluxes is the use of API model. Indeed, this approach requires as input parameters only precipitation, Rn and Gsoil which can be indirectly derived from the temperature. The other models proposed in the present work also require for their implementation, the wind speed and air humidity which, contrary to Rn and Gsoil, cannot be empirically estimated from temperature. Finally, the results of the present comparative study in contrasting environment have provided suggestions and recommendations for the selection of the best suited methodology to be used for ET predictions both in case of calibration/ non-calibrated procedures and in case of lack of measured data.