Actual Evapotranspiration from UAV Images: A Multi-Sensor Data Fusion Approach

: Multispectral imaging using Unmanned Aerial Vehicles (UAVs) has changed the pace of precision agriculture. Actual evapotranspiration (ET a ) from the very high spatial resolution of UAV images over agricultural ﬁelds can help farmers increase their production at the lowest possible cost. ET a estimation using UAVs requires a full package of sensors capturing the visible/infrared and thermal portions of the spectrum. Therefore, this study focused on a multi-sensor data fusion approach for ET a estimation (MSDF-ET) independent of thermal sensors. The method was based on sharpening the Landsat 8 pixels to UAV spatial resolution by considering the relationship between reference ET a fraction (ETrf) and a Vegetation Index (VI). Four Landsat 8 images were processed to calculate ET a of three UAV images over three almond ﬁelds. Two ﬂights coincided with the overpasses and one was in between two consecutive Landsat 8 images. ETrf was chosen instead of ET a to interpolate the Landsat 8-derived ETrf images to obtain an ETrf image on the UAV ﬂight. ETrf was deﬁned as the ratio of ET a to grass reference evapotranspiration (ET r ), and the VIs tested in this study included the Normalized Difference Vegetation Index (NDVI), Soil Adjusted Vegetation Index (SAVI), Enhanced Vegetation Index (EVI), Normalized Difference Water Index (NDWI), and Land Surface Water Index (LSWI). NDVI performed better under the study conditions. The MSDF-ET-derived ET a showed strong correlations against measured ET a , UAV- and Landsat 8-based METRIC ET a . Also, visual comparison of the MSDF-ET ET a maps was indicative of a promising performance of the method. In sum, the resulting ET a had a higher spatial resolution compared with thermal-based ET a without the need for the Albedo and hot/cold pixels selection procedure. However, wet soils were poorly detected, and in cases of continuous cloudy Landsat pixels the long interval between the images may cause biases in ET a estimation from the MSDF-ET method. Generally, the MSDF-ET method reduces the need for very high resolution thermal information from the ground, and the calculations can be conducted on a moderate-performance computer system because the main image processing is applied on Landsat images with coarser spatial resolutions.


Introduction
The world population is projected to reach 9.7 billion by 2050 [1]. Satisfying the food demand of such a population is a big challenge. FAO estimated that food production needs to double by 2050, which would require a massive amount of water. The challenge is that water is already scarce and agriculture alone accounts for more than 70% of total freshwater withdrawals [2]. Therefore, there is an urgent need to produce more food with less water, which can only be achieved by improving water use efficiency in the largest water consuming sector. Current remote sensing technologies provide an opportunity to monitor water consumption over large areas in a cost-effective way. They offer an important decision support tool with lots of potential for growers and stakeholders.
(NDVI) [35] is the most commonly used VI, and has been further improved as the Soil Adjusted Vegetation Index (SAVI) [36] and Enhanced Vegetation Index [37] to remove the effects of soil background and atmosphere, respectively. Also, since the Shortwave Infrared (SWIR) portion of the spectrum can relate surface moisture to optical reflectance [38], two water indices, namely, the Normalized Difference Water Index (NDWI) and Land Surface Water Index (LSWI) [39,40], have been used for detecting moisture variability of surface. It is speculated that these indices may be beneficial for monitoring surface ET a rates.
The objective of this study was to exclude thermal camera requirements for UAVbased ET a mapping using the modified TsHARP algorithm specifically designed for ET a sharpening, herein referred to as the multi-sensor data fusion approach for ET estimation (MSDF-ET). To achieve this goal, a UAV equipped with multispectral and thermal cameras and Landsat 8 images were used in consideration of the following steps: (1) perform Landsat 8-based ET a calculation using the METRIC algorithm and subsequently the reference ET fraction (ETrf); (2) find a linear regression equation based on the relationship between different VIs and ETrf at the Landsat 8 resolution; (3) build a Bias image for each Landsat 8 overpass; (4) apply the algorithm to UAV images and calculate the ETrf at a very high spatial resolution; (5) visually and statistically evaluate the results against ground observed, UAV-, and Landsat 8-based METRIC ET a .

Study Area
The study was conducted in Corning, Northern California, U.S.A. (Figure 1). The climate in Northern California is Mediterranean with long hot summers and mild winters. The long-term (10 years) variations of grass reference evapotranspiration (ET r ) in the area (from GRIDMET [41], available on Google Earth Engine) showed that ETr changes between 0.9 and 9.7 mm·day -1 ( Figure 2). In 2018, three ET flux towers were installed in three adjacent young almond orchards of different ages (1st (Field 1), 2nd (Field 2), and 3rd leaf (Field 3)) irrigated with micro-sprinklers systems. . Auxiliary data include air temperature and relative humidity, soil water content, and soil thermocouples at 5 cm depth, and a tipping rain gauge. Each orchard had enough fetch to accurately estimate ET using each flux tower. Sensible heat flux (H) was quantified using the eddy covariance method from data collected using a 3D sonic anemometer and fine-wire thermocouples at a sampling frequency of 10 Hz. Soil heat flux (G) was measured using three soil heat flux plates and soil thermocouples, and adjusted for moisture content using three capacitance sensors (ECH2O EC5) on the top 5 cm depth. ET a was calculated as the residual of the energy balance equation (λET = R n − G − H) and averaged over 30 min.

Datasets
The flux footprint of the towers were predicted based on Kljun et al. [42] using the online data processing (http://footprint.kljun.net/, accessed on 5 June 2021). Accordingly, pixels within the footprint area were used to optimally compare the EC data against remotely sensed ET a.   Earth Engine (Table 1). Cloud cover and water bodies were removed using the quality 160 band provided for the surface reflectance product. Landsat 8 contains 11 bands, 7 of which 161 are commonly used for agricultural purposes (Table 2) and cover visible (VIS), near infra-162 red (NIR), and SWIR portions. An area was clipped from the images so that it mostly 163 represented agricultural fields and was big enough to contain more than 60,000 pixels for 164 extracting the LST-VI relationship, while also being small enough not to capture heights 165 where temperature drops drastically. Image processing was conducted using MATLAB.

Remotely Sensed Data
Satellite: Four Landsat 8 surface reflectance products were pre-processed in Google Earth Engine (Table 1). Cloud cover and water bodies were removed using the quality band provided for the surface reflectance product. Landsat 8 contains 11 bands, 7 of which are commonly used for agricultural purposes (Table 2) and cover visible (VIS), near infrared (NIR), and SWIR portions. An area was clipped from the images so that it mostly represented agricultural fields and was big enough to contain more than 60,000 pixels for extracting the LST-VI relationship, while also being small enough not to capture heights where temperature drops drastically. Image processing was conducted using MATLAB. UAV: Three flights (Table 1) were conducted over the three almond fields using a quadcopter (DJI Matrice 210) equipped with a high resolution multispectral and thermal sensor (Micasense Altum) and an RGB camera (Zenmus X4S). Raw images were preprocessed in Agisoft Metashape software. The images were ortho-rectified and layerstacked to achieve single TIFF files containing Blue, Green, Red, Red-edge, Infra-Red and thermal bands (Table 2). Subsequently, further processes were done in ENVI 5.3.
Comparison: Landsat 8 and UAV images covered almost the same portions over VIS/NIR, with small differences in bandwidths. Their spatial resolution differed significantly, with 30 m fixed for Landsat 8 compared to 2.65 and 3.23 cm for UAV, depending on flight altitude (Table 3). To achieve an orthomosaic of good quality, images were captured with 90% overlap. A calibrated reflectance panel was used before and after each flight to convert raw pixel values into reflectance. UAV images were acquired between 11:00 AM and 1:30 P.M (Table 3) to reduce the shading effect and to match as much as possible LandSat passage (around 11:38 am in local time).

Meteorological Data
The California Department of Water Resources (DWR) manages over 145 automated weather station in California under the California Irrigation Management Information System (CIMIS) program. CIMIS is a joint program between DWR and the University of California, Davis, assisting researchers utilize meteorological data to precisely manage California's water resources and crop water use.
The meteorological data used in the algorithms were obtained from the Gerber South station (Station id: 222) situated at 40 • 1 48 N and 122 • 9 36 N, comprising solar radiation (R s ), vapor pressure, air temperature (T a ), relative humidity, dew point, and wind speed. The hourly data were obtained specifically for the date of Landsat 8 overpasses and UAV flights and 10 years of daily data for ET 0 calculation.
The five-year annual precipitation was 416.1 mm; however, 2019, where two of our flights were conducted, was a wet year with 650.9 mm precipitation. The 2019-2020 time series of daily precipitation is shown in Figure 3.  and UAV flights and 10 years of daily data for ET0 calculation.

203
The METRIC model was developed by Allen et al. [8] and validated in Allen et al. 204 [9]. METRIC has proven to demonstrate promising accuracy calculating Landsat [43][44][45][46] 205 and UAV [10,46] images. For this reason ETa from remotely sensed data were chosen to 206 be calculated using this model in this study. 207 The ETa calculation process was almost the same for both Landsat 8 and UAV images, 208 with the slight differences pointed out during the model description in the current section. 209 Both sets of data required hot and cold pixel identification in order to establish a relation-

Mapping EvapoTranspiration at High Resolution with Internalized Calibration (METRIC)
The METRIC model was developed by Allen et al. [8] and validated in Allen et al. [9]. METRIC has proven to demonstrate promising accuracy calculating Landsat [43][44][45][46][47] and UAV [10,46] images. For this reason ET a from remotely sensed data were chosen to be calculated using this model in this study.
The ET a calculation process was almost the same for both Landsat 8 and UAV images, with the slight differences pointed out during the model description in the current section. Both sets of data required hot and cold pixel identification in order to establish a relationship between LST and the difference between T a and LST (dT0 for iteratively finding the corresponding coefficients, "a" and "b" (dT = aLST + b) by considering the stability conditions of the atmosphere. The energy required for ET a (λET) was obtained as the residual of the energy balance equation: where R n was calculated by determining the shortwave and longwave radiation intercepting the ground and reflected and emitted from the ground: where R s is the incoming shortwave radiation assuming clear sky conditions; R L ↓ and R L ↑ are the incoming and outgoing longwave radiations, respectively, as the function of air temperature and LST to the fourth power (the Stefan-Boltzman law); ε 0 is the surface emissivity, described in terms of vegetation cover; and α is the surface albedo, which had different calculation procedures for satellite (Equation (3) [48]) and UAV (Equation (4) [49]) images: where ω b is the weight of each band in the Landsat 8 spectral range; b is the reflectance of each band; VIS is the reflectance corresponding to the visible bands, and NIR is the reflectance of the Near Infrared band of the UAV. An NDVI greater equal than 0.25 was considered vegetation and lower than 0.25 bare soil. G was calculated based on the empirical formulation presented in [50]: H was iteratively identified using the stability conditions of the pixels under both Landsat 8 and UAV conditions. The heat gradient was determined by the heat transfer formula of Newton's law divided by the aerodynamic resistance (r ah ).
where a is the air density; Cp is the specific heat of vaporization; r ah s the aerodynamic resistance; K is the Von Karman's constant (0.41); u * is the friction velocity; z is the reference height (2 m above the ground); z oh is the roughness length for heat flux (0.1 m); ψ m and ψ h are the stability corrections for momentum and heat transport, respectively; u 200 is the wind speed at a height where the surface does not affect the wind function (200 m above the ground); and z 0m is the roughness length for momentum flux. The Equation (1) output was instantaneous λET (W·m −2 ) converted to ET a,inst (mm·h −1 ). Hence, in order to achieve daily ET a (ET a,daily (mm·day −1 )), the following approach was taken: ET a,daily = ETrf × ET r,daily (10) where ET r,inst and ET r,daily are the hourly and daily reference evapotranspiration, respectively, calculated from the FAO Penman-Monteith equation [51], and ETrf is the reference ET fraction.

Modified TsHARP Method: The Multi-Sensor Data Fusion Model for Actual Evapotranspiration Estimation (MSDF-ET)
The TsHARP method was originally developed to sharpen the MODIS LST products to the Landsat resolution based on the relationship between LST and VIs [21]. This has proven to be a reliable method for downscaling thermal information [52], focusing on sharpening the instantaneous LST; however, [53] pioneered the use of daytime and nighttime LST of the MODIS onboard both Terra and Aqua for downscaling the 24-h LST to the Landsat resolution in order to calculate potential ET. In this study, the same concept was adapted to directly enhance the resolution of Landsat 8-based ET a to acquire very high resolution maps of crop water use.
One of the prominent factors affecting the magnitude of ET a is, undeniably, the density of vegetation covering the ground. Therefore, it is presumed that a strong linear relationship exists between ET a and VIs calculated from the VIS-IR bands of Landsat 8. However, considering that it would not always be possible to acquire UAV images coinciding with the Landsat 8 overpass, and ET a is not transferable from one day to another, ETrf was employed for linear interpolation between two consecutive Landsat 8 overpasses to acquire an image coinciding with the UAV flight [54]. Since ETrf gives specific information of each pixel, and relative changes in weather are identified from ET r (and ET a changes proportionally to ET r [54]), this approach was taken into account for ET a interpolation in order to achieve its monthly variations [43].
The MSDF-ET method consisted of the following steps ( Figure 4): proportionally to ETr [54]), this approach was taken into account for ETa interpolation in 261 order to achieve its monthly variations [43].   1. Different VIs (Table 4) were calculated from different bands of Landsat 8 including NDVI, SAVI, EVI, NDWI, and LSWI. The reason for selecting these VIs was mainly due to the emanations from the different bands used in their formulation (Blue, Red, NIR, SWIR1, and SWIR2), as well as the soil correction coefficient used in SAVI, because the study area was an orchard exposing a high soil surface area, and atmospheric correction in EVI.

2.
To find the most suitable VI for the method, the ETrf-VI relationships were investigated before proceeding with the MSDF-ET application on ETrf estimation.

3.
The most suitable VI was selected and a linear relationship was established for each Landsat 8 overpass to find the slope (a) and intercept of the equation (b): ETrf LC08 = a × VI LC08 +b (11) where ETrfLC08 is the ETrf calculated from the METRIC algorithm using the Landsat 8 multispectral/thermal bands, and VILC08 is the most suitable VI calculated from Landsat 8 multispectral bands.

4.
Once "a" and "b" were found, a Bias image for each overpass was constructed to suppress the effects of non-vegetation phenomena on the ETrf variations: where ETrf LC08,unc is the uncorrected ETrf calculated from "a" and "b", which were obtained from step 3 (Equation (11)) and Bias is the image containing the effects of non-vegetation objects covering the ground.

5.
Having calculated the Bias image, an uncorrected ETrf was calculated using the multispectral bands of the UAV images (ETrf UAV,unc ): 6.
In the end, the Bias image was resampled and applied to ETrf UAV,unc to achieve the true ETrf of the UAV images without the help of UAV-based thermal bands: ETrf UAV , MSDF-ET = Bias + ETrf UAV,unc

Model Evaluation
The model was evaluated in four steps: 1.
Directly evaluating the results with the ET a measured using the three EC towers installed in the almond fields.

2.
A pixel-by-pixel evaluation of the model against ET a calculated from the METRIC model using the UAV images.

3.
A pixel-by-pixel evaluation of the model against ET a calculated from the METRIC model using the Landsat 8 images.

4.
Visual interpretation of the ET a maps.

Statistical Analysis
The statistical parameters consisted of the coefficient of determination (R 2 ) showing the correlation between parameters and the Root Mean Squared Error (RMSE) indicative of the dispersion of the points from the 1:1 line.
where RSS is the sum of squares of residuals, TSS is the total sum of squares, P j is the predicted value, O j is the observed value and n is the number of data.

Against the Measured Data
METRIC was applied to both satellite and UAV (multispectral/thermal) images. The results were evaluated against EC data ( Figure 5). In this section, we made an effort to discuss the reasons behind the errors that occurred in both UAV and Landsat 8 results. The differences in R 2 values were not significant (R 2 = 0.74 for Landsat 8 and R 2 = 0.77 for UAV images); however, the dispersion of the points from the 1:1 line in UAV images were greater than the Landsat 8 results (RMSE = 1.23 mm/day for UAV against RMSE = 0.81 mm/day for Landsat 8). Mean error (ME) for the Landsat 8 and UAV ETrf were −0.36 and −0.97, respectively, which is indicative of a more severe underestimation in the UAV results. Generally, the underestimation in UAV-based ET a METRIC may have mainly resulted from the lack of a well-watered cold pixel in the UAV's field of view. By choosing a cold pixel being under conditions, we were in fact considering the maximum ET a at a rate lower than its true value; therefore, ET a , which is linearly changing between 0 and maximum ET a , was subsequently underestimated, which was the reason for the higher RMSE value in the UAV results. Hence, Landsat 8 results showed a lower error compared with the UAV. The Landsat 8 data point distribution in the scatter plot was rooted in several sources such as: bigger coverage of the Landsat 8 pixels in terms of the EC tower coverage, atmospheric correction not being 100 percent precise, a small error in the geometric correction of the images, among others. Since the UAV images were more accurate regarding the capture of surface reflectance and spatial accuracy, the points were more in correlation with the trend line. However, two points (highlighted by a red rectangle in Figure 5) diverged from the other data. These points were those in Field 3 (3 rd leaf), with the highest ground cover fraction, where shadows covered the ground considerably. Since the UAV-based ET a METRIC was calculated after removing the shadows over bare soil, the ground cover fraction in this field increased; therefore, it caused ET a to rise more than expected. The shadow effects were only significant on 24 September 2019, and 9 August 2020, which is why ET a on the 1 July 2019 was not influenced by the shadow removal procedure.  Before jumping straight to ETa calculation, the relationship between ETrf and each VI 339 were comprehensively discussed. Four Landsat 8 overpass dates used in this study (Table   340 1) were considered in the evaluation process ( Figure 6). The water bodies were removed 341 from the images, hence the oversaturated points (red elliptical figures) were those pixels 342 that captured irrigated areas, as well as a few ponding surfaces such as pools and swamps.

Correlation with the VIs
Before jumping straight to ET a calculation, the relationship between ETrf and each VI were comprehensively discussed. Four Landsat 8 overpass dates used in this study (Table 1) were considered in the evaluation process ( Figure 6). The water bodies were removed from the images, hence the oversaturated points (red elliptical figures) were those pixels that captured irrigated areas, as well as a few ponding surfaces such as pools and swamps. Precipitation in the first half of 2019 was considerably higher than the five-year average (569.5 mm (Figure 3) against 257 mm), which was logically accounted for by the greater amount of moisture content stored in the soil; therefore, this may have been one of the reasons for an increase in ETrf in 2019 and in particular on the 27 June 2019.
Outliers caused by oversaturated bare soil pixels (red figures), which accounted for evaporation rather than transpiration, were observed in the VIs computed from the Red and NIR bands, including NDVI, SAVI, and EVI. Such indices have shown to be better correlated with crop transpiration compared with evaporation from the soil, but needless to say that separating transpiration from evaporation from ETrf would not be feasible over a regional scale. Also, the negative values of ETrf commonly occurred under waterstressed bare soil conditions where H was overestimated. These pixels were converted to 0, indicating very low evaporation from the soil. On the other hand, in optical remote sensing, the shortwave infrared (SWIR) portion of the spectrum is the most sensitive to moisture changes in the surface [55], which is why LSWI and NDWI showed more promising correlation against ETrf variability. While oversaturated pixels possessed ETrf > 0.7, they had low values of NDVI, SAVI, and EVI; however, NDWI and LSWI remained high in values over such pixels. Therefore, R 2 in the ETrf-NDWI and ETrf-LSWI relationship was frequently higher than ETrf-NDVI, ETrf-SAVI, and ETrf-EVI in all the Landsat 8 overpasses.
Nevertheless, sensors onboard UAVs commonly lack the ability to capture the SWIR portion of the spectrum, and Micasense Altum was not exceptional. Hence, it would not be possible to benefit from the ETrf-NDWI and ETrf-LSWI relationship under the circumstances of this study; however, the next best VI was NDVI, showing a linear relationship with the average R 2 of 0.72. Cihlar et al. [31] found an R 2 of about 0.6 between ET a and NDVI. In this study, due to the better correlation, NDVI was selected for further calculations. All in all, UAVs containing SWIR bands are recommended for determining water indices such as NDWI and LSWI.

Against Measured ET a
ET a calculated using the MSDF-ET method against EC data was investigated and the results were promising (Figure 7). R 2 was not higher than the METRIC method (0.68 against 0.77); however, the ET a variability was strongly aligned with the measured data. RMSE was slightly lower than the METRIC-based ET a , representative of the lesser dispersion of the points. Generally, the results showed that the method can be relied upon.

Against UAV METRIC ET a
The MSDF-ET method was also evaluated against the METRIC algorithm obtained using UAV thermal/multispectral images. ETrf of 9 August 2020 are illustrated as sample correlations over the three fields of almond ( Figure 8). First, ETrf from the MSDF-ET was upscaled, then each pixel was evaluated against its corresponding pixel in the METRICbased ETrf image. The correlations were relatively strong (R 2 of 0.62, 0.65, and 0.67 for Field 1, 2, and 3, respectively) with promising RMSE values (0.14, 0.13, and 0.15 for Field 1, 2, and 3, respectively).

UAV-and Landsat 8-Based NDVI Comparison
NDVI calculated from UAV and Landsat 8 were both averaged over each field. First, the noData pixels on the edges were removed and an inside area of interest was chosen to clip the images. Then, a simple aerial average of the NDVI was computed for both UAV and Landsat 8 and d against each other (Figure 9). UAV and Landsat had very similar NDVIs except for the 24th of September 2019 at Field 2 and Field 3. This discrepancy resulted from the fact that the irrigation system was running on both fields, making the wetting patterns clearly visible, especially on young orchards where most of the ground is exposed. Another reason is related to the extended shadowed area on both orchards created by the slight difference between UAV flights and Landsat overpass time (11:40 PST) ( Table 3). In the ET a evaluation from MSDF-ET (Figure 7), the differences were suppressed using the Bias image (Step 4 equation 13). Generally, further studies need to be conducted to investigate the precise sources of discrepancies in UAV and Landsat 8 vegetation indices, especially on partly covered fields.   12 An increase in pixel size limited to less than twice the smallest row spacing over an 13 orchard does not affect ETa accuracy [3]; however, it may impede its use for precision 14 agriculture, such as detecting diseases on a single tree, irrigation system malfunctioning, 15 over-irrigation or water stress at a microscale, ponding locations, and so on. Figure 6   16 shows the difference in Landsat 8 and UAV spatial resolution on the 1st of July 2019 cor- 17 responding to the Field 1, where shadow was at its lowest. In the UAV image (Figure 10a), 18 each tree could be precisely visualized, and the well-grown trees were easily distinguish- 19 able. On the other hand, the Landsat 8 orchard illustration (Figure 10b) where all the pixels 20 were mixed, containing different ground covers, was only limited to an estimation of well-21 grown and -watered areas, with the size of 30 × 30 m 2 . Also, the range of ETa was lower 22 over the field in the Landsat 8 image compared with the UAV due to the mixture of objects 23 (soil and vegetation) in each pixel.

Differences in Spatial Resolution (UAV vs. Landsat 8)
An increase in pixel size limited to less than twice the smallest row spacing over an orchard does not affect ET a accuracy [3]; however, it may impede its use for precision agriculture, such as detecting diseases on a single tree, irrigation system malfunctioning, over-irrigation or water stress at a microscale, ponding locations, and so on. Figure 6 shows the difference in Landsat 8 and UAV spatial resolution on the 1st of July 2019 corresponding to the Field 1, where shadow was at its lowest. In the UAV image (Figure 10a), each tree could be precisely visualized, and the well-grown trees were easily distinguishable. On the other hand, the Landsat 8 orchard illustration (Figure 10b) where all the pixels were mixed, containing different ground covers, was only limited to an estimation of well-grown and -watered areas, with the size of 30 × 30 m 2 . Also, the range of ET a was lower over the field in the Landsat 8 image compared with the UAV due to the mixture of objects (soil and vegetation) in each pixel.

Visual Interpretation (MSDF-ET vs. METRIC UAV)
By visually analyzing the output ET a from both METRIC and MSDF-ET methods (Figure 11), it was recognized that the model could properly distinguish the spatial differences in ET a and was suitable for ET a mapping over the almond orchard. Due to higher spatial resolution of the MSDF-ET method, the field was mapped with more details. The canopy covers and the spaces between rows and trees were more distinguishable. Even the ET a over a single tree showed more details.

Soil Moisture Detection (MSDF-ET vs. METRIC UAV)
The performance of the MSDF-ET and METRIC methods were visually investigated for detecting soil moisture and subsequently irrigated areas over soil at the UAV scale ( Figure 12). The UAV flight over Field 2 on 1 July 2019 was captured during an irrigation event. The northern part of the field received excessive water that caused runoff causing extra water discharged to the road between the fields (red rectangle in Figure 12a). Irrigated water can be clearly distinguished in the true color image, and the METRIC method also presented significantly higher values of ETrf over the saturated soil (light blue pixels in the METRIC image, Figure 12b). However, the MSDF-ET method was not able to properly detect the wetted soil (Figure 12c), which was a disadvantage to this method. Also, both approaches were successful in distinguishing the weeds between rows, but with more details in the MSDF-ET image due to higher spatial resolution.

Shadow Effects (MSDF-ET vs. METRIC UAV)
Shadows may cause bias in ET a obtained from high spatial resolution imagery [56]. The UAV flight on 24 September 2019 was captured in the afternoon, which caused several shadowy pixels over the three fields. The Field 2 true color (Figure 13a), NDVI (Figure 13b), ET a from MSDF-ET (Figure 13c), LST (Figure 13d), and ET a from METRIC (Figure 13e) images are shown as samples of which shadows were visually distinguished. The pixels superimposed over shadows resulted in lesser values of LST. The lower the LST, the weaker the longwave radiation emitted from the ground based on the Stefan-Boltzman law. Therefore, the R n and subsequently ET a of the shadowy pixels (blue pixels in Figure 13e) rose above the ET a compared with the pixels over vegetation, which was unusual. In these cases, we were obliged to remove the shadows in order to calculate ET a with better reliability, which still inevitably forced biases, however small, in the ET a maps. However, NDVI was only slightly affected by shadow, so subsequent MSDF-ET-based ET a was calculated even over shadowy pixels. Zhang et al. [57] tested the effects of shadow on different vegetation indices calculated using a hyperspectral spectroradiometer and found that NDVI was the least affected index. Therefore, the use of NDVI in the MSDF-ET method was beneficial for attenuating the effects of shadow on ET a calculation.
with better reliability, which still inevitably forced biases, however small, in the ETa maps. 464 However, NDVI was only slightly affected by shadow, so subsequent MSDF-ET-based 465 ETa was calculated even over shadowy pixels. Zhang et al. [57] tested the effects of shadow 466 on different vegetation indices calculated using a hyperspectral spectroradiometer and 467 found that NDVI was the least affected index. Therefore, the use of NDVI in the MSDF-468 ET method was beneficial for attenuating the effects of shadow on ETa calculation.

Advantages and Disadvantages of the MSDF-ET Method
Advantages: 1.
The foremost advantage was the higher spatial resolution of the resulting ET a maps.

2.
The thermal image was excluded, which may result in a less expensive device for ET a mapping.

3.
Albedo calculation procedure using UAV images was removed, as these images usually suffer from the lack of bands covering the shortwave infrared portion of the spectrum, inevitably causing errors in ET a maps.

4.
Finding a well-watered vegetation or a fully dry surface in the limited UAV's field of view has been always challenging, and the MSDF-ET method removed the need for hot and cold pixel selection.

5.
Due to METRIC-based ET a calculations applied to Landsat 8 images, the process was not heavy and could be executed on a low-performance laptop. 6.
Instead of sharpening LST to the UAV resolution, which was not transferable from one date to another, ETrf was used to omit this limitation and apply the MSDF-ET method to UAV images not captured in the Landsat 8 overpasses. Disadvantages: 1.
The wet soils could not be clearly distinguished compared with the METRIC method.

2.
In cases of cloudy Landsat images, the interval between two consecutive Landsat overpasses is increased, and subsequently the accuracy decreases (however the launch of Landsat 9 would immensely reduce the risk).

Conclusions
This study focused on a new method for calculating ET a from UAV multispectral images without the use of thermal information from the surface. The method was a modified version of the TsHARP algorithm, which sharpens the MODIS LST product to Landsat resolution; however, Landsat 8-based ETrf was sharpened instead of LST in order to make the method feasible for achieving ET a maps of flights captured in between Landsat overpasses. The method in this study was referred to as MSDF-ET. The statistical and visual evaluation of the method showed promising results. The correlation between MSDF-ET ET a against EC data collected over three almond fields was reliable, with an R 2 of 0.68 and RMSE of 1.19 mm/day. Further evaluations against UAV-and Landsat 8-derived ET a showed slightly lower correlations, with the error partly resulting from upscaling the pixels. The visual interpretation was indicative of the proper performance of the method. In the MSDF-ET method, ET a over trees was shown in detail and shadows had negligible effects on ET a values, while on the other hand ET a using a UAV derived from the METRIC algorithm was significantly affected by the shadows. The lower sensitivity of the MSDF-ET method to shadows resulted from the slight effects of shadow on the NDVI. By using this method, the need for a thermal camera onboard a UAV was reduced. UAV images do not usually capture well-watered vegetation or fully dried surfaces due to the limited area for image capturing; however, the MSDF-ET method eliminated the hot/cold pixel selection procedure of UAV images. Also, UAVs commonly suffer from a lack of shortwave infrared cameras for Albedo calculation, which was also removed in the MSDF-ET method. Needless to say, the resulting ET a maps from multispectral images possessed higher spatial resolutions compared to the ET a maps from LST images (METRIC algorithm). In general, the methods such the one presented in this study focusing on ET a mapping using only multispectral images may significantly reduce operational and investment costs due to removing the need for thermal cameras.