Actual Evapotranspiration Estimation Using Sentinel-1 SAR and Sentinel-3 SLSTR Data Combined with a Gradient Boosting Machine Model in Busia County, Western Kenya

: Kenya is dominated by a rainfed agricultural economy. Recurrent droughts inﬂuence food security. Remotely sensed data can provide high-resolution results when coupled with a suitable machine learning algorithm. Sentinel-1 SAR and Sentinel-3 SLSTR sensors can provide the fundamental characteristics for actual evapotranspiration (AET) estimation. This study aimed to estimate the actual monthly evapotranspiration in Busia County in Western Kenya using Sentinel-1 SAR and Sentinel-3 SLSTR data with the application of the gradient boosting machine (GBM) model. The descriptive analysis provided by the model showed that the estimated mean, minimum, and maximum AET values were 116, 70, and 151 mm/month, respectively. The model performance was assessed using the correlation coefﬁcient ( r ) and root mean square error (RMSE). The results revealed a correlation coefﬁcient of 0.81 and an RMSE of 10.7 mm for the training dataset (80%), and a correlation coefﬁcient of 0.47 and an RMSE of 14.1 mm for the testing data (20%). The results are of great importance scientiﬁcally, as they are a conduit for exploring alternative methodologies in areas with scarce meteorological data. The study proves the efﬁciency of high-resolution data retrieved from Sentinel sensors coupled with machine learning algorithms, focusing on GBM as an alternative to accurately estimate AET. However, the optimal solution would be to obtain direct evapotranspiration measurements.


Introduction
Water fluxes are fundamental for many theoretical, practical, and applied disciplines of climatology, hydrometeorology, and agriculture. Thus, the estimation and quantification of actual evapotranspiration (AET) are equally indispensable in realizing the global, regional, and country-specific 2030 17 sustainable development goals of the common agenda of the UN. The sustainable development goals (SDG) include SDG1-no poverty, SDG6-clean water and sanitation, SDG13-climate action, and SDG2-zero hunger [1]. Moreover, the Food and Agriculture Organization (FAO) SDG indicator 6.4.1-the change in water use [2] efficiency, as well as 6.4.2-the level of water stress [3] over time, is essential for several sectors related to the global economy, such as agriculture, industry, mining, and power production. In this context, agriculture services represent the most significant share of water consumption [4], with a higher percentage in the semiarid and Mediterranean climatic regions [5]. In this regard, AET estimation is crucial for adequately implementing sustainable agricultural systems and achieving food security in developing countries.
The importance of actual evapotranspiration cannot be undervalued, particularly in light of the current global challenges such as food insecurity and climate change. It is a major hydrological cycle component [6], along with precipitation, depicting the primary constituents of the surface energy budget [7]. Moreover, it is a crucial component of regional and global environmental phenomena associated with meteorological, agricultural, and hydrological applications [8][9][10]. Not only do its changes influence precipitation, streamflow, and surface temperature, among other hydro-climatological variables [11,12], but it also plays a vital role in the climate system, coupled with water, carbon, and energy cycles [13]. Despite the creation of global initiatives aiming to directly measure AET, such as the FLUXNET project [14], short and inconsistent field surveys exist. In view of this, previous studies on machine learning applications have progressively been used in various environment-related fields. A study by Malik et al. [15] demonstrated the potential of the gradient boosting machine (GBM) model for pan-evaporation process prediction in Iran and India. In addition, Shrivastav and Jha [16] successfully used the GBM to explore the effects of temperature and humidity on COVID transmission. Furthermore, Frey [17] demonstrated that the GBM has a significant predictive performance in natural resource management aiming to enhance ecological sustainability. Hailstorm prediction and forecasting and severe weather forecasting have been performed efficiently, proving the suitability of the GBM and other machine learning models [18,19]. An accurate estimation of AET remains a challenging scientific problem [20][21][22][23] due to the associated prediction uncertainty when quantifying the actual evapotranspiration [14,24]. Although the gradient boosting machine has been used extensively in this field, limited research has been conducted to investigate its efficiency for AET estimation, which is the aim of the present study.
It is also significant that studies have extensively explored these differences, revealing significant uncertainties in AET estimations using various modeling techniques and approaches [25][26][27][28]. The authors of [29] provided insight regarding the vast disparities in evapotranspiration assessment using multiple theoretical methods compared to other global variable uncertainties, many of which are retrieved using satellite remote sensing systems [30][31][32]. Nevertheless, [33] found that AET products show the lowest uncertainties in the case of the LSM (land surface model) and moderate uncertainties when using a moderate resolution imaging spectroradiometer (MODIS), with the highest uncertainties observed when using the water budget approach. However, in recent years, remote sensing has made significant progress in estimating and assessing AET variation over time and space. Studies have employed remote-sensing-based models in AET-related studies. Using the surface energy balance index (SEBI), two-source model (TSM), surface energy balance algorithm for land (SEBAL), surface energy balance system (SEBS), Eta mapping algorithm (ETMA), and atmosphere-land exchange inverse model (ALEXI), many researchers have successfully assessed and predicted AET spatiotemporal variation in different regions across the globe [34][35][36][37][38][39][40]. In the same context, [41] suggested a more sophisticated analysis aiming to reduce the range of uncertainty in observation-based AET estimations based on a combination of the remote sensing and machine learning tools discussed in the present study.
In practice, combining remote sensing data and machine learning models can efficiently improve water flux balance modeling and management strategies to create a more sustainable future. This can be explained by the spatiotemporal variability in water fluxes, which is highly influenced by the heterogeneity of the land surface, topography, lithology climate, meteorological conditions, soil moisture content characteristics, and vegetation vigor and density [42,43]. As evapotranspiration represents a vital component of the hydrosphere and atmosphere, this induces complex land-atmosphere feedback processes and drivers. Recent developments achieved by the European Space Agency (ESA), such as the Sentinel-1 SAR (synthetic aperture radar) and Sentinel-3 sea and land surface temperature radiometer (SLSTR), are revolutionary in terms of their provision of free data access to the public. It is worth noting that SLSTR involves the acquisition of TIR (thermal infrared) data [44] and its comparison with the NDVI (normalized difference vegetation index) tem-Atmosphere 2022, 13, 1927 3 of 19 poral variation [45], which is an objective of this present research study. Remotely sensed data retrieved from Sentinel sensors are the prerequisites for the potential application of spectral and spatial-temporal characteristics [46] in agriculture [45,47].
Studies on AET estimation and quantification have gained momentum in recent years. For instance, [48] found that Sentinel-2 and Sentinel-3 data offer the most essential and suitable spectral information required for AET estimation, despite the remarkable differences in the spatial resolution from 10 m to 1 km. Furthermore, various projects, such as the Sen-ET project (https://www.esa-sen4et.org/, accessed on 5 July 2022), have demonstrated that the high-spatial-resolution remotely sensed data (10-60 m) retrieved from Sentinel-2 and medium-spatial-resolution (1 km) thermal data captured by Sentinel-3 produce reliable AET estimates with great accuracy. Similarly, [10] successfully estimated AET using multispectral data, proving the potential of remote sensing for drought monitoring studies. Although remotely sensed data have been widely and extensively used for earth observation, limited research has been conducted in Kenya through the combination of machine learning and remotely sensed data and the investigation of its efficiency in AET estimation and prediction. In this context, to ensure the continuous monitoring of water resource consumption and balance, a combination of variables, especially VH, VV, VV−VH, and VH/VV, are of interest since they have rarely been used in previous studies [45]. These variables, combined with Sentinel-3 data such as the NDVI and land surface temperature (LST), therefore, can provide valuable spectral information associated with AET, unlike other processes, such as conventional pan-evaporation and lysimetric and eddy covariance, which require enormous datasets acquired from field campaigns [49]. The scarcity of meteorological data has led to an urgent need to explore alternative approaches for estimating AET in regions. Therefore, this study is of critical scientific benefit because it provides the basis for utilizing up-to-date alternative approaches and methodologies and integrating them with a machine learning model GBM for AET prediction in local and regional areas with insufficient meteorological data. Furthermore, the study demonstrates the potential of remote sensing to estimate actual evapotranspiration. The proposed approach provides a foundation for future local and regional research applications. This is because evapotranspiration estimates are fundamental parameters for water balance modeling, an essential aspect of Kenya's dominated rainfed agriculture, which is its economic mainstay [50,51].
In Kenya, remote sensing for the estimation of AET is critical for drought monitoring [52] since the high evapotranspiration potential leads to hydric stress and, consequently, lower crop yields [50]. Our study establishes a basis for research in other regions in Kenya, because around 89% of its total landmass (29 out of 47 counties) is influenced by arid and semiarid climates [53]. These areas are prone to water scarcity, a global challenge recognized by the UN that has led to calls for action to manage water [54,55], as well as food insecurity due to low agricultural production, the low adaptive capacity of households, and high vulnerability to climate extremes, which have strong negative socio-economic impacts [56]. For instance, Sorre [57] indicated that the increased frequency and amount of temperature and precipitation anomalies have led to recurring droughts in Busia County. Deficiencies in precipitation and fluctuations in evapotranspiration influence water availability [58]. In this regard, our study aims to investigate the efficiency of the machine learning model, i.e., the GBM, combined with remotely sensed data retrieved from Sentinel-1SAR and Sentinel-3 SLSTR sensors in AET estimation and to determine the main variables influencing its spatial distribution over Busia County in Western Kenya.

Study Area
Kenya lies between the latitudes of 4.5 • N and 4.5 • S and longitudes 34 • E and 42 • E in Eastern Africa, covering 582,646 km 2 of the land surface, with a population of 47,564,296 million [59]. As Kenya is administratively divided into forty-seven counties, it has a diverse climate and is a prosperous country with geographical features such as the famous Great Rift Valley and iconic Mount Kenya, with a height of 5199 m above sea level, and Lake Victoria. Busia County, presented in Figure 1, is in the west and divided into seven administrative sub-counties: Funyula, Budalangi, Butula, Matayos, Nambale, Teso North, and Teso South, lying on latitudes of 0 • 27' to 38.7684" north and longitudes of 34 • 6' to 41.2632" east. It borders Bungoma to the north, Kakamega to the east, and Siaya to the southwest. The study area has a tropical climate with an average temperature of 22 • C and an average rainfall of 1691 mm annually [60]. It has an annual mean maximum temperature range of 26 • C to 30 • C [61] and a mean minimum temperature range of 14 • C to 22 • C [57,62]. Busia County experiences a bimodal rainfall distribution with an extended rainy season in April-May and a short rainy season in October [63]. It is also prone to flooding, specifically in the Budalangi Constituency, Teso North Sub-County, situated in the low-lying swampy zone [64,65]. The altitude varies from 1130 m on the shores of Lake Victoria to approximately 1500 m in Funyula and the North Teso Hills. Overall, the study area has a complex terrain along the Samia Hills, with the Kavirondo Rocks, granitic hills in Amukura, and Chelelemuk representing a conspicuous topographic stretch. Busia is characterized by sandy loam soils with dark clay domination in the northern and central parts, making it agriculturally prosperous [66], with diverse food and cash crops, including tobacco, cotton, maize, robusta coffee, sugarcane cultivation, and various horticultural crops [62].

Used Data and Processing
Sentinel-1 SAR ground-range-detected (GRD) data, acquired on 24 September 2021, and Sentinel-3 SLSTR Level 2 data, acquired on 29 September 2021, were downloaded from the Copernicus Open Access Hub Portal (https://scihub.copernicus.eu/dhus/#/home, accessed on 8 September 2022). Detailed information about Sentinel products is available online in the user guides [67]. A 30 m raster map of the actual evapotranspiration (mm) in Busia County in September 2021 was retrieved from the WaPOR (the FAO portal to monitor Water Productivity through Open access of Remotely sensed derived data) (https://wapor.apps.fao.org/, accessed on 8 September 2022). More details about the reference data can be found in the metadata file available on the WaPOR 2.1 official website. SLSTR, referring to the sea and land surface temperature radiometer, is a dual-scan temperature radiometer selected for the ESA Sentinel-3 mission in low Earth orbit as a part of the Copernicus Programme [67]. It provides a full range of applications related to earth observation, the most prominent of which are the sea surface temperature (SST) assess-

Used Data and Processing
Sentinel-1 SAR ground-range-detected (GRD) data, acquired on 24 September 2021, and Sentinel-3 SLSTR Level 2 data, acquired on 29 September 2021, were downloaded from the Copernicus Open Access Hub Portal (https://scihub.copernicus.eu/dhus/#/home, accessed on 8 September 2022). Detailed information about Sentinel products is available online in the user guides [67]. A 30 m raster map of the actual evapotranspiration (mm) in Busia County in September 2021 was retrieved from the WaPOR (the FAO portal to monitor Water Productivity through Open access of Remotely sensed derived data) (https://wapor.apps.fao.org/, accessed on 8 September 2022). More details about the reference data can be found in the metadata file available on the WaPOR 2.1 official website. SLSTR, referring to the sea and land surface temperature radiometer, is a dual-scan temperature radiometer selected for the ESA Sentinel-3 mission in low Earth orbit as a part of the Copernicus Programme [67]. It provides a full range of applications related to earth observation, the most prominent of which are the sea surface temperature (SST) assessment and land monitoring [68,69]. SLSTR products offer highly accurate global and regional sea and land surface temperatures (SST and LST) for climatological and meteorological applications. The Sentinel-3 mission provides images of a high frequency and resolution [70]. It has sufficient complexity to interpret data due to its dependence on many factors, such as moisture content, surface heterogeneity, and vegetation cover monitoring, among others. SAR comprises high-resolution returns of radar frequency energy from terrain illuminated by a sensor-generated directed beam of pulses. It monitors both geophysical and biophysical components [71]. The physical characteristics of the surface features include surface roughness, geometric structure, and digital elevation models [72].
Sentinel-1 ground-range-detected (GRD) data were co-registered, radiometrically calibrated, and then geometrically corrected using range Doppler terrain correction and filtered through the speckle effect using a three-by-three Lee filter [73]. Once the data were converted to a decibel scale, the ratio (VH/VV), the difference (VH-VV), and the radar vegetation index (RVI) were derived using the band math tool on the Sentinel Application Platform (SNAP). Once the Sentinel-3 SLSTR Level 2 data were co-registered, they were geometrically corrected. Then, the LST, TCWV, NDVI, and FVC features were extracted. After the LST values were converted from Kelvin to degrees Celsius ( • C), the products were resampled to 30 m, stacked, and clipped using ArcMap 10.3. The derived covariates are presented in Table 1. For the reference map, random sampling was conducted using the "create random points" function in ArcMap 10.3 manufactured by ESRI in San Diego, USA. As a result, 250 sampling points were created ( Figure 1). Then, we extracted the corresponding multiple values from the Sentinel-1 and Sentinel-3 derivative variables. Once the database had been created in ArcMap 10.3, it was imported to RStudio to train, calibrate, and test the gradient boosting model (GBM) [74]. Only 80% of the data were used for the training, while the remaining 20% were used for the testing. Figure 2 describes the flow chart of the study, the variables from the satellite image, and the gradient boosting machine estimation model. USA. As a result, 250 sampling points were created ( Figure 1). Then, we extracted the corresponding multiple values from the Sentinel-1 and Sentinel-3 derivative variables. Once the database had been created in ArcMap 10.3, it was imported to RStudio to train, calibrate, and test the gradient boosting model (GBM) [74]. Only 80% of the data were used for the training, while the remaining 20% were used for the testing. Figure 2 describes the flow chart of the study, the variables from the satellite image, and the gradient boosting machine estimation model.

Variable Description VH
Sigma naught (σ°) backscatter intensity in decibels (dB) VV Sigma naught (σ°) backscatter intensity in decibels (dB) Diff The difference between VH and VV (dB) ratio The ratio between VH and VV (dB)

Gradient Boosting Machine (GBM)
The 'gbm' R package was used to train and calibrate the model (RDocumentation). The gbm R package implements extensions to [76] the AdaBoost algorithm and Friedman's gradient boosting machine [77]. The AdaBoost algorithm trains a decision tree, whereby each observation is assigned an equal variable weight. After that, the weight of the difficultto-classify observations aims to improve the prediction, so that the final ensemble represents the weighted sum of the previous tree models. GBM is a predictive modeling algorithm that leads to the decision making of tree-like structures to reduce residual errors from the previous iteration [78]. Hence, it is highly competitive with random forest algorithms. In addition, boosting improves the trees' accuracy [79]. This machine learning algorithm was used because of its robust characteristics that produce better predictions than the simpler ones [80,81]. Several studies have used this model for applications such as sentiment classification [82], where GBM performed better than random forest. GBM has been used for predictive functions [83] in related clinical research and produces better results in cases of complex relationships. Khoi et al. [84] also demonstrated its good performance in predicting the water quality index. The model has further shown an exclusive potential to predict the impact of air quality on urban areas and an impressive performance in predicting pollution caused by human activities [85].
Furthermore, GBM is an ensemble-based model that can be used for regression and classification purposes for decision making [86] and builds on weak successive trees to improve the previous tree. It is reliable when fitting new models to produce accurate estimates [87]. GBM portrays superior results when combined with other techniques to minimize the prediction error. Its basic concept is presented as follows: Inputs: The input data are (x, y)N i=1 , where N i=1 is the sampling dataset. x = (x 1 , . . . , ) refers to the input variables, and y refers to the response variable.
Number of iterations M.
The choice of loss of functions is Ψ(y, f), where Ψ is the loss function, y is the response variable, and f is the function expressed as follows: (1) wheref(x) is the estimate or approximation function (predictive learning) of x, f(x) is the functional dependence andf(x) is the function estimate (predictive learning), and Ψ(y, f) is the loss function. The choice of base learner model is h (x, θ), a custom base learner function, which implies a node regression tree induced in a best-first manner.
Algorithm: Initializef(0), wheref is a function and (0) is a constant, the initial constant value prediction.
For t = 1 to M, do: Compute the negative gradient gt(x), where gt(x) is the negative gradient of the loss function associated with the whole ensemble.
Fit a new base-learner function h(x, θt), which is a simple parameterized function of the input variables x, and h is a regression tree.
Find the best gradient descent step size ρt: where ρt is the gradient descent. Update the function estimate. For the function estimate at the t th iteration, the optimization rule is, therefore, defined as: An updated model ensures framework overfitting, which is restrained by the end number of gradient boosting repetitions number.

Limitations of the Applied Datasets and Methodology
Data derived from different sensors with different properties, e.g., radiometric, spatial, and spectral resolutions, might lead to uncertainties during modeling, mainly because some preprocessing steps might lead to the loss of spectral characterization and information in the pixels. In addition, the machine learning model applied is predictive; therefore, it cannot generalize and outline the exact relationship between AET and the spectral information retrieved from Sentinel-1 SAR and Sentinel-3 SLSTR Level 2. These issues can only be resolved when a more extensive database size is accessible and additional variables are integrated into the model. Moreover, the machine learning model was semiautomatically calibrated. Future research will focus on the optimization algorithms for the model hyperparameters, which will reduce the estimation errors and further improve its accuracy. However, the present work satisfactorily demonstrates the applicability of the methodology used and elucidates the importance of machine learning in modeling hydrological and environmental processes using remotely sensed data.

Descriptive Statistics
A vast difference between a minimum value of 69.8 mm and a maximum value of 150.9 mm was revealed, indicating spatial variability in the actual evapotranspiration distribution in the study area, as shown in Table 2 and Figure 3. Moreover, the normality test demonstrated that the AET estimates were normally distributed, with a slight negative skewness of -0.24, which was further proved by the disparity between the mean of 115.8 mm/month and the median of 117 mm/month.  The data derived from Sentinel-1 and Sentinel-3 are shown in Figure 4 below. The NDVI varied from 0.26 to 0.70, while the FVC varied from 0.45 to 0.72. Overall, these variables showed a non-homogenous distribution pattern across Busia County, which agrees with [88], who revealed the remarkable influences of the NDVI and FVC on AET, in addition to topographical characteristics. This can be explained by the significant contribution of vegetation to the increase in the actual evapotranspiration due to the increased available energy absorbed by the canopy, as identified by Zhao et al. [89]. To further support this finding, Klisch and Atzberger [90] demonstrated the applicability of the NDVI derived from MODIS data for drought monitoring, since low estimated NDVI values indicate stressed-out vegetation, which is an indicator of drought occurrence in most scenarios [91]. Therefore, NDVI assessment constitutes the basis for early drought warnings [92]. In addition, the land surface temperature (LST) values ranged from 29.7 °C to 38.9 °C, while the TCWV estimates ranged from 33.6 kg/m 2 to 37.2 kg/m 2 . The LST and TCWV values were relatively high, directly contributing to high energy availability, indicating higher AET estimates, as suggested by [93]. Lower values indicate that the AET and LST negatively relate to air surface temperature changes [94]. The AET was found to be proportionally increased in regions with a high net solar radiation and air surface temperature, greatly influenced by the increase in evapotranspiration intensity, coupled with increased atmospheric evaporative demands, thus further increasing the frequency of droughts [58]. Nonetheless, compared to the tropics, the AET and LST have a positive relationship in high-altitude regions [95]. Although LST variation can be fundamental in selecting the wettest and driest pixels, as stated by Wang et al. [96], this may also introduce uncertainties and increase prediction errors. The authors of [97] found that Sentinel-1 and Sentinel-3 sensors can provide estimate values of the NDVI and LST to ascertain spatiotemporal vegetation dynamics, droughts, and water availability in water stress conditions, respectively, which are essential factors influencing, and significant driving forces of, the AET distribution. This is consistent with Arast et al. [39], who demonstrated that the NDVI, net solar radiation, and other meteorological parameters influence AET. The data from the two sensors are efficiently sufficient to derive the variables under investigation and explore their associations with AET. The data derived from Sentinel-1 and Sentinel-3 are shown in Figure 4 below. The NDVI varied from 0.26 to 0.70, while the FVC varied from 0.45 to 0.72. Overall, these variables showed a non-homogenous distribution pattern across Busia County, which agrees with [88], who revealed the remarkable influences of the NDVI and FVC on AET, in addition to topographical characteristics. This can be explained by the significant contribution of vegetation to the increase in the actual evapotranspiration due to the increased available energy absorbed by the canopy, as identified by Zhao et al. [89]. To further support this finding, Klisch and Atzberger [90] demonstrated the applicability of the NDVI derived from MODIS data for drought monitoring, since low estimated NDVI values indicate stressed-out vegetation, which is an indicator of drought occurrence in most scenarios [91]. Therefore, NDVI assessment constitutes the basis for early drought warnings [92]. In addition, the land surface temperature (LST) values ranged from 29.7 • C to 38.9 • C, while the TCWV estimates ranged from 33.6 kg/m 2 to 37.2 kg/m 2 . The LST and TCWV values were relatively high, directly contributing to high energy availability, indicating higher AET estimates, as suggested by [93]. Lower values indicate that the AET and LST negatively relate to air surface temperature changes [94]. The AET was found to be proportionally increased in regions with a high net solar radiation and air surface temperature, greatly influenced by the increase in evapotranspiration intensity, coupled with increased atmospheric evaporative demands, thus further increasing the frequency of droughts [58]. Nonetheless, compared to the tropics, the AET and LST have a positive relationship in high-altitude regions [95]. Although LST variation can be fundamental in selecting the wettest and driest pixels, as stated by Wang et al. [96], this may also introduce uncertainties and increase prediction errors. The authors of [97] found that Sentinel-1 and Sentinel-3 sensors can provide estimate values of the NDVI and LST to ascertain spatiotemporal vegetation dynamics, droughts, and water availability in water stress conditions, respectively, which are essential factors influencing, and significant driving forces of, the AET distribution. This is consistent with Arast et al. [39], who demonstrated that the NDVI, net solar radiation, and other meteorological parameters influence AET. The data from the two sensors are efficiently sufficient to derive the variables under investigation and explore their associations with AET.
Atmosphere 2022, 13,1927 10 of 21 The radar variables, i.e., the VH, VV, the ratio (VH/VV), the difference (VV−VH), and RVI, derived from the Sentinel-1 SAR data, are presented in Figure 5. Since an electromagnetic signal received by radar sensors is highly influenced by the surface [98], the backscatter intensity of VH polarization ranged from −26.2 dB to 15.8 dB, while the backscatter intensity of VV polarization was slightly more robust, ranging from −21.2 dB to 18.6 dB. Thus, this shows a more substantial variation compared to the VV backscatter values since the more robust the co-polarization (HH or VV) reflection is, the brighter the SAR image will be [99]. As former studies indicate, a decrease in the backscatter intensity is chiefly attributed to vegetation growth, causing volume scattering [45,100]. The difference (VV−VH) values ranged from −27.9 dB to 7.7 dB, while the backscatter estimated values for the ratio (VH/VV) ranged from −27,107 to 9666. Moreover, the ratio (VH/VV) decreased with a change in the land cover to scarce vegetation and non-vegetated rocky terrains, e.g., the Kavirondo Rock series in Busia. According to many studies, the value increases during the vegetation growth season [70,99], leading to the more significant influence of vegetation biomass [45]. Further analysis showed that the RVI estimated values ranged The radar variables, i.e., the VH, VV, the ratio (VH/VV), the difference (VV−VH), and RVI, derived from the Sentinel-1 SAR data, are presented in Figure 5. Since an electromagnetic signal received by radar sensors is highly influenced by the surface [98], the backscatter intensity of VH polarization ranged from −26.2 dB to 15.8 dB, while the backscatter intensity of VV polarization was slightly more robust, ranging from −21.2 dB to 18.6 dB. Thus, this shows a more substantial variation compared to the VV backscatter values since the more robust the co-polarization (HH or VV) reflection is, the brighter the SAR image will be [99]. As former studies indicate, a decrease in the backscatter intensity is chiefly attributed to vegetation growth, causing volume scattering [45,100]. The difference (VV−VH) values ranged from −27.9 dB to 7.7 dB, while the backscatter estimated values for the ratio (VH/VV) ranged from −27,107 to 9666. Moreover, the ratio (VH/VV) decreased with a change in the land cover to scarce vegetation and non-vegetated rocky terrains, e.g., the Kavirondo Rock series in Busia. According to many studies, the value increases during the vegetation growth season [70,99], leading to the more significant influence of vegetation biomass [45]. Further analysis showed that the RVI estimated values ranged from −265.38 to 163.0 dB. In general, the reflected energy drastically varied in Busia County according to the vegetation vigor and density, which are proportionally associated with the soil moisture content and canopy physiology in various growth stages [70]. Based on Figure 5, the bright features can probably be attributed to riparian vegetation [99] along the water bodies, such as the River Mososkoto and River Sio, and swampy areas, such as the Yala swamp, one of Kenya's most extensive freshwater wetlands.
Atmosphere 2022, 13,1927 11 of 21 from −265.38 to 163.0 dB. In general, the reflected energy drastically varied in Busia County according to the vegetation vigor and density, which are proportionally associated with the soil moisture content and canopy physiology in various growth stages [70]. Based on Figure 5, the bright features can probably be attributed to riparian vegetation [99] along the water bodies, such as the River Mososkoto and River Sio, and swampy areas, such as the Yala swamp, one of Kenya's most extensive freshwater wetlands.

Model Training Using a Random Search
Hyperparameter optimization enables decision making concerning the most important hyperparameters and tuning spaces. Therefore, we used the random grid search method to calibrate the model and optimize its hyperparameters by defining the search space as a bounded domain of hyperparameter values and randomly sampling points in that domain [101]. The authors of [102] indicated that a random search could tremendously improve the model accuracy by successfully probing a larger configuration space. When the random search is compared with the grid search, according to Larochelle et al. [103], although the grid search is one of the most extensively utilized hyperparameter optimization algorithms [102], the random search, over the same domain, effectively identifies accurate models with minimal processing. The authors of [104] also indicated that hyperparameters must be established before starting the process.

Model Training Using a Random Search
Hyperparameter optimization enables decision making concerning the most important hyperparameters and tuning spaces. Therefore, we used the random grid search method to calibrate the model and optimize its hyperparameters by defining the search space as a bounded domain of hyperparameter values and randomly sampling points in that domain [101]. The authors of [102] indicated that a random search could tremendously improve the model accuracy by successfully probing a larger configuration space. When the random search is compared with the grid search, according to Larochelle et al. [103], although the grid search is one of the most extensively utilized hyperparameter optimization algorithms [102], the random search, over the same domain, effectively identifies accurate models with minimal processing. The authors of [104] also indicated that hyperparameters must be established before starting the process.
For GBM calibration, three hyperparameters are supposed to be optimized, including the number of trees, learning rate, and depth of each tree. The number of trees represents the total number of trees in the sequence or ensemble. The averaging of separately growing trees in bagged and random forests renders overfitting with too many trees exceedingly tricky. GBMs, on the other hand, work differently since each tree is built in sequence so as to correct the flaws of the previous tree. The learning rate determines the extent to which each tree contributes to the final output and affects how rapidly the algorithm descends the gradient descent. Typical values vary from 3 to 8, yet a tree depth of 1 is not uncommon [78]. A detailed explanation of the method used to calibrate a GBM is presented in [105]. In this study, the best hyperparameters determined using a random search were the ntree (Number of trees) of 800, shrinkage or learning rate of 0.01, and interaction depth (depth of each tree) of 3.

Relative Influences of the Variables on the Model
The calibrated model was statistically significant in estimating the AET over the study area based on the existing reference's physical background and AET values estimated by the FAO in Busia County. A split criterion was applied so as to better understand and visualize the explanatory variable's influence on the model prediction of AET. In Figure 6, a demonstration of the most to least influential variables affecting the AET prediction model is shown. The more substantial the influence of the response and explanatory variable is, the larger the value is. Figure 6b illustrates that the FVC is the most influential explanatory variable, with the most significant impact on the modeling and estimation of the AET in Busia. This variable is consistent with the drought patterns in the same area since low values indicate insufficient amounts of precipitation. The total column water vapor and land surface temperature influenced the AET on various scales, including daily and monthly, according to Rocha et al. [96] and Wu et al. [106], revealing their great potential for application in many agronomy-based studies [96,107,108]. These variables are imperative for estimating regional AET, provided that soil moisture content and related variables are available in the remote sensing area [109].
Atmosphere 2022, 13,1927 12 of 21 For GBM calibration, three hyperparameters are supposed to be optimized, including the number of trees, learning rate, and depth of each tree. The number of trees represents the total number of trees in the sequence or ensemble. The averaging of separately growing trees in bagged and random forests renders overfitting with too many trees exceedingly tricky. GBMs, on the other hand, work differently since each tree is built in sequence so as to correct the flaws of the previous tree. The learning rate determines the extent to which each tree contributes to the final output and affects how rapidly the algorithm descends the gradient descent. . Typical values vary from 3 to 8, yet a tree depth of 1 is not uncommon [78]. A detailed explanation of the method used to calibrate a GBM is presented in [105]. In this study, the best hyperparameters determined using a random search were the ntree (Number of trees) of 800, shrinkage or learning rate of 0.01, and interaction depth (depth of each tree) of 3.

Relative Influences of the Variables on the Model
The calibrated model was statistically significant in estimating the AET over the study area based on the existing reference's physical background and AET values estimated by the FAO in Busia County. A split criterion was applied so as to better understand and visualize the explanatory variable's influence on the model prediction of AET. In Figure 6, a demonstration of the most to least influential variables affecting the AET prediction model is shown. The more substantial the influence of the response and explanatory variable is, the larger the value is. Figure 6b illustrates that the FVC is the most influential explanatory variable, with the most significant impact on the modeling and estimation of the AET in Busia. This variable is consistent with the drought patterns in the same area since low values indicate insufficient amounts of precipitation. The total column water vapor and land surface temperature influenced the AET on various scales, including daily and monthly, according to Rocha et al. [96] and Wu et al. [106], revealing their great potential for application in many agronomy-based studies [96,107,108]. These variables are imperative for estimating regional AET, provided that soil moisture content and related variables are available in the remote sensing area [109]. Similarly, Probst et al. [94] found that LST variation is strongly associated with vegetation evapotranspiration and energy balance in the case of wet soil and plants. Furthermore, the LST enhances evapotranspiration in cold air and unlimited soil water in inadequate precipitation conditions. In addition, droughts, high temperatures, and stronger radiative forcing lead to the drying propensity of the surface due to high evapotranspiration Similarly, Probst et al. [94] found that LST variation is strongly associated with vegetation evapotranspiration and energy balance in the case of wet soil and plants. Furthermore, the LST enhances evapotranspiration in cold air and unlimited soil water in inadequate precipitation conditions. In addition, droughts, high temperatures, and stronger radiative forcing lead to the drying propensity of the surface due to high evapotranspiration rates and low soil moisture, inducing an increase in the heat flux and high temperatures [58].
Furthermore, Yang et al. [69] demonstrated that NDVI patterns are usually consistent with the AET spatial distribution, while [45] demonstrated that the SAR backscatter and NDVI can be used in various physical environmental conditions because they have suitable optical plant properties. The least influential variables were the radar vegetation index (RVI), which measures the randomness of the scattering [110], and the ratio VH/VV, both associated with vegetation conditions. A study by Szigarski et al. [111] indicated that the correlation between the RVI and other indices depends on the other indices' independence from the surface roughness and soil moisture.
Furthermore, as stated by Rosenqvist et al. [99], surface roughness and hilly terrain may cause strong reflection, and VH polarization demonstrates multiple scattering and, hence, a low influence on AET. From Figure 6a, the correlation matrix demonstrates variations between the AET and remotely sensed data, indicating estimated positive correlations between the AET and FVC, NDVI, VH, and VH-VV. The findings agree with Yan et al. [112], who used cloud-free MODIS images from 2000 to 2014 with the ETWatch system and found that the NDVI positively correlated with the AET. Ma et al. [113] also demonstrated the significance of the FVC as a driving parameter that affected the AET and influenced its variation. In addition, there was an estimated weak negative correlation between the AET, VH/VV, and RVI, while there appeared to be no correlation between the AET and VV, LST, and TCWV. In agreement with our finding, Yan et al. [112] found that AET and LST are negatively correlated in water-scarce areas at various spatiotemporal extents.
After identifying the most to least influential variables (Figure 6b), partial dependence plots (PDPs) were created to visualize and understand the response variable changes, as presented in Figure 7. Evidently, they demonstrated the change in the average predicted AET (y) values. As demonstrated by the PDPs, the AET estimated values increased as the majority of the variables used in the model increased. For instance, the predicted AET estimated values increased with the increase in the NDVI (Figure 7a), which agrees with [95,[114][115][116], who found that this variable has a significant correlation with the AET and is closely linked to green-leaf-area-and vegetation-based indices. A varying trend in the AET, with the NDVI, cover varying between 55% and 62.5%, occurred, which does not necessarily demonstrate water availability but the greenery characteristic of vegetation [116]. The results further concur with Arast et al. [39], who found that larger NDVI values usually indicate increased AET estimates over different spatial and temporal timescales. The NDVI pattern did not entirely harmonize with the FVC (Figure 7b) response change characterization. For instance, the AET estimates were the highest for a vegetation density of 55% to 60%. In addition, the LST (Figure 7c) demonstrated varying trends in its response change, with the highest AET estimates recorded for temperatures ranging between 32 • C and 33 • C, whereas the highest response change in the TCWV variable (7d) demonstrated the highest predicted AET estimates (Figure 7d). The smaller the VH −VV (Figure 7e) index response was, the higher the predicted AET estimates for the radar variables were. Specifically, the PDPs satisfactorily demonstrated the change in the predicted average AET estimates as the variables varied in their distribution.

Accuracy Assessment Using Correlation Coefficient (R) and Root-Mean-Squared Error (RMSE)
The spatial distribution of the AET can be estimated using the GBM final model, which contributes to the improvement and development of the proper utilization of water systems [117]. The correlation coefficient R and root-mean-squared error (RMSE) were used to evaluate the model's statistical performance. The two-assessment metrics can be determined using Equations (1) and (2): where ŷi is the predicted value for the i th observation, yi is the observed value, and n is the total number of observations: where xi is the x-variable in a sample, ̅ is the mean of the x-variable values, yi is the yvariable in a sample, and is the mean of the y-variable values. The model established a relationship between the remotely sensed data retrieved from Sentinel-1 and Sentinel-3 and the AET reference data. The calibrated model yielded a correlation coefficient of 0.81 and an RMSE value of 10.7 mm, indicating its efficiency in predicting the AET. Figure 8a illustrates the relationship between the measured and estimated AET values (in mm) using the calibrated GBM model based on only two satellite images. The model established a statistical significance, yet an overestimation occurred in a few cases. The reason for this is that the model was semi-automatically calibrated. However, it was satisfying, and future work will entail the application of more processed sensor data for further investigations regarding climatological and ecological dynamics, as evidenced in a study by the authors of [118]. Based on Figure 8b, the GBM testing results showed a reasonably moderate association between the measured and predicted AET values (in mm), with a correlation R coefficient of 0.47 and an RMSE of 14.1 mm. This revealed the applied approach's potential statistical significance, which can be used as a timeless basis for future studies aiming to model and map the AET spatial distribution in similar environmental and climatic conditions.

Accuracy Assessment Using Correlation Coefficient (R) and Root-Mean-Squared Error (RMSE)
The spatial distribution of the AET can be estimated using the GBM final model, which contributes to the improvement and development of the proper utilization of water systems [117]. The correlation coefficient R and root-mean-squared error (RMSE) were used to evaluate the model's statistical performance. The two-assessment metrics can be determined using Equations (1) and (2): whereŷ i is the predicted value for the i th observation, y i is the observed value, and n is the total number of observations: where x i is the x-variable in a sample, x is the mean of the x-variable values, y i is the y-variable in a sample, and y is the mean of the y-variable values. The model established a relationship between the remotely sensed data retrieved from Sentinel-1 and Sentinel-3 and the AET reference data. The calibrated model yielded a correlation coefficient r of 0.81 and an RMSE value of 10.7 mm, indicating its efficiency in predicting the AET. Figure 8a illustrates the relationship between the measured and estimated AET values (in mm) using the calibrated GBM model based on only two satellite images. The model established a statistical significance, yet an overestimation occurred in a few cases. The reason for this is that the model was semi-automatically calibrated. However, it was satisfying, and future work will entail the application of more processed sensor data for further investigations regarding climatological and ecological dynamics, as evidenced in a study by the authors of [118]. Based on Figure 8b, the GBM testing results showed a reasonably moderate association between the measured and predicted AET values (in mm), with a correlation R coefficient of 0.47 and an RMSE of 14.1 mm. This revealed the applied approach's potential statistical significance, which can be used as a timeless basis for future studies aiming to model and map the AET spatial distribution in similar environmental and climatic conditions.

Conclusions
This research aimed to estimate the monthly mean actual evapotranspiration using Sentinel-1 SAR ground-range-detected (GRD) and Sentinel-3 SLSTR Level 2 data in a typical tropical climate in Busia County. The gradient boosting machine was trained and tested using reference data acquired from the WaPOR. The model showed a strong correlation (r = 0.81) between the observed and estimated AET data for the training and a moderate correlation (r = 0.47) for the testing, revealing the superiority of the applied method. The FVC was a highly influential explanatory variable, with the most significant impact on the prediction model for AET estimation, while the ratio VV/VH was the least influential variable regarding the AET estimation model. Although the remotely sensed data and GBM application undoubtfully yielded promising results, further examination is highly recommended using other machine learning algorithms to optimize the approach's efficiency and explore the nature of the statistical relationships between the AET and applied variables. This could enable the timely and consistent monitoring of actual evapotranspiration, water deficiencies, and agricultural sustainability, as well as ensure food security. This research further enhances our understanding of AET assessment and the potential of using Sentinel-1 and Sentinel-3 data for regional drought monitoring and natural resources management. With a relatively successful estimation of the AET, drought events can easily be predicted in future studies since the AET is one of the primary factors of drought magnitude and occurrence. Further research will be carried out on larger scales and in different climatic regions to validate the applicability of the proposed methodology.
Author Contributions: Conceptualization: P.K.M. and G.S.; data processing and code writing, G.S.; writing-original draft preparation P.K.M.; writing-review and editing, P.K.M. and G.S., data interpretation, manuscript revision, G.T., T.W., and B.S. All authors have read and agreed to the published version of the manuscript.

Conclusions
This research aimed to estimate the monthly mean actual evapotranspiration using Sentinel-1 SAR ground-range-detected (GRD) and Sentinel-3 SLSTR Level 2 data in a typical tropical climate in Busia County. The gradient boosting machine was trained and tested using reference data acquired from the WaPOR. The model showed a strong correlation (r = 0.81) between the observed and estimated AET data for the training and a moderate correlation (r = 0.47) for the testing, revealing the superiority of the applied method. The FVC was a highly influential explanatory variable, with the most significant impact on the prediction model for AET estimation, while the ratio VV/VH was the least influential variable regarding the AET estimation model. Although the remotely sensed data and GBM application undoubtfully yielded promising results, further examination is highly recommended using other machine learning algorithms to optimize the approach's efficiency and explore the nature of the statistical relationships between the AET and applied variables. This could enable the timely and consistent monitoring of actual evapotranspiration, water deficiencies, and agricultural sustainability, as well as ensure food security. This research further enhances our understanding of AET assessment and the potential of using Sentinel-1 and Sentinel-3 data for regional drought monitoring and natural resources management. With a relatively successful estimation of the AET, drought events can easily be predicted in future studies since the AET is one of the primary factors of drought magnitude and occurrence. Further research will be carried out on larger scales and in different climatic regions to validate the applicability of the proposed methodology.

Data Availability Statement:
The data that supported this research can be found on the Copernicus Open Access Hub (https://scihub.copernicus.eu/dhus/#/home) and the FAO WaPOR (https:// wapor.apps.fao.org/). R code used in this study can be shared upon request.