A Comparison of Different Machine Learning Methods to Reconstruct Daily Evapotranspiration Time Series Estimated by Thermal–Infrared Remote Sensing

: Remote sensing-based models usually have difficulty in generating spatio-temporally continuous terrestrial evapotranspiration (ET) due to cloud cover and model failures. To overcome this problem, machine learning methods have been widely used to reconstruct ET. Therefore, studies comparing and evaluating the accuracy and effectiveness of reconstruction among different machine learning methods at the basin scale are necessary. In this study, four popular machine learning methods, including deep forest (DF), deep neural network (DNN), random forest (RF) and extreme gradient boosting (XGB), were used to reconstruct the ET product, addressing gaps resulting from cloud cover and model failure. The ET reconstructed by the four methods was evaluated and compared for Heihe River Basin. The results showed that the four methods performed well for Heihe River Basin, but the RF method was particularly robust. It not only performed well compared with ground measurements (R = 0.73) but also demonstrated the ability to fully reconstruct gaps generated by the TSEB model across the entire basin. Validation based on ground measurements showed that the DNN and XGB models performed well (R > 0.70). However, some gaps still existed in the desert after reconstruction using the DNN and XGB models, especially for the XGB model. The DF model filled these gaps throughout the basin, but this model had lower consistency compared with ground measurements (R = 0.66) and yielded many low values. The results of this study suggest that machine learning methods have considerable potential in the reconstruction of ET at the basin scale.


Introduction
Terrestrial evapotranspiration (ET) is a crucial component of land-atmosphere hydrology, energy and material cycles [1,2].The accurate and reliable estimation of regional ET is important for basin hydrology, agricultural water management and drought monitoring [3].Currently, ground measurement systems including the eddy covariance (EC) system and the large-aperture scintillometer (LAS) are commonly used to measure ET under different vegetation types [4,5].These measurement techniques, however, can only provide valid measurements at a several meters to ~100 m scale, with difficulty in obtaining valid measurements at larger scales [4].Therefore, ground-based observations are usually used to validate ET products based on remote sensing.By contrast, remote sensing techniques provide the ability to easily monitor large-scale geographical information according to satellites and thus have become a commonly used way of detecting ET.
However, remote sensing techniques can only detect surface parameters related to ET, rather than directly observing ET.In order to acquire reliable ET over larger scales, many remote sensing-based ET simulation models have been proposed that can be used to acquire ET over larger scales [6][7][8][9][10][11][12][13].Among these models, thermal-infrared-based models are widely used to estimate regional ET based on thermal-infrared-based land surface temperature (LST) [14][15][16].The two-source energy balance (TSEB) model is one of the most widely applied and has a more reasonable physical mechanism compared to singlesource models [6].It has been shown that the TSEB model can more accurately simulate energy exchanges between the atmosphere, soil and vegetation and is more adaptable to different vegetation types and climatic regions [17][18][19].The input parameters of the TSEB model include surface boundary parameters based on remote sensing and meteorological parameters [6].Meteorological reanalysis data overcome the spatial limitations of the observed meteorological data recorded by traditional weather stations and can be employed to drive the TSEB model at large scales [20,21].However, the TSEB model also relies on inputting thermal-infrared-based surface temperature as a boundary constraint.This often leads to model invalidation in regions where surface temperatures are influenced by solid clouds, thus limiting the practical applications of this model [12,[22][23][24].Moreover, due to the mechanism of the TSEB model, it may still produce gaps in areas shrouded by solid clouds with low radiation, even when land surface temperatures are available [25].
Hence, exploring reliable methods for the spatio-temporal reconstruction of TSEBestimated ET is significant for agricultural water management and hydrological applications [22,25].In response to these challenges, various machine learning methods, such as random forest (RF) [26], deep forest (DF) [27], deep neural networks (DNNs) [25] and extreme gradient boosting (XGBoost) [28], have provided viable solutions for the reconstruction of ET.These methods have been applied to estimate or reconstruct the surface parameters from remote sensing data in previous studies [29][30][31][32][33].The conventional approach usually entails the initial training of a model at the site scale and then expanding the model to a larger regional scale using remote sensing and other data [34][35][36].However, although such well-trained models typically perform well at the site scale, unevenly distributed and limited sites cannot adequately represent heterogeneous surfaces [35].Hence, some relevant research has used the effective target parameters obtained from the model as input samples for training machine learning methods, which are used to fill the gaps in the model estimation by combining the spatio-temporal continuous impact factors [25,33].This innovative methodology ensures that machine learning methods not only fill in the gaps but also guarantee the accuracy and reliability of the models.However, few studies have used this way of reconstructing ET.Whether different machine learning methods perform differently when combined with physical models also needs to be investigated.
The objective of this paper was to generate spatio-temporally continuous daily ET, overcoming the spatial limitations of traditional ground measurement systems and the temporal constraints associated with remote sensing models.To achieve this, four machine learning methods were employed to reconstruct the gaps generated by the TSEB model for Heihe River Basin.In the following sections, we delve into the methodology of combining the TSEB model with machine learning for ET estimation and reconstruction and comprehensively compare the accuracy and effectiveness of the different machine learning methods coupled with the TSEB model at different spatial scales.

Study Area and EC Sites
The study area was Heihe River Basin located in the middle of the Hexi corridor, which is the second largest inland basin in northwest China, covering approximately 1,432,000 km 2 [37,38].According to the hydrological characteristics, the basin can be divided into upstream, midstream and downstream sections.Heihe River Basin is characterized by widespread desert, sporadic grassland and cropland, with riparian forest in the downstream regions and widespread grassland, riparian ecosystems, wetland and cropland (cultivated by crops such as maize, wheat and vegetables) in the upstream and midstream regions (Figure 1) [37,38].This area is in arid and semi-arid regions and has a typical temperate continental climate, with a mean annual temperature of 6.0~8.0 • C, mean annual precipitation of 100~250 mm, and mean annual evapotranspiration of 1200~1800 mm.downstream regions and widespread grassland, riparian ecosystems, wetland and cropland (cultivated by crops such as maize, wheat and vegetables) in the upstream and midstream regions (Figure 1) [37,38].This area is in arid and semi-arid regions and has a typical temperate continental climate, with a mean annual temperature of 6.0~8.0 °C, mean annual precipitation of 100~250 mm, and mean annual evapotranspiration of 1200~1800 mm.Heihe Watershed Allied Telemetry Experimental Research (HiWATER) has been conducted in this area to better understand hydrological, ecological and other land surface processes, accumulating numerous surface observation data for this purpose [38].Six EC stations from 2011 to 2016 with relatively homogeneous surfaces were selected to validate the accuracy of the estimated and reconstructed daily ET in this study (Table 1).These sites include one wetland EC station (Dashalong) [39], one grassland EC station (Arou) [39], two cropland EC stations (Daman and Linze) [5,39,40] and two forest EC stations (Huyanglin and Hunhelin) (Figure 1) [39].Heihe Watershed Allied Telemetry Experimental Research (HiWATER) has been conducted in this area to better understand hydrological, ecological and other land surface processes, accumulating numerous surface observation data for this purpose [38].Six EC stations from 2011 to 2016 with relatively homogeneous surfaces were selected to validate the accuracy of the estimated and reconstructed daily ET in this study (Table 1).These sites include one wetland EC station (Dashalong) [39], one grassland EC station (Arou) [39], two cropland EC stations (Daman and Linze) [5,39,40] and two forest EC stations (Huyanglin and Hunhelin) (Figure 1) [39].
Original EC measurement data were stored as the average latent heat flux per 30 min (48 data per day).In this study, the daily ET measurements were aggregated from 8:00 to 19:00, when less than 25% of the observations were absent.All ground measurement data can be acquired from the National Tibetan Plateau Data Center (TPDC) at https: //data.tpdc.ac.cn (accessed on 10 November 2023).

Multisource Data
In this study, surface boundary parameters for constrained surface heat fluxes, including LST, leaf area index (LAI) and land cover type (LC), needed to be input into the TSEB model.These parameters can be acquired through remote sensing techniques.Among them, the LST dataset utilized a fusion product that combines the Global Land Data Assimilation System (GLDAS) and Terra MODIS LST [41].This fusion product is based on the time series decomposition model of LST, reconstructing the gaps in MODIS LST, with spatial and temporal resolutions of 1 km and daily [41].The LAI dataset was collected from the Global Land Surface Satellite (GLASS) LAI dataset with spatial and temporal resolutions of 500 m and 8-day [42].In order to ensure consistency with other data in the temporal resolution, the LAI was temporally linearly smoothed to a daily scale.The Albedo dataset, which was used for the reconstruction of daily ET, was also collected from GLASS and similarly processed.The land cover type map based on the International Geosphere-Biosphere Programme (IGBP) classification system can be acquired from the MCD12Q1 Version 6.1 data product [43].Considering the influence of topography on ET, a digital elevation model (DEM) was collected to reconstruct ET in this study.
Considering the TSEB model and reconstruction, eight meteorological variables in ERA5-land were selected, including air temperature (TA), u-component of wind (UW), v-component of wind (VW), surface pressure (SP), dewpoint temperature (DT), surface solar radiation downward (SSRD) and surface thermal radiation downward (STRD) [21].Each meteorological parameter was processed as an instantaneous value at 14:00 according to longitude to drive the TSEB model and a daily average value for reconstruction.The TSEB model required true wind speed (WS) and relative humidity (RH) as inputs.However, the ERA5-land does not provide WS and RF directly.But, they can be calculated by the above parameters.The WS can be obtained by combining the two components of wind (UW and VW) through the vector addition principle, and the RH can be calculated through TA and DT.
Due to different sources, there are considerable variations in the spatial resolutions of these parameters.Therefore, the spatial resolutions of all datasets were unified to 0.01 • by bilinear interpolation.Details of the datasets used in this study are shown in Table 2.

Methods
The flowchart for generating spatio-temporal continuous daily ET by the TSEB model and machine learning methods is shown in Figure 2.After the pre-processing of remote sensing and meteorological data was finished, the spatio-temporal discontinuous daily ET was first generated by the TSEB model using remote sensing and instantaneous meteorological data.To reconstruct the gaps in TSEB simulation, four machine learning methods (RF, DNN, DF, XGBoost) were trained and then employed to reconstruct the above gaps in this study.At last, the reconstructed daily ET time series by different machine learning methods were obtained.

Description of the TSEB Model
The TSEB model, proposed by Norman in 1995 [6], is a physically based two-source energy balance model used in remote sensing and hydrological studies.The TSEB model can be used to estimate surface energy fluxes at different scales and considers two separate energy components: the soil and the vegetation.It can be applied to accurately estimate the radiative and turbulent energy exchange between the canopy, soil and atmosphere with different vegetation types and climatic areas and has demonstrated robust performances [44,45].Moreover, the TSEB model is easy to combine with remote sensing, enabling the estimation of evapotranspiration with high spatio-temporal resolution [46].In this study, the TSEB model was initially utilized to estimate the latent fluxes from the canopy and soil at 14:00 and then temporally upscaled to a daily scale by the evaporative fraction constant (ConEF) method.Details of the TSEB model and ConEF method can be found in relevant articles [23,47,48].Details of the TSEB model can be found in Supplementary Materials.

Machine Learning Methods for Filling the Gaps
The TSEB model generated gaps in 45.2% of Heihe River Basin due to the cloud cover and the mechanism of TSEB [25].Machine learning methods can be used to explore and establish complex nonlinear relationships between multiple variables [25,32].In this study, four machine learning methods (RF, DF, DNN, XGBoost) were employed to reconstruct gaps after TSEB estimation for Heihe River basin.Considering the influence of various factors on ET, surface parameters including LAI, Albedo, LC and meteorological variables including Ta, RH, RH, SSRD, STRD and WS were used to train the machine learning methods.DEM and latitude (LAT) were also employed to further constrain and train the machine learning methods in order to depict the influence of terrain and latitudinal zonation on ET [25].
The trained models combining spatio-temporally continuous parameters were subsequently applied to reconstruct gaps, respectively.The relationship between ET and impact factors can be expressed as follows: (E, T) = f RF,DF,DNN,XGB (Albedo, LAI, DEM, LC, LAT, RH, SP, SSRD, STRD, TA, WS) (1) where the f represents the nonlinear relationship between E, T and impact factors and the subscript represents different machine learning methods.It should be noted that in order to improve the stability, accelerate the convergence and avoid gradient vanishing or exploding, the inputs of the training parameters were normalized initially.

Description of the TSEB Model
The TSEB model, proposed by Norman in 1995 [6], is a physically based two-source energy balance model used in remote sensing and hydrological studies.The TSEB model can be used to estimate surface energy fluxes at different scales and considers two separate energy components: the soil and the vegetation.It can be applied to accurately estimate the radiative and turbulent energy exchange between the canopy, soil and atmosphere with different vegetation types and climatic areas and has demonstrated robust performances [44,45].Moreover, the TSEB model is easy to combine with remote sensing, enabling the estimation of evapotranspiration with high spatio-temporal resolution [46].In this study, the TSEB model was initially utilized to estimate the latent fluxes from the canopy and soil at 14:00 and then temporally upscaled to a daily scale by the evaporative fraction constant (ConEF) method.Details of the TSEB model and ConEF method can be found in relevant articles [23,47,48].Details of the TSEB model can be found in Supple-

SHAP Explanation
The "black-box" nature of machine learning methods is an important feature that refers to the fact that such models are difficult to understand.The SHAP method can indirectly explain the contribution of features to model predictions using the Shapley value.Lundberg and Lee extended the concept of the Shapley value and used it to quantify the contribution of each feature to the model output [49].SHAP values are calculated based on weighted averages of differences between predictions when training the model with all features and with focused features removed.A larger absolute value of SHAP means that the variables have a greater impact on the retrieval results.In this study, the SHAP value was used to quantify the comprehensive contribution of each parameter to ET.

Site-Scale Validation
Based on the EC flux data, the daily ETs generated by different machine learning methods were compared with ground measurements to validate the accuracy of them, respectively.In this study, the correlation coefficient (R), bias (unit: mm day −1 ) and root mean square error (RMSE, unit: mm day −1 ) were selected as quantitative indicators to evaluate the accuracy of the generated ET, and the expression is as follows: where ET E and ET OB represent the generated and observed daily ET, respectively, the subscript i denotes the ith sample, the symbols of ET E and ET OB denote the mean of the generated and observed daily ET, and n represents the sample size.A larger R and smaller RMSE and bias indicate better performance; furthermore, the bias can reflect the overall overestimation and underestimation.

Uncertainty Evaluation at the Regional Scale
Since site-scale validation is not representative of accuracy for the whole basin, the three-cornered hat (TCH) method was employed for cross-validation between daily ETs reconstructed by different machine learning methods.The generalized TCH method can be employed to estimate the relative uncertainty of the ET time series from different reconstruction methods without any ground measurement [50].The details of the generalized TCH method are described below.
The time series of daily ET can be decomposed into two parts: true value and error: where all variables are time series, X i represents the ith time series of reconstructed daily ET, X t is the truth value series, ε i represents the error term of the ith time series and N is the number of datasets involved in the calculation.In this study, N was 4. In order to calculate the relative uncertainty of each reconstructed ET result, the true value series (X t ) needed to be known.But, most of the true values were difficult to observe.Therefore, the TCH method defined the difference between series and reference series (X N ) as follows: where Y is a matrix with an N − 1 time series.Since the choice of X N is theoretically insensitive in the TCH method, it can be randomly selected.DNN reconstructed daily ET was selected as X N in this study.The covariance matrix of Y can be obtained using S = cov(Y).The unknown N × N covariance matrix of the individual noise R is related to S: where Z is an (N − 1) × (N − 1) identity matrix and a is [ . Because the number of unknown elements is larger than the number of equations, the above equation could not be solved.In order to solve these equations, the constrained minimization problem was proposed by Galindo and Palacio [51] based on the Kuhn-Tucker theorem.Finally, the matrix R was obtained by minimizing the objective function.The uncertainty of the time series (X i ) was the square root of the diagonal elements of the R matrix and the relative uncertainty was defined as the ratio of the uncertainty to the mean value of each uncertainty.

Determination of Key Input Parameters
In this study, eleven spatio-temporally continuous parameters were employed to train machine learning models and reconstruct daily ET.Correlation coefficient matrix analysis can be used to capture the linear relationships between individual variables and daily ET [52].The result revealed that LAI had a strong positive correlation with T (R = 0.76) and a weak negative correlation with E (R = −0.14)(Figure 3).This highlighted the crucial role of vegetation in the partitioning of ET as high LAI may impede energy reaching the soil surface.Additionally, the radiation terms SSRD and STRD, as primary energy sources, showed moderate correlations with ET.All other variables showed different linear correlations with ET.However, ET is influenced by multiple factors, and interactions among these factors exist.The correlation coefficients may not adequately represent the effects on ET in the actual environment.Therefore, SHAP analysis was introduced in this study to analyze the comprehensive effect of different variables on ET (Figure 4).The SHAP analysis indicated that LAI and SSRD had the greatest effect on ET, with SHAP values of 0.26 and 0.21, respectively.Other parameters also exhibited comparable average impacts.This highlights their influence on the water exchange between the surface and the atmosphere.It is noteworthy that Albedo and RH showed the smallest impact (SHAP values ≈ 0.01) on ET, contrary to the results of the correlation analysis (absolute value of R > 0.3).This discrepancy may be attributed to the SHAP analysis considering interaction effects between features, whereas correlation coefficients only focus on the linear relationships of individual features.The effects of Albedo and relative humidity on ET may be attenuated by other variables.

Validation of Reconstructed Daily ET
Figures 5-8 show the daily ET reconstructed by different machine learning methods compared to ground measurements at the six EC sites.Overall, the generated daily ET (including TSEB-estimated and reconstructed daily ET) demonstrated great and similar performances, with an average R of 0.74, bias between 0.08 and 0.11 mm day −1 , and RMSE between 1.11 and 1.15 mmday −1 .However, when only the reconstructed daily ET was considered, the discrepancies between different machine learning methods were reflected (Figure 9).Apparently, most points were clustered in the range where ET was less than 2 mm day −1 .This phenomenon can be attributed to the fact that lower ET is usually accompanied by lower solar radiation.Under these conditions, LST may not be available and the TSEB model is more likely to fail.As the ET increases, the distribution of points tends to disperse.Despite these discrepancies, the reconstructed ET by different machine learning methods usually showed reasonable accuracy.Among them, the daily ET reconstructed by the XGB model had the highest performance, with an R, bias and RMSE of 0.76, 0.06 mm day −1 and 0.52 mmday −1 , followed by the DNN and RF models.On the contrary, the DF model showed slightly worse performance, with R, bias and RMSE of 0.66, 0.04 mm day −1 and 0.55 mm day −1 , respectively, and the reconstruction had greater scatter in the lower value range.Overall, all parameters showed different levels of importance and were involved in the model training and reconstruction.

Validation of Reconstructed Daily ET
Figures 5-8 show the daily ET reconstructed by different machine learning methods compared to ground measurements at the six EC sites.Overall, the generated daily ET (including TSEB-estimated and reconstructed daily ET) demonstrated great and similar performances, with an average R of 0.74, bias between 0.08 and 0.11 mm day −1 , and RMSE between 1.11 and 1.15 mmday −1 .However, when only the reconstructed daily ET was considered, the discrepancies between different machine learning methods were reflected (Figure 9).Apparently, most points were clustered in the range where ET was less than 2 mm day −1 .This phenomenon can be attributed to the fact that lower ET is usually accompanied by lower solar radiation.Under these conditions, LST may not be available and the TSEB model is more likely to fail.As the ET increases, the distribution of points tends to disperse.Despite these discrepancies, the reconstructed ET by different machine learning methods usually showed reasonable accuracy.Among them, the daily ET reconstructed by the XGB model had the highest performance, with an R, bias and RMSE of 0.76, 0.06 mm day −1 and 0.52 mmday −1 , followed by the DNN and RF models.On the contrary, the DF model showed slightly worse performance, with R, bias and RMSE of 0.66, 0.04 mm day −1 and 0.55 mm day −1 , respectively, and the reconstruction had greater scatter in the lower value range.

Relative Uncertainty at the Basin Scale
Direct validation at the site scale does not adequately represent spatial performance.Due to the difficulty of obtaining direct observations at large scales, the TCH method was employed to calculate the relative uncertainty of different machine learning methods in this study.The spatial distributions of relative uncertainty of daily ET reconstructed by different models are shown in Figure 10.Overall, the reconstruction results of all four methods had low relative uncertainty for the whole basin, with average relative uncertainties of 5.36%, 9.35%, 5.95% and 6.44% for DF, DNN, RF and XGB, respectively.However, the DNN model had overall high relative uncertainty, especially for the deserts at the junction between the midstream and upstream regions (>20%).The relative uncertainty of XGB-reconstructed daily ET showed a patchy distribution in the Heihe River region.This may be related to the gaps that remained in these regions after XGB model reconstruction.The DF and RF models, on the other hand, had an analogical distribution of uncertainty across the basin, without significant high values.This could mean that DF and RF are more robust at whole-basin scales.

Relative Uncertainty at the Basin Scale
Direct validation at the site scale does not adequately represent spatial performance.Due to the difficulty of obtaining direct observations at large scales, the TCH method was employed to calculate the relative uncertainty of different machine learning methods in this study.The spatial distributions of relative uncertainty of daily ET reconstructed by different models are shown in Figure 10.Overall, the reconstruction results of all four methods had low relative uncertainty for the whole basin, with average relative uncertainties of 5.36%, 9.35%, 5.95% and 6.44% for DF, DNN, RF and XGB, respectively.However, the DNN model had overall high relative uncertainty, especially for the deserts at the junction between the midstream and upstream regions (>20%).The relative uncertainty of XGB-reconstructed daily ET showed a patchy distribution in the Heihe River region.This may be related to the gaps that remained in these regions after XGB model reconstruction.The DF and RF models, on the other hand, had an analogical distribution of uncertainty across the basin, without significant high values.This could mean that DF and RF are more robust at whole-basin scales.

Spatial Distribution of Reconstructed ET
Figure 11 shows cumulative distribution frequency curves vs. effective coverage percentage of the TSEB-estimated and reconstructed daily ET.The areas of the curve on the X-axis in the figure represent the missing amounts.They indicated that RF and DF completely reconstructed the gaps after TSEB estimation, but there remained some gaps when ET was reconstructed by DNN and XGB.In order to further understand the effectiveness of different reconstruction methods, the spatial patterns of the effective coverage rate of daily ET (ratio of the number of days with valid ET against the total days) estimated by these methods and the original TSEB model are shown in Figure 11.The coverage of TSEB model-estimated ET for deserts was lower than for other regions, regardless of the region of the basin.The average effective coverage rates of daily ET after reconstruction with DF, DNN, RF and XGB were improved from 54.8% to 100%, 94.8%, 100% and 94.5%, respectively, for the original TSEB model (Figures 11 and 12).The DNN model exhibited a low coverage rate for the downstream desert regions, while the low coverage rate after XGB-model reconstruction was sporadically distributed throughout the basin.bined with Figure 10, observing the reconstruction results by different machine learning methods, the DF and RF methods completely reconstructed the daily ET in all seasons but the DNN and XGB methods showed some localized gaps in winter and autumn.Mos of these gaps were more apparent for the desert of the downstream region.Furthermore the XGB model showed more patchy gaps in the upstream and midstream regions.This suggested that the DNN and XGB models may not be suitable for desert regions.Figure 13 shows the spatial patterns of the TSEB model-estimated and reconstructed daily ET for different seasons.In terms of spatial effectiveness, the TSEB model-estimated ET showed significant gaps in all seasons, particularly in autumn and winter.Additionally, gaps were more prevalent for deserts where daily ET was low (<2 mm day −1 ).Combined with Figure 10, observing the reconstruction results by different machine learning methods, the DF and RF methods completely reconstructed the daily ET in all seasons, but the DNN and XGB methods showed some localized gaps in winter and autumn.Most of these gaps were more apparent for the desert of the downstream region.Furthermore, the XGB model showed more patchy gaps in the upstream and midstream regions.This suggested that the DNN and XGB models may not be suitable for desert regions.

Coupling of the TSEB Model and Machine Learning Methods
In this study, the TSEB model-estimated gaps of daily ET maps were reconstructed by using different machine learning methods and multi-source remote sensing data.The results showed that although most of the reconstructed daily ET values were concentrated in the lower range, these low values of daily ET still had an important influence on the hydrological effects of the basin [53].In addition, the high values of ET in this study also showed reasonable consistency compared to the ground measurements.In previous studies, machine learning methods were widely used mainly to upscale daily ET using ground measurements [36,54].While this approach may be able to generate spatio-temporally continuous ET at a regional scale, it lacks a reasonable physical explanation [32].Subsequent studies have employed machine learning methods for reconstruction at a regional scale where LST information is invalid [22,25,55].In such studies, researchers have trained machine learning methods using the valid outputs of physical models as labels.The models trained in this way not only have the support of the physical theory but can also provide accurate ET estimates without LST information [22,56].However, previous studies found that the TSEB model and machine learning methods may not yield valid results due to low solar shortwave radiation, even when the LST information is available [25].The reasons for this phenomenon may be the limitations of the TSEB model itself under low available energy conditions and extreme meteorological conditions [14].Therefore, this study also explored whether different machine learning methods performed reasonably well in such regions.

Importance of Input Parameters
The selection of appropriate input variables is crucial before deep learning model training [52].In the theory of TSEB, ET is constrained not only by surface parameters but also by various meteorological driving factors [14].Based on this, in this study, Albedo and LC were chosen to represent the effect of land surface character; LAI represented the effect of vegetation and WS, TA, RH, SP, SSRD and STRD represented the effect of atmosphere.Also, considering the latitudinal zonation of ET and topography, DEM and LAT were chosen as key input parameters [25].However, this study was conducted at a spatial resolution of 0.01 • .Given that multiple variables were required for the TSEB model and machine learning methods, we combined input parameters from multiple sources and unified their spatial and temporal resolution to 0.01 • and daily by bilinear interpolation.However, this approach to data processing can raise some issues.For instance, the meteorological factors provided by the ERA5-land dataset originally had a spatial resolution of 0.1 • , and we downscaled it to 0.01 • by bilinear interpolation.However, extreme meteorological conditions and advection are prevalent in Heihe River Basin, which may have resulted in the ERA5-land data with 0.1 • resolution failing to accurately reflect the actual meteorological conditions due to smoothing effects during the downscaling [57].Similarly, the GLASS LAI dataset suffers from a comparable problem.The temporal resolution of GLASS LAI is 8-day.In this study, the LAI data were smoothed to daily by a linear smoothing method.This may have little effect in natural vegetation conditions, but it may not accurately capture sharp changes in LAI caused by crop harvesting in croplands.Currently, the spatial representation of remote sensing remains a major challenge.It is expected that more reliable datasets can be developed in future studies to further develop the application of remote sensing in the estimation of surface parameters.

Comparison of Different Machine Learning Methods for Reconstruction
To reconstruct the TSEB model-estimated ET, four different machine learning methods were used in this study.Although we anticipated a complete reconstruction of the ET over the whole Heihe River Basin, the XGB and DNN models retained gaps after reconstruction.Despite the high reconstruction accuracy of the XGB and DNN models compared to ground measurements (R > 0.7), these gaps indicated that they still have potential for improvement in the reconstruction of regional ET.The gaps of the DNN model were uniformly distributed in the desert region downstream of Heihe River Basin, while the gaps of the XGB model showed a patchy distribution throughout Heihe River Basin.This phenomenon suggested that the DNN model may have had a more robust performance than the XGB model over the whole basin, even though the accuracy of both models in comparison with ground measurements was comparable.The DF model is a development of the RF model, which has a higher potential in theory.Moreover, the DF model is insensitive to parameter settings [27,31].However, we found that although the DF model completely reconstructed the daily ET for the whole basin, the DF model had the lowest R value (R = 0.66) compared with ground measurements.The DF model produced numerous low values of daily ET that did not match observations.These issues suggested that the DF model may have limitations in the reconstruction of surface parameters.Summarizing the above comparison, among the four machine learning methods in this study, the RF model had the most robust performance.The RF model not only accomplished the reconstruction of daily ET for Heihe River Basin but also performed well in comparison with ground measurements.Moreover, the RF model had the highest efficiency among the four methods.
In addition, each machine learning method has many parameters to support normal operation.Therefore, the performance of these models may vary significantly with parameter changes [58,59].However, the focus of this study was to compare their performances in the reconstruction of daily ET.Therefore, the original parameters were not intentionally adjusted here.

Conclusions
This study initially drove TSEB to estimate the ET for Heihe River Basin.Subsequently, daily ET was reconstructed using four different machine learning methods (DF, DNN, RF, XGB).At last, the performances of the reconstructed ET from the four machine learning methods were evaluated and compared at site and basin scales.The results showed that the four methods all performed well for Heihe River Basin.The RF model not only demonstrated high prediction accuracy (R = 0.73) but also effectively reconstructed regional ET across all vegetation types, being a more robust model overall.This highlights its suitability as a reliable model for ET reconstruction for Heihe River Basin.The DNN and XGB models achieved high accuracy compared with ground measurements (R > 0.70).However, the reconstructed daily ET retained gaps for the desert region, especially with the XGB model, which had patchy, distributed gaps.The DF model successfully reconstructed daily ET across the whole basin, but it performed poorly (R = 0.66) compared with ground measurements.Moreover, the DF model produced many unreasonable low values.The exploration of this study may provide more references for scholars to estimate or reconstruct ET.
Future research endeavors may enhance the generalizability of these findings by expanding the spatial scale to cover a wider geographic area.Additionally, the integration of other machine learning and physical models presents an opportunity.Combining the strengths of different models may potentially improve the overall accuracy and reliability of ET estimation and reconstruction.These endeavors will contribute to advancing the field of remote sensing-based ET estimation, enabling more robust and versatile applications across diverse environmental contexts.

Figure 1 .
Figure 1.Study area and vegetation type map for Heihe River Basin and the location of EC sites in the upstream, midstream and downstream regions, along with the landscape around the EC sites.

Figure 1 .
Figure 1.Study area and vegetation type map for Heihe River Basin and the location of EC sites in the upstream, midstream and downstream regions, along with the landscape around the EC sites.

Figure 2 .
Figure 2. Flowchart of the estimation and reconstruction of the daily T and E based on TSEB and four machine learning methods.Note that the EC measurement data were used only for accuracy verification.

Figure 2 .
Figure 2. Flowchart of the estimation and reconstruction of the daily T and E based on TSEB and four machine learning methods.Note that the EC measurement data were used only for accuracy verification.
Remote Sens. 2024, 16, x FOR PEER REVIEW 9 of 22 features.The effects of Albedo and relative humidity on ET may be attenuated by other variables.Overall, all parameters showed different levels of importance and were involved in the model training and reconstruction.

Figure 3 .
Figure 3. Pearson correlation coefficient matrix of all parameters.The sample size for the calculation of correlation coefficients was 2189858.All p−values for the correlation coefficients (two−tailed) were less than 0.01.

Figure 3 .
Figure 3. Pearson correlation coefficient matrix of all parameters.The sample size for the calculation of correlation coefficients was 2189858.All p−values for the correlation coefficients (two−tailed) were less than 0.01.

Figure 4 .
Figure 4. Average impact values of input parameters calculated by the SHAP method.

Figure 4 .
Figure 4. Average impact values of input parameters calculated by the SHAP method.

22 Figure 5 .
Figure 5. Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using DF at six EC sites.The dashed line is a 1:1 line.

Figure 6 .
Figure 6.Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using DNN at six EC sites.

Figure 5 . 22 Figure 5 .
Figure 5. Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using DF at six EC sites.The dashed line is a 1:1 line.

Figure 6 .
Figure 6.Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using DNN at six EC sites.

Figure 6 .
Figure 6.Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using DNN at six EC sites.

Figure 7 .
Figure 7. Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using RF at six EC sites.

Figure 8 .
Figure 8. Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using XGB at six EC sites.

Figure 7 . 22 Figure 7 .
Figure 7. Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using RF at six EC sites.

Figure 8 .
Figure 8. Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using XGB at six EC sites.

Figure 8 .
Figure 8. Validation of generated daily ET (including TSEB−estimated and reconstructed daily ET) using XGB at six EC sites.

Figure 9 .
Figure 9. Validation of reconstructed daily ET by (a) DF, (b) DNN, (c) RF and (d) XGB at EC sites.Only daily ET reconstructed by machine learning methods are considered.

Figure 9 .
Figure 9. Validation of reconstructed daily ET by (a) DF, (b) DNN, (c) RF and (d) XGB at EC sites.Only daily ET reconstructed by machine learning methods are considered.

Figure 10 .
Figure 10.Spatial distribution of relative uncertainties of daily ET reconstructed by four mach learning methods over Heihe River Basin.

Figure 10 .
Figure 10.Spatial distribution of relative uncertainties of daily ET reconstructed by four machine learning methods over Heihe River Basin.

Figure 11 .Figure 22 Figure 12 .
Figure 11.Plot of cumulative distribution frequency curves vs. effective coverage percentage of the daily ET.The area of the curve on the X−axis in the figure represent the missing amounts.The blue line in the figure indicates that the coverage of daily ET reconstructed by RF or DF was always 100% Figure Plot of cumulative distribution frequency curves vs. effective coverage percentage of the daily ET.The area of the curve on the X−axis in the figure represent the missing amounts.The blue line in the figure indicates that the coverage of daily ET reconstructed by RF or DF was always 100%.Remote Sens. 2024, 16, x FOR PEER REVIEW 16 of 22

Figure 12 .
Figure 12.Temporal coverage of ET estimated from (a-d) different machine learning methods and (e) original TSEB model.

Figure 12 .
Figure 12.Temporal coverage of ET estimated from (a-d) different machine learning methods and (e) original TSEB model.

Figure 13 .
Figure 13.Spatial patterns of (a) TSEB-estimated daily ET and daily ET reconstructed by (b) DF, (c) DNN, (d) XGB and (e) RF in different seasons.The white areas represent gaps.

Table 2 .
Datasets used in this study.