Integrating Latent Heat Flux Products from MODIS and Landsat Data Using Multi-Resolution Kalman Filter Method in the Midstream of Heihe River Basin of Northwest China

An accurate and spatially continuous estimation of terrestrial latent heat flux (LE) is crucial to the management and planning of water resources for arid and semi-arid areas, for which LE estimations from different satellite sensors unfortunately often contain data gaps and are inconsistent. Many integration approaches have been implemented to overcome these limitations; however, most suffer from either the persistent bias of relying on datasets at only one resolution or the spatiotemporal inconsistency of LE products. In this study, we exhibit an integration case in the midstream of the Heihe River Basin of northwest China by using a multi-resolution Kalman filter (MKF) method to develop continuous and consistent LE maps from satellite LE datasets across different resolutions. The Moderate Resolution Imaging Spectroradiometer (MODIS) LE product (MOD16), the Landsat-based LE product derived from the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) sensor, and ground observations of eddy covariance flux tower from June to September 2012 are used. The integrated results illustrate that data gaps of MOD16 dropped to less than 0.4% from the original 27–52%, and the root-mean-square error (RMSE) between the LE products decreased by 50.7% on average. Our findings indicate that the MKF method has excellent capacity to fill data gaps, reduce uncertainty, and improve the consistency of multiple LE datasets at different resolutions.


Introduction
Terrestrial latent heat flux (LE) describes the heat flux of transpiration and evaporation from the land surface to the atmosphere and plays an indispensable role in understanding the global energy balance and water cycle [1,2]. In arid and semi-arid areas, such as large portions of China, LE represents more than 70% of the annual water balance [3]. The reliable acquisition of LE is conducive to advancing a wide variety of scientific explorations and societal fields, including, but not limited to, climate forecasting [4], sustainable agriculture development [5,6], and the management of basin water resources [7,8], especially for those arid and semi-arid areas.
LE is difficult to measure and predict, however, particularly over regional-to-global scales [9,10]. With the rapid development of new technologies during the past several decades, estimating LE via satellite remote sensing provides an unprecedented opportunity to acquire temporally and spatially continuous LE data at regional-to-global scales [11]. Although there are a large amount of remotely sensed LE products available with different resolutions, uncertainties still remain among the individual LE products due to the error of algorithm inputs, the differences of model mechanisms, physical parameterization, and scaling effects [12,13]. For instance, the Moderate Resolution Imaging Spectroradiometer (MODIS) LE product (MOD16) has been substantially applied because it can cover the global land surface with moderate spatial resolution (1 km/500 m) and high temporal resolution (8 days) [14]. In relatively dry environments, however, MOD16 exhibits large uncertainty due to inadequate description of the soil moisture and the inconsistency between the meteorological inputs (e.g., average and minimum air temperature, incident photosynthetically active radiation, specific humidity) and the MODIS images [15]. In addition, some remotely sensed LE products fail to deal appropriately with complicated surface heterogeneity characteristics and the diversity of landscape types, and hence, there are data gaps in current LE products. For instance, arid and semi-arid regions with large barren or sparsely vegetated areas are often excluded from MOD16. Xiong et al. (2015) reported 70% missing data in the MOD16 of the Heihe River Basin (HRB) [16], which is situated in one of the most hydrologically vulnerable areas in the arid and semi-arid regions of northwest China [17,18]. Currently, the water scarcity conflicts with user needs, and the issue has become increasingly intense due to the increasing use of irrigation water for the midstream in the HRB [19]. To resolve the intense conflict, the first step is to better quantify water budgets at the field scale through accurate and high spatiotemporal resolution estimates of land surface LE [20]. Unfortunately, the data gaps stated above limit the comprehensive understanding of regional water budgets over these areas.
To overcome these limitations, many data fusion methods have been proposed to improve accuracy and spatiotemporal consistencies of LE data by integrating multiple products. Two major types of data integration are being used. The first integrates the input parameters and then estimates LE. Ma et al. [18] merged the input parameters (e.g., normalized difference vegetation index, leaf rea index, fractional vegetation cover, emissivity, land surface temperature) for estimating LE using the enhanced spatial and temporal adaptive reflectance fusion model (ESTARFM) and then estimated daily LE for the middle reaches of the HRB. Yet, great complexity and potential uncertainties of the inputs hinder such an integration type and its universal applicability. The other type directly integrates LE products building on various methods. For instance, Yao et al. [21] merged five LE products using the Bayesian model averaging (BMA) method and improved the accuracy of global LE estimates comparing with ground observations from over 300 flux sites. Feng et al. [22] integrated the MOD16 and the Priestley-Taylor Jet Propulsion Laboratory (PT-JPL) model [23] using an empirical orthogonal function (EOF) method. The integrated LE product showed a lowest bias of 3.67 W/m 2 and highest R 2 of 0.84 through the validation of 22 eddy covariance (EC) sites, compared with original datasets. However, these methods are generally used at the same spatial resolutions, and the scaling issues across different resolutions are rarely considered.
Recently, a multi-resolution Kalman filter (MKF) method has been explored to take advantage of multi-resolution data and to fill data gaps, by assuming a linear tree structure that is autoregressive in levels of resolution [24]. One of the MKF method's strengths lies in the integration of multiple variables across different resolutions. For instance, He et al. [25] integrated three Albedo products with multi-resolution, and the results showed that the data gaps were significantly filled based on the supporting data from other scales. Wang et al. [26] compared the MKF method with the optimal interpolation (OI) method when integrating leaf area index (LAI) products and found that the computation efficiency of the MKF method is also high, and significantly meets the requirements of large satellite-based datasets. Although the MKF method has been substantially carried out over high-level satellite products, the advantages of this method have not been applied to resolving LE data gaps, which requires further exploration, especially in arid and semi-arid areas.
Herein, given the immense data gaps of the MOD16 in arid and semi-arid areas, we integrated two satellite-based LE products (the MOD16 and Landsat-based LE products) using the MKF method, and generated spatially and temporally consistent LE products across different resolutions. The objectives of this study are: (1) to integrate the MOD16 and Landsat-based LE products using the MKF method; and (2) to assess the performance of the MKF method, including the magnitude of remaining data gaps and the accuracy of the two LE products after the integration.
The study area is situated in the midstream of the HRB (Figure 1), which is characterized by annual air temperature of 7.3 • C (1971-2000) and annual precipitation of 130.4 mm [28]. The land cover types follow the classification from Internation Geosphere-Biosphere Program (IGBP) and consist of barren and sparsely vegetated, cropland, grassland, urban and built-up, deciduous broadleaf forest, evergreen needleleaf forest, and water body. The main crops, including maize, wheat, and vegetables, are distributed along the irrigated region in the midstream. to resolving LE data gaps, which requires further exploration, especially in arid and semi-arid areas.
Herein, given the immense data gaps of the MOD16 in arid and semi-arid areas, we integrated two satellite-based LE products (the MOD16 and Landsat-based LE products) using the MKF method, and generated spatially and temporally consistent LE products across different resolutions. The objectives of this study are: (1) to integrate the MOD16 and Landsat-based LE products using the MKF method; and (2) to assess the performance of the MKF method, including the magnitude of remaining data gaps and the accuracy of the two LE products after the integration.

Study Area and Ground Observations
The Heihe River Basin (HRB), located at 97°24′-102°10′E and 37°41′-42°42′N, covers approximately 143,600 km 2 in the arid region of northwestern China [19,27]. The study area is situated in the midstream of the HRB (Figure 1), which is characterized by annual air temperature of 7.3 °C (1971-2000) and annual precipitation of 130.4 mm [28]. The land cover types follow the classification from Internation Geosphere-Biosphere Program (IGBP) and consist of barren and sparsely vegetated, cropland, grassland, urban and built-up, deciduous broadleaf forest, evergreen needleleaf forest, and water body. The main crops, including maize, wheat, and vegetables, are distributed along the irrigated region in the midstream.
To better understand how LE interacts under the heterogeneity of land surfaces across multiple spatial resolutions [29], the "Multi-Scale Observation Experiment on Evapotranspiration over heterogeneous land surfaces 2012" (HiWATER-MUSOEXE) thematic experiment was conducted in the middle reaches of the HRB from May to September 2012, involving a flux observation matrix that was composed of two nested matrices [30,31]. One was a 30 km × 30 km large experimental area (LEA) in an oasis-desert area, and the other was a 5 km × 5 km kernel experimental area (KEA) inside the oasis (Figure 3, left). As shown in Table 1, the LEA included one superstation (Daman) and four regular stations (Zhangye wetland, Shenshawo sandy desert, Huazhaizi desert steppe and Bajitan Gobi). The KEA, which was within the LEA, contained 17 stations. To better understand how LE interacts under the heterogeneity of land surfaces across multiple spatial resolutions [29], the "Multi-Scale Observation Experiment on Evapotranspiration over heterogeneous land surfaces 2012" (HiWATER-MUSOEXE) thematic experiment was conducted in the middle reaches of the HRB from May to September 2012, involving a flux observation matrix that was composed of two nested matrices [30,31]. One was a 30 km × 30 km large experimental area (LEA) in an oasis-desert area, and the other was a 5 km × 5 km kernel experimental area (KEA) inside the oasis (Figure 3, left). As shown in Table 1, the LEA included one superstation (Daman) and four regular stations (Zhangye wetland, Shenshawo sandy desert, Huazhaizi desert steppe and Bajitan Gobi). The KEA, which was within the LEA, contained 17 stations. To obtain the daily LE, half-hourly LE measured by the EC instruments was calculated, and the energy non-closure problem was fixed using the Bowen ratio closure method developed by Twine et al. (2000) [32]. Technical details of the data processing procedure can be found in Liu et al. (2013Liu et al. ( , 2018 [33,34].

MODIS LE Product
The MODIS LE product (MOD16) was modified and updated by Mu et al. [35,36] from the Penman-Monteith equation [37] adapted by Cleugh et al. [38]. The MOD16 algorithm logic separates daytime and nighttime LE, divides canopy and soil into wet and dry components, and modifies the vegetation cover with the fraction of absorbed photosynthetically active radiation. The latest version 6 of MOD16 was released in 2017 and is an eight-day composite product at 500 m pixel resolution [39]. Since MOD16 uses a sinusoidal tiling system, a reprojection was carried out onto the Universal Transverse Mercator (UTM) projection to match with the Landsat scenes using the MODIS Reprojection Tool. MODIS tile h25v05 for 2012 covering the study area was selected as the input.

Landsat-Based LE Product
The Landsat L1T data from the Landsat7 Enhanced Thematic Mapper Plus (ETM+) sensor are geometrically and radiometrically calibrated and projected under the UTM coordinate system, with a pixel size of 30 m × 30 m taken over a period of 16 days. We acquired these data from the United States Geological Survey for the period of June to September 2012 with p133r33 scenes. The Landsat ecosystem disturbance adaptive processing system (LEDAPS) tool [40] was employed to remove the radiation error. LEDAPS outputs may include some possible residual cloud and shadow pixels; hence, to match with the ground observations and reduce the risk of undetected pixel contamination, only Landsat scenes with less than 30% cloud coverage were exploited in this study.
We produced the Landsat-based LE product using the modified satellite-based Priestley-Taylor (MS-PT) algorithm [41]. This was developed on the basis of the Priestley-Taylor model and exhibits great agreement with the field-measured LE from 16 EC flux towers in China. Constrained by the normalized difference vegetation index (NDVI) and apparent thermal inertia (ATI), it avoids the Remote Sens. 2019, 11, 1787 5 of 19 computational complexities of aerodynamic resistance parameters. The inputs of the MS-PT algorithm only require NDVI, the air temperature (Ta), the diurnal air temperature range (DT) for ATI, and the net radiation (Rn). We calculated NDVI from the Landsat data and obtained Ta, DT, and Rn from the China Meteorological Forcing Dataset [42,43], which integrates multiple surface meteorological and environmental data sources with a spatiotemporal resolution of 0.1 • and 3 h. To take full use of this meteorological reanalysis dataset, we applied the daily gridded data in 2012 based on the previous work of Zhang et al. [44]. Appendix B provides detailed descriptions of the MS-PT algorithm.

The MKF Method
The data at different spatial resolutions were organized in an autoregressive tree structure ( Figure 2). All the grids covered the same area but each level of the tree served the a different resolution [45]. Each node s had a "parent" pa(s) and "children" ch(s). The scale that only consisted of one pixel was the "root" node. There are two models required in the MKF method: (1) state conversion model and (2) measurement model.
Firstly, the state conversion model represents how the multiscale stochastic process converts from coarse scale to fine scale and can be expressed using Ref. [46] where y s is the estimated value at the scale s, and y pa(s) is the variable at the parent node. A s estimates the state transition from the parent to children. W s controls the scale variability and follows a normal distribution N (0, Q(s)). Accordingly, there is an equation describing the variable at node s from its children node ch(s) [47]. Secondly, the measurement model links the satellite measurement to the "true" data: where z s is the measurement variable at node s with white noise ε s which has a normal distribution N (0, R(s)). C s is a measurement matrix and is set to the identity matrix because both satellite measurement and the variable are LE and cover the same area in this study. The state conversion model of Equation (1) and the measurement model of Equation (2) form the general framework of the MKF method. Chou et al. [48] extended a tree-like data structure to describe this algorithm. The multiresolution variables are computed from a fine-to-coarse sweep first, in which estimates are recursively propagated from the fine scale to the next coarser scale, and then a coarse-to-fine sweep, which implements from the root scale to the finest scale. The fine-to-coarse sweep is ascending to fill in the data gaps at each scale from the finer variables. The descending coarse-to-fine sweep integrates both fine and coarse variables and updates information at each scale. A technical description of the MKF method is given in Appendix A.
Our study requires several steps to implement the integration using the MKF method ( Figure 3). First, the LE products are compared against the LE measured by the eddy covariance (EC) method to evaluate the uncertainties. Second, a basic assumption of the MKF method is zero-mean in the spatial process of each variable, and thereby it is required to extract the trend surface of the LE products. Building on the detrended LE data, we obtain the observation error ε s at the finest scale using the standard deviation of the relative difference between the finest data and the ground observations derived from the HiWATER-MUSOEXE thematic experiment. For the observation error at other scales, we estimate the relative difference between the current nodes and the aggregated data from their children nodes. The errors were postulated to be invariant to time and location in the study area to simplify the problem. Third, starting from the finest scale, the fine-to-coarse sweep is implemented to fill all data gaps at each scale. When proceeding to the root node, the descending coarse-to-fine sweep is applied to update the variables into the optimal estimation at each scale. Landsat-based LE product serves at the finest scale, and MOD16 is in the middle of the tree structure in this study. Finally, we add the trend surface back to the optimal estimation, and then two LE products can be smoothed and consistent at different resolutions.
Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 19 estimation at each scale. Landsat-based LE product serves at the finest scale, and MOD16 is in the middle of the tree structure in this study. Finally, we add the trend surface back to the optimal estimation, and then two LE products can be smoothed and consistent at different resolutions.

Assessment Metrics
The performance of the MKF method is assessed with three criteria for quantification in this study: Bias, the root-mean-square error (RMSE), and the relative RMSE in percentage terms (RMSE %). The bias describes the mean difference between the observed values and the estimated values. The RMSE reflects how reliable the model is; the larger the RMSE, the lower the model performance. The RMSE % provides the relative error of the model and is sensitive to outliers. These criteria are given by estimation at each scale. Landsat-based LE product serves at the finest scale, and MOD16 is in the middle of the tree structure in this study. Finally, we add the trend surface back to the optimal estimation, and then two LE products can be smoothed and consistent at different resolutions.

Assessment Metrics
The performance of the MKF method is assessed with three criteria for quantification in this study: Bias, the root-mean-square error (RMSE), and the relative RMSE in percentage terms (RMSE %). The bias describes the mean difference between the observed values and the estimated values. The RMSE reflects how reliable the model is; the larger the RMSE, the lower the model performance. The RMSE % provides the relative error of the model and is sensitive to outliers. These criteria are given by

Assessment Metrics
The performance of the MKF method is assessed with three criteria for quantification in this study: Bias, the root-mean-square error (RMSE), and the relative RMSE in percentage terms (RMSE %). The bias describes the mean difference between the observed values and the estimated values. The RMSE reflects how reliable the model is; the larger the RMSE, the lower the model performance. The RMSE % provides the relative error of the model and is sensitive to outliers. These criteria are given by Remote Sens. 2019, 11, 1787 where X obs and X est describe the observed values and estimated values, respectively.

Integration of Satellite-Derived LE Products
The MKF method is designed to address variables having an expectation of zero mean in the spatial process at each scale. However, the LE products generally cannot satisfy the requirement of the MKF method, and hence, it is necessary to select a proper detrending method to extract the spatial trend surface from the original data. Previous research has presented several ways to do so, such as spline fitting and lognormal space [49,50]. These methods are quite time-consuming [51], and thus, Shi et al. [52] found a simple average way that is suitable for the MKF method after comparing two detrending methods. We applied this method to extract the trend surface and added the updated spatial residual back to the trend surface to generate the integrated LE data.
The fine-to-coarse sweep and the coarse-to-fine sweep of the MKF method were implemented on the Landsat-based LE product and the MOD16 product. Building on the availability of the clear scenes with less than 30% contaminated pixels, the original LE products and the corresponding integrated results were compared for time series from DOY (day of year) 177 to DOY193 and DOY 217 to DOY 249 for 2012 in the midstream of the HRB (Figure 4). Eight integration cases in total were exhibited in this time range. Cloud/shadow pixels and data gaps were masked with dark blue color (0 value) in two LE products. One may note that the same Landsat scenes might be shared by the adjacent DOY cases.
where Xobs and Xest describe the observed values and estimated values, respectively.

Integration of Satellite-Derived LE Products
The MKF method is designed to address variables having an expectation of zero mean in the spatial process at each scale. However, the LE products generally cannot satisfy the requirement of the MKF method, and hence, it is necessary to select a proper detrending method to extract the spatial trend surface from the original data. Previous research has presented several ways to do so, such as spline fitting and lognormal space [49,50]. These methods are quite time-consuming [51], and thus, Shi et al. [52] found a simple average way that is suitable for the MKF method after comparing two detrending methods. We applied this method to extract the trend surface and added the updated spatial residual back to the trend surface to generate the integrated LE data.
The fine-to-coarse sweep and the coarse-to-fine sweep of the MKF method were implemented on the Landsat-based LE product and the MOD16 product. Building on the availability of the clear scenes with less than 30% contaminated pixels, the original LE products and the corresponding integrated results were compared for time series from DOY (day of year) 177 to DOY193 and DOY 217 to DOY 249 for 2012 in the midstream of the HRB (Figure 4). Eight integration cases in total were exhibited in this time range. Cloud/shadow pixels and data gaps were masked with dark blue color (0 value) in two LE products. One may note that the same Landsat scenes might be shared by the adjacent DOY cases.   For MOD16, there were approximately 27-52% obvious data gaps before the integration, mostly in barren or sparsely vegetated areas. After employing the MKF method, this decreased to less than 0.4% on average and MOD16 became consistent and smooth compared to the original. Moreover, the spatial distribution of MOD16 is more in line with that of the Landsat, which means two LE products remain consistent across different resolutions. Generally, data at finer resolution provide more detailed information of the spatial random processes, whereas coarser data represent average information. Landsat is capable of capturing details of the LE spatial distribution and fills the gaps for MOD16. However, there are still few missing values in MOD16 and we noticed that those abrupt changes mainly came from the margin of the original data gaps, because all of the children nodes were calculated and transferred to their sharing parent nodes, rather than just a For MOD16, there were approximately 27-52% obvious data gaps before the integration, mostly in barren or sparsely vegetated areas. After employing the MKF method, this decreased to less than 0.4% on average and MOD16 became consistent and smooth compared to the original. Moreover, the spatial distribution of MOD16 is more in line with that of the Landsat, which means two LE products remain consistent across different resolutions. Generally, data at finer resolution provide more detailed information of the spatial random processes, whereas coarser data represent average information. Landsat is capable of capturing details of the LE spatial distribution and fills the gaps for MOD16. However, there are still few missing values in MOD16 and we noticed that those abrupt changes mainly came from the margin of the original data gaps, because all of the children nodes were calculated and transferred to their sharing parent nodes, rather than just a single child node. Owing to the scale effect and the spatial heterogeneity, the original data gaps could have a negative impact on the integration performance.
For the Landsat-based LE product, MKF integration lowers Landsat data in the DOY 233, DOY 241, and DOY 249 cases due to the large difference between Landsat and MOD16. This hints that data at coarser resolutions could also bring information to finer data. The MKF method could adjust data at all scales to maintain consistency and smoothness. However, there are smaller changes than for MOD16, indicating that the fine-to-coarse sweep, which aims to fill the gaps, plays a major role, and the coarse-to-fine sweep did not show significant effects within the implementation of the MKF-based integration. This is due to the fact that the original data gaps of MOD16 were filled from the Landsat in the ascending process, MOD16 did not adjust the variables much for the finer scale, and Landsat retained the values of the ascending process when descending.

Spatial Assessment of the MKF Integration Performance
Integration performance was compared by aggregating Landsat into a MODIS-like scale to unify the matrix size. We drew the histogram of the difference between the aggregated Landsat and MOD16 before and after the MKF method to present the integration performance. The difference was calculated as MOD16 minus aggregated Landsat. Owing to a number of the data gaps, the missing pixels from MOD16 were all removed to avoid potential errors and only valid pixels were concerned when calculating the difference between the aggregated Landsat and MOD16. The filled pixels of MOD16 after MKF integration were verified individually.
The distribution of the difference between the two LE products is indicated by gray bars (Figure 5), which were relatively scattered showing large inconsistency before the integration. After the MKF method, the red bars were significantly concentrated, and the difference was restricted to ±10 W/m 2 in most cases. The outliers with absolute difference larger than 20 W/m 2 were more than approximately 14% of the total, whereas the numbers of those outliers dropped to less than 1% after the integration. This hints that the two LE products were made more consistent across different resolutions through the MKF method.
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 19 single child node. Owing to the scale effect and the spatial heterogeneity, the original data gaps could have a negative impact on the integration performance. For the Landsat-based LE product, MKF integration lowers Landsat data in the DOY 233, DOY 241, and DOY 249 cases due to the large difference between Landsat and MOD16. This hints that data at coarser resolutions could also bring information to finer data. The MKF method could adjust data at all scales to maintain consistency and smoothness. However, there are smaller changes than for MOD16, indicating that the fine-to-coarse sweep, which aims to fill the gaps, plays a major role, and the coarse-to-fine sweep did not show significant effects within the implementation of the MKF-based integration. This is due to the fact that the original data gaps of MOD16 were filled from the Landsat in the ascending process, MOD16 did not adjust the variables much for the finer scale, and Landsat retained the values of the ascending process when descending.

Spatial Assessment of the MKF Integration Performance
Integration performance was compared by aggregating Landsat into a MODIS-like scale to unify the matrix size. We drew the histogram of the difference between the aggregated Landsat and MOD16 before and after the MKF method to present the integration performance. The difference was calculated as MOD16 minus aggregated Landsat. Owing to a number of the data gaps, the missing pixels from MOD16 were all removed to avoid potential errors and only valid pixels were concerned when calculating the difference between the aggregated Landsat and MOD16. The filled pixels of MOD16 after MKF integration were verified individually.
The distribution of the difference between the two LE products is indicated by gray bars (Figure 5), which were relatively scattered showing large inconsistency before the integration. After the MKF method, the red bars were significantly concentrated, and the difference was restricted to ±10 W/m 2 in most cases. The outliers with absolute difference larger than 20 W/m 2 were more than approximately 14% of the total, whereas the numbers of those outliers dropped to less than 1% after the integration. This hints that the two LE products were made more consistent across different resolutions through the MKF method. To further quantify how well the MKF method performs, several assessment criteria were used ( Table 2). Compared to the original difference between the aggregated Landsat and MOD16, the difference after MKF integration was decreased and the RMSE greatly reduced, by 50.7% on average. The lower RMSE % also implied fewer outliers. Thus, it is safe to conclude that the MKF method is able to smooth data and reduce the inconsistency of different LE products across different resolutions. However, the absolute bias increased after MKF integration, which is due to the fact that the differences of the two original LE products were substantially scattered and positive values offset negative values. After integration, the bias was more concentrated and persistent relying on the physical structure of the algorithm. Hence, we contend that the integration performance would be limited if the inputs largely deviate from each other. The unit for bias and RMSE is W/m 2 . To further quantify how well the MKF method performs, several assessment criteria were used ( Table 2). Compared to the original difference between the aggregated Landsat and MOD16, the difference after MKF integration was decreased and the RMSE greatly reduced, by 50.7% on average. The lower RMSE % also implied fewer outliers. Thus, it is safe to conclude that the MKF method is able to smooth data and reduce the inconsistency of different LE products across different resolutions. However, the absolute bias increased after MKF integration, which is due to the fact that the differences of the two original LE products were substantially scattered and positive values offset negative values. After integration, the bias was more concentrated and persistent relying on the physical structure of the algorithm. Hence, we contend that the integration performance would be limited if the inputs largely deviate from each other.

Evaluation of the MKF Performance Using Ground Observations
Two groups of data validations were plotted in scatterplots showing the accuracy of the LE products before and after MKF ( Figure 6). The original missing pixels from MOD16 were still excluded to avoid potential error.

Evaluation of the MKF Performance Using Ground Observations
Two groups of data validations were plotted in scatterplots showing the accuracy of the LE products before and after MKF ( Figure 6). The original missing pixels from MOD16 were still excluded to avoid potential error. After the MKF-based integration, MOD16 was highly improved with a decrease in RMSE% from 36.87% to 27.44%, a decrease in RMSE from 26.59 W/m 2 to 21.22 W/m 2 , a fall in bias from 9.33 W/m 2 to 9.27 W/m 2 , and an increase in R 2 from 0.46 to 0.58. Likewise, the RMSE% of the integrated Landsat data decreased from 42.98% to 40.42%, the RMSE fell from 34.15 W/m 2 to 32.5 W/m 2 , the bias decreased from 4.33 W/m 2 to 4.3 W/m 2 , and the R 2 improved from 0.63 to 0.67. The slight improvement illustrates that the coarse-to-fine sweep of the MKF method offered few updates for Landsat, and the uncertainty of the original LE products might be further propagated to the integrated LE data.
The original missing pixels, after being filled, were compared against ground observations (Figure 7). These missing pixels were mainly distributed in three sites: Shenshawo sandy desert, site 5, and site 11, covering barren land and cropland. After MKF-based integration, the filled pixels of MOD16 exhibited that the RMSE% was 38.14%, the RMSE was 30.04 W/m 2 , the bias was 7.82 W/m 2 , and the R 2 was 0.57. Note that there is obvious underestimation of MOD16, which mainly comes from cropland, and was actually introduced by Landsat data that underestimated cropland during the fine-to-coarse sweep of the MKF method. After the MKF-based integration, MOD16 was highly improved with a decrease in RMSE% from 36.87% to 27.44%, a decrease in RMSE from 26.59 W/m 2 to 21.22 W/m 2 , a fall in bias from 9.33 W/m 2 to 9.27 W/m 2 , and an increase in R 2 from 0.46 to 0.58. Likewise, the RMSE% of the integrated Landsat data decreased from 42.98% to 40.42%, the RMSE fell from 34.15 W/m 2 to 32.5 W/m 2 , the bias decreased from 4.33 W/m 2 to 4.3 W/m 2 , and the R 2 improved from 0.63 to 0.67. The slight improvement illustrates that the coarse-to-fine sweep of the MKF method offered few updates for Landsat, and the uncertainty of the original LE products might be further propagated to the integrated LE data.
The original missing pixels, after being filled, were compared against ground observations (Figure 7). These missing pixels were mainly distributed in three sites: Shenshawo sandy desert, site 5, and site 11, covering barren land and cropland. After MKF-based integration, the filled pixels of MOD16 exhibited that the RMSE% was 38.14%, the RMSE was 30.04 W/m 2 , the bias was 7.82 W/m 2 , and the R 2 was 0.57. Note that there is obvious underestimation of MOD16, which mainly comes from cropland, and was actually introduced by Landsat data that underestimated cropland during the fine-to-coarse sweep of the MKF method. Other than filling gaps and yielding consistent LE data across different resolutions, the MKF method therefore exhibited a capacity for improving the accuracy for LE products through the validations above. Given that the overestimation showed better agreement with EC ground observations than the underestimation in MKF-integrated MOD16, and there was similar and less-changed underestimation in Landsat, it implied that the data uncertainty also propagated during the MKF implementation. For maximum accuracy of the integrated results, we propose that the inputs for MKF integration cannot have great differences among each other.

Uncertainty Analysis
Three problems are associated with the integrated result validations: The mechanism of MKF integration, scaling effects, and the errors from inputs, including of the original LE products and the EC ground observations. First, integrating and adjusting pixel values in the MKF method suggests a high dependence on the weighted values derived from the uncertainty at each scale, which implies that the smaller the uncertainty of an estimate, the larger the influence of that estimate in the integrating process. In this study, the uncertainties that came from satellite characteristics and the estimated algorithms were invariant in temporal and spatial variations to simplify the problem and hence, the matrix of the weighted values might be imperfect.
Second, the mismatch in scale among different datasets might lead to uncertainty of the MKF-integrated result evaluation. The ground measurements represent a point-based value, and the footprint of the flux tower site is approximately several hundred meters, while the spatial resolution of the Landsat-based LE product is only 30 m [53]. The direct evaluation with the satellite-based images may be unreasonable. It is possible that the spatial uncertainty is further reinforced in heterogeneous areas such as the midstream of the HRB. The abrupt changes in integrated MOD16 indicated the original data gaps were also considered as the children nodes when transferring to the next scale under scaling effects.
Finally, the input errors are an important issue. Landsat-based LE products exhibited the discrepancies of different land cover types and underestimated in the growing season, especially for cropland and wetland (e.g., 1-17 sites and Zhangye wetland) situated in the irrigated artificial oasis.   [54] evaluated the ability of the MS-PT algorithm using the ground observations from 40 global flux towers; results showed that the R 2 varied from 0.4 to 0.7 and the RMSE varied from approximately 37.5 W/m 2 to 44.4 W/m 2 for cropland and wetland. This is consistent with our validation results (i.e., R 2 is 0.67 and RMSE is 32.5 W/m 2 ). Neglecting the differences in parameters from different biome types and complex terrain may cause the errors [55], and such findings are in accordance with previous studies over northeast China [44]. The errors could be reduced by 5-25% if calibrating the Priestley-Taylor coefficients controlled by the ground observations towards different biome types and climatic zones [56]. For MOD16, underestimation also exists because the soil moisture constraint in MOD16 applied relatively humidity (RH) and Other than filling gaps and yielding consistent LE data across different resolutions, the MKF method therefore exhibited a capacity for improving the accuracy for LE products through the validations above. Given that the overestimation showed better agreement with EC ground observations than the underestimation in MKF-integrated MOD16, and there was similar and less-changed underestimation in Landsat, it implied that the data uncertainty also propagated during the MKF implementation. For maximum accuracy of the integrated results, we propose that the inputs for MKF integration cannot have great differences among each other.

Uncertainty Analysis
Three problems are associated with the integrated result validations: The mechanism of MKF integration, scaling effects, and the errors from inputs, including of the original LE products and the EC ground observations. First, integrating and adjusting pixel values in the MKF method suggests a high dependence on the weighted values derived from the uncertainty at each scale, which implies that the smaller the uncertainty of an estimate, the larger the influence of that estimate in the integrating process. In this study, the uncertainties that came from satellite characteristics and the estimated algorithms were invariant in temporal and spatial variations to simplify the problem and hence, the matrix of the weighted values might be imperfect.
Second, the mismatch in scale among different datasets might lead to uncertainty of the MKF-integrated result evaluation. The ground measurements represent a point-based value, and the footprint of the flux tower site is approximately several hundred meters, while the spatial resolution of the Landsat-based LE product is only 30 m [53]. The direct evaluation with the satellite-based images may be unreasonable. It is possible that the spatial uncertainty is further reinforced in heterogeneous areas such as the midstream of the HRB. The abrupt changes in integrated MOD16 indicated the original data gaps were also considered as the children nodes when transferring to the next scale under scaling effects.
Finally, the input errors are an important issue. Landsat-based LE products exhibited the discrepancies of different land cover types and underestimated in the growing season, especially for cropland and wetland (e.g., 1-17 sites and Zhangye wetland) situated in the irrigated artificial oasis.   [54] evaluated the ability of the MS-PT algorithm using the ground observations from 40 global flux towers; results showed that the R 2 varied from 0.4 to 0.7 and the RMSE varied from approximately 37.5 W/m 2 to 44.4 W/m 2 for cropland and wetland. This is consistent with our validation results (i.e., R 2 is 0.67 and RMSE is 32.5 W/m 2 ). Neglecting the differences in parameters from different biome types and complex terrain may cause the errors [55], and such findings are in accordance with previous studies over northeast China [44]. The errors could be reduced by 5-25% if calibrating the Priestley-Taylor coefficients controlled by the ground observations towards different biome types and climatic zones [56]. For MOD16, underestimation also exists because the soil moisture constraint in MOD16 applied relatively humidity (RH) and vapor pressure deficit (VPD) as indicators of water stress [44]. During the irrigation period, the soil water content in the surface and root layers is generally high and the overestimation of water stress, parameterized by RH and atmospheric VPD, brought about the underestimation of the soil evaporation in the arid and semi-arid midstream of the HRB [57]. Researchers drew similar conclusions in China [58], the United States [59], eastern Asia [60], and Brazil [61] when evaluating MOD16. Large errors and uncertainty may be introduced by these approximations and assumptions. Moreover, the integration evaluation greatly relies on the accuracy of EC ground observations, which were considered as true LE values in this study. However, Wang et al. (2015) [62] reported 16% uncertainty in LE observations derived from the HiWATER-MUSOEXE flux matrix. Even though we corrected the energy imbalance, errors that were caused by complexity in wind variation, footprint representation, and sampling variability are still unclear [23,63]. These corrections still cause large errors of EC ground observations in results [64].

Superiority and Recommendation for the MKF integration
We propose that the MKF method is scalable to other areas and applicable to serve other integrations of satellite LE products suffering from the issues mentioned in this study for two reasons, namely, high efficiency and comprehensive consideration of multiple datasets' advantages across different resolutions. The MKF method executes mere seconds to complete an integration case and is very time efficient, whereas optimal interpolation usually requires several hours [26]. Additionally, the great superiority of the MKF method is integrating multiple satellite datasets at different resolutions. Since data at different resolutions capture various information, we filled the data gaps and retained those data consistent with their own resolution in this study. Previous research showed that the MKF method has the capacity to fill Albedo data gaps derived from a multi-angle imaging spectroradiometer (MISR) and to maintain their consistency in spatial processes [25].
For integrating multiple satellite datasets at different resolutions and taking full use of the MKF method, we recommend that one original input should be relatively precise, at least due to mutual propagation because the performance of the MKF method highly depends on how accurate the uncertainty estimation of the datasets is and how largely those datasets deviate from each other. Moreover, we evaluated two LE products with ground observations and estimated the weighted values for each scale based on the uncertainty of each product. To mitigate the scaling effects and extend this approach to other areas, especially for heterogeneous land surfaces, the observation uncertainty matrix might be more accurate for the integration process when referring to a "true" LE map instead of point-based values. This forms the foundation of our ongoing research.

Conclusions
Accurate and spatiotemporally consistent satellite-driven LE products are essential to quantify and improve regional water budgets, especially for arid and semi-arid areas. Current individual LE products, however, exhibit spatiotemporal discontinuity and uncertainties due to sensor characteristics or the physical structure of the LE estimating algorithm. This study showcased our efforts to overcome the current limitations by integrating satellite-driven LE products at different resolutions.
We presented an integration study of the midstream of the HRB of northwest China using the MKF method, which aims to improve current LE products from multiple datasets at different resolutions and produce spatiotemporal continuous LE products. We integrated the MOD16 product and Landsat-based LE product and demonstrated the merits of this approach in integrating datasets across different resolutions.
Our comparison and validation results indicated that the data gaps of MOD16 were filled, the integrated LE products were more continuous in their spatial distribution, and the accuracies of LE products were improved. The MKF method thereby shows great potential to take full use of data at various resolutions and to generate consistent LE products in other areas that suffer from the same issues.
This study used ground observations to estimate the uncertainties of LE products and to further acquire weighted values as background values for scale conversion. However, scaling effects might lead to negative influences for MKF integration. Future work will focus on accurately evaluating the uncertainties of the LE products as well as estimating and generating more precise LE products for heterogeneous land surfaces.
Author Contributions: J.X. and Y.Y. designed the work and prepared the manuscript. K.T. revised the manuscript. Y.L. and S.L. corrected and improved the manuscript. K.S., K.J., and X.Z. provided the data and conducted the remotely sensed image processing. X.C. and X.B. contributed by providing additional analysis to the discussion. Every child node provides an estimate to their parent nodes, where the children of a node s are denoted by s∂ i , i = 1, . . . , q. According to the state conversion model,ŷ(s ∂ i |s ∂ i ) can be estimated from each child to predict y(s):ŷ (s|s ∂ i ) = A(s ∂ i )ŷ(s ∂ i |s ∂ i ) (A6) All the predicted nodes from the lower scale are integrated into the parent nodes and the integrated y(s|s+) takes the form:ŷ (s|s+) = P(s|s+) The optimal estimator consists of a weighted sum of the estimates from its children. It is implied from (A7) that the larger the uncertainty P(s) of an estimate, the smaller the adjustment influence of that estimate in the ascending sweep. Through (A1) to (A7), predicted values from the state conversion model and the observations from the measurement model are recursively integrated.