Reconstruction of High-Temporal-and High-Spatial-Resolution Reﬂectance Datasets Using Di ﬀ erence Construction and Bayesian Unmixing

: High-temporal-and high-spatial-resolution reﬂectance datasets play a vital role in monitoring dynamic changes at the Earth’s land surface. So far, many sensors have been designed with a trade-o ﬀ between swath width and pixel size; thus, it is di ﬃ cult to obtain reﬂectance data with both high spatial resolution and frequent coverage from a single sensor. In this study, we propose a new Reﬂectance Bayesian Spatiotemporal Fusion Model (Ref-BSFM) using Landsat and MODIS (Moderate Resolution Imaging Spectroradiometer) surface reﬂectance, which is then used to construct reﬂectance datasets with high spatiotemporal resolution and a long time series. By comparing this model with other popular reconstruction methods (the Flexible Spatiotemporal Data Fusion Model, the Spatial and Temporal Adaptive Reﬂectance Fusion Model, and the Enhanced Spatial and Temporal Adaptive Reﬂectance Fusion Model), we demonstrate that our approach has the following advantages: (1) higher prediction accuracy, (2) e ﬀ ective treatment of cloud coverage, (3) insensitivity to the time span of data acquisition, (4) capture of temporal change information, and (5) higher retention of spatial details and inconspicuous MODIS patches. Reﬂectance time-series datasets generated by Ref-BSFM can be used to calculate a variety of remote-sensing-based vegetation indices, providing an important data source for land surface dynamic monitoring.


Introduction
Dense time-series data from satellites are commonly used to study dynamics of the land surface at large spatial scales, such as monitoring phenological changes of land surface vegetation, evaluating occurrence of natural disasters, mapping distribution of land features, and estimating crop yields [1][2][3]. However, in heterogeneous regions with a larger number of land-cover types, change-monitoring studies require a long time series of satellite data with higher spatial resolution to more accurately determine the timing and characteristics of the changes. In recent years, time-series-based research has become very popular, as more free satellite images have been made publicly available and increasingly effective algorithms have been developed [4], providing a reliable data source for research and application. For example, high-resolution Landsat data, Sentinel data, and the GF (GaoFen, meaning "high resolution" in Chinese) series satellite data are all readily available Remote Sens. 2020, 12 in China. There are a number of publicly available remote sensing data-processing platforms for general use. For example, the Google Earth Engine cloud computing platform, through the use of a combination of user-uploaded algorithms and online data, can easily and quickly produce time-series data [5].
Given the limitations of current satellites, a trade-off must be made between a sensor's swath width and pixel size [6]. For instance, Landsat TM (Thematic Mapper), ETM+ (Enhanced Thematic Mapper Plus), and OLI (Operational Land Imager) provide 30 m spatial resolution data, which successfully capture the spatial details of the surface features; however, the satellite's 16 day return period makes it difficult to ensure that acquired data are cloud-free and high-quality, limiting application of these data to surface change monitoring [7,8]. Many studies have shown that the average cloud coverage of Landsat TM/ETM+ sensors is 35%, and, in some humid and cloudy areas, only 2-3 periods of cloud-free data can be obtained per year [9,10]. In contrast, the Moderate-Resolution Imaging Spectrometer (MODIS) image has a daily revisit cycle, and some algorithms produce data at daily or 8 day time steps. Its spatial resolution, however, is not fine enough to capture the spatial details required over heterogeneous regions. In the past decade, a number of spatiotemporal data fusion algorithms have been developed to combine satellite images such as these to generate synthetic data with both high spatial and high temporal resolution [11][12][13][14][15][16][17][18][19][20][21][22][23][24].
In previous studies, spatiotemporal data fusion algorithms were divided into four groups on the basis of unmixing, weight function, Bayesian, or dictionary-pair learning [4]. Unmixing methods regard a coarse-resolution pixel to be a combination of several fine-resolution pixels. By unmixing the coarse pixels using linear spectral mixing, fine-resolution pixel values are obtained. The unmixing-based multisensor multiresolution image fusion method (MMT) demonstrates the concept according to the following procedure: (1) determine endmember components by classifying the fine-resolution image, (2) calculate endmember component fractions for the coarse pixel, and (3) unmix the coarse pixels according to these endmember fractions [11]. However, in this method, within-class variability is ignored. The Spatial Temporal Data Fusion Approach, an improved version of MMT, estimates reflectance change by unmixing the endmember reflectance in a sliding window of images from the input date and the predicted date. This estimated change is added to the fine-resolution image at the input date to obtain reflectance prediction [12,13]. The Spatial and Temporal Reflectance Unmixing Model estimates the change in reflectance by unmixing the change for coarse pixels, applying Bayesian theory to constrain the estimated value for the change of the endmembers [14].
The greatest number of algorithms has been developed for data fusion based on a weighting function [4]. These methods estimate the pixel value of the fine image by combining the information of all the input images through a weighting function. The Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) is the first model to provide a freely available source code, and many data fusion models have been developed to improve this method. It assumes that changes in reflectance in coarse-resolution and fine-resolution images are consistent and comparable. When a coarse pixel includes only one land-cover type, the change obtained from the coarse pixel can be directly added to the pixels in the fine-resolution image. When coarse-resolution pixels contain several different land-cover types, the algorithm uses a weighting function based on the information from adjacent fine pixels to assign a higher weight to pure coarse pixels, so that the value of fine pixels can be predicted [15]. However, the STARFM method does not perform well in heterogeneous regions. To solve this problem, the Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model (ESTARFM) was developed to improve the ability of the STARFM algorithm to capture spatial heterogeneity. It does so by calculating different conversion coefficients for homogeneous and heterogeneous pixels in the prediction process, and it uses two fine-resolution images as inputs, thereby improving the accuracy in study areas where the spatial dimension changes greatly [16]. The Flexible Spatiotemporal Data Fusion Model (FSDAF) is actually a combination of unmixing-based methods and weighting-function-based methods. It uses the concept of spatial interpolation and also assumes that reflectance changes for the same land-cover type are the same. Then, it estimates the reflectance change by solving a linear mixed equation [17].
In the NDVI (Normalized Difference Vegetation Index) Bayesian Spatiotemporal Fusion Mode (NDVI-BSFM), a similar weighting function is used, but the estimated value of the fine pixel is obtained using Bayesian mixed-pixel unmixing [18]. The Spatiotemporal Bayesian Data Fusion method (STBDF) incorporates the temporal correlation information into the time series of images and transforms the fusion problem into a maximum posterior estimation problem [19]. The Improved Spatiotemporal Bayesian Data Fusion Method I and II (ISTBDF-I and -II) incorporate an unmixing-based algorithm into the existing STBDF framework, which improves performance of STBDF method in heterogeneous regions [20].
The method based on dictionary-pair learning determines the relationship between the observed pairs of coarse-and fine-resolution images, and then calculates the fine image at the predicted time. Until now, many machine learning or deep learning algorithms have been used for spatiotemporal data fusion, such as random forest, deep convolutional neural networks, and artificial neural networks [21,22]. The Sparse-Representation-Based Spatiotemporal Reflectance Fusion Model introduces dictionary-pair learning technology into spatiotemporal data fusion modeling and establishes the correspondence of changes between coarse and fine image pairs [23]. The Wavelet-Artificial Intelligence Fusion approach combines wavelet transform and artificial neural network to deal with nonlinear characteristics of land surface temperature data and successfully fuses land surface temperature data from MODIS with those of Landsat [24].
In summary, a number of methods have been used to generate high-spatial-resolution reflectance datasets with a shorter time step, although improvements are still needed. Our objective in this study was to construct a high-spatiotemporal-resolution reflectance dataset with greatly improved accuracy and versatility by making improvements to the NDVI-BSFM data fusion model, hoping that this model can be expanded in the future from a single fusion NDVI dataset to a fusion of multiple-band reflectance datasets.

Study Area
In this study, two regions with different size were selected to examine the performance of the fusion results over different regions. We first applied this model to a small area to verify prediction accuracy for spatial details, that is, local performance, and then applied this model to a larger area to verify the global performance.

Study Area of Huailai
To test our algorithm, we conducted the first experiment over a small area in Huailai County, Hebei Province, China, located around the experimental station of the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences (40 • 21 north (N), 115 • 47 east (E); Figure 1). The remote sensing image covering this area was 608 × 512 Landsat pixels, and this study area mainly consisted of forests, grass, farmland, and shrub ( Figure 1).

Study Area in Hebei
To verify the global performance, experiments were conducted in the northwest of Hebei Province ( Figure 2). We chose a Landsat tile (WRS (Worldwide Reference System) -2 path 124 and row 31) as the larger area. The tile was 7582 × 7827 pixels in size at a 30 m scale, its typical elevation was approximately 480 m, and the area had a temperate continental climate. The study area was mainly made up of forests, farmland predominantly planted with corn, and grass.

Study Area in Hebei
To verify the global performance, experiments were conducted in the northwest of Hebei Province ( Figure 2). We chose a Landsat tile (WRS (Worldwide Reference System) -2 path 124 and row 31) as the larger area. The tile was 7582 × 7827 pixels in size at a 30 m scale, its typical elevation was approximately 480 m, and the area had a temperate continental climate. The study area was mainly made up of forests, farmland predominantly planted with corn, and grass.

Study Area in Hebei
To verify the global performance, experiments were conducted in the northwest of Hebei Province ( Figure 2). We chose a Landsat tile (WRS (Worldwide Reference System) -2 path 124 and row 31) as the larger area. The tile was 7582 × 7827 pixels in size at a 30 m scale, its typical elevation was approximately 480 m, and the area had a temperate continental climate. The study area was mainly made up of forests, farmland predominantly planted with corn, and grass.   The 30 m spatial resolution Landsat 8/OLI land surface reflectance product was downloaded from the United States Geological Survey website. This product is atmospherically corrected by the L8SR program [25,26]. We collected six images for Landsat from 2014 (DOY (Day of Year) = 94, 206, 238, 270, 286, and 318). Among them, DOY 94 and 238 were one image pair as the model input, and DOY 206 was a model verification of this pair of images. Similarly, DOY 270 and 318 represented another image pair, and DOY 286 was a verification (Figure 3). The MCD43A4 NBAR (Nadir Bidirectional Reflectance Distribution Function (BRDF) Adjusted Reflectance) product is composed of land surface reflectance in the zenith direction, which is produced using MODIS data and adjusted using BRDF. The solar zenith angle used by each NBAR product is the median value of the solar zenith angle corresponding to the 16 day cumulative observation. Since the reflectance of all pixels is adjusted to the observation in the zenith direction, the effect of viewing angle on reflectance is eliminated, and the resulting product is more stable and consistent. MCD43A4 provides 500 m resolution data products for the first to seventh bands [27]. We collected all the MCD43A4 data from 2010 to 2014, and there were 46 images in each year.

Data Preparation
In addition, we obtained land-cover data for these two areas for both Landsat and MODIS. The Landsat land-cover dataset was from the 1:100,000 land-use vector map in 2010 provided by the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, and we rasterized it into 30 m spatial resolution images [28]. We also used MCD12Q1 Land_Cover_Type_5 acquired in 2010 to identify the land-cover type for each MODIS pixel. All MODIS data were obtained from the Land Processes Distributed Active Archive Center (LP DAAC) [  The MCD43A4 NBAR (Nadir Bidirectional Reflectance Distribution Function (BRDF) Adjusted Reflectance) product is composed of land surface reflectance in the zenith direction, which is produced using MODIS data and adjusted using BRDF. The solar zenith angle used by each NBAR product is the median value of the solar zenith angle corresponding to the 16 day cumulative observation. Since the reflectance of all pixels is adjusted to the observation in the zenith direction, the effect of viewing angle on reflectance is eliminated, and the resulting product is more stable and consistent. MCD43A4 provides 500 m resolution data products for the first to seventh bands [27]. We collected all the MCD43A4 data from 2010 to 2014, and there were 46 images in each year.
In addition, we obtained land-cover data for these two areas for both Landsat and MODIS. The Landsat land-cover dataset was from the 1:100,000 land-use vector map in 2010 provided by the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, and we rasterized it into 30 m spatial resolution images [28]. We also used MCD12Q1 Land_Cover_Type_5 acquired in 2010 to identify the land-cover type for each MODIS pixel. All MODIS data were obtained from the Land Processes Distributed Active Archive Center (LP DAAC) [29]. To make the datasets consistent, they were all re-projected and geometrically registered. The MODIS MCD43A4 and MCD12Q1 products with 500 m spatial resolution were in the Integerized Sinusoidal projection; hence, we used MRT (MODIS Reprojection Tools) [30] to convert MODIS data to the same projection as Landsat data (UTM (Universal Transverse Mercator) WGS84). For convenience of calculation, all MODIS data were resampled to a spatial resolution of 480 m [17,18]. It should be noted that this process caused some data loss, which is inevitable in many current data fusion models. Generally speaking, this error has little effect on the fusion result [15,16]; thus, we ignored it.
(2) Spectral Standardization Differences in the spectral response function of the sensor, transit time, and shooting angle meant that, even though the observation was the same object on the surface, the land surface reflectance data were not consistent. Therefore, we performed standardized processing on the multi-source data so that the parameters for each sensor used the same reference. Because the sensors have different applications, the wavelength settings were also different ( Figure 4). The figure on the left (Figure 4a) compares the spectral response functions of the two sensors for the red band. The figure on the right (Figure 4b) compares the spectral response functions of the near-infrared band. The blue curve represents the MODIS sensor, and the red curve represents the Landsat OLI sensor. There were some differences between the spectral response functions of the two sensors in the same band. Therefore, we normalized the spectral response function between sensors [16].
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 28 To make the datasets consistent, they were all re-projected and geometrically registered. The MODIS MCD43A4 and MCD12Q1 products with 500 m spatial resolution were in the Integerized Sinusoidal projection; hence, we used MRT (MODIS Reprojection Tools) [30] to convert MODIS data to the same projection as Landsat data (UTM (Universal Transverse Mercator) WGS84). For convenience of calculation, all MODIS data were resampled to a spatial resolution of 480 m [17,18]. It should be noted that this process caused some data loss, which is inevitable in many current data fusion models. Generally speaking, this error has little effect on the fusion result [15,16]; thus, we ignored it.
(2) Spectral Standardization Differences in the spectral response function of the sensor, transit time, and shooting angle meant that, even though the observation was the same object on the surface, the land surface reflectance data were not consistent. Therefore, we performed standardized processing on the multisource data so that the parameters for each sensor used the same reference. Because the sensors have different applications, the wavelength settings were also different ( Figure 4). The figure on the left (Figure 4a) compares the spectral response functions of the two sensors for the red band. The figure on the right (Figure 4b) compares the spectral response functions of the near-infrared band. The blue curve represents the MODIS sensor, and the red curve represents the Landsat OLI sensor. There were some differences between the spectral response functions of the two sensors in the same band. Therefore, we normalized the spectral response function between sensors [16].  The 6S radiation transmission model was used to simulate land surface reflectance, and the spectral response functions between the sensors were normalized to each other. In the 6S model, the input parameters of the simulation were set as follows: the sun incidence angle and the observation zenith angle ranged from 0-50° with a step size of 5°, the atmospheric mode was mid-latitude summer, the aerosol mode was continental aerosol, the aerosol optical thickness at 550 nm ranged from 0-0.6 with a step size of 0.1, and the surface reflectance ranged from 0-0.6 with a step size of 0.05. We used these parameters in the 6S model to simulate reflectance curves for the two bands, each with a different spectral response function. The simulation results of the spectral response functions of the red and near-infrared bands of MODIS and Landsat OLI data showed good agreement between the two different satellite sensors, especially for the red band ( Figure 5). There was a slight deviation from the 1:1 line between the two sensors in the near-infrared band, but they were still very consistent. The 6S radiation transmission model was used to simulate land surface reflectance, and the spectral response functions between the sensors were normalized to each other. In the 6S model, the input parameters of the simulation were set as follows: the sun incidence angle and the observation zenith angle ranged from 0-50 • with a step size of 5 • , the atmospheric mode was mid-latitude summer, the aerosol mode was continental aerosol, the aerosol optical thickness at 550 nm ranged from 0-0.6 with a step size of 0.1, and the surface reflectance ranged from 0-0.6 with a step size of 0.05. We used these parameters in the 6S model to simulate reflectance curves for the two bands, each with a different spectral response function. The simulation results of the spectral response functions of the red and near-infrared bands of MODIS and Landsat OLI data showed good agreement between the two different satellite sensors, especially for the red band ( Figure 5). There was a slight deviation from the 1:1 line between the two sensors in the near-infrared band, but they were still very consistent.  According to the spectral response function of MODIS data, the spectral response function of Landsat OLI was corrected using the following formula: where and are the slope and intercept of the fitting parameters of the simulation results between the two sensors, is the land surface reflectance of the Landsat data, and is the land surface reflectance of the corresponding MODIS data.
By using the 6S model to process differences between the spectral response functions, we normalized the reflectance data obtained by each sensor so that it could be processed against the same benchmark. To compare the correction results, we compared the same land-cover types for both MODIS and Landsat OLI in the red and near-infrared bands. The two images were taken on the same day, and the same type of land surface reflectance at a given coordinate position was extracted and compared. Scatter diagrams for the comparison results show that the observation results of the two sensors were consistent with each other after calibration ( Figure 6). According to the spectral response function of MODIS data, the spectral response function of Landsat OLI was corrected using the following formula: where a and b are the slope and intercept of the fitting parameters of the simulation results between the two sensors, r OLI is the land surface reflectance of the Landsat data, and r MODIS is the land surface reflectance of the corresponding MODIS data. By using the 6S model to process differences between the spectral response functions, we normalized the reflectance data obtained by each sensor so that it could be processed against the same benchmark. To compare the correction results, we compared the same land-cover types for both MODIS and Landsat OLI in the red and near-infrared bands. The two images were taken on the same day, and the same type of land surface reflectance at a given coordinate position was extracted and compared. Scatter diagrams for the comparison results show that the observation results of the two sensors were consistent with each other after calibration ( Figure 6). According to the spectral response function of MODIS data, the spectral response function of Landsat OLI was corrected using the following formula: where and are the slope and intercept of the fitting parameters of the simulation results between the two sensors, is the land surface reflectance of the Landsat data, and is the land surface reflectance of the corresponding MODIS data.
By using the 6S model to process differences between the spectral response functions, we normalized the reflectance data obtained by each sensor so that it could be processed against the same benchmark. To compare the correction results, we compared the same land-cover types for both MODIS and Landsat OLI in the red and near-infrared bands. The two images were taken on the same day, and the same type of land surface reflectance at a given coordinate position was extracted and compared. Scatter diagrams for the comparison results show that the observation results of the two sensors were consistent with each other after calibration ( Figure 6). (c) (d) Figure 6. Calibration results of three land-cover types (farmland, forest, and grassland) in the red band (a,b) and near-infrared band (c,d).

Ref-BSFM Method
This research was divided into two parts: the construction of the difference in reflectance at adjacent times, and the use of Bayesian pixel unmixing to produce high-spatial and high-temporalresolution reflectance datasets. We first calculated the differences between 46 MCD43A4 images at

Ref-BSFM Method
This research was divided into two parts: the construction of the difference in reflectance at adjacent times, and the use of Bayesian pixel unmixing to produce high-spatial and high-temporal-resolution reflectance datasets. We first calculated the differences between 46 MCD43A4 images at adjacent times to get images of the change in reflectance. Then, we used a linear regression equation and a residual distribution model to estimate changes in the Landsat image during the next 8 days. These two sets of data were simultaneously used as inputs to the data fusion model. Afterward, to obtain images with high spatial and temporal resolution, we used the MCD43A4 product from 2010 to 2014 to construct prior information about reflectance changes in the study area. The Bayesian pixel unmixing model was used to downscale the MODIS reflectance data. Lastly, we produced Landsat-like reflectance datasets using a prediction model. Figure 7 shows a flowchart that illustrates the method we used, while the main implementation steps are described in detail in the next few sections.
(c) (d) Figure 6. Calibration results of three land-cover types (farmland, forest, and grassland) in the red band (a,b) and near-infrared band (c,d).

Ref-BSFM Method
This research was divided into two parts: the construction of the difference in reflectance at adjacent times, and the use of Bayesian pixel unmixing to produce high-spatial and high-temporalresolution reflectance datasets. We first calculated the differences between 46 MCD43A4 images at adjacent times to get images of the change in reflectance. Then, we used a linear regression equation and a residual distribution model to estimate changes in the Landsat image during the next 8 days. These two sets of data were simultaneously used as inputs to the data fusion model. Afterward, to obtain images with high spatial and temporal resolution, we used the MCD43A4 product from 2010 to 2014 to construct prior information about reflectance changes in the study area. The Bayesian pixel unmixing model was used to downscale the MODIS reflectance data. Lastly, we produced Landsatlike reflectance datasets using a prediction model. Figure 7 shows a flowchart that illustrates the method we used, while the main implementation steps are described in detail in the next few sections.

MODIS Data Temporal Change Detection
The construction of difference data includes two steps. We calculated the differences between MODIS data at adjacent time periods, because, in a long time series, fine-resolution remote sensing data from Landsat were scarce, and much of the change information came from MODIS data. For example, we obtained 46 scenes in a single year for the MODIS MCD43A4 product, and then we determined the change (difference) between temporally adjacent scenes to generate 45 change-mask files. The change information was stored in these change-mask files, which were numbered C1-C45, as shown in Figure 8. We then needed to construct the same change difference between adjacent times for Landsat data to provide the necessary spatial detail for the fusion results.
In reality, Landsat data with good quality available in a single year are very rare, and its 16 day revisit cycle differs from MODIS data with an 8 day temporal resolution. Therefore, the key to the Ref-BSFM method is to use the change information contained in the MODIS change-mask files to estimate the change in Landsat data over the next 8 days.
as shown in Figure 8. We then needed to construct the same change difference between adjacent times for Landsat data to provide the necessary spatial detail for the fusion results.
In reality, Landsat data with good quality available in a single year are very rare, and its 16 day revisit cycle differs from MODIS data with an 8 day temporal resolution. Therefore, the key to the Ref-BSFM method is to use the change information contained in the MODIS change-mask files to estimate the change in Landsat data over the next 8 days.

Building the Difference of Landsat Data
Suppose there is a pair consisting of a fine-resolution image and a coarse-resolution image with good quality during time period t1 and another coarse-resolution image for the following time period t2 (8 days later). These two adjacent coarse-resolution images can be used to estimate the change that would have occurred for the fine-resolution image at t2. We resampled all MODIS data to the same 30 m spatial resolution as the Landsat data. We used these data to build a linear regression relationship (Equation (2)) between MODIS data and Landsat data at t1. This linear model was applied to the adjacent time t2 to estimate the Landsat data at this time (Equation (3)). Note that this prediction result only retains change information from the MODIS data and does not contain the detailed spatial information from the Landsat image.
, , = * , , + q, , , = * , , + q, where , , is the Landsat image at t1, , , is the MODIS image after resampling at t1, , , is the MODIS image after resampling at t2, , , is the spatial prediction of the fine-resolution image at t2, and and q represent fitting coefficients for band at t1, a obtained from the linear regression equation.
A high-precision land-cover map was used to get the fraction of each class (land-cover type) in a single coarse pixel. We calculated the class fractions for each coarse pixel by counting the number of fine pixels of each class using the following equation: where ( , ) is the number of fine pixels belonging to class c within the coarse pixel at ( , ).
We assumed that the land-cover type did not change over a short period of time, and that the reflectance changes of the same type were also consistent.

Building the Difference of Landsat Data
Suppose there is a pair consisting of a fine-resolution image and a coarse-resolution image with good quality during time period t1 and another coarse-resolution image for the following time period t2 (8 days later). These two adjacent coarse-resolution images can be used to estimate the change that would have occurred for the fine-resolution image at t2. We resampled all MODIS data to the same 30 m spatial resolution as the Landsat data. We used these data to build a linear regression relationship (Equation (2)) between MODIS data and Landsat data at t1. This linear model was applied to the adjacent time t2 to estimate the Landsat data at this time (Equation (3)). Note that this prediction result only retains change information from the MODIS data and does not contain the detailed spatial information from the Landsat image.
where F 1 x ij , y ij , λ is the Landsat image at t1, C 1 x ij , y ij , λ is the MODIS image after resampling at t1, C 2 x ij , y ij , λ is the MODIS image after resampling at t2, F S 2 x ij , y ij , λ is the spatial prediction of the fine-resolution image at t2, and p and q represent fitting coefficients for band λ at t1, a obtained from the linear regression equation.
A high-precision land-cover map was used to get the fraction of each class (land-cover type) in a single coarse pixel. We calculated the class fractions for each coarse pixel by counting the number of fine pixels of each class using the following equation: where N c (x i , y i ) is the number of fine pixels belonging to class c within the coarse pixel at (x i , y i ).
We assumed that the land-cover type did not change over a short period of time, and that the reflectance changes of the same type were also consistent. For band λ, the temporal change of the coarse pixel at (x i , y i ) was calculated as follows: According to spectral linear mixing theory, the temporal change of a coarse pixel is the weighted sum of the temporal change of all classes within it.
where l is the number of classes, f c (x i , y i ) is obtained from a land-cover map, and ∆F(c, λ) represents changes in each class, obtained using Equation (6). The fine-resolution image at t2 can be estimated as where F T 2 x ij , y ij , λ represents the temporal prediction of the fine-resolution image at t2, and F 1 x ij , y ij , λ represents the fine-resolution image at t1. However, the information for the fine-resolution image obtained at time t2 comes only from MODIS data; it still lacks detailed spatial information. We needed to provide it with the detailed spatial information from the Landsat image at t1.
Considering that the land-cover type might have changed or there may be within-class changes during the period from t1 to t2, we introduced the following residual system to correct for this: Both F S 2 x ij , y ij , λ and F T 2 x ij , y ij , λ are the predictions of fine-resolution images at time t2, and we assumed that F S 2 x ij , y ij , λ is closer to the real image at t2 than F T 2 x ij , y ij , λ , because F S 2 x ij , y ij , λ contains both the information from the fine-resolution image at t1 and the change information obtained from the MODIS data [17]. The error for the prediction of F T 2 x ij , y ij , λ was then estimated as Both Equations (8) and (9) represent errors when predicting the fine-resolution image at t2. We then introduced a weight distribution coefficient to guide the distribution of these two errors. This concept comes from the FSDAF data fusion model published by Zhu in 2016 [17].
where I k = 1 when the k-th fine pixels within a moving window (its size is one coarse pixel) have the same land-cover type as the central fine pixel x ij , y ij ; otherwise, I k = 0. The weight for combining the two errors through I x ij , y ij , λ is The weight is then normalized as Lastly, when x ij , y ij belongs to class c, the prediction of the total change within a fine pixel between t1 and t2 is In fact, we added F 1 x ij , y ij , λ and ∆F x ij , y ij , λ to directly and accurately predict fine-resolution image data at time t2. This calculation is limited to adjacent time periods. The use of a linear correlation model between images is only accurate when the images used in prediction images are close in time. We proved through a large number of experiments that, when the time span is too long, continuing to use the same linear model would lead to large errors in the results. Therefore, we chose to construct difference data for adjacent times, expecting that more images could be predicted in this manner.

Obtaining Prior Information
It is known that the trends in reflectance time-series data for a specific land-cover type are relatively stable over several years and, thus, reflectance trend information in a time series for each land-cover type summarized from data for many years can be utilized as prior information. In this study, there were 46 reflectance observations for each land-use type for a year (Figure 9). We calculated the average value of reflectance changes for each land-cover type over a 5 year period and obtained the difference between adjacent time periods, in turn creating 45 difference values for each land-cover type in a given year (Figure 10). The prior information was used as input data for the next stage. cover map and the MDC12Q1 product. If a coarse-resolution pixel was flagged with the land-cover type "grass" in the MCD12Q1, and more than 90% of the 30 m resolution pixels were also flagged with the type "grass" in this coarse pixel, we regarded this MODIS pixel as a "pure" pixel. The MCD43A4 reflectance dataset and its QA quality control dataset from 2010 to 2014 were then used to calculate extremely precise average reflectance for the study area, and a multi-year average reflectance time series was established for each land-cover type. Lastly, we used the Savitzky-Golay (S-G) filtering algorithm to optimize the time series curve and calculated the difference between two adjacent times to obtain the final prior information. At the beginning and end of each year, observed reflectance values might be high due to snow cover and other issues. In this study, we used MODIS snow-cover products to eliminate abnormal values and used data from the same time period from years without snow coverage to replace them. This resulted in a reflectance curve that was more realistic. Note that, when we used the a priori information for these corrected time period data, we gave them greater uncertainty (i.e., greater variance).

Bayesian Pixel Unmixing
The unmixing process used in this study was the Bayesian pixel unmixing process developed for the NDVI-BSFM data fusion method [18]. In the NDVI-BSFM model, the inputs are the original NDVI prior information and observation data, while, in the Ref-BSFM model, the inputs are the prior information of the reflectance difference between adjacent times, as described in Section 2.3.1.
Usually, we define a sliding window of [n × n] on the MODIS image, where each pixel corresponds to a difference in MODIS reflectance and contains the area fractions for the m types of land cover. Then, in the sliding window, an area component matrix A with n × n rows and m columns can be obtained.
We treated the filtered multi-year average reflectance difference time series of pure MODIS pixels for each land-cover type as prior information and assumed it obeyed a Gaussian distribution We used a 30 m fine-resolution land-cover map and 480 m coarse-resolution MCD12Q1 land-cover product to identify MODIS "pure" pixels. For example, we spatially overlapped the 30 m land-cover map and the MDC12Q1 product. If a coarse-resolution pixel was flagged with the land-cover type "grass" in the MCD12Q1, and more than 90% of the 30 m resolution pixels were also flagged with the type "grass" in this coarse pixel, we regarded this MODIS pixel as a "pure" pixel. The MCD43A4 reflectance dataset and its QA quality control dataset from 2010 to 2014 were then used to calculate extremely precise average reflectance for the study area, and a multi-year average reflectance time series was established for each land-cover type. Lastly, we used the Savitzky-Golay (S-G) filtering algorithm to optimize the time series curve and calculated the difference between two adjacent times to obtain the final prior information. At the beginning and end of each year, observed reflectance values might be high due to snow cover and other issues. In this study, we used MODIS snow-cover products to eliminate abnormal values and used data from the same time period from years without snow coverage to replace them. This resulted in a reflectance curve that was more realistic. Note that, when we used the a priori information for these corrected time period data, we gave them greater uncertainty (i.e., greater variance).

Bayesian Pixel Unmixing
The unmixing process used in this study was the Bayesian pixel unmixing process developed for the NDVI-BSFM data fusion method [18]. In the NDVI-BSFM model, the inputs are the original NDVI prior information and observation data, while, in the Ref-BSFM model, the inputs are the prior information of the reflectance difference between adjacent times, as described in Section 2.3.1.
Usually, we define a sliding window of [n × n] on the MODIS image, where each pixel corresponds to a difference in MODIS reflectance and contains the area fractions for the m types of land cover. Then, in the sliding window, an area component matrix A with n × n rows and m columns can be obtained.
We treated the filtered multi-year average reflectance difference time series of pure MODIS pixels for each land-cover type as prior information and assumed it obeyed a Gaussian distribution with mean value R p and covariance matrix Σp. At the same time, we assumed that the MODIS reflectance difference observations (R 0 ) in the sliding window comprised a Gaussian distribution with errors, and its covariance matrix was Σ 0 . Under these assumptions, according to the Bayesian theory and the conjugate of the Gaussian distribution, the posterior distribution also obeys a Gaussian distribution with expectation value µ e and covariance matrix Σe.
where A represents the area fraction of each land-cover type in this sliding window, and µ e is the expected value of the posterior probability, which is the estimated value of the minimum mean square error for each end member. Σp and Σ 0 are empirical values, which are different in different areas. For example, according to a large amount of data processing experience, when data quality is relatively good, Σp is defined as 20 and Σ 0 is defined as 40, providing good results. When the MODIS observation data quality is poor, the value of Σ 0 needs to be increased (usually, we define it as 200), because we used the value of the a priori information to replace the MODIS observations of poor quality. Actually, we found through sensitivity analysis of these parameters that the individual values of Σ 0 and Σp are not remarkable, but the ratio Σr = Σp/Σ 0 is the key index for controlling the relative importance of the prior information and the observations. If the MODIS observations are of poor quality, then, after reducing the Σr, more stress is placed on the prior information, thereby decreasing the influence of the observation noise on the estimators, and vice versa. Therefore, better results can be obtained by optimizing Σr [18][19][20].
In this process, the sliding window was moved pixel by pixel, and a model was constructed for each pixel. Each MODIS pixel, thus, had a unique posterior probability expectation value for each land-cover type. Lastly, combined with land-use data, the MODIS data could be downscaled to 30 m.

Prediction of Reflectance Difference Datasets
The 30 m spatial resolution data were obtained through the unmixing process. Their spectral characteristics were derived from MODIS data, but the detailed spatial information of Landsat data Remote Sens. 2020, 12, 3952 13 of 26 was not yet available. To include that information, we established a regression relationship between the Landsat difference image and the unmixed image at t1 and tk to adjust the results of each subsequent prediction.
where L t1 represents the Landsat difference image constructed at t1, M t1 represents the difference image of 30 m MODIS obtained using the Bayesian unmixing method, and p, q t1 , and q tk are fitting coefficients. The coefficient p is determined by the characteristics of the two sensors and atmospheric conditions. In this study, we considered that the atmospheric conditions were the same, and this coefficient did not change with time. This was determined using the image pair at t1 (Equation (16)), where q t1 is the error term, which changes with the time period of difference. When the value of M t1 is larger, the magnitude of the change of this coefficient is larger, indicating there is a positive correlation between q t1 and M t1 .
Using Equation (19), we predicted the difference images during 45 adjacent time periods during the year. These results not only had a spatial resolution of 30 m, but also had detailed spatial information from the Landsat data. At this point, the original Landsat data and these images were used to create images of predicted change in reflectance at tk. If there was more than one available Landsat observation in a year, the weight distribution and comprehensive utilization between these data improved the accuracy of the prediction.
where j is a variable that ranges from 1 to n, L tj and M tj are obtained on date tj, and ω t k , t j is a function of time and distance.
where L 1 is the original Landsat/OLI land surface reflectance data obtained at time t1, L tn (n = 1, 2, . . . , 45) represents the difference image at time tn, and L k represents the prediction result of the fine-resolution image at time tk. The Ref-BSFM fusion results that we developed were compared with several mainstream data fusion algorithms, namely, STARFM, ESTARFM, and FSDAF. Even with all of the current research on data fusion, a unified standard has not yet been created for an accuracy evaluation index. In this study, we summarized the most commonly used indicators that have been published to verify the advantages of the Ref-BSFM data fusion model.
where, in Equations (23)-(26), R p i and R 0 i are the predicted and observed values for pixel i, respectively, in Equation (27), µ represents the mean of the two images, σ represents the variance of the two images, and σ xy represents the covariance of the two images, and, in Equation (28), MSE represents the mean square error of the two images, and MAX 2 0 represents the maximum value of the pixels in the verification image.

Performance Comparison over a Small Area in Huailai
The quality of MODIS data is more important to the final results; therefore, we analyzed the fusion results of the Ref-BSFM model separately for the two cases of MODIS data with good quality and poor quality. We then compared the results with other fusion models (STARFM, ESTARFM, and FSDAF). We used six Landsat 8 land surface reflectance images (DOY = 94, 206, 238, 270, 286, and 318) as inputs or verification data for the fusion model in 2014. Due to the inconsistency in transit time, a cloudless area on Landsat images may have been covered by clouds in the same area of a MODIS image.
It should be noted that FSDAF and STARFM default to inputting a pair of coarse-and fine-resolution images for prediction, while ESTARFM uses two pairs (two coarse-and two fine-resolution) of images for prediction. The Ref-BSFM supports the input of both one pair of images and multiple pairs of images. To eliminate the error caused by inputting different numbers of image pairs, we discuss the performance of the four models below using a different number of inputs.  (Figure 11). At first glance, all three models generated similar results. Prediction results from FSDAF and STARFM were different from the observed values in the OLI image, while results from Ref-BSFM more closely matched the observed values. Because we used a single pair of input images, when the reflectance between the input date and the predicted date was quite different, model predictability was lower (Figure 11a,b). That is, prediction results were more similar to values at the input date, and the color of the results after the near-infrared-red-green composite was deeper.

Performance Based on Cloudless MODIS Data
The FSDAF first classified Landsat data at the input date; hence, the classification result greatly affected the fusion result (Figure 11a). For example, mountainous areas have many shadows; if the shadow area cannot be identified correctly, the wrong classification would lead to unrealistic outputs in the fusion result. The STARFM method produced good results, but its principle of searching for similar pixels in the neighborhood led to the loss of some detailed spatial information, which resulting in an image that looked too smooth (Figure 11b), whereas Ref-BSFM used a high-precision land-cover dataset, which could effectively deal with shadow problems in FSDAF. In addition, the search strategy of neighboring similar pixels was not used in Ref-BSFM; thus, there was no loss of spatial details. Although Ref-BSFM (Figure 11c) also used a single pair of input images, it combined many years of a priori information based on MODIS data during the time period. In other words, this additional information produced predictions closer to observed values in OLI (Figure 11d). •

Two pairs of images
We used two Landsat and MODIS image pairs (DOY = 270 and 318) to predict changes in reflectance for DOY = 286. We compared the output from Ref-BSFM to outputs from ESTARM. The ESTARFM method added a conversion coefficient, improving the ability of the STARFM method to capture spatial heterogeneity. As a result, it performed better than STARFM in terms of spatial details and image quality, and its accuracy was similar to that of Ref-BSFM ( Figure 12). Both of these images closely matched the OLI image. However, it should be noted that the calculation of the ESTARM method is very time-consuming when compared with that of the Ref-BSFM method. The FSDAF first classified Landsat data at the input date; hence, the classification result greatly affected the fusion result (Figure 11a). For example, mountainous areas have many shadows; if the shadow area cannot be identified correctly, the wrong classification would lead to unrealistic outputs in the fusion result. The STARFM method produced good results, but its principle of searching for similar pixels in the neighborhood led to the loss of some detailed spatial information, which resulting in an image that looked too smooth (Figure 11b), whereas Ref-BSFM used a high-precision land-cover dataset, which could effectively deal with shadow problems in FSDAF. In addition, the search strategy of neighboring similar pixels was not used in Ref-BSFM; thus, there was no loss of spatial details. Although Ref-BSFM (Figure 11c) also used a single pair of input images, it combined many years of a priori information based on MODIS data during the time period. In other words, this additional information produced predictions closer to observed values in OLI (Figure 11d).

• Two pairs of images
We used two Landsat and MODIS image pairs (DOY = 270 and 318) to predict changes in reflectance for DOY = 286. We compared the output from Ref-BSFM to outputs from ESTARM. The ESTARFM method added a conversion coefficient, improving the ability of the STARFM method to capture spatial heterogeneity. As a result, it performed better than STARFM in terms of spatial details and image quality, and its accuracy was similar to that of Ref-BSFM ( Figure 12). Both of these images closely matched the OLI image. However, it should be noted that the calculation of the ESTARM method is very time-consuming when compared with that of the Ref-BSFM method. Increasing the number of input image pairs had the same effect as using a priori MODIS data. It compensated for the error caused by the large reflectance difference and improved the accuracy of the prediction results ( Figure 12).
We used six accuracy evaluation indicators to quantitatively express the advantages of the Ref-BSFM method, as mentioned in Section 2. The overall accuracy of Ref-BSFM was higher than that of the other three methods ( Figure 13 and Table 1). By analyzing and comparing the values of these accuracy indicators, these four methods generated good prediction results; however, when we only Increasing the number of input image pairs had the same effect as using a priori MODIS data. It compensated for the error caused by the large reflectance difference and improved the accuracy of the prediction results ( Figure 12).
We used six accuracy evaluation indicators to quantitatively express the advantages of the Ref-BSFM method, as mentioned in Section 2. The overall accuracy of Ref-BSFM was higher than that of the other three methods ( Figure 13 and Table 1). By analyzing and comparing the values of these accuracy indicators, these four methods generated good prediction results; however, when we only used one pair of images to predict reflectance changes, Ref-BSFM outperformed FSDAF and STARFM. The results were more concentrated around the 1:1 line, and the coefficients of determination were consistently the highest, ranging from 0.8535 to 0.9217. In addition, from the various accuracy evaluation indicators in Table 1, Ref-BSFM had higher correlation (r), as well as lower average error (AAD) and relative error (AARD), when compared to the OLI images. It also showed higher structural similarity (SSIM) and peak signal-to-noise ratio (PSNR), indicating better image quality. To simplify the comparison process, we show the three most commonly used bands (green, red, and near-infrared bands) as examples. The fusion accuracy of the red and green bands was significantly higher than that of the near-infrared band.  Increasing the number of input image pairs had the same effect as using a priori MODIS data. It compensated for the error caused by the large reflectance difference and improved the accuracy of the prediction results ( Figure 12).
We used six accuracy evaluation indicators to quantitatively express the advantages of the Ref-BSFM method, as mentioned in Section 2. The overall accuracy of Ref-BSFM was higher than that of the other three methods ( Figure 13 and Table 1). By analyzing and comparing the values of these accuracy indicators, these four methods generated good prediction results; however, when we only used one pair of images to predict reflectance changes, Ref-BSFM outperformed FSDAF and STARFM. The results were more concentrated around the 1:1 line, and the coefficients of determination were consistently the highest, ranging from 0.8535 to 0.9217. In addition, from the various accuracy evaluation indicators in Table 1, Ref-BSFM had higher correlation (r), as well as lower average error (AAD) and relative error (AARD), when compared to the OLI images. It also showed higher structural similarity (SSIM) and peak signal-to-noise ratio (PSNR), indicating better image quality. To simplify the comparison process, we show the three most commonly used bands (green, red, and near-infrared bands) as examples. The fusion accuracy of the red and green bands was significantly higher than that of the near-infrared band.
Near-Infrared Band Red Band Green Band There were many sources of error in Ref-BSFM. The acquisition times of MCD43A4 and Landsat data did not match exactly, which made reflectance observations between fine-coarse resolution image pairs inconsistent. The pixel quality of MCD43A4 and the accuracy of the background field of various land-cover types also affected the fusion results. In addition, the errors in position matching pixels between MCD43A4 and Landsat, as well as land-cover changes, which were ignored, added additional uncertainty to the fusion results.

Performance Based on MODIS Data with Clouds
When the MODIS data were cloud-covered during either the initial time or the prediction time, Ref-BSFM continued to outperform FSDAF, STARFM, and ESTARFM. We selected a total of three image pairs for 2014 with DOY of 94, 206, and 238, and we used one Landsat image (DOY = 94) to predict the Landsat-like reflectance image at DOY 206; then, we used two image pairs (DOY = 94 and 238) to make the same prediction, which was similar to the process used in Section 3.1.1. Among these images, the MODIS image at the prediction time (DOY = 206) was mostly covered by clouds. The FSDAF, STARFM, and ESTARFM methods were unable to handle cloudy pixels in an image well. Final fusion results for Ref-BSFM, on the other hand, did not show clouds in the image (Figures 14 and 15).
The results of FSDAF retained cloudy areas from the MODIS data, and, although there are no obvious cloud patches in the results of STARFM, its strategy of searching through neighboring pixels made the entire image smooth, and outliers from some similar pixels were included in the calculation, leading to very low accuracy in each band. As mentioned previously, the Ref-BSFM method constrained poor-quality pixels of MODIS using prior information, such that fusion results were generally closer to actual values, and no cloud patches appeared (Figure 14). •

Two pairs of images
When we increased the number of input image pairs, ESTARFM maintained good predictability in cloudless areas, but did not accurately predict reflectance changes in cloudy areas. For the same reason as above, Ref-BSFM maintained good-quality fusion results (see Figure 15). The scatter plots of the Ref-BSFM fusion results were more concentrated, with a determination coefficient (R 2 ) ranging from 0.8762 to 0.9315, while RMSE, AAD, and AARD values lower than those of the other three methods. Its r, SSIM, and PSNR values were higher than those of FSDAF, STARFM, and ESTARFM methods ( Table 2). The correlation coefficient between the fusion result of Ref-BSFM and observed outputs from Landsat OLI ranged from 0.9360 to 0.9651, with a very small RMSE (approximately 0.02) in each band. The SSIM ranged from 0.8209 to 0.8751, while the PSNR ranged from 45-56, both indicating that the image quality obtained by Ref-BSFM fusion was excellent ( Figure  16 and Table 2).
This clearly demonstrates the advantages of the Ref-BSFM method for prediction over cloudy areas. • One pair of images The results of FSDAF retained cloudy areas from the MODIS data, and, although there are no obvious cloud patches in the results of STARFM, its strategy of searching through neighboring pixels made the entire image smooth, and outliers from some similar pixels were included in the calculation, leading to very low accuracy in each band. As mentioned previously, the Ref-BSFM method constrained poor-quality pixels of MODIS using prior information, such that fusion results were generally closer to actual values, and no cloud patches appeared ( Figure 14).

• Two pairs of images
When we increased the number of input image pairs, ESTARFM maintained good predictability in cloudless areas, but did not accurately predict reflectance changes in cloudy areas. For the same reason as above, Ref-BSFM maintained good-quality fusion results (see Figure 15).
The scatter plots of the Ref-BSFM fusion results were more concentrated, with a determination coefficient (R 2 ) ranging from 0.8762 to 0.9315, while RMSE, AAD, and AARD values lower than those of the other three methods. Its r, SSIM, and PSNR values were higher than those of FSDAF, STARFM, and ESTARFM methods ( Table 2). The correlation coefficient between the fusion result of Ref-BSFM and observed outputs from Landsat OLI ranged from 0.9360 to 0.9651, with a very small RMSE (approximately 0.02) in each band. The SSIM ranged from 0.8209 to 0.8751, while the PSNR ranged from 45-56, both indicating that the image quality obtained by Ref-BSFM fusion was excellent ( Figure 16 and Table 2).  The scatter plots of the Ref-BSFM fusion results were more concentrated, with a determination coefficient (R 2 ) ranging from 0.8762 to 0.9315, while RMSE, AAD, and AARD values lower than those of the other three methods. Its r, SSIM, and PSNR values were higher than those of FSDAF, STARFM, and ESTARFM methods ( Table 2). The correlation coefficient between the fusion result of Ref-BSFM and observed outputs from Landsat OLI ranged from 0.9360 to 0.9651, with a very small RMSE (approximately 0.02) in each band. The SSIM ranged from 0.8209 to 0.8751, while the PSNR ranged from 45-56, both indicating that the image quality obtained by Ref-BSFM fusion was excellent ( Figure  16 and Table 2).
This clearly demonstrates the advantages of the Ref-BSFM method for prediction over cloudy areas. This clearly demonstrates the advantages of the Ref-BSFM method for prediction over cloudy areas.

Performance Comparison over a Large Area in Hebei
We demonstrated that, in a small area, Ref-BSFM had advantages over other data fusion methods. Therefore, we applied the Ref-BSFM model to a larger study area. To avoid errors caused by an unequal number of the input image pairs, we only used one pair of fine-and coarse-resolution images for this analysis. Since the ESTARFM method typically uses two fine-resolution images and calculations for both ESTARFM and FSDAR are time-consuming over large areas, they were not included in this analysis.
In the Hebei area, we used the image pairs on DOY = 238 to predict the fine-resolution image on DOY = 286, and we compared the performance of Ref-BSFM, FSDAF, and STARFM over a large area ( Figure 17 and Table 3). Figure 17a,b,f,g show the fusion performance in the red band and near-infrared band. The last two columns (Figure 17d,e,i,j) show the difference between the Landsat OLI images and the fusion images. The r and RMSE values of the fusion results of Ref-BSFM and STARFM were similar, although Ref-BSFM had a higher r and lower RMSE, showing that it was slightly more accurate than STARFM. Since the study area was large, AAD and AARD were obtained from a large amount of data; thus, their values were relatively close, and the overall accuracy of the fusion result was guaranteed to be good. In addition, both Ref-BSFM and STARFM had a structural similarity (SSIM) as high as 0.8, indicating that both of them could accurately predict the detailed spatial information, but the PSNR of the two was very different (Table 3). This indicates that Ref-BSFM had a greater advantage in maintaining overall image quality, and it was less sensitive to some outliers and can be corrected with time. Low values in the difference image of Ref-BSFM were more concentrated and values of Ref-BSFM with larger errors were more evenly distributed. There was no area with large errors in the overall image (Figure 17d,i). On the other hand, the northwest region of the red band and the southeast region of the near-infrared band of STARFM were obviously higher than that of Landsat (shown in Figure 17e,j). Combining the image quality with the statistical results in Table 3 calculations for both ESTARFM and FSDAR are time-consuming over large areas, they were not included in this analysis.
In the Hebei area, we used the image pairs on DOY = 238 to predict the fine-resolution image on DOY = 286, and we compared the performance of Ref-BSFM, FSDAF, and STARFM over a large area ( Figure 17 and Table 3). Figure 17a,b,f,g show the fusion performance in the red band and nearinfrared band. The last two columns (Figure 17d,e,i,j) show the difference between the Landsat OLI images and the fusion images. The r and RMSE values of the fusion results of Ref-BSFM and STARFM were similar, although Ref-BSFM had a higher r and lower RMSE, showing that it was slightly more accurate than STARFM. Since the study area was large, AAD and AARD were obtained from a large amount of data; thus, their values were relatively close, and the overall accuracy of the fusion result was guaranteed to be good. In addition, both Ref-BSFM and STARFM had a structural similarity (SSIM) as high as 0.8, indicating that both of them could accurately predict the detailed spatial information, but the PSNR of the two was very different (Table 3). This indicates that Ref-BSFM had a greater advantage in maintaining overall image quality, and it was less sensitive to some outliers and can be corrected with time. Low values in the difference image of Ref-BSFM were more concentrated and values of Ref-BSFM with larger errors were more evenly distributed. There was no area with large errors in the overall image (Figure 17d,i). On the other hand, the northwest region of the red band and the southeast region of the near-infrared band of STARFM were obviously higher than that of Landsat (shown in Figure 17e,j). Combining the image quality with the statistical results in Table 3, when the study area was large, the results obtained by Ref-BSFM were closer to observed values, and the image quality was improved. In summary, Ref-BSFM had distinct advantages over other fusion methods when applied over a large area.    When Ref-BSFM was applied to a single image, it took less time to obtain the Landsat-like reflectance time series throughout the year than other methods, which is essential when applying fusion products to practical applications. When monitoring the changes in the Earth's land surface, we usually cannot derive an accurate result from a single image; hence, we need to analyze changes over the same area across an entire year or several years, using algebraic combinations between the various bands to calculate indices that can reflect changes on the land surface, allowing the fusion products to be used to the greatest extent.
Lastly, we obtained the annual reflectance curve for Hebei and extracted the NDVI time series curves for several representative land-cover types such as forest, grassland, and farmland, comparing them with the MODIS data ( Figure 18). When Ref-BSFM was applied to a single image, it took less time to obtain the Landsat-like reflectance time series throughout the year than other methods, which is essential when applying fusion products to practical applications. When monitoring the changes in the Earth's land surface, we usually cannot derive an accurate result from a single image; hence, we need to analyze changes over the same area across an entire year or several years, using algebraic combinations between the various bands to calculate indices that can reflect changes on the land surface, allowing the fusion products to be used to the greatest extent.
Lastly, we obtained the annual reflectance curve for Hebei and extracted the NDVI time series curves for several representative land-cover types such as forest, grassland, and farmland, comparing them with the MODIS data ( Figure 18 The NDVI curves for grassland and forest showed a single growing season, while the farmland in the Hebei area reflected the two crop cultivation methods in the region. One method involves cultivating a single crop of corn, while the other involves a double crop rotation of winter wheat and summer maize. In the fusion result of Ref-BSFM, we found pixels for both cultivation types, examining them separately. From a temporal perspective, NDVI changes for single-season crops were similar to those for grassland and forest, while the NDVI changes for double-season crops were very different. The NDVI of the entire area reached its highest value on DOY 105, and then decreased, reaching the lowest value on DOY 153. This was caused by the planting and harvesting of winter wheat. After that, due to the planting of summer maize, the NDVI gradually increased, reaching its highest value on DOY 225. Corn harvesting was basically completed on DOY 273. Using reflectance data obtained by Ref-BSFM fusion to calculate vegetation indices, we demonstrated that the results were consistent with the mature MODIS products. The NDVI curves for grassland and forest showed a single growing season, while the farmland in the Hebei area reflected the two crop cultivation methods in the region. One method involves cultivating a single crop of corn, while the other involves a double crop rotation of winter wheat and summer maize. In the fusion result of Ref-BSFM, we found pixels for both cultivation types, examining them separately. From a temporal perspective, NDVI changes for single-season crops were similar to those for grassland and forest, while the NDVI changes for double-season crops were very different. The NDVI of the entire area reached its highest value on DOY 105, and then decreased, reaching the lowest value on DOY 153. This was caused by the planting and harvesting of winter wheat. After that, due to the planting of summer maize, the NDVI gradually increased, reaching its highest value on DOY 225. Corn harvesting was basically completed on DOY 273. Using reflectance data obtained by Ref-BSFM fusion to calculate vegetation indices, we demonstrated that the results were consistent with the mature MODIS products.

Discussion
We found that Ref-BSFM generated good prediction results even in the presence of clouds, demonstrating that it was insensitive to outliers through its correction process of using prior information and quality control. In addition, accuracy for the images from DOY = 94, 206, and 238 was higher than that from DOY = 270, 286, and 318. Despite the existence of cloud coverage, the time span was short enough to overcome any issues. The time span of available data is the most sensitive factor among all current data fusion methods. By shortening the time span of the data as much as possible, we maximized the accuracy of the Ref-BSFM fusion method.
In fact, for many current mainstream data fusion methods based on unmixing or weight functions, a weight distribution system is adopted to reduce the influence of noise signals in the paired images as much as possible. Nevertheless, it is inevitable to have poor-quality pixels in the images. If these pixels are used in the decomposition of coarse-resolution pixels or in the calculation of similar, nearby pixels, they inevitably produce incorrect results. In Ref-BSFM, we use Bayesian theory to introduce prior information about the reflectance of various land-cover types into the decomposition process of mixed pixels. When the pixel quality of MODIS is poor, increasing the weight of the prior information greatly reduces the contribution of these poor-quality pixels to the fusion result and error transmission is reduced. For areas covered by clouds, the a priori information replaces outliers, which makes the fusion results more reasonable. In this way, this processing strategy can resolve problems with cloud coverage.
During this study, we tried many different data fusion methods for the same area to determine their performance, and we compared the resulting Ref-BSFM with current mainstream algorithms for data fusion (FSDAF, STARFM, and ESTARFM). We showed that Ref-BSFM has the following advantages: Unlike other current mainstream data fusion methods, Ref-BSFM first constructs the difference between the two adjacent MODIS images and establishes the prior information of the difference. Because it is easy to obtain MODIS data, we can get the time-series data of MODIS for one or more years [31]. In the Ref-BSFM method, we not only use the two MODIS images at t1 and t2, but rather all of the MODIS data acquired from t1 to t2, which provides more change information for the prediction process. As long as a change could be captured by MODIS, it could be reflected in the prediction results, and the results were closer to the observed data when compared to other methods. This increased information allowed the Ref-BSFM method to obtain higher accuracy than other methods.
Similar to the NDVI-BSFM method, Ref-BSFM takes the multi-year average reflectance time series of each land-cover type as a trend and calculates the difference between adjacent time periods to construct the prior information needed. Using the framework of Bayesian unmixing [18][19][20], we combined MODIS pixel quality control datasets and made a reasonable weight assignment for the contribution of good-quality pixels and poor-quality pixels to the prediction results. If the quality of MODIS observations is too poor to reflect the true land surface state, then the prior information is given greater weight. Using this strategy, when the study area is covered by a large area of clouds, Ref-BSFM gives the prior information a greater weight to correct the value of these cloud pixels, so that the predicted results are reasonable and have low uncertainty. In this way, we combine the stability of the prior information with the validity of the observations. The traditional decomposition process of mixed pixels inevitably generates some unrealistic results [32][33][34][35], but the Bayesian theory uses prior

Conclusions
In this article, we proposed a new method for the construction of time-series Landsat-like reflectance datasets, discussing its principles and performance in detail. Ref-BSFM estimates possible changes in Landsat data during the next 8 days and calculates the difference between adjacent time periods of MCD43A4 in the same area as prior information. Together, they drive the Bayesian mixed pixel decomposition process. Lastly, a spatial information reconstruction model is used to introduce detailed spatial information into the decomposition results to obtain Landsat-like reflectance datasets with high spatial (30 m) and temporal resolution (8 days). Compared with the original NDVI-BSFM, we changed the data input form and added fitting coefficients to the reconstruction model, which more accurately expresses the relationship between Landsat and MODIS data. Ref-BSFM is more sensitive to changing information; thus, it can make more precise predictions when land-cover changes or within-class changes occur.
The reflectance datasets produced by the Ref-BSFM method can be used to construct vegetation indices or other remote sensing indices. The method responds effectively to business needs requiring high spatial resolution and temporal resolution. As such, it plays an important role in mapping land cover, monitoring dynamic surfaces, and estimating biogeochemical parameters. Moreover, the application of Ref-BSFM is not restricted to OLI and MODIS sensors. It was shown that using this method to fuse MODIS and Sentinel-2 MSI sensors images provides good results [41][42][43][44][45][46][47][48][49][50][51]; therefore, Ref-BSFM can be extended for use on a variety of remote sensing satellite platforms.