A Simple Spatio–Temporal Data Fusion Method Based on Linear Regression Coefficient Compensation

High spatio–temporal resolution remote sensing images are of great significance in the dynamic monitoring of the Earth’s surface. However, due to cloud contamination and the hardware limitations of sensors, it is difficult to obtain image sequences with both high spatial and temporal resolution. Combining coarse resolution images, such as the moderate resolution imaging spectroradiometer (MODIS), with fine spatial resolution images, such as Landsat or Sentinel-2, has become a popular means to solve this problem. In this paper, we propose a simple and efficient enhanced linear regression spatio–temporal fusion method (ELRFM), which uses fine spatial resolution images acquired at two reference dates to establish a linear regression model for each pixel and each band between the image reflectance and the acquisition date. The obtained regression coefficients are used to help allocate the residual error between the real coarse resolution image and the simulated coarse resolution image upscaled by the high spatial resolution result of the linear prediction. The developed method consists of four steps: (1) linear regression (LR), (2) residual calculation, (3) distribution of the residual and (4) singular value correction. The proposed method was tested in different areas and using different sensors. The results show that, compared to the spatial and temporal adaptive reflectance fusion model (STARFM) and the flexible spatio–temporal data fusion (FSDAF) method, the ELRFM performs better in capturing small feature changes at the fine image scale and has high prediction accuracy. For example, in the red band, the proposed method has the lowest root mean square error (RMSE) (ELRFM: 0.0123 vs. STARFM: 0.0217 vs. FSDAF: 0.0224 vs. LR: 0.0221). Furthermore, the lightweight algorithm design and calculations based on the Google Earth Engine make the proposed method computationally less expensive than the STARFM and FSDAF.


Introduction
With the development of Earth observation technology over the last few decades, a large amount of time series of satellite images have been accumulated, and the number of freely available satellite images is growing at fast pace. For example, Landsat data became available at no cost in 2008 [1]. In 2015 and 2017, Sentinel-2A and Sentinel-2B satellites were launched with the data freely available to everyone without any restrictions [2]. In recent years, the Google Earth Engine (GEE), a planetary-scale platform with the capability to process massive satellite images, facilitated the application of satellite It includes three steps of regression model fitting, spatial filtering, and residual compensation. Several studies showed that, compared with the STARFM and ESTARFM methods, the linear interpolation model (LIM) can produce higher prediction accuracy sometimes [40].
Here, we propose a new simple enhanced linear regression spatio-temporal fusion method (ELRFM). This method uses two pairs of fine-coarse (i.e., Landsat-MODIS) spatial resolution images acquired at the reference dates and one coarse MODIS image acquired at a prediction date to generate a fine resolution image at the prediction date ( Figure 1). The developed method builds a linear regression model, using two fine resolution images acquired at the reference dates, between the image reflectance and the acquisition date for each pixel and each band. The regression coefficients are used to assign residuals which are calculated from the coarse images to compensate for the result of the first linear prediction to further improve the prediction accuracy. The proposed method can be used to predict fine spatial resolution images in heterogeneous areas with abrupt LULC changes and has good adaptability on different remote sensing data sources (e.g., Sentinel-2 and Sentinel-3).
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 16 methods, the linear interpolation model (LIM) can produce higher prediction accuracy sometimes [40].
Here, we propose a new simple enhanced linear regression spatio-temporal fusion method (ELRFM). This method uses two pairs of fine-coarse (i.e., Landsat-MODIS) spatial resolution images acquired at the reference dates and one coarse MODIS image acquired at a prediction date to generate a fine resolution image at the prediction date ( Figure 1). The developed method builds a linear regression model, using two fine resolution images acquired at the reference dates, between the image reflectance and the acquisition date for each pixel and each band. The regression coefficients are used to assign residuals which are calculated from the coarse images to compensate for the result of the first linear prediction to further improve the prediction accuracy. The proposed method can be used to predict fine spatial resolution images in heterogeneous areas with abrupt LULC changes and has good adaptability on different remote sensing data sources (e.g., Sentinel-2 and Sentinel-3).

Data
Landsat 8 top-of-atmosphere (TOA) images archived in GEE platform with a 30 m spatial resolution acquired on 4 June ( ), 6 July ( ), and 7 August ( ) 2019 from operational land imager (OLI) sensor, with Worldwide Reference System (WRS)-2 path/row of 44/34 were used. The study area is a subset of one TOA scene covering an area of 28.8 km × 28.8 km (960 × 960 pixels) ( Figure 2). The LULC of the areas marked by yellow rectangles in Figure 2 changes during the study period, which can help to evaluate the performance of the ELRFM. The coarse spatial resolution images used are the simulated MODIS aggregated from Landsat. Figure 1. Schema of the enhanced linear regression spatio-temporal fusion method (ELRFM). F 1 /C 1 , F 2 /C 2 , and F 3 /C 3 indicate fine/coarse resolution images at t 1 , t 2 , and t 3 , respectively.

Data
Landsat 8 top-of-atmosphere (TOA) images archived in GEE platform with a 30 m spatial resolution acquired on 4 June (t 1 ), 6 July (t 2 ), and 7 August (t 3 ) 2019 from operational land imager (OLI) sensor, with Worldwide Reference System (WRS)-2 path/row of 44/34 were used. The study area is a subset of one TOA scene covering an area of 28.8 km × 28.8 km (960 × 960 pixels) ( Figure 2). The LULC of the areas marked by yellow rectangles in Figure 2 changes during the study period, which can help to evaluate the performance of the ELRFM. The coarse spatial resolution images used are the simulated MODIS aggregated from Landsat.

Data Pre-Processing and Environment
The strategy of conducting tests on simulated images was adopted, that is, the high spatial resolution images used the original Landsat 8 images, while the coarse spatial resolution images used the simulated MODIS images, which were aggregated from Landsat. This strategy is adopted by many spatio-temporal fusion studies [26,41,42] to avoid the interference caused by the radiometric or geometric inconsistencies between two sensors. The application of the proposed method to real scenarios and the evaluation of the effects of the above are beyond the scope of this research, but they will be tested and discussed in our future research.

Data Pre-Processing and Environment
The strategy of conducting tests on simulated images was adopted, that is, the high spatial resolution images used the original Landsat 8 images, while the coarse spatial resolution images used the simulated MODIS images, which were aggregated from Landsat. This strategy is adopted by many spatio-temporal fusion studies [26,41,42] to avoid the interference caused by the radiometric or geometric inconsistencies between two sensors. The application of the proposed method to real scenarios and the evaluation of the effects of the above are beyond the scope of this research, but they will be tested and discussed in our future research.
The simulated MODIS images were obtained by upscaling the Landsat images to a spatial resolution of 480 m. The Landsat 8 images acquired on 4 June and 7 August 2019 and the MODIS image simulated from the Landsat 8 image acquired on 6 July 2019 were used as the input for the fusion algorithm. The task is to predict the high spatial resolution Landsat-like image on 6 July 2019. The original Landsat 8 image acquired on 6 July 2019 is later adopted for evaluation (i.e., not used as the algorithm input).
The proposed method runs on the GEE platform and is written with JavaScript.

Enhanced Linear Regression Spatio-temporal Fusion Method
The ELRFM is based on two assumptions: (1) if there is no change in the reflectance of the ground object between the two Landsat reference images, the reflectance of the object in the predicted image does not change either (that is, the prediction error mainly comes from the areas where the reflectance changes greatly between the two reference images); and (2) the reflectance relationship between the Landsat image and the MODIS image at different times is consistent. The simulated MODIS images were obtained by upscaling the Landsat images to a spatial resolution of 480 m. The Landsat 8 images acquired on 4 June and 7 August 2019 and the MODIS image simulated from the Landsat 8 image acquired on 6 July 2019 were used as the input for the fusion algorithm. The task is to predict the high spatial resolution Landsat-like image on 6 July 2019. The original Landsat 8 image acquired on 6 July 2019 is later adopted for evaluation (i.e., not used as the algorithm input).
The proposed method runs on the GEE platform and is written with JavaScript.

Enhanced Linear Regression Spatio-Temporal Fusion Method
The ELRFM is based on two assumptions: (1) if there is no change in the reflectance of the ground object between the two Landsat reference images, the reflectance of the object in the predicted image does not change either (that is, the prediction error mainly comes from the areas where the reflectance changes greatly between the two reference images); and (2) the reflectance relationship between the Landsat image and the MODIS image at different times is consistent.
The method is divided into four steps ( Figure 3). In step 1 (linear regression), the first predicted image at the prediction time (t 2 ) and the change slope (i.e., the regression coefficient) are generated using linear regression of the two fine resolution images at t 1 and t 3 .
Step 2 (residual calculation) calculates the residual from the coarse image at t 2 and the first predicted image.
Step 3 (distribution of the residual) is the key to the method in which the residual is allocated based on the change slope and assumption 1 to produce the final predicted image. The singular values in the final predicted image are further revised in step 4 (singular value correction).
image at the prediction time ( ) and the change slope (i.e., the regression coefficient) are generated using linear regression of the two fine resolution images at and .
Step 2 (residual calculation) calculates the residual from the coarse image at and the first predicted image.
Step 3 (distribution of the residual) is the key to the method in which the residual is allocated based on the change slope and assumption 1 to produce the final predicted image. The singular values in the final predicted image are further revised in step 4 (singular value correction). Step 1. Linear Regression With the two reference fine resolution images, the linear regression model is built for the correlation between the observed reflectance and the acquisition date for each band and each pixel to obtain the regression coefficient and the first linear predicted image at the prediction date. Here, the image of the first linear prediction is also the result of the linear interpolation prediction. The calculation is performed per-pixel without a contextual neighborhood evaluation. This enables it to provide the best capability to model local spatial variability.
The calculation of the first linear predicted image is shown in Equation (1): where ( , , ) is the reflectance of pixel ( , ) in band on the first predicted image at , ( , , ) is the reflectance of pixel ( , ) in band on the fine resolution at , ( , , ) is the calculated change slope from the linear regression model based on the two fine reference resolution images and ∆ is the time interval from reference time to prediction time .
Step 2. Residual Calculation The predicted coarse resolution image at time is obtained by upscaling the first predicted fine resolution image ( , , ), as shown in Equation (2). That is, the reflectance of each coarse Step 1. Linear Regression With the two reference fine resolution images, the linear regression model is built for the correlation between the observed reflectance and the acquisition date for each band and each pixel to obtain the regression coefficient and the first linear predicted image at the prediction date. Here, the image of the first linear prediction is also the result of the linear interpolation prediction. The calculation is performed per-pixel without a contextual neighborhood evaluation. This enables it to provide the best capability to model local spatial variability.
The calculation of the first linear predicted image is shown in Equation (1): is the reflectance of pixel (x i , y i ) in band b on the fine resolution at t 1 , a(x i , y i , b) is the calculated change slope from the linear regression model based on the two fine reference resolution images and ∆t is the time interval from reference time t 1 to prediction time t 2 .
Step 2. Residual Calculation The predicted coarse resolution image C p at time t 2 is obtained by upscaling the first predicted fine resolution image F lp (x i , y i , b), as shown in Equation (2). That is, the reflectance of each coarse pixel is the average of all the fine pixels under its corresponding position. The residual R(x, y, b) is computed as the reflectance difference between the simulated and predicted MODIS image at t 2 (Equation (3)).
Remote Sens. 2020, 12, 3900 where C p (x, y, b) and C 2 (x, y, b) are the predicted and simulated coarse MODIS reflectance of the pixel (x, y) in band b at t 2 , respectively, and n is the number of Landsat pixels used to aggregate to one MODIS pixel (here n is 256). R(x, y, b) is the residual of the pixel (x, y) in band b at t 2 .
Step 3. Distribution of the Residual According to assumption 1, regions with a larger surface reflectance change (i.e., regions with a larger change slope) have higher prediction uncertainty, resulting in a greater residual. This means that residual compensation should be performed in these areas. For example, a positive residual R means that the observed surface reflectance is larger than the predicted reflectance. Therefore, the areas with a positive change slope may need an increment to make the reflectance change faster to offset the residual, or the areas with a negative change slope may also need to be increased by an increment to make the reflectance change more slowly to offset the residual. In an extreme case, if the change slope is 0 in some areas, meaning there is barely any change in the reflectance in these areas during the study period. Therefore, the linear prediction in these areas is considered to be highly reliable and no residual distribution will take place in these areas.
In this study, approximate calculations and thresholds are used to identify the Landsat pixels with great reflectance changes in each MODIS pixel. Considering that the difference between the positive and negative changes of the surface reflectance is always large, the areas with the largest positive and negative changes (i.e., the maximum and minimum value of change) of surface reflectance are identified, respectively. Assume that the residuals that should be allocated to the two areas are , respectively, then the calculations are as in Equations (4) and (5): where R(x i , y i , b) is the residual with a 30 m spatial resolution downscaled by R(x, y, b), T is the positive threshold set as half the absolute value of extremum change slope of the Landsat pixels in one MODIS pixel, −T is the corresponding negative threshold, n 1 and n 2 are the numbers of pixels where the slope is above T and below −T, respectively, and a(x i , y i , b) >T and a(x i , y i , b) <−T are the change slopes above T and below −T, respectively. There may be some independent pixels in the identified areas that need to be compensated for residuals. A morphological open operation is adopted to remove them. Because the distribution of residuals is performed on each MODIS pixel, the residuals allocated to one object composed of multiple MODIS pixels are often uneven, which leads to a patch effect. Therefore, the compensation values of each connected compensation area are averaged to eliminate the patch effect. This process is shown in Figure 4.
The final predicted fine resolution image is calculated using Equation (6):  The final predicted fine resolution image is calculated using Equation (6): where ( , , ) is the reflectance of pixel ( , ) in band on the final predicted fine resolution image at prediction time .
Step 4. Singular Value Correction Due to the approximate calculation of the residuals and some cases that do not meet the assumptions, singular values may exist after the residual compensation. This could be partly corrected by removing the single pixels that appear in the areas that need to be compensated. On the other hand, we adopted a linear increment during the study period as the extreme value of the compensation. For the areas that exceed the compensation extreme value in the final prediction results, the linear prediction results are used as replacements.

Comparison and Evaluation
The results of the proposed method are compared with those of the STARFM [33], FSDAF [26], and LR methods. The STARFM and FSDAF methods have different adaptability in different scenarios. With their relatively desirable prediction effect and open source code (code websites, STARFM: https://www.ars.usda.gov/; FSDAF: https://xiaolinzhu.weebly.com/open-source-code.html), they are widely used and constantly compared. The LR method here also refers to the linear interpolation method, which is based on the interpolation or the regression of the fine resolution images at two dates. The proposed method uses the residual calculated from the coarse resolution images to compensate for the linear prediction result, which improves the LR prediction results. Hence, the LR method is also used for comparison.
The performance of the ELRFM was evaluated using a variety of methods, including calculating the quantitative evaluation indicators, drawing scatter plots between the predicted and observed image on the sample points, and comparing the average of the predicted reflectance of each method at the predicted time. Quantitative assessment indicators of the root mean square error (RMSE), average difference (AD), average absolute difference (AAD), and correlation coefficient are used. These indicators are commonly adopted in many spatio-temporal fusion method studies [26,27]. The RMSE and AAD refer to the deviation and average of the absolute error between the predicted and Step 4. Singular Value Correction Due to the approximate calculation of the residuals and some cases that do not meet the assumptions, singular values may exist after the residual compensation. This could be partly corrected by removing the single pixels that appear in the areas that need to be compensated. On the other hand, we adopted a linear increment during the study period as the extreme value of the compensation. For the areas that exceed the compensation extreme value in the final prediction results, the linear prediction results are used as replacements.

Comparison and Evaluation
The results of the proposed method are compared with those of the STARFM [33], FSDAF [26], and LR methods. The STARFM and FSDAF methods have different adaptability in different scenarios. With their relatively desirable prediction effect and open source code (code websites, STARFM: https://www.ars.usda.gov/; FSDAF: https://xiaolinzhu.weebly.com/open-source-code.html), they are widely used and constantly compared. The LR method here also refers to the linear interpolation method, which is based on the interpolation or the regression of the fine resolution images at two dates. The proposed method uses the residual calculated from the coarse resolution images to compensate for the linear prediction result, which improves the LR prediction results. Hence, the LR method is also used for comparison.
The performance of the ELRFM was evaluated using a variety of methods, including calculating the quantitative evaluation indicators, drawing scatter plots between the predicted and observed image on the sample points, and comparing the average of the predicted reflectance of each method at the predicted time. Quantitative assessment indicators of the root mean square error (RMSE), average difference (AD), average absolute difference (AAD), and correlation coefficient r are used. These indicators are commonly adopted in many spatio-temporal fusion method studies [26,27]. The RMSE and AAD refer to the deviation and average of the absolute error between the predicted and observed reflectance, respectively. AD is used to evaluate whether the result is overestimated or underestimated as opposed to the observed reflectance. A positive AD value indicates that the predicted results are higher than the observed reflectance, while a negative AD value implies the opposite. As the proposed method is an improvement on LR, the absolute difference between the predicted and the observed reflectance of the ELRFM and LR methods is compared on every sample point.
The sample data for evaluation includes 853 points, which are randomly generated from forest land, bare land, and residential areas manually selected through visual interpretation.
In addition, to test the adaptability of the developed method to different sensors and areas, the fusion of Sentinel-2 and simulated Sentinel-3 are carried out in the other two regions. The results are also evaluated by quantitative evaluation and scatter plots.

Comparison
The prediction results of the STARFM, FSDAF, LR and ELRFM are shown in Figure 5. Two sub-regions of the study area are zoomed-in to show the result differences in detail between the various methods.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 16 observed reflectance, respectively. AD is used to evaluate whether the result is overestimated or underestimated as opposed to the observed reflectance. A positive AD value indicates that the predicted results are higher than the observed reflectance, while a negative AD value implies the opposite. As the proposed method is an improvement on LR, the absolute difference between the predicted and the observed reflectance of the ELRFM and LR methods is compared on every sample point. The sample data for evaluation includes 853 points, which are randomly generated from forest land, bare land, and residential areas manually selected through visual interpretation.
In addition, to test the adaptability of the developed method to different sensors and areas, the fusion of Sentinel-2 and simulated Sentinel-3 are carried out in the other two regions. The results are also evaluated by quantitative evaluation and scatter plots.

Comparison
The prediction results of the STARFM, FSDAF, LR and ELRFM are shown in Figure 5. Two subregions of the study area are zoomed-in to show the result differences in detail between the various methods. One sub-region is the area bounded by the red rectangle in Figure 5a. From to , a dark object appears. This feature can be identified on the fine resolution Landsat image, but it is too small to be recognized on the coarse resolution MODIS image. In reality, it exists at . The different methods are applied to determine whether the object cloud be predicted at the prediction time. The results are shown in the middle row of Figure 5. Neither the STARFM nor the FSDAF could capture this change. The linear regression method captures the feature at but with certain reflectance One sub-region is the area bounded by the red rectangle in Figure 5a. From t 1 to t 3 , a dark object appears. This feature can be identified on the fine resolution Landsat image, but it is too small to be recognized on the coarse resolution MODIS image. In reality, it exists at t 2 . The different methods are applied to determine whether the object cloud be predicted at the prediction time. The results are shown in the middle row of Figure 5. Neither the STARFM nor the FSDAF could capture this change. The linear regression method captures the feature at t 2 but with certain reflectance errors. The proposed ELRFM enhances the linear regression result and produces an image with a reflectance closest to the observed image.
The other sub-region is the area enclosed by the yellow rectangle in Figure 5a. Similar to the first sub-region, a dark object emerges in this area from t 1 to t 3 . This object is distinguishable on the Landsat image, while it is only represented by a few pixels on the MODIS image. Unlike the first sub-region, the object does not exist at t 2 . The prediction results of the four methods are shown in the bottom row of Figure 5. The result predicted by the ELRFM is comparable to that of the STARFM and FSDAF, and they are all close to the original image, while the linear regression performs the worst with the most errors.
It is worth mentioning that the STARFM and FSDAF use one pair of Landsat-MODIS images for prediction, which has advantages in regions with less cloud-free data. However, due to the reduction in effective information, the prediction accuracy is also lowered. Although it is possible to perform STARFM or FSDAF predictions on two pairs of data separately, the prediction results may show great differences, which leads to a decrease in the credibility of the comprehensive results. In addition, the operation time will be doubled when conducting prediction operations on two pairs of data.
By visual inspection, the proposed ELRFM enhances the linear regression prediction results and presents acceptable prediction results in various situations. Particularly, the ELRFM is able to capture small changes in ground objects with high prediction accuracy.

Accuracy Assessment
We calculated the quantitative indicators and drew a scatter plot between the predicted and observed reflectance for each band of the sample points. The quantitative evaluation results are shown in Table 1. It shows that the proposed ELRFM has the smallest RMSE and AAD in all bands, and in the red and near-infrared (NIR) bands, it has the lowest prediction deviation AD. However, the STARFM and FSDAF have a smaller AD in the green band. The ELRFM also has the highest correlation coefficient in each band. Overall, our method is comparable to the other three methods. Quantitative indicators show that the proposed ELRFM has less prediction deviation, and its prediction result is closer to the original image. Table 1. Quantitative assessment results of the four data fusion methods. Root mean square error (RMSE), average difference (AD), average absolute difference (AAD), and correlation coefficient r, near-infrared (NIR). Scatter plots of the observed and the predicted reflectance from the four methods in different bands are plotted ( Figure 6). It can be found that here STARFM is more accurate than FSDAF with a higher R 2 in red and green bands, while ELRFM is the most accurate of the four methods, and it exhibits an improvement over the linear regression method. The ELRFM was also compared with LR specifically. The absolute difference between the observed and the predicted reflectance from these two methods in the green, red and NIR bands of the sample points are plotted with the corresponding histograms attached (Figure 7). The closer the point is to the origin zero, the smaller the absolute difference between the predicted and observed reflectance is, which means the greater the accuracy. From Figure 7, it can be seen that the ELRFM has more points near zero than the LR in the three bands, especially in the green and red bands, which indicates that the ELRFM performs more accurately than the LR. This conclusion could also be reached from the attached histogram, which shows that the ELRFM histograms are positively skewed and the LR histograms are more flattened and spread out. The ELRFM was also compared with LR specifically. The absolute difference between the observed and the predicted reflectance from these two methods in the green, red and NIR bands of the sample points are plotted with the corresponding histograms attached (Figure 7). The closer the point is to the origin zero, the smaller the absolute difference between the predicted and observed reflectance is, which means the greater the accuracy. From Figure 7, it can be seen that the ELRFM has more points near zero than the LR in the three bands, especially in the green and red bands, which indicates that the ELRFM performs more accurately than the LR. This conclusion could also be reached from the attached histogram, which shows that the ELRFM histograms are positively skewed and the LR histograms are more flattened and spread out.

Band
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 16 Figure 7. Distribution of the absolute difference between the observed and predicted reflectance from the LR and ELRFM in the green (B3), red (B4), and near-infrared (B5) bands. Figure 8 shows the average of the observed and predicted reflectance from the four fusion methods in different bands. In the green band, the STARFM and FSDAF are closer to the observed average reflectance than the ELRFM and LR. In the red band, the ELRFM, STARFM, and FSDAF are similarly close to the observed reflectance, while LR is farther away. In the NIR band, among the four methods, the ELRFM is the closest to the observed average reflectance.

Further Verification
To investigate whether the proposed ELRFM could also perform well in different areas and using different sensors, two other test areas were selected. One site (zone 1) is located near Hatton in Adams County, Washington, United States (Lat/Lng: 46.81/−118.92), which is dominated by cropland and bare land. The other site (zone 2) is located in a section of Green River, Wyoming (Lat/Lng: 42.06/−110.12), United States. This area is dominated by water and shrubs. Sentinel-2 (10 m spatial resolution for green, red and NIR bands) and Sentinel-3 (300 m spatial resolution) data are selected as the fine and coarse resolution images, respectively, for the fusion experiment. Sentinel-2 used here is the level-1C orthorectified TOA reflectance data from GEE. Similarly, to avoid the interference caused by the radiometric or geometric inconsistencies between the two sensors, the simulated Sentinel-3 (also called Sentinel-3-like) images are aggregated from the Sentinel-2 images. For zone 1, the fine-coarse image pairs (i.e., Sentinel-2 and simulated Sentinel-3) acquired on 29 May and 3 July 2020, respectively, and the simulated Sentinel The fusion results from different methods are shown in Figure 9. In zone 1, ELRFM achieves a similar result to that of STARFM and FSDAF methods, while it is superior to detect the feature type changes in some areas. For example, in the yellow ellipses of the original image, the ELRFM detects the transition between the crops and the bare land. In zone 2, the ELRFM works better where the water area changes significantly. For example, in the yellow ellipses shown in the original image, the ELRFM predicts the water body more accurately than the other methods. The boundaries of the water Figure 7. Distribution of the absolute difference between the observed and predicted reflectance from the LR and ELRFM in the green (B3), red (B4), and near-infrared (B5) bands. Figure 8 shows the average of the observed and predicted reflectance from the four fusion methods in different bands. In the green band, the STARFM and FSDAF are closer to the observed average reflectance than the ELRFM and LR. In the red band, the ELRFM, STARFM, and FSDAF are similarly close to the observed reflectance, while LR is farther away. In the NIR band, among the four methods, the ELRFM is the closest to the observed average reflectance.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 16 Figure 7. Distribution of the absolute difference between the observed and predicted reflectance from the LR and ELRFM in the green (B3), red (B4), and near-infrared (B5) bands. Figure 8 shows the average of the observed and predicted reflectance from the four fusion methods in different bands. In the green band, the STARFM and FSDAF are closer to the observed average reflectance than the ELRFM and LR. In the red band, the ELRFM, STARFM, and FSDAF are similarly close to the observed reflectance, while LR is farther away. In the NIR band, among the four methods, the ELRFM is the closest to the observed average reflectance.

Further Verification
To investigate whether the proposed ELRFM could also perform well in different areas and using different sensors, two other test areas were selected. One site (zone 1) is located near Hatton in Adams County, Washington, United States (Lat/Lng: 46.81/−118.92), which is dominated by cropland and bare land. The other site (zone 2) is located in a section of Green River, Wyoming (Lat/Lng: 42.06/−110.12), United States. This area is dominated by water and shrubs. Sentinel-2 (10 m spatial resolution for green, red and NIR bands) and Sentinel-3 (300 m spatial resolution) data are selected as the fine and coarse resolution images, respectively, for the fusion experiment. Sentinel-2 used here is the level-1C orthorectified TOA reflectance data from GEE. Similarly, to avoid the interference caused by the radiometric or geometric inconsistencies between the two sensors, the simulated Sentinel-3 (also called Sentinel-3-like) images are aggregated from the Sentinel-2 images. For zone 1, the fine-coarse image pairs (i.e., Sentinel-2 and simulated Sentinel-3) acquired on 29 May and 3 July 2020, respectively, and the simulated Sentinel The fusion results from different methods are shown in Figure 9. In zone 1, ELRFM achieves a similar result to that of STARFM and FSDAF methods, while it is superior to detect the feature type changes in some areas. For example, in the yellow ellipses of the original image, the ELRFM detects the transition between the crops and the bare land. In zone 2, the ELRFM works better where the water area changes significantly. For example, in the yellow ellipses shown in the original image, the ELRFM predicts the water body more accurately than the other methods. The boundaries of the water

Further Verification
To investigate whether the proposed ELRFM could also perform well in different areas and using different sensors, two other test areas were selected. One site (zone 1) is located near Hatton in Adams County, Washington, United States (Lat/Lng: 46.81/−118.92), which is dominated by cropland and bare land. The other site (zone 2) is located in a section of Green River, Wyoming (Lat/Lng: 42.06/−110.12), United States. This area is dominated by water and shrubs. Sentinel-2 (10 m spatial resolution for green, red and NIR bands) and Sentinel-3 (300 m spatial resolution) data are selected as the fine and coarse resolution images, respectively, for the fusion experiment. Sentinel-2 used here is the level-1C orthorectified TOA reflectance data from GEE. Similarly, to avoid the interference caused by the radiometric or geometric inconsistencies between the two sensors, the simulated Sentinel-3 (also called Sentinel-3-like) images are aggregated from the Sentinel-2 images. For zone 1, the fine-coarse image pairs (i.e., Sentinel-2 and simulated Sentinel-3) acquired on 29 May and 3 July 2020, respectively, and the simulated Sentinel The fusion results from different methods are shown in Figure 9. In zone 1, ELRFM achieves a similar result to that of STARFM and FSDAF methods, while it is superior to detect the feature type changes in some areas. For example, in the yellow ellipses of the original image, the ELRFM detects the transition between the crops and the bare land. In zone 2, the ELRFM works better where the water area changes significantly. For example, in the yellow ellipses shown in the original image, the ELRFM predicts the water body more accurately than the other methods. The boundaries of the water are almost within one coarse pixel, which shows that the ELRFM has the advantage of being able to predict small LULC type changes within one coarse pixel. Moreover, the tests in zone 1 and 2 illustrate that the ELRFM cloud works well even when the fine resolution and the coarse resolution differ by 30 times.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 16 are almost within one coarse pixel, which shows that the ELRFM has the advantage of being able to predict small LULC type changes within one coarse pixel. Moreover, the tests in zone 1 and 2 illustrate that the ELRFM cloud works well even when the fine resolution and the coarse resolution differ by 30 times. In addition, 599 and 700 sample points are randomly generated in zone 1 and 2, respectively, for quantitative evaluation and scatter plot drawing. The results are shown in Table S1, Figures S3 and  S4 of the Supplementary Materials.

Discussion
Unlike the STARFM and FSDAF methods that use similar pixels to predict reflectance, ELRFM builds a linear model for each pixel. Therefore, it is able to preserve spatial detail. In addition, ELRFM is more computationally efficient than STARFM and FSDAF, as it is based on the GEE platform and uses GEE's cloud computing capability.
The proposed method also has certain limitations. It is not always better than the STARFM and FSDAF methods at predicting the reflectance. The scatter plots of the observed and predicted reflectance of zone 1 (see Figure S3) show that the ELRFM is inferior to STARFM and FSDAF, with a lower (e.g., for the green band, ELRFM vs. STARFM and FSDAF: 0.6526 vs. 0.8044 and 0.7999). From the quantitative evaluation results of zone 1 (see Table S1), ELRFM only predicts the lowest RMSE (ELRFM vs. STARFM, FSDAF, and LR: 171.61 vs. 183.75, 172.69 and 282.56) and a higher (0.96 vs. 0.95, 0.96 and 0.92) in the red band with slight advantages. In the prediction of the green and NIR bands, the FSDAF has the smallest RMSE. In all three bands, the predicted result from the STARFM has the smallest AAD. The above evaluation results show that the STARFM and FSDAF methods perform better than the ELRFM. This is because zone 1 is mainly faced with surface reflectance changes rather than LULC changes. The STARFM and FSDAF methods are advantageous in this case.
The observed and predicted reflectance scatter plots of zone 2 (see Figure S4), where some areas experienced LULC type changes, show that the ELRFM is superior to the STARFM and FSDAF methods by predicting the highest in all three bands. Taking the green band as an example, the of the ELRFM is 0.951, while those of the STARFM, FSDAF, and LR methods are 0.5913, 0.8716 and 0.9368, respectively. The quantitative evaluation index results of the four methods in zone 2 are shown in Table S1. In this zone, the RMSE and AAD of the ELRFM in the three bands are the smallest (e.g., in the green band, ELRFM vs. STARFM, FSDAF, and LR, RMSE: 111.07 vs. 146.23, 173.67 and 123.66; AAD: 67.27 vs. 75.14, 102.14 and 71.46) and the is the largest (0.98 vs. 0.96, 0.93 and 0.97), which shows that the ELRFM has the best prediction accuracy in zone 2. This is because LULC Figure 9. Comparison of the original Sentinel-2 image (near infrared, red, green bands as RGB) and the predictions of different fusion methods for the other two test areas.
In addition, 599 and 700 sample points are randomly generated in zone 1 and 2, respectively, for quantitative evaluation and scatter plot drawing. The results are shown in Table S1, Figures S3 and S4 of the Supplementary Materials.

Discussion
Unlike the STARFM and FSDAF methods that use similar pixels to predict reflectance, ELRFM builds a linear model for each pixel. Therefore, it is able to preserve spatial detail. In addition, ELRFM is more computationally efficient than STARFM and FSDAF, as it is based on the GEE platform and uses GEE's cloud computing capability.
The proposed method also has certain limitations. It is not always better than the STARFM and FSDAF methods at predicting the reflectance. The scatter plots of the observed and predicted reflectance of zone 1 (see Figure S3) show that the ELRFM is inferior to STARFM and FSDAF, with a lower R 2 (e.g., for the green band, ELRFM vs. STARFM and FSDAF: 0.6526 vs. 0.8044 and 0.7999). From the quantitative evaluation results of zone 1 (see Table S1), ELRFM only predicts the lowest RMSE (ELRFM vs. STARFM, FSDAF, and LR: 171.61 vs. 183.75, 172.69 and 282.56) and a higher r (0.96 vs. 0.95, 0.96 and 0.92) in the red band with slight advantages. In the prediction of the green and NIR bands, the FSDAF has the smallest RMSE. In all three bands, the predicted result from the STARFM has the smallest AAD. The above evaluation results show that the STARFM and FSDAF methods perform better than the ELRFM. This is because zone 1 is mainly faced with surface reflectance changes rather than LULC changes. The STARFM and FSDAF methods are advantageous in this case.
The observed and predicted reflectance scatter plots of zone 2 (see Figure S4), where some areas experienced LULC type changes, show that the ELRFM is superior to the STARFM and FSDAF methods by predicting the highest R 2 in all three bands. Taking the green band as an example, the R 2 of the ELRFM is 0.951, while those of the STARFM, FSDAF, and LR methods are 0.5913, 0.8716 and 0.9368, respectively. The quantitative evaluation index results of the four methods in zone 2 are shown in Table S1. In this zone, the RMSE and AAD of the ELRFM in the three bands are the smallest (e.g., in the green band, ELRFM vs. STARFM, FSDAF, and LR, RMSE: 111.07 vs. 146.23, 173.67 and 123.66; AAD: 67.27 vs. 75.14, 102.14 and 71.46) and the r is the largest (0.98 vs. 0.96, 0.93 and 0.97), which shows that the ELRFM has the best prediction accuracy in zone 2. This is because LULC changes occurred in many parts of this region, and the ELRFM is based on the linear regression of each pixel, which can better capture this change.
Developing generic fusion algorithms suitable for all types of scenarios and applications remains an open problem. In general, the ELRFM is advantageous in monitoring the changes in surface feature categories. Therefore, it could be used in applications focused on surface feature changes, such as water monitoring. For applications that focus on changes in surface reflectance, such as crop growth, the STARFM and FSDAF methods are a better choice.
The algorithm proposed in this paper is an improvement on the linear fitting method, and it generally more accurate than the linear method. However, due to the statistical calculation and compensation of the residual error within each coarse pixel, the ELRFM has a plate effect on the coarse pixels scale. In addition, a fixed threshold strategy in determining the degree of change also has certain limitations. Further research could be conducted on automatic threshold determination based on the statistical information in the coarse pixels. As data acquisition and processing become easier, further work could also focus on the comprehensive use of time series information to aid data spatio-temporal fusion. For example, a time series could be used to make preliminary predictions and then compensate for residuals.

Conclusions
Dense daily time series Earth observation data are important for surface monitoring. However, they are currently not freely available. Therefore, data fusion methods can be applied to generate synthetic high-resolution and dense images. In this paper, we developed an efficient spatio-temporal fusion model, ELRFM, based on linear regression. This method compensates for the residual error between the linear regression prediction results and the actual coarse resolution image with a regression coefficient.
The ELRFM maintains spatial details and presents a better ability to capture small feature changes at the fine image scale compared with the spatial and temporal adaptive reflectance fusion model (STARFM) and the flexible spatio-temporal data fusion (FSDAF) method. In addition, it is more computationally efficient than the STARFM and FSDAF due to the simple design and implementation on the Google Earth Engine platform. The ELRFM is applicable not only to the fusion of Landsat and the moderate resolution imaging spectroradiometer (MODIS), but also to the fusion of Sentinel-2 and Sentinel-3. However, since the ELRFM is based on certain assumptions and the residual compensation process in it is an approximate calculation, the prediction accuracy will not be high in cases that do not meet these assumptions. For example, a longer study period will result in lower prediction accuracy, because the land-use/land-cover types do not typically show linear changes over a long period. Overall, the proposed method is typically more accurate than the linear fitting method, and, in some cases (such as feature type changes in the coarse pixel), outperforms the existing STARFM and FSDAF methods. The ELRFM could be used as a supplement for the spatio-temporal fusion algorithm community.
The method proposed performs well on the fusion of simulated data. Future research could consider its testing and application in real scenarios. In addition, with more satellite launches and an increasing number of freely available medium and high-resolution images, future research may include other higher spatial resolution data such as Gaofen-1 wide field of view (GF-1 WFV) sensor, which became free in 2019. These data have a higher temporal resolution (a 2-day revisit cycle) than Landsat and presents a great potential for time series applications.