An Improved STARFM with Help of an Unmixing-Based Method to Generate High Spatial and Temporal Resolution Remote Sensing Data in Complex Heterogeneous Regions

Remote sensing technology plays an important role in monitoring rapid changes of the Earth's surface. However, sensors that can simultaneously provide satellite images with both high temporal and spatial resolution haven’t been designed yet. This paper proposes an improved spatial and temporal adaptive reflectance fusion model (STARFM) with the help of an Unmixing-based method (USTARFM) to generate the high spatial and temporal data needed for the study of heterogeneous areas. The results showed that the USTARFM had higher accuracy than STARFM methods in two aspects of analysis: individual bands and of heterogeneity analysis. Taking the predicted NIR band as an example, the correlation coefficients (r) for the USTARFM, STARFM and unmixing methods were 0.96, 0.95, 0.90, respectively (p-value < 0.001); Root Mean Square Error (RMSE) values were 0.0245, 0.0300, 0.0401, respectively; and ERGAS values were 0.5416, 0.6507, 0.8737, respectively. The USTARM showed consistently higher performance than STARM when the degree of heterogeneity ranged from 2 to 10, highlighting that the use of this method provides the capacity to solve the data fusion problems faced when using STARFM. Additionally, the USTARFM method could help researchers achieve better performance than STARFM at a smaller window size from its heterogeneous land surface quantitative representation.


Introduction
High spatial and temporal resolution remote sensing technology plays an important role in land-cover detection, crop growth monitoring and phenological parameter inversion [1]. Unfortunately, it is impossible to obtain high temporal resolution and high spatial resolution images simultaneously from one sensor mounted on a satellite [2,3]. For example, Landsat series multi-spectral images at 30-m resolution have wide applications in extracting vegetation indices, monitoring land cover dynamic changes and ecological system variation studies. This system is widely used because of its finer spatial resolution, rich archive, and free availability [4][5][6]. However, the 16-day revisit cycle and the influence of bad weather, such as rain and clouds, make it difficult to acquire the continuous and cloudless remote sensing images that may be required to monitor certain Earth surface changes [7,8]. To resolve the difficulties that STARFM presents from the mixed pixel of MODIS in heterogeneous regions, we developed an improved STARFM named Unmixing-based STARFM (USTARFM) with the help of an Unmixing-based algorithm. The main objectives of this study include: (1) improve the performance of STARFM in heterogeneous areas, through unmixing the coarse pixel to gain specific land cover reflectance as the basis for further fusion rather than directed resampled data; (2) compare the fusion results of USTARFM and STARFM at different window sizes, and (3) assess the influence of landscape heterogeneity on the fusion performance of USTARFM. The USTARFM approach was applied to generate Landsat-like resolution and MODIS-like frequency images, and was tested using the Landsat 8/7 and MODIS reflectance as the reference data in two study areas.

Description of USTARFM
The coarse image on date t 0 (predicted date) and a pair of fine-resolution and coarse-resolution remote sensing images on the same date t k (basis date) are needed for the USTARFM algorithm. Incorporating the clusters obtained from the fine resolution image on the t k as input component definition, the coarse resolution images (t 0 and t k ) were unmixed to get the land cover cluster reflectance. Unmixing data was used in place of the directly resampled data for the STARFM. The subsequent steps, like the STARFM, were run to predict the Landsat-like image. A sketch map of the USTARFM algorithm is shown in Figure 1, and the detailed implementation steps are listed as follows.

Land Cover Cluster and Abundance Extraction
The class types and related abundance within the MODIS mixed pixel are the two fundamental parameters for unmixing mixed pixels. The ISODATA algorithm has some further refinements by splitting and merging of clusters. Clusters are merged if either the number of members (pixel) in a cluster is less than a certain threshold or if the centers of two clusters are closer than a certain threshold. The ISODATA method has been used in many previous studies [1,[18][19][20][21]. In our study, land cover was clustered using this method. The abundance of each class within MODIS pixel was calculated by summing the number of pixels occupied by each class and then we compared that to the total pixel number of coarse pixels [31,32]. The calculated abundance of each class and the class types were assumed constant during the prediction period [18,19,31], which is the precondition to downscaling the low resolution images at different dates (i.e., t k and t 0 ).

Unmixing Data
Mixed pixel decomposition based on the linear spectral mixture model is a popular method for estimating land cover fraction [15]. The surface reflectance of mixed pixels can be calculated as the sum of mean reflectance values of the different land-cover classes within the pixel, weighted by the corresponding abundance [33]. The abundance of classes and surface reflectance within coarse pixels (usually in window size) as input parameters can be solved with the ordinary least squares technique, producing the mean surface reflectance value (r) for each land cover component as shown in Equation (1). Then mean reflectance values are assigned to each fine-resolution pixel of the classification map according to the class labels. The spectral unmixing model is sensitive to the co-linearity caused by high correlations between endmembers, leading to inversion of ill-posed matrices [34]. This problem could be decreased by increasing the number of equations through enlarging the search window to adopt more classes, i.e., increasing n in Equation (1) [22]. Furthermore, due to the existence of spatial heterogeneity, the solution of Equation (1) should also be conducted in a rectangular window, which can preserve spatial heterogeneity to the extent [31]. The oversize window scale containing too many pixels to make the model weakens spatial variation and leads to degradation in the accuracy of mixed pixel decomposition [1,31]. Hence, an appropriate window size is crucial to the decision of the coarse pixels decomposition. A set of different window sizes were tested to determine the optimal window size, and the optimal window size was measured with three metrics: the correlation coefficient (γ), Root Mean Square Error (RMSE), and the Erreur Relative Globale Adimensionalle de Synthèse (ERGAS) between the synthetic image and the reference fine-resolution image [1,31,35]: where R pi, tq is the reflectance of coarse-pixel i at time t, f c (i,c) is the abundance of class c in the i th coarse pixel, r pc, tq is the mean reflectance of class c at time t, ξ pi, tq is the residual error term, and k is the number of classes, n is the total coarse pixels number within the predefined window. Apparently, the class number and the window size in Equation (1) are key parameters for the unmixing-based method. Therefore, by varying the class number (k) and the sliding window size (W), a series of unmixing data can be obtained and assessed by the metrics of quantative assessment indices. The optimal combination (k,W) of class number and window size was finally determined to unmix the MODIS scale resolution images [1].

Fused Image Generation
The two unmixed images from MODIS on date t k and t 0 , and one fine-resolution image on date t k are applied to predict the fine-resolution image on date t 0. The critical step to implement the USTARFM algorithm, as well as the STARFM, is weighting the spatial information from neighboring pixels which are used to estimate reflectance of the central pixel and allow the weight function to be flexibly adjusted according to land surface complexity and heterogeneity [25]. To ensure that suitable information is acquired from neighboring pixels, similar spectral pixels from fine-resolution images within the moving window are involved to compute the reflectance of the central pixel. Then these similar pixels are weighed and used to calculate the reflectance of the central pixel as shown in Equation (3).
where L(x w/2 ,y w/2 ,t 0 ) represents the central pixel estimated reflectance of the moving window w at Landsat-like scale, L(x i ,y j ,t k ) and M(x i ,y j ,t k ) are the given location ( x i ,y j ) pixel surface reflectance of the fine resolution image and unmixing data at the base date t k and M(x i ,y j ,t 0 ) is the value of the unmixing data of MODIS on the prediction date t 0 . The W ijk refers to the weight of similar pixel (x i ,y j ,t k ) , which is determined by three indices: spectral difference (S ijk ), temporal difference (T ijk ) and spatial distance between the central pixel and the similar pixel (D ijk ). A smaller value of S ijk implies that the fine-resolution pixel has spectral features that are similar to the coarse pixel, and thus, pixel reflectance should be assigned a higher weight in Equation (2). A smaller T ijk means there is less change in the surface reflectance between t k and t 0 ; thus, the pixel should be assigned with a higher weight. A smaller D ijk means there is a closer distance to the central pixel, and therefore a higher weight for the pixel should be assigned.
For a more detailed description of the above algorithm readers may refer to [25].

Test Data and Preprocessing
The study region is located in the territory of Hengshui, Hebei Province in China (37.52°-37.

Test Data and Preprocessing
The study region is located in the territory of Hengshui, Hebei Province in China (37.52˝-37.73˝N, 115.44˝-115.71˝E) (Figure 2), and covers approximately 25 kmˆ25 km. This region is dominated by fragmented cultivated land as shown in Figure 2a that is mixed with small areas of urbanized land ( Figure 2b) and water bodies (Figure 2c). Broken parcels of cultivated land in this area are a typical feature because small land parcels are the basic management unit for farmers in China's agricultural policy. These characteristics make this area a good place to test the applicability of USTARFM to identify fragmented agricultural units in the landscape. The study dataset includes two Landsat 8 OLI images and two date 500 m daily MODIS surface reflectance products (MOD09GA) acquired on 19 August and 4 September 2014 (Table 1), which were downloaded from the USGS website (URL = http://earthexplorer.usgs.gov/). Images from these dates are of good quality and represent clear sky conditions. The Landsat 8 data products have been processed in geometric correction based on the terrain data. The main pre-processing included two steps: radiance calibration and atmospheric correction, which were conducted using FLAASH tools [36], then capable of comparing with the corresponding band of MODIS [26]. MOD09GA is the standard surface reflectance and provides MODIS band 1-7 daily surface reflectance at 500 m resolution accompanied with high-precision geolocation (approximately 50 m at NADIR) [1]. These MODIS images were re-projected from the native SIN projection to a UTM-WGS84 geospatial system and sampled by means of the MRT (MODIS Re-projection Tool). Similar to previous research, no further geometric correction was needed because the MODIS data were assumed to be well co-registered and matched the geographical position with the Landsat 8 data [1]. Additionally, compared to the models of bilinear filtering and cubic convolution, nearest neighbor resampling is an effective method to inherit the maximum original remote sensing spectral value. The MODIS at 500 m resolution were resampled into 480 m in order to match 16 times of 30 m of Landsat images, which facilitated the subsequent decomposition of MOD09GA pixels [1,31]. The green, red, and near-infrared bands, like [1,30], were chosen to test USTARFM performance. The study dataset includes two Landsat 8 OLI images and two date 500 m daily MODIS surface reflectance products (MOD09GA) acquired on 19 August and 4 September 2014 (Table 1), which were downloaded from the USGS website (URL = http://earthexplorer.usgs.gov/). Images from these dates are of good quality and represent clear sky conditions. The Landsat 8 data products have been processed in geometric correction based on the terrain data. The main pre-processing included two steps: radiance calibration and atmospheric correction, which were conducted using FLAASH tools [36], then capable of comparing with the corresponding band of MODIS [26]. MOD09GA is the standard surface reflectance and provides MODIS band 1-7 daily surface reflectance at 500 m resolution accompanied with high-precision geolocation (approximately 50 m at NADIR) [1]. These MODIS images were re-projected from the native SIN projection to a UTM-WGS84 geospatial system and sampled by means of the MRT (MODIS Re-projection Tool). Similar to previous research, no further geometric correction was needed because the MODIS data were assumed to be well co-registered and matched the geographical position with the Landsat 8 data [1]. Additionally, compared to the models of bilinear filtering and cubic convolution, nearest neighbor resampling is an effective method to inherit the maximum original remote sensing spectral value. The MODIS at 500 m resolution were resampled into 480 m in order to match 16 times of 30 m of Landsat images, which facilitated the subsequent decomposition of MOD09GA pixels [1,31]. The green, red, and near-infrared bands, like [1,30], were chosen to test USTARFM performance.

Implementation Procedure
The ISODATA algorithm was applied to cluster a Landsat image collected August 19, generating land cover component maps with class number (k) as [5,10,15,20,25,30]. Two kinds of window scales which have pronounced impacts on the performance of the USTARFM method are concerned: the optimized window (W, MODIS pixel) for unmixing MODIS spectrum and the window (w, Landsat pixel) of searching the similar pixels for STARFM and USTARFM. For the W, a series of window size scales, [5ˆ5, 7ˆ7, 11ˆ11, 15ˆ15, 21ˆ21, 31ˆ31, 41ˆ41], are used to unmix the MODIS data acquired on Aug. 19 at different class numbers, respectively. Next, the correlation coefficient (γ), RMSE and ERGAS were iteratively tested to unmix MODIS images to achieve the best results. This was completed as the popular benchmark to measure the difference between the reference Landsat 8 image and the unmixing data at different window sizes (W) and class number (k), and the optimal combination of the two parameters. An assumption considered here was that the relationship between land surface and landscape derived from the Landsat data are stable under the condition of the same area and same sensor [31]. So, the optimal window size determined from the 19 August 2014 images was inherited and further used to unmix the MODIS data acquired on 4 September 2014. The unmixing data for 19 August 19 and 4 September 2014 and the base date Landsat 8 data for 19 August 2014 were used together to predict the Landsat-like images on 4 September 2014 by weighting the similar pixels within a certain window size. The window sizes w (Landsat OLI pixel) were set as [7ˆ7, 11ˆ11, 31ˆ31, 61ˆ61, 101ˆ101, 151ˆ151].
The performance of the synthetic images were assessed using the correlation coefficient (γ) and p-value, RMSE, and ERGAS metrics [20,22]. The γ indicates the linear correlativity between fusion data and the reference data, and higher γ values represent better correlation between synthetic images and reference images; lower RMSE values show the lower fusion errors of synthetic images. In addition, ERGAS is a common comprehensive evaluation index for assessing the quality of a synthetic image which is able to quantity the spatial similarity degree of the fusion image and reference image [18,37]. These assessment metrics have different focus to show the performance of the synthetic images as shown in Equations (4)-(6): where N refers to the pixels number of fine-resolution image; x j and y j refer to the j th pixel values of the synthetic and reference images, respectively; and x and y are the mean value of synthetic data and reference data, respectively. ERGAS is calculated as: where H is the fine-resolution pixel size, such as 30 m; L is the coarse-resolution pixel size (480 m); N ban refers to the band number; RMSE refers to the i th band between the synthetic and reference data; and M i refers to the mean spectrum value of reference data in i th band.

Algorithm Performance Analysis Influenced by W, k and w
We tested whether USTARFM is an improved STARFM algorithm and compared the results with STARFM to determine whether improvements could be obtained from this algorithm. The unmixed image as traditional Unmixing model is also produced during the USTARFM process, so we put the three fusion methods together to achieve a more comprehensive understanding of the synthetic image performance. Table 2 shows the performances of the three fused methods with varying W and k settings using the metrics of γ, RMSE and ERGAS as benchmarks to determine the optimal combination of window size and class number for unmixing MODIS images (the best options are underlined and bold in Table 2). For the green and red bands, the optimal combination of window size (W) and class number is 31ˆ31 and 10, and for NIR, the window size (W) and class number are 31ˆ31 and 5, respectively. Considering that the metrics for the NIR band are similar to the green and red band, and more classes are feasible to guarantee class features uniform, we chose the window size of 31ˆ31 MODIS pixels and the class number of 10 as the optimal combination for further unmixing MODIS data acquired Sept. 4.
From the above analysis, the W and k were determined. The next step was to focus on how to calculate w for USTARFM. Table 3 shows the performance of the USTARFM and STARFM methods at different window sizes (w) to search the similar pixels for the central pixel. The USTARFM algorithm achieves the optimal effect when the window size w is 11ˆ11 OLI pixels for red and green bands, and 31ˆ31 OLI pixels for the NIR band. Compared to 31ˆ31 for the STARFM, the difference of optimal window size between NIR and visible bands may be caused by their reflectance characteristics which affect the similar pixels search efficiency. Relative to NIR, the green and red are the short-wave bands, which are sensitive to weather related factors, such as haze.
Though the atmosphere process has been applied for MODIS data, the atmosphere effects cannot be entirely removed [38]. For the other two bands, the indices for the three methods have the same trend as the green band. Table 3 also illustrates that the best window size of USTARFM is smaller than that of STARFM, and at the same window size, the three indices of USTARFM are also better than those of STARFM. This is mainly because the unmixing data used in USTARFM is efficient to reflect the surface reflectance comparing to the directed resampled data used in STARFM.

Accuracy Assessment Under the Best Parameters Setting
For the green band at the optimal window size (W = 31ˆ31), the γ = 0.91 of the USTARFM method is higher than those of the STARFM (0.89) and the unmixing method (0.82); meanwhile, USTARFM shows the lowest error (RMSE = 0.0116) compared to the prediction of the STARFM (RMSE = 0.0127) and the unmixing method (RMSE = 0.0182); ERGAS shows the same trend (1.1502 vs. 1.2559 vs. 1.8052, respectively). Figure 3 shows the scatterplots between reflectance estimate from USTARFM, STARFM and unmixing based method, with the optimized parameter setting, (W, k) and w, and the corresponding reference bands of Landsat 8 image. USTARFM inspired from STARFM improves the performance of central pixel reflectance estimation by weighting the neighboring similar pixels within the moving window, which reflected the spatial variability well, producing scatter point distributions close to 1:1. The γ of each band of images generated by USTARFM and STARFM is superior to that of the unmixing method, by about 0.02. The reason is that the reflectance calculated by the unmixing method is the mean reflectance of each clustered class within the window, which blurs the spatial heterogeneity and makes the scatterplot appear in "stripe" pattern (Figure 3g-i). Generally, the images at Landsat-like image predicted by the USTARFM are more accurate than those derived from the STARFM and the unmixing methods.
1:1. The γ of each band of images generated by USTARFM and STARFM is superior to that of the unmixing method, by about 0.02. The reason is that the reflectance calculated by the unmixing method is the mean reflectance of each clustered class within the window, which blurs the spatial heterogeneity and makes the scatterplot appear in "stripe" pattern (Figure 3g-i). Generally, the images at Landsat-like image predicted by the USTARFM are more accurate than those derived from the STARFM and the unmixing methods.  Figure 3. Scatterplots of the real reflectance and the predicted product produced by the three algorithms for the green, red and NIR-infrared bands.

Landscape Heterogeneity Impact on USTARFM Performance
The heterogeneity over the regions where complex mixtures of land-cover type commonly exist represents a big challenge for STAFRM to identify pure pixels [25,30]. To analyze the performance of the USTARFM algorithm in a complex landscape, the MODIS pixel scale, 480 m × 480 m, was defined as a grid to divide the study area, and each grid was marked according to its heterogeneous degree. Here, we developed the degree of heterogeneity (h) metric using the clustered of thematic map from the Landsat 8 image as described in Equation (7). This idea is based on the quantity of land cover classes in one grid. More than one class type in one grid implies a higher degree of heterogeneity . If (Nc > 0) (Nc is the pixel number of class c) is a determinant function to judge whether class c appears in the grid. P is marked as 1 if Nc is more than 1: Figure 3. Scatterplots of the real reflectance and the predicted product produced by the three algorithms for the green, red and NIR-infrared bands.

Landscape Heterogeneity Impact on USTARFM Performance
The heterogeneity over the regions where complex mixtures of land-cover type commonly exist represents a big challenge for STAFRM to identify pure pixels [25,30]. To analyze the performance of the USTARFM algorithm in a complex landscape, the MODIS pixel scale, 480 mˆ480 m, was defined as a grid to divide the study area, and each grid was marked according to its heterogeneous degree. Here, we developed the degree of heterogeneity (h) metric using the clustered of thematic map from the Landsat 8 image as described in Equation (7). This idea is based on the quantity of land cover classes in one grid. More than one class type in one grid implies a higher degree of heterogeneity . If (N c > 0) (N c is the pixel number of class c) is a determinant function to judge whether class c appears in the grid. P is marked as 1 if N c is more than 1: Num is the total class in the thematic map, here maximum number of the land cover type is 10. From this definition of the heterogeneity metric, the heterogeneity level of a grid with only one class is defined as 1 which represents the lowest heterogeneous degree. Otherwise, the higher the number of classes that appear in a pixel, the higher the degree of heterogeneity. The highest level of heterogeneity assigned was 10 ( Figure 4). From Figures 2 and 4 it can be seen clearly that the areas of higher heterogeneity are distributed from the north east to south west area where fragmented cultivated land is common. The areas of lower heterogeneous grids were concentrated in the northwest which is dominated by larger cultivated fields.
Num is the total class in the thematic map, here maximum number of the land cover type is 10. From this definition of the heterogeneity metric, the heterogeneity level of a grid with only one class is defined as 1 which represents the lowest heterogeneous degree. Otherwise, the higher the number of classes that appear in a pixel, the higher the degree of heterogeneity. The highest level of heterogeneity assigned was 10 ( Figure 4). From Figures 2 and 4, it can be seen clearly that the areas of higher heterogeneity are distributed from the north east to south west area where fragmented cultivated land is common. The areas of lower heterogeneous grids were concentrated in the northwest which is dominated by larger cultivated fields.   Figure 5 shows the RMSE and γ of Green, Red and NIR data generated by the three fused methods at different heterogeneity levels. For the three bands, the change trend decreases is accompanied by the heterogeneity degree increases, highlighting that the heterogeneity is a sensitive variable for the STARFM and has side-effects on the performance of the proposed method. However, an interesting result is that higher performance was achieved from USTARFM than those of STARFM at each heterogeneity level, except for level one (Figure 5a-d).  Figure 5 shows the RMSE and γ of Green, Red and NIR data generated by the three fused methods at different heterogeneity levels. For the three bands, the change trend decreases is accompanied by the heterogeneity degree increases, highlighting that the heterogeneity is a sensitive variable for the STARFM and has side-effects on the performance of the proposed method. However, an interesting result is that higher performance was achieved from USTARFM than those of STARFM at each heterogeneity level, except for level one (Figure 5a-d).  Table 4 analyzes the relationship of the unmixing and resampled MODIS data at different heterogeneity levels to the Landsat reference data on 4 September 2014, which illustrates why USTARFM is superior to the STARFM. The relationship was obviously higher from 2 to 10 level, consistent with the trend of USTARFM trend. This is because the unmixing data instead of directed sampling data as basis for USTARFM can describe the mixed landscape clearly, especially in the higher heterogeneity level area, while the resampled method just offer the STARM uniform spot without reflecting the land cover heterogeneous characteristics. However, the areas at the heterogeneity level of 1 are special as the most homogeneous region, where STARFM has better performance. The USTARFM precedes the unmixing method but is inferior to STARFM for the green and red bands. Taking the result of the green band as an example, the RMSE and γ of the green band predicted by the STARFM are better than those for USTARFM (for green, RMSE: 0.0097 vs. 0.0102; γ: 0.82 vs. 0.79). This is because the accuracy of the directly resampled data in the homogeneous area for the STARFM is better than the unmixing data for USTARFM (for green, γ: 0.89 vs. 0.77; RMSE: 0.0102 vs. 0.0128) ( Table 4), resulting in the lower accuracy of the proposed method.  Table 4 analyzes the relationship of the unmixing and resampled MODIS data at different heterogeneity levels to the Landsat reference data on 4 September 2014, which illustrates why USTARFM is superior to the STARFM. The relationship was obviously higher from 2 to 10 level, consistent with the trend of USTARFM trend. This is because the unmixing data instead of directed sampling data as basis for USTARFM can describe the mixed landscape clearly, especially in the higher heterogeneity level area, while the resampled method just offer the STARM uniform spot without reflecting the land cover heterogeneous characteristics. However, the areas at the heterogeneity level of 1 are special as the most homogeneous region, where STARFM has better performance. The USTARFM precedes the unmixing method but is inferior to STARFM for the green and red bands. Taking the result of the green band as an example, the RMSE and γ of the green band predicted by the STARFM are better than those for USTARFM (for green, RMSE: 0.0097 vs. 0.0102; γ: 0.82 vs. 0.79). This is because the accuracy of the directly resampled data in the homogeneous area for the STARFM is better than the unmixing data for USTARFM (for green, γ: 0.89 vs. 0.77; RMSE: 0.0102 vs. 0.0128) ( Table 4), resulting in the lower accuracy of the proposed method. In Figure 6g, the higher reflectance values for red and green bands seem to be underestimated in some scatter points (residential region, a MODIS pixel scale), which may be because the pure coarse pixel was affected by the surrounding pixels when it is decomposed in Equation (1), and the error of the unmixing data causes accumulated error in the USTARFM (Figure 6a). However, for the NIR band, the accuracy of the unmixing data is slightly better than directly resampled data in the homogeneous area (γ: 0.99 vs. 0.98; RMSE: 0.0487 vs. 0.0492), which makes the RMSE of the USTARFM lowest compared to the STARFM and unmixing methods (0.0198 vs. 0.0235 vs. 0.0487) (Figure 5e) and the scatterplot more close to 1:1 (Figure 6c). This is mainly because, relative to the green and red, NIR is the long-wave bands, which are not sensitive to weather related factors such as haze and have a high signal-to-noise when they are decomposed linearly [37].

Synthetic Image Analysis
The synthetic images of these three methods look similar to the reference Landsat 8 images acquired 4 September 2014 (Figure 7a). Figure 7b-d shows that land cover with large area, e.g., water bodies, residential areas and wide roads, can be clearly identified. From the subplots of the synthetic images (Figure 7f-h), the synthetic image in the heterogeneous region produced by the unmixing method appears somewhat "blurry" and a homogeneous "plot" in which we could not discern the small objects. In contrast, areas from the images generated by the USTARFM and STARFM represent more spatial details (Figure 7g-h). Compared with the image predicted by STARFM (Figure 7g), the result produced by the USTARFM has a lower spectral deviation to the actual spectrum (Figure 7e) in the fragmentation region. This is because the directly resampled data used in STARFM has uniform spectral reflectance, which is unable to reveal the heterogeneity within the MODIS pixel.

Synthetic Image Analysis
The synthetic images of these three methods look similar to the reference Landsat 8 images acquired 4 September 2014 (Figure 7a). Figure 7b-d shows that land cover with large area, e.g., water bodies, residential areas and wide roads, can be clearly identified. From the subplots of the synthetic images (Figure 7f-h), the synthetic image in the heterogeneous region produced by the unmixing method appears somewhat "blurry" and a homogeneous "plot" in which we could not discern the small objects. In contrast, areas from the images generated by the USTARFM and STARFM represent more spatial details (Figure 7g-h). Compared with the image predicted by STARFM (Figure 7g), the result produced by the USTARFM has a lower spectral deviation to the actual spectrum (Figure 7e) in the fragmentation region. This is because the directly resampled data used in STARFM has uniform spectral reflectance, which is unable to reveal the heterogeneity within the MODIS pixel.

Algorithm Applicability Analysis
We analyzed another series of study dates to test the applicability of USTARFM. This study area is situated in a region of Virginia, in the Eastern United States (about 37.51°-37.72° N and 76.99°-77.26° W) where there are many small patches of different land types including forest, bare soil, and residential patches. The dataset includes Landsat 7 ETM+ and MOD09GA images acquired for 25 January, 26 February, and 17 May, in 2002. In the study, the Landsat 7 ETM+ and MOD09GA for 25 January is the base data for predicting the Landsat-like data collected 26 February and 17 May 2002. The dataset has been applied for data fused study, and the detailed description about study area and dataset can be found in publication [30].
The series parameters for the USTARFM were the same as for the previous experiment. Through analyzing the unmixed results of MODIS data for data acquired 25 January 2002, the optimal combination of window size and the class number (W = 31, k = 10) were also determined for unmixing the MODIS data acquired for 26 3487 vs. 1.6289). However, the performance of the predicted result is lower than those acquired on 26 February 2002, because some changes of land-cover types from the larger time span occurred, which is also consistent with Zhu's result [30]. Figure 8 shows the synthetic images of 26 February and 17 May 2002 for the three methods. The prediction image by USTARFM (Figure 8(b1)) on 26 February 2002 is almost the same as that by STARFM (Figure 8(c1)) visually, but the quantitative evaluation of the former is better than the latter, and the unmixed image is worse for loss of texture details of surface ground in Figure 8(d1) (marked in circle). However, due to land cover changed over the long time span that single date Landsat was unable to record the changes, it was also unfulfillable to reflect the changed information from the fusion data [25]. This is why, in this paper, the predicted images from 17 May 2002 by USTARFM and STARFM have some variances from the actual image shown in Figure 8(a2) (marked by a rectangle). Therefore, the STAARCH model, proposed by Hilker [28], will offer us some new ideas to solve the

Algorithm Applicability Analysis
We analyzed another series of study dates to test the applicability of USTARFM. This study area is situated in a region of Virginia, in the Eastern United States (about 37.51˝-37.72˝N and 76.99˝-77.26˝W) where there are many small patches of different land types including forest, bare soil, and residential patches. The dataset includes Landsat 7 ETM+ and MOD09GA images acquired for 25 January, 26 February, and 17 May, in 2002. In the study, the Landsat 7 ETM+ and MOD09GA for 25 January is the base data for predicting the Landsat-like data collected 26 February and 17 May 2002. The dataset has been applied for data fused study, and the detailed description about study area and dataset can be found in publication [30].
The series parameters for the USTARFM were the same as for the previous experiment. Through analyzing the unmixed results of MODIS data for data acquired 25 January 2002, the optimal combination of window size and the class number (W = 31, k = 10) were also determined for unmixing the MODIS data acquired for 26 . 1.6289). However, the performance of the predicted result is lower than those acquired on 26 February 2002, because some changes of land-cover types from the larger time span occurred, which is also consistent with Zhu's result [30]. Figure 8 shows the synthetic images of 26 February and 17 May 2002 for the three methods. The prediction image by USTARFM (Figure 8(b1)) on 26 February 2002 is almost the same as that by STARFM (Figure 8(c1)) visually, but the quantitative evaluation of the former is better than the latter, and the unmixed image is worse for loss of texture details of surface ground in Figure 8(d1) (marked in circle). However, due to land cover changed over the long time span that single date Landsat was unable to record the changes, it was also unfulfillable to reflect the changed information from the fusion data [25]. This is why, in this paper, the predicted images from 17 May 2002 by USTARFM and STARFM have some variances from the actual image shown in Figure 8(a2) (marked by a rectangle). Therefore, the STAARCH model, proposed by Hilker [28], will offer us some new ideas to solve the problem. Besides, the high temporal frequency MODIS at long time scale can record the land surface changed trace, which may be a good alternative to reflect the crop changes in agricultural landscape. problem. Besides, the high temporal frequency MODIS at long time scale can record the land surface changed trace, which may be a good alternative to reflect the crop changes in agricultural landscape.  (2) and their enlarged sub-images (a1) and (a2); and the corresponding prediction sub-images by USTARFM (b1, b2), STARFM (c1, c2), and unmixing-based method (d1, d2).

Conclusions
The temporal-spatial fusion of remote sensing data is an effective approach to solve the dilemma of the simultaneous attainment of high temporal and spatial resolution remote sensing images. In this study, an USTARFM algorithm based on improved STARFM supported by an unmixing-based method was developed to generate high-temporal-spatial remote sensing data. The USTARFM algorithm was tested by the experimental data from Landsat 8/Landsat 7 and MODIS images in two study areas.
USTARFM adopts the unmixed fraction from the unmixing data, which is able to reflect surface information in heterogeneous areas, to take place of the directed resampled data used for STARFM, which can increase the probability of searching the "pure pixels" and guarantee the fusion accuracy. This enables the USTARFM to achieve better performance than STARFM at the same window size, even smaller (w, 11 × 11 vs. 31 × 31). In the Hengshui study area, taking the predicted NIR band reflectance as an example, with the best combination of window size, class number and similar pixels searching window, γ of the USTARFM, STARFM and unmixing were 0.96, 0.95, 0.90, respectively (p-value < 0.001); RMSE were 0.0245, 0.0300, 0.0401, respectively; and ERGAS were 0.5416, 0.6507, 0.8737, respectively. However, the scatterplots of synthetic data produced by USTARFM are more close to 1:1. The synthetic images produced by the USTARFM look more similar to the reference images in RGB combination manner (Figure 7). Especially in the broken area, USTARFM is skilled in reducing the homogeneous "plot" generated by the unmixing method and maintains the fidelity of the spectrum. In addition, the similar conclusions for USTARFM were also obtained for a second study area in Virginia which is a relatively heterogeneous area.
The influence of heterogeneity on the three methods shows the certain regularity ( Figure 5). The USTARM showed consistent higher performance than those of STARM, except for the Red and Green bands when heterogeneity degree as 1, highlighting that this method is capable of solving the problem of fused data that is present in STARM. It is an exciting result that, for the NIR band, the USTARFM is performs consistently better than STARFM when heterogeneity degree ranges from 1 to 10. Compared to the short-wave band as Red and Blue, the long-wave band, NIR, it is insensitive  (2) and their enlarged sub-images (a1) and (a2); and the corresponding prediction sub-images by USTARFM (b1, b2), STARFM (c1, c2), and unmixing-based method (d1, d2).

Conclusions
The temporal-spatial fusion of remote sensing data is an effective approach to solve the dilemma of the simultaneous attainment of high temporal and spatial resolution remote sensing images. In this study, an USTARFM algorithm based on improved STARFM supported by an unmixing-based method was developed to generate high-temporal-spatial remote sensing data. The USTARFM algorithm was tested by the experimental data from Landsat 8/Landsat 7 and MODIS images in two study areas.
USTARFM adopts the unmixed fraction from the unmixing data, which is able to reflect surface information in heterogeneous areas, to take place of the directed resampled data used for STARFM, which can increase the probability of searching the "pure pixels" and guarantee the fusion accuracy. This enables the USTARFM to achieve better performance than STARFM at the same window size, even smaller (w, 11ˆ11 vs. 31ˆ31). In the Hengshui study area, taking the predicted NIR band reflectance as an example, with the best combination of window size, class number and similar pixels searching window, γ of the USTARFM, STARFM and unmixing were 0.96, 0.95, 0.90, respectively (p-value < 0.001); RMSE were 0.0245, 0.0300, 0.0401, respectively; and ERGAS were 0.5416, 0.6507, 0.8737, respectively. However, the scatterplots of synthetic data produced by USTARFM are more close to 1:1. The synthetic images produced by the USTARFM look more similar to the reference images in RGB combination manner (Figure 7). Especially in the broken area, USTARFM is skilled in reducing the homogeneous "plot" generated by the unmixing method and maintains the fidelity of the spectrum. In addition, the similar conclusions for USTARFM were also obtained for a second study area in Virginia which is a relatively heterogeneous area.
The influence of heterogeneity on the three methods shows the certain regularity ( Figure 5). The USTARM showed consistent higher performance than those of STARM, except for the Red and Green bands when heterogeneity degree as 1, highlighting that this method is capable of solving the problem of fused data that is present in STARM. It is an exciting result that, for the NIR band, the USTARFM is performs consistently better than STARFM when heterogeneity degree ranges from 1 to 10. Compared to the short-wave band as Red and Blue, the long-wave band, NIR, it is insensitive to the atmospheric effect to ensure the unmixing result, even in the area when heterogeneous degree is 1.
On the whole, USTARFM works better than STARFM and the unmixing based method. As the heterogeneity increased, the USTARFM was better than the STARFM and unmixing method. But if the land-cover patches are "extremely" broken and the high-spatial remote sensing images are unable to capture the texture details, it is still difficult to achieve an acceptable result.
There are some issues that need to be addressed in further study. First, the accuracy of the USTARFM depends on the performance of the unmixing data generated from Equation (1). The USTARFM method could get better results depending on the higher precision unmixing data. How to achieve better unmixed data from development of unmixing model or finer remote sensing is still an important research topic. Next, the land-cover types changed greatly over the long time span of this research, the performances of USTARFM and STARFM are degraded, such as 0.12 and 0.16 of correlation coefficient in green band lower than usual analyzed in Virginia area in this study. This problem urgently needs to be addressed because this affects the ability of USTARFM to detect cultivated land where usually had abrupt change at short temporal scale. Third, the co-registration errors between Landsat 8 and MODIS is still potential factor with side-effect on the fusion result, which are usually neglected. Previous researchers have done related research on the influence of co-registration errors on data fusion [39,40]. Maybe, this is another way needing to be concerned to improve USTARFM performance. Last but not least, the applicability of the USTARFM method in inversing parameters, such as e.g. NDVI or biomass, need to be further tested. Although the work in this paper is based on Landsat and MODIS surface reflectance data are popular satellite sources, the USTARFM algorithm still has a prospect for other high resolution images (e.g., Chinese GF-1 satellite data with 16-m resolution, French SPOT-5/6satellite data with 10/6 m resolution, etc.) and medium resolution images (e.g., the NPOESS VIIRS data U.S. operational polar satellite constellation which is the new generation satellite after MODIS).