Generating Daily Synthetic Landsat Imagery by Combining Landsat and MODIS Data

Owing to low temporal resolution and cloud interference, there is a shortage of high spatial resolution remote sensing data. To address this problem, this study introduces a modified spatial and temporal data fusion approach (MSTDFA) to generate daily synthetic Landsat imagery. This algorithm was designed to avoid the limitations of the conditional spatial temporal data fusion approach (STDFA) including the constant window for disaggregation and the sensor difference. An adaptive window size selection method is proposed in this study to select the best window size and moving steps for the disaggregation of coarse pixels. The linear regression method is used to remove the influence of differences in sensor systems using disaggregated mean coarse reflectance by testing and validation in two study areas located in Xinjiang Province, China. The results show that the MSTDFA algorithm can generate daily synthetic Landsat imagery with a high correlation coefficient (R) ranged from 0.646 to 0.986 between synthetic images and the actual observations. We further show that MSTDFA can be applied to 250 m 16-day MODIS MOD13Q1 products and the Landsat Normalized Different Vegetation Index (NDVI) data by generating a synthetic NDVI image highly similar to actual Landsat NDVI observation with a high R of 0.97.


Introduction
Because Earth surface observations can be obtained periodically by satellite remote sensing, this technology has become a foremost technique for monitoring land surface processes [1]. Since 1978, satellite remote sensing of the land surface process has been dominated by polar-orbiting sensors including the Advanced Very High Resolution Radiometer (AVHRR) [2], Systeme Pour l'Observation de la Terre (SPOT) Vegetation (VGT) [3], and the Moderate Resolution Imaging Spectroradiometer (MODIS) [4]. The temporal resolution of these coarse resolution sensors is one-two days, which means they can image the entire Earth at one-two-day intervals. Owing to their high temporal resolutions, the time series data of these sensors are widely used in land surface processes dynamic monitoring applications [5][6][7][8] such as land cover and land change [9,10], crop mapping and production forecasts [11,12], disasters such as fires [13], floods [14], and algal blooms, and forest [15] and grassland [16] ecosystems. However, these high temporal resolution data have low spatial resolution; therefore, the signals recorded by these sensors are not suitable for highly spatially variant land surface processes monitoring. Medium spatial resolution sensors such as the Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+), and Operational Land Imager (OLI) sensors on Landsat satellites are other types satellite data widely used in land surface monitoring for applications such as detailed land use and land cover mapping [17,18], environmental monitoring [19], and ecological system dynamic monitoring [20,21]. However, these data have low temporal resolution and cloud interference; therefore, their application to land surface monitoring leads to a shortage of valid data. Leckie [22] found that the probability of acquiring cloud-free Landsat images for a given time with cloud cover <10% can be as low as 10%. Therefore, no single satellite can provide data to meet the challenges of high spatial and temporal land surface process monitoring.
To address this problem, several spatial and temporal data fusion approaches have been proposed to generate high spatial and temporal data by fusing coarse and medium spatial resolution data. Gao et al. [23] introduced the spatial and temporal adaptive reflectance fusion model (STARFM) for blending MODIS and Landsat imagery, and Roy et al. [24] used a semi-physical fusion approach to fuse multi-temporal MODIS-Landsat data. Several studies have applied the STARFM to urban environmental variable extraction, vegetated dry-land ecosystem monitoring, public health studies, and daily land surface temperature generation [25][26][27][28][29]. Hilker et al. [25] improved the STARFM for the spatial temporal adaptive algorithm for mapping reflectance change (STAARCH) for producing synthetic imagery and detecting changes. Zhu et al. [30] enhanced the STARFM for complex heterogeneous regions. Emelyanova et al. [31] assessed the accuracy of STARFM and ESTARFM in two landscapes with contrasting spatial and temporal dynamics. However, most of these models are based on the assumption that the change in reflectance r of each land cover class is linear, which is not accurate for land cover types such as vegetation [32].
In response, other scholars have proposed methods based on a linear mixed model that assumes that the reflectance of each coarse spatial resolution pixel is a linear combination of the responses of each land cover class contributing to the mixture [33][34][35]. However, owing to differences in environmental factors such as altitude, morphology, and soil type or management factors such as sowing date and fertilization, this assumption is not always valid. Zhukov et al. [36] and Maselli [37] addressed this problem by using the neighboring pixel information based on the assumption that spectral properties of a land cover class do not show great variations in the surrounding pixels. On the basis of their work [36,37], Busetto et al. [32] developed a new method in which the neighboring pixels are selected and weighted on the basis of their Euclidean distances from the target pixel; this method also considers the spectral similarity of the subcomponents with those of the targets. However, the reflectance disaggregated by these proposed methods is the mean reflectance of each land cover class in the identified subset s, which is still not equal to the real surface reflectance r of fine-resolution pixels. To solve this problem, Wu et al. [38] proposed a spatial and temporal data fusion approach (STDFA) based on the assumption that the temporal variation properties of each fine pixel in the same class are constant. They applied this method to the estimation of the high spatial and temporal resolution land surface temperature [39] and leaf area index [40]. They also validated that ESTARFM and STDFA can be applied to combine Huanjing (HJ) charge coupled device (CCD) and MODIS reflectance data together with Gaofen satellite no. 1 (GF-1) wide field of view camera (WFV) and MODIS reflectance data [41]. Gevaert and Garcí a-Haro [42] introduced an unmixing-based algorithm and compared it with STARFM. They recommended using unmixing-based data fusion for situations in which the spectral characteristics of the medium-resolution input images are downscaled.
However, STDFA has several limitations. The differences in sensor systems are not considered, and the window sizes used to select coarse pixels involved in the solution of the linear mixed models is fixed. The best window sizes for different land cover classes may vary according to the different spatial distribution of each land cover class; a constant window size may result in lower accuracy solution of the linear mixed models of some land cover classes. To address these limitations, the objectives of the present study are (1) to modify STDFA by introducing sensor difference correction and adaptive window size selection methods; (2) to test and analyze the applicability of the modified spatial and temporal data fusion approach (MSTDFA) in other data such as MODIS MOD13Q1 products and Landsat Normalized Different Vegetation Index (NDVI) data; and (3) to test and analyze the availability of Finer Resolution Observation and Monitoring of Global Land Cover (FROM-GLC) data in MSTDFA.

Method Inputs and Processing Steps
To address the weaknesses of STDFA in the estimation of daily synthetic Landsat imagery, MSTDFA is proposed in this paper. The input of this algorithm includes a Landsat image, land cover and time series MODIS reflectance data that were acquired on the same day as was the Landsat image, and at least one MODIS reflectance dataset acquired on the same day as was the Landsat image selected for simulation. The output of this algorithm is a time series of synthetic Landsat imageries in which the acquisition date can be considered the same as that of the time series MODIS data. The algorithm includes four steps: (1) best window size selection; (2) mean reflectance disaggregation; (3) sensor difference adjustment; and (4) calculation of each pixel's reflectance and outputting of the daily synthetic Landsat imagery. A flowchart of the algorithm is shown in Figure 1. The algorithm is run by one band. To generate a multispectral synthetic Landsat imagery, we need to apply this method for each band, respectively.

Selecting the Best Window Size
According to the unmixing theory, the reflectance of a coarse-resolution spatial pixel is assumed to be a linear combination of the responses of each land cover class contributing to the mixture [33]. The coarse spatial reflectance R (i, t) of the landscape thus consists of k discrete land cover class c weighted by their class fractional cover as Constrained:   f i r c t for each class ranges from 0 to 1. Disaggregated mean surface reflectance out of this constraint were not used to build a linear model between the disaggregated mean surface reflectance and actual TM mean surface reflectance for the adjustment of sensor difference. To reduce the influence of spatial variation and geolocation errors, the solution of Equation (1) was conducted in a rectangular window centered to the MODIS target pixel. Owing to the different spatial distribution of each land cover class, the best window size s for each land cover class may be different. To determine the best window size s for each land cover class, rectangles of lengths of 3-120 MODIS pixels for different classes were tested. Firstly, for window length l, a subset s of MODIS pixels centered to the MODIS target pixel mk, fractional cover data, and Landsat pixels was extracted. Then, the disaggregated mean surface reflectance value ( , , ) where m1, m2, and mn is the MODIS target pixel; n is the number of the MODIS pixel. Obviously, the best window length l for land cover class c will have the highest correlation coefficient. Thus, the window length l with the highest correlation coefficient R was set to the best window length for land cover class c.

Disaggregating Mean Reflectance
After the best window size for land cover class c is determined, the mean reflectance for land cover class c of target MODIS pixel mk can be calculated by solving Equation (1) using the ordinary least squares technique by inputting a subset s for window length l of the MODIS pixels centered to the MODIS target pixel i, fractional cover data, and Landsat pixels.

Adjusting Sensor Difference
Owing to sensor system differences in bandwidth, acquisition time, spectral response functions, geolocation errors, and atmospheric correction, there is a need to adjust the disaggregated mean reflectance ( , )

Calculating Pixels Reflectance and Method Outputs
Since the time series mean fine reflectance ( , ) f i r c t from time t1 to time tn was calculated, the time series reflectance of each fine resolution pixel can be determined using the SRCM model proposed by Wu et al. [38], which is described as where ( , )

Study Area
Two study areas located in Xinjiang Province, West China, were selected to test and validate this method ( Figure 2). The first is Bole County, Xinjiang Province, China, located in the valleys between Alatau and Gang Giger mountains, Boertala River. The area to the west of Bole is mountainous, whereas that to the east is plains. In the plains area, the main land use type is farmland, in which the crop plots are usually large. Therefore, the land cover types in this area are relatively homogeneous. The second study area is Luntai County, Xinjiang Province, China, located in southern Tianshan, northern Tarim Basin. The areas north of Luntai are hills, whereas the middle and the southern parts are plains. The crop plots in Luntai are very small; thus, the landscapes are heterogeneous.

Landsat Data and Pre-Processing
Three Landsat-5 TM datasets in Bole and Three Landsat-8 OLI datasets in Luntai were used in this study (Table 1). All data were acquired in clear sky conditions and were provided by the United States Geological Survey (USGS). The Landsat data used in the Bole study area were surface reflectance products, whereas those used in Luntai were Level L1T products. The three Landsat L1T products were atmospherically corrected by using the Fast Line-of-Sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) Atmospheric Correction Model in software ENVI 5.0. Then, the six Landsat datasets were georeferenced by using a second-order polynomial warping approach based on the selection of 43 ground control points (GCPs) using a 1:10,000 topographic map by the nearest neighbor resampling method with the position error within 0.74 Landsat pixels. Landsat images acquired on 11 July 2011 in Bole and on 6 October 2013 in Luntai were used as reference images for building a linear model between Landsat and MODIS mean reflectance and to calculate the reflectance of fine pixels from the mean reflectance. These images were also used for land cover mapping, which is explained in Section 3.2.3. The subsequent Landsat images were used to evaluate this algorithm.

MODIS Data and Pre-Processing
Six daily MODIS surface reflectance products (MOD09GA, 500 m) obtained in clear-sky conditions were used in this study (Table 1). Ideally, the MODIS image acquisition date should be the same as the acquisition date of Landsat data. However, the quality of MODIS data acquired on the same date as the Landsat data was not good in Bole and Luntai; therefore, these MODIS data were replaced with data of good quality acquired one day earlier or later than the Landsat data.
These six MODIS images were reprojected from the native Sinusoidal projection to a UTM-WGS84 reference system and were resized to the selected study area using MODIS Reprojection Tool (MRT) software. We also resized the spatial resolution from 500 m to 480 m with a nearest neighbor resampling method in MRT. All of these MODIS data were then georeferenced by a second-order polynomial warping approach based on the selection of 38 GCPs on 480 m Landsat images with a nearest neighbor resampling method in which the position error was within 0.63 MODIS pixels. The 480 m Landsat images were resized from georeferenced Landsat images by using the pixel aggregate resampling method.

Land Cover Data
Two types of land cover data were used in this study. The first was mapped by using the maximum likelihood classification method from the reference Landsat images with 1196 field survey data including 334 plots in Bole and 862 plots in Luntai. The field survey data in Bole and Luntai was obtained in 2011 and 2013, respectively. These land cover data were used in the generation of high spatial and temporal synthetic Landsat imageries. Currently, abundant global and regional information of land cover and use are provided, for example, by National Land Cover Database (NLCD) and FROM-GLC data. Usually these sources will be updated every five years. To test the applicability of these data in MSTDFA, the FROM-GLC data in Bole mapped by using Landsat-5 TM data acquired on 21 July 2009 was used in MSTDFA. The classification accuracy of the FROM-GLC data in Bole was evaluated by using a confusion matrix with regions of interest (ROIs) selected by using visual interpretation methods and field survey data. Table 2 shows the accuracy evaluation results. The overall accuracy and Kappa coefficient of the FROM-GLC data in Bole is 78.06% and 0.65, respectively.

Results of Landsat Mean Reflectance Regressing
A linear model between Landsat mean reflectance and disaggregated mean coarse reflectance at time t1 (Bole: 11 July 2011; Luntai: 6 October 2013) was built successfully by using linear regression analysis. Table 3 shows the best window size for each band and each land class. Table 4 shows that high-correlation coefficients R were acquired in these two study areas. This result demonstrates that the adaptive window size and moving steps selection method have the ability to select the best window size for the disaggregation of coarse pixels.

Results of Synthetic Landsat Image Generation
By using MSTDFA, four synthetic Landsat images were outputted that contained six bands including blue, green, red, near infrared (NIR), short-wave infrared 1 (SWIR1), and SWIR2. The acquisition date of these data can be considered the same as that for MODIS. Figure 3a shows the actual observation of MODIS surface reflectance on the NIR band acquired on 28 July 2011 in Bole and on 21 October 2013 in Luntai, and Figure 3b shows the synthetic surface reflectance imagery on the NIR band generated by MSTDFA in the two study areas. Figure 3c shows the actual observation of Landsat NIR band surface reflectance acquired on 27 July 2011 in Bole and on 22 October 2013 in Luntai. Through visual interpretation, we determined that the synthetic and actual Landsat data are highly similar and were unable to be distinguished with the unaided eye.

Accuracy Assessment
In this study, four actual observations of Landsat data were used to evaluate the accuracy. Closer similarity of the synthetic Landsat image to the actual image relates to the higher precision of the method. To quantitatively evaluate the similarity between the actual observations and synthetic images, correlation analysis was used to calculate the correlation between the synthetic Landsat imagery and the actual observation of Landsat data. Several indicators such as the coefficient (R), variance, mean absolute difference (MAD), bias, and RMSE were calculated. Table 5 shows the results of this analysis. As indicated in Table 5, MSTDFA can generate synthetic Landsat images with high similarity to the actual images. Most synthetic Landsat images had a high correlation with the actual Landsat imageries with a coefficient (R) higher than 0.90. Figure 4 shows the scatter plots between the actual and synthetic Landsat images, which were close to the 1:1 line. These results indicate a high similarity between the actual and synthetic Landsat data and that MSTDFA has high accuracy in generating synthetic Landsat images.    Table 6 and Figure 5 show the results of the conditional STDFA. It is evident that in the Luntai area, MSTDFA had better accuracy than the conditional STDFA in nearly all parameters. In the Bole area, MSTDFA had much better accuracy than the conditional STDFA in SWIR1 and SWIR2 bands. However, the precision of MSTDFA was slightly lower in the blue, green, and red bands. A comparison of Figures 4 and 5 revealed that in the Luntai area, the scatter diagram of MSTDFA was closer to the 1:1 line than that of STDFA. Two factors can explain these results. Firstly, MSTDFA has two important improvements over STDFA. In particular, the differences in sensor systems are considered in MSTDFA. In Figure 5, a high correlation is shown between the synthetic and actual Landsat images. However, a certain deviation appeared between the regression line and the 1:1 line that was caused mainly by differences in the sensor systems. Therefore, these deviations were eliminated in the scatter plots of MSTDFA. Secondly, many plots indicate that land cover type changed in the Bole area from 11 to 27 July 2011. For example, the northwest corner of the study area is Aibi Lake which is a huge shallow lake. In addition, Boertala River flows east from the west of the study area. As the water level changed, a lot of bare land changed into water. These changes in land cover type will reduce the precision of the model, and they had a more severe influence for MSTDFA. In STDFA, two days of Landsat images were used to detect the land cover change areas. The land cover change areas were classified as other class. However, only one Landsat image was used in MSTDFA and the change area cannot be detected. So, the land cover type change leads to more reflectance changes of blue, green, and red bands. Thus, MSTDFA performed worse than STDFA in these bands. Details of the influence of land cover change are described in Section 4.5. Table 6. Results of correlation analysis between synthetic and actual Landsat imageries by using the spatial and temporal data fusion approach (STDFA).

Improvement
Compared with traditional STDFA, MSTDFA has two improvements. Firstly, in the traditional STDFA, the best window size for the solution of Equation (1) was set to a fixed value of 40 coarse pixels. However, owing to the different spatial distribution of each land cover class, the best window size for each land cover class may be different. Thus, a fixed window size may not be appropriate for all classes. In MSTDFA, an adaptive window size and moving step length selection method was used to avoid this problem. This method tested every window size and moved the step length from minimum to maximum by using the exhaustive method, and the correlation coefficient between the reference Landsat mean reflectance and the MODIS disaggregated mean coarse reflectance of every test was calculated. The best window size and moving step length were designed to be those which can lead to the maximum correlation coefficient between the fine and coarse mean reflectance.
Secondly, the differences in sensor systems are not considered in traditional STDFA. In MSTDFA, the sensor differences were removed by using linear models between the Landsat mean reflectance and the disaggregated mean reflectance.
To show the improvements by the above steps, we tested those methods step by step in Luntai at NIR band. Firstly, the basic STDFA model with a fixed window size of 40 MODIS pixels and without sensor difference adjustment was used to generate a synthetic NIR image. Then the adaptive window size selection method and sensor difference adjustment were added step by step to generate a synthetic NIR image. Finally, the similarity between those synthetic NIR imageries and the actual Landsat NIR image were evaluated by correlation analysis. Table 7 shows the results of correlation analysis. From Table 7, we can see improvements of correlation coefficient R and declines of Variance, RMSE, MAD, and bias in each step. So, the traditional STDFA methods were enhanced by those improvements.

Landsat and MODIS Fusion Using FROM-GLC Data
To determine the applicability of the FROM-GLC data in MSTDFA, the FROM-GLC data in Bole mapped by using Landsat-5 TM data acquired on 21 July 2009 was used in MSTDFA. By inputting FROM-GLC data rather than classification data mapped using the maximum likelihood classification method, six synthetic Landsat images ware generated. Table 8 shows the results of accuracy assessment of these six synthetic Landsat images. As indicated in Table 8, most synthetic Landsat images had a high correlation with the actual Landsat imageries with coefficient R higher than 0.90. Figure 6 shows the scatter plots between the actual and synthetic Landsat images, which were close to the 1:1 line. These results indicate a high similarity between the actual and synthetic Landsat data and that MSTDFA has high accuracy in generating synthetic Landsat images. Therefore, the FROM-GLC data can be used in MSTDFA. Table 8. Results of correlation analysis between synthetic and actual Landsat imageries using the modified spatial and temporal data fusion approach (MSTDFA) with input of Finer Resolution Observation and Monitoring of Global Land Cover (FROM-GLC) data.

Influence of the Image Extents
The Ordinary Least Squares technique was used to disaggregate the time series mean surface reflectance value ( , ) di r c t for each class. However, the solutions will affect the number of pixels an image contains from two aspects. It is easier to generate outliers in the solution of Ordinary Least Squares in a small area. In addition, the images with different extents will produce different disaggregated mean reflectance. These effects were reduced by three steps in this method. Firstly, abnormal disaggregated mean surface reflectance value was not used. Then, all the normal disaggregated mean reflectance for every target MODIS pixels was used to adjust the sensor difference. Finally, the sensor adjusted time  Figure 7 shows that the correlation coefficient R has a logarithmic relationship with the size of applied area. The bigger of the study area, the higher the correlation coefficient R. So, we recommend using the MSTARFM model in a large area and using the MSTDFA model in areas greater than 200 × 200 MODIS pixels.

Figure 7.
Relationship between correlation coefficient R and study area size.

Comparison of Actual NDVI and NDVI Calculated Using Synthetic Data
This method was applied to MODIS and Landsat NDVI data in Bole to test its applicability with different data. The MODIS NDVI data included 250 m 16-day MOD13Q1 products acquired on 12 July 2011 and 28 July 2011. The Landsat NDVI data was calculated by using the red and NIR bands of Landsat data acquired on 11 July 2011. By inputting these NDVI data and the land cover data mapped by using the maximum likelihood classification method, a synthetic Landsat NDVI image was generated by using MSTDFA and conditional STDFA. Table 9 shows the accuracy assessment results of the two synthetic Landsat NDVI images. Both STDFA and MSTDFA can be used to fuse NDVI data, and the results of the latter were better than those of the former. Another method to generate synthetic NDVI data is to calculate NDVI using the synthetic red and NIR data generated by STDFA and MSTDFA. Wu et al. [41] compared the two methods in the generation of synthetic NDVI and leaf area index.

Limitations of the Method
Although MSTDFA can generate daily synthetic Landsat images with high similarity to actual Landsat images, this method has weaknesses.
Firstly, all the spatial and temporal data fusion methods are based on the assumption that the land cover classes do not change over time. Weng et al. [28] found that the accuracy of those models will be substantially reduced if this assumption is violated. As discussed in Section 4.2, many plots include land cover changes over time in the Bole area. For example, the northwest corner of the study area is Aibi Lake which is a huge shallow lake. In addition, Boertala River flows east from the west of the study area. As the water level changes, a lot of bare land changes into water. To evaluate the influence of land cover change, we extracted the land change plots by using the differences in Landsat images acquired on 11 July 2011 and 27 July 2011. The residual image between the synthetic and actual images was also calculated. Then, we calculated the correlation coefficient between these two data sets to obtain a high correlation coefficient R of 0.992. We also found that pixels with large errors were located in the plots in which the land cover types changed. Therefore, land cover class change has a very important influence in MSTDFA. To assess the influence, we masked these plots and calculated the accuracy assessment parameters again, as shown in Table 10. A comparison of Tables 5 and 10 revealed that the maximum fusion accuracy improvement was 0.32 in correlation coefficient R if the plots are not considered. Secondly, this method considered only the fusion of multi-sensor optical images. Therefore, it cannot provide effective data under cloudy conditions because all optical satellites are affected by clouds. In this situation, radar satellite must be considered as a solution [43][44][45][46][47][48].
Thirdly, a simple method in which solutions out of the range of 0%-100% were not used was employed to satisfy the constraints of the solutions of the linear mixed model. Optimization algorithms, such as the normalization algorithm which uses all solutions, minimizing the squared errors between the predicted response variable and observed data, can be used to improve this method [49]. Furthermore, nonlinear least squares regressions for spectral quantitative analysis is also an important direction to modify this method for future research [50].

Conclusions
In this study, an MSTDFA was developed and validated for two study areas located in Xinjiang, China. By inputting MODIS reflectance data, Landsat data, and land cover data, this method is able to generate daily synthetic Landsat images in which the spatial resolution is the same as that of the Landsat data and the temporal resolution is the same as that of the MODIS data. A comparison with the actual Landsat image revealed the following findings: (1) The adaptive window size and moving step selection method can select the best window size for disaggregation of coarse pixels. The disaggregated mean coarse reflectance had a strong linear relationship with the Landsat mean reflectance. (2) MSTDFA had higher accuracy than STDFA but was more easily influenced by land cover change.
Land cover data such as that of FROM-GLC can be used in MSTDFA. Synthetic Landsat images with high similarity to actual Landsat images with a correlation coefficient R of 0.96 can be generated. (3) Land cover class change had a very important influence in MSTDFA, which can lead to a reduction in the correlation coefficient R of 0.32 in the blue band. (4) MSTDFA can be applied in 250 m 16-day MODIS MOD13Q1 products and Landsat NDVI data.
A synthetic NDVI image with very high similarity to the actual NDVI observation with a high correlation coefficient R of 0.97 can be generated.
Technology Major Project of China, and the Yunnan Provincial Science and Technology Program (2010AD004). The funders had no role in choosing the study design, in the collection, analysis, and interpretation of the data, in the writing of the report, or in the decision to submit the article for publication.

Author Contributions
Mingquan Wu, Wenjiang Huang, Zheng Niu and Changyao Wang conceived and designed the experiments. Mingquan Wu performed the experiments and the manuscript draft. Wenjiang Huang, Zheng Niu and Changyao Wang revised the manuscript draft. All authors read and approved the final version.

Conflicts of Interest
The authors declare no conflict of interest.