Combining HJ CCD, GF-1 WFV and MODIS Data to Generate Daily High Spatial Resolution Synthetic Data for Environmental Process Monitoring

The limitations of satellite data acquisition mean that there is a lack of satellite data with high spatial and temporal resolutions for environmental process monitoring. In this study, we address this problem by applying the Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model (ESTARFM) and the Spatial and Temporal Data Fusion Approach (STDFA) to combine Huanjing satellite charge coupled device (HJ CCD), Gaofen satellite no. 1 wide field of view camera (GF-1 WFV) and Moderate Resolution Imaging Spectroradiometer (MODIS) data to generate daily high spatial resolution synthetic data for land surface process monitoring. Actual HJ CCD and GF-1 WFV data were used to evaluate the precision of the synthetic images using the correlation analysis method. Our method was tested and validated for two study areas in Xinjiang Province, China. The results show that both the ESTARFM and STDFA can be applied to combine HJ CCD and MODIS reflectance data, and GF-1 WFV and MODIS reflectance data, to generate synthetic HJ CCD data and synthetic GF-1 WFV data that closely match actual data with correlation coefficients (r) greater than 0.8989 and 0.8643, respectively. Synthetic red- and near infrared (NIR)-band data generated by ESTARFM are more suitable for the calculation of Normalized Different Vegetation Index (NDVI) than the data generated by STDFA.


Introduction
Coarse-resolution satellite data obtained, for example from the Advanced Very High Resolution Radiometer (AVHRR) [1], Systeme Pour l'Observation de la Terre (SPOT) Vegetation (VGT) [2], and from the Moderate Resolution Imaging Spectroradiometer (MODIS) [3], are widely used in areas such as land cover and land use mapping [4,5], crop mapping and yield forecasting [6,7], global change [8], vegetation trends and phenology estimations [9,10], disaster monitoring [11][12][13][14] and atmospheric environment [15][16][17] and water environment monitoring [18]. The return cycle of these satellites is one to two days, making them suitable for dynamic monitoring of land surface processes, particularly AVHRR data, which provides the longest time series among global satellite measurements [19]. However, the spatial resolutions of these data are lower than 250 m. When the size of land objects is smaller than the spatial resolution of images acquired by these sensors, the recorded signals are often a mixture of different land cover types. This makes these data difficult to apply to high spatial resolution surface process monitoring. Medium spatial resolution satellite data, such as data from Landsat and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), can also be used for dynamic monitoring of land surface processes. However, because of the long return cycles of these satellites (>16 days) and the influence of clouds, the rate at which these satellites can obtain useful data is very low [20], and it is difficult to use their data to monitor rapid changes in land surface processes. Therefore, these satellite data are often used only for annual dynamic analyses, including the spatiotemporal dynamic analysis of ecosystems such as wetlands [21,22], forests [23][24][25], water [26], crops [27] and cities [28], for monitoring plant phenology [29] and for land management [30,31]. There is a lack of satellite data with high enough spatial and temporal resolutions to monitor rapid changes in land surface processes.
A solution to this problem is to combine coarse and medium spatial resolution satellite data to generate synthetic satellite data with high spatial and temporal resolutions. This method is called spatial and temporal data fusion technology and several such approaches have recently been proposed. Gao et al. [32] introduced the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) for blending MODIS and Landsat imagery. Several studies have applied STARFM, mainly in coniferous areas for urban environmental variable extraction, vegetated dry-land ecosystem monitoring, public health studies and daily land surface temperature generation [33][34][35][36][37]. Zhu et al. [38] enhanced STARFM for complex heterogeneous regions. Emelyanova et al. [39] assessed the accuracy of STARFM and ESTARFM (Enhanced STARFM) for two landscapes with contrasting spatial and temporal dynamics. Jarihani et al. [40] evaluated the accuracy of STARFM and ESTARFM to downscale MODIS indices to match the spatial resolution of Landsat. Scholars have also proposed methods based on linear mixed models [41][42][43]. Wu et al. [44] proposed a Spatial and Temporal Data Fusion Approach (STDFA) to calculate the real surface reflectance of fine-resolution pixels from the mean reflectance of each land cover class, disaggregated using unmixing methods. They also applied this method to the estimation of high spatial and temporal resolution Leaf Area Index [45] and land surface temperature [46] data.
Gevaert and García-Haro [47] compared STARFM and an unmixing-based algorithm. STARFM and ESTARFM are more suitable for complex heterogeneous regions, while unmixing methods such as STDFA are more suitable for cases that downscale the spectral characteristics of medium-resolution input imagery [47].
The proposed spatial and temporal data fusion approaches mainly focus on the fusion of Landsat and MODIS data. However, with the recent launch of new satellites, there is a need to validate these methods for the new sensors. In recent years, China has launched two moderate-resolution satellites, Huanjing satellite (HJ), and Gaofen satellite no. 1 (GF-1). Wei et al. [48] compared the data quality of the HJ charge coupled device (CCD) and the Landsat Thematic Mapper (TM) sensor data. They found that the radiation accuracy, clarity and signal-to-noise ratio (SNR) of the HJ CCD data were lower than for the Landsat TM data. This issue is particularly important for Chinese satellite data because of the poor data quality.
To address this problem, the objectives of the present study are: (1) to validate the applicability of ESTARFM and STDFA to HJ and GF satellite data and (2) to analyse the influence of MODIS data of different spatial resolutions on the application of ESTARFM and STDFA.

Study Area
Two counties located in Xinjiang Province, China, were selected as the study areas ( Figure 1). The first county, Kuche, Aksu City, Xinjiang Province, China, is located at 40°46′-42°35′ N, 82°35′-84°17′ E. The second county, Luntai, Bayinguoleng Mongolian Autonomous Prefecture, Xinjiang Province, China, is located at 41°05′-42°32′ N, 83°38′-85°25′ E. Both counties are located in the south of the middle Tianshan Mountain and the north of Tarim Basin and belong to a warm temperate zone, with a continental arid climate. The north of both counties is mountainous, while the central and southern regions consist of plains. The main landforms of both counties are plains, which are mainly devoted to agriculture, cities and deserts.

Data and Pre-Processing
Three HJ CCD datasets, six GF-1 wide field of view camera (WFV) datasets and six MODIS surface reflectance datasets were used in this study ( Table 1). All of these data were acquired during clear sky conditions and are of good quality. The HJ satellite constellation consists of two satellites (HJ-1-A and HJ-1-B), and was launched on 6 September 2008. HJ-1-A is equipped with a CCD camera and a hyperspectral imager (HSI). HJ-1-B is equipped with a CCD camera and an infrared camera (IRS). The HJ satellite constellation can image the entire Earth at two-day intervals. Table 2 lists the sensors' parameters for the HJ satellite constellation.
Three HJ CCD datasets were downloaded from the China Centre for Resources Satellite Data and Application (http://www.cresda.com/n16/index.html). These data were atmospherically corrected using the Fast Line-of-Sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) Atmospheric Correction Model in the ENVI 5.0 software. The three HJ CCD datasets were then georeferenced using a second-order polynomial warping approach based on the selection of 45 ground control points (GCPs) from a 1:10,000 topographic map by the nearest neighbour resampling method, with a position error of within 0.67 HJ CCD pixels. The GF-1 satellite is a Chinese sun-synchronous, high-resolution satellite. It was launched on 26 April 2013. The GF-1 satellite is equipped with two types of sensors: two panchromatic multispectral sensors (MPS) and four WFV cameras. It can acquire panchromatic images with a resolution of 2 m, and multispectral images at a resolution of 8 m or 16 m. Table 3 lists the sensors' parameters for the GF-1 satellite. Table 3. Parameters for the GF-1 satellite sensors. Six GF-1 WFV datasets, provided by the GF Satellite Application Technology Centre, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, were used in this study (Table 1). These data were orthorectified with ENVI 5.0 software, using rational polynomial coefficients (RPC) with a Digital Elevation Model (DEM) and 36 GCPs selected from a 1:10,000 topographic map. The data were then atmospherically corrected using the FLAASH Atmospheric Correction Model with the ENVI 5.0 software provided by Esri China Information Technology Co., Ltd., Beijing, China.

MODIS Data
Three 500 m resolution daily MODIS surface reflectance products (MOD09GA) and three 250 m resolution daily MODIS surface reflectance products (MOD09GQ) were used in this study (Table 1). These MODIS images were reprojected from the native sinusoidal projection onto the UTM-WGS84 reference system, and were resized to the selected study area using MODIS Reprojection Tool (MRT) software. These MODIS data were then georeferenced using a second-order polynomial warping approach based on the selection of 28 or 32 GCPs from 500 m or 250 m GF-1 WFV images, respectively, with a nearest neighbour resampling method that has a position error of within 0.53 and 0.67 GF-1 WFV pixels, respectively. The 500 m and 250 m GF-1 WFV images were resized from georeferenced GF-1 WFV images using the pixel aggregate resampling method.

ESTARFM
ESTARFM was proposed to improve the STARFM algorithm for the accurate prediction of surface reflectance in heterogeneous landscapes using the observed reflectance trend between two points in time and spectral unmixing theory [38]. According to the linear mixture model, the changes in coarse-resolution reflectance from t0 to tk can be expressed as: where ( ) and hk is the rate of change of the k-th endmember, which was assumed to be constant from time t0 to tk. Then, Equations (1) and (2) can be rewritten as: Substituting Equation (3) into Equation (4), the ratio,vk, of the change in reflectance for the k-th endmember to the change in reflectance for a coarse pixel can be described as: Equation (5) can be rewritten as: where (x,y) is the position of the target pixels. By introducing additional information from the neighbouringpixels to reduce the influence of land cover changes, surface heterogeneity and solar geometrybi-directional reflectance distribution function (BRDF) changes, a weighted ESTARFM can be determined as Equation (7): where w is the size of the search window; ijk W is the weight determined by the spectral difference ijk S and the temporal difference ijk T between the fine-and low-resolution data, and by the location distance ijk D between the target pixel and the candidate pixel; k is the number of pixels (xi,yj) in window w.
These parameters are calculated as follows:

STDFA
STDFA is based on a linear mixing model that assumes the reflectance of each coarse spatial resolution pixel is a linear combination of the responses of each land cover class contributing to the mixture [49]: is the mean reflectance of land type i at time t and ) , , is the residual error term. By inputting the fraction data ) , , ( i y x f i extracted from the land cover map, the mean reflectance for land cover class i can be calculated by solving Equation (12) using the ordinary least squares technique. Then, based on the assumption that the temporal variation properties of each fine-resolution pixel in the same class are constant, the STDFA predicts synthetic high-resolution spatial imagery as follows:

Model Application
According to Equation (7), three images are needed for ESTARFM: fine-resolution data acquired at time t0, called the base image, and two low-resolution datasets acquired at times t0 and tk, called the time series low-resolution data. Two pairs of fine-resolution and low-resolution datasets acquired at time t0 and time tl are also required to calculate the spectral similarity index. According to Equation (12), three images are needed for STDFA: a fine-resolution dataset acquired at time t0, called the base image, and two low-resolution datasets acquired at timest0 and tk, called the time series low-resolution data. Two fine-resolution datasets acquired at time t0 and time tl are also required for classification. The outputs of ESTARFM and STDFA are synthetic fine-resolution data acquired at time tk. In this study, HJ-1 CCD and GF-1 WFV data acquired on 3 October 2013 were used as the base image.

Validation of Results
Since the objective of the ESTARFM and STDFA methods was to generate synthetic fine-resolution data, actual HJ-1 CCD and GF-1 WFV data acquired on 7 October 2013 were used to validate the algorithm using the methods proposed by Wu et al. [44]. First, the results were qualitatively evaluated using visual interpretation. The greater the similarity between the synthetic and actual fine-resolution data, the higher the accuracy of the model. Second, the results were quantitatively evaluated using the correlation analysis method. Parameters, such as the correlation coefficient (r), variance (Var), mean absolute difference (MAD), bias and root mean square error (RMSE), were calculated to quantitatively evaluate the precision of these models. A higher value of r and lower variance, MAD, bias and RMSE indicate higher accuracy.

Accuracy Comparison for the Fusion Results Using 250 m and 500 m MODIS Data
There are two reflectance MODIS data products: MOD09GQ and MOD09GA. Both products can be used in spatial and temporal data fusion, but the question remains as to whether the spatial resolution differences between these two products have an impact on the fusion accuracy. To answer this question, we applied STDFA and ESTARFM using MOD09GQ data in Kuche and Luntai, and compared the fusion results with results obtained using MOD09GA data. The correlation analysis method was used to evaluate the similarity of the actual fine-resolution data and the synthetic fine-resolution data generated by inputting MOD09GQ and MOD09GA. By comparing the r, variance, MAD, bias and RMSE we can analyse the influence of spatial resolution differences for these two data products.

Comparison of Actual NDVI and NDVI Calculated Using Synthetic Data
The synthetic red-and near infrared (NIR)-band data were generated using STDFA and ESTARFM in Kuche and Luntai, allowing Normalized Different Vegetation Index (NDVI) images in these two study areas to be calculated. The correlation analysis method was then used to evaluate the similarity of the NDVI image calculated from synthetic fine-resolution data and the NDVI image calculated from actual fine-resolution data.

HJ and MODIS Fusion Results
By inputting a base HJ CCD image, two days of MOD09GA data and two days of multi-spectral HJ CCD images, a synthetic multi-spectral HJ CCD image was generated by STDFA. By inputting a base HJ CCD image, two days of multi-spectral MOD09GA data and two days of multi-spectral MODIS data and HJ CCD datasets, a synthetic multi-spectral HJ CCD image was also generated by ESTARFM. These synthetic multi-spectral HJ CCD images contained four bands, including blue, green, red and NIR. Figure 2a shows the actual observed MODIS surface reflectance in the red band acquired on 7 October 2013 in Kuche and Luntai. Figure 2b,c show the synthetic surface reflectance imagery in the red band generated by STDFA and ESTARFM, respectively, for the two study areas. Figure 2d shows the actual observed HJ CCD red-band surface reflectance acquired on 7 October 2013 in Kuche and Luntai. The actual high spatial resolution data acquired on 7 October 2013 was used to evaluate the accuracy of these models. Through visual interpretation, we found that the synthetic and actual HJ CCD data are very similar, and were hard to distinguish between with the naked eye. Table 4 shows the results of the quantitative evaluation using the correlation analysis method. From Table 4, we can see that both STDFA and ESTARFM can generate synthetic HJ CCD data that are very similar to actual HJ CCD data, with r values greater than 0.8989. STDFA showed a slightly better performance than ESTARFM for Luntai and Kuche.

GF and MODIS Fusion Results
By inputting a base GF-1 WFV image, two days of MOD09GA data and two days of multi-spectral GF-1 WFV images, a synthetic multi-spectral GF-1 WFV image was generated by STDFA. By inputting a base GF-1 WFV image, two days of multi-spectral MOD09GA data and two days of multi-spectral MODIS data and GF-1 WFV datasets, a synthetic multi-spectral GF-1 WFV image was also generated by ESTARFM. These synthetic multi-spectral GF-1 WFV images contained four bands, including blue, green, red and NIR. Figure 3a shows the actual observed MODIS surface reflectance in the red band acquired on 7 October 2013 in Kuche and Luntai. Figure 3b, c show the synthetic GF-1 WFV surface reflectance imagery in the red band generated by STDFA and ESTARFM, respectively, for the two study areas. Figure 3d shows the actual observed GF-1 WFV red-band surface reflectance acquired on 7 October 2013 in Kuche and Luntai.
Through visual interpretation, we found that the synthetic and actual GF-1 WFV data are very similar and were not distinguishable with the naked eye. Table 5 shows the results of the quantitative evaluation using the correlation analysis method. From Table 5, we also find that both STDFA and ESTARFM can generate synthetic GF-1 WFV data that are very similar to actual GF-1 WFV data, with r values greater than 0.8643. ESTARFM performed slightly better than STDFA in Kuche, while the accuracies of these two methods were very similar in Luntai.

Accuracy Comparison for the Fusion Results Using 250 m and 500 m MODIS Data
To analyse the influence of spatial resolution differences, we applied STDFA and ESTARFM using MOD09GQ data in Kuche and Luntai, and compared the fusion results with the results obtained using MOD09GA data. Table 6 shows the results of the correlation analysis between synthetic data generated using MOD09GQ data and actual data. Comparing Table 6, Table 4 and Table 5, we find that the spatial resolution differences had more impact on the fusion accuracy for ESTARFM than for STDFA in these two study areas. This is mainly because the MODIS reflectance was directly used to calculate the fine reflectance in ESTARFM, while it was used to calculate the mean reflectance of each land cover type in STDFA. Table 6. Results of the correlation analysis between synthetic data generated using MOD09GA data and actual data.

Comparison of Actual NDVI and NDVI Calculated Using Synthetic Data
NDVI images calculated using actual data were used to evaluate the quality of the synthetic NDVI images. Table 7 shows the results of the correlation analysis between synthetic and actual NDVI data. From Table 7, we find that the NDVI data calculated using synthetic data generated by ESTARFM were more similar to actual NDVI than the NDVI data calculated using synthetic data generated by STDFA.

Discussion
This study applied and demonstrated the use of ESTARFM and STDFA in combining HJ CCD, GF-1 WFV and MODIS data to generate high spatial resolution data. Although the data quality of HJ CCD data is lower than Landsat data, both the ESTARFM and STDFA algorithms can generate daily synthetic high spatial resolution data accurately, with r values higher than 0.8643 and RMSEs lower than 0.0360. As the MODIS sensor can acquire daily images, ESTARFM and STDFA can be used to enhance the proportion of useful HJ CCD and GF-1 WFV data. For example, the proportion of useful MODIS data in Luntai for 2013 is 48.49%, while the proportions of useful HJ CCD data and useful GF-1 WFV data are 15.07% and 10.29%, respectively. Owing to the high proportion of useful data, this method is potentially useful for high spatial resolution environmental process monitoring, and can be applied in natural resource damage assessments and environmental policy and management. However, there are some issues should be addressed in the application of this method: (1) Influence of the satellite data quality. The test areas for the HJ-MODIS fusion and the GF-MODIS fusion in Kuche were the same, while the test area for the GF-MODIS fusion was a little larger than the area for the HJ-MODIS fusion in Luntai. Therefore, we can only analyse the influence of the satellite data quality on the data fusion for the Kuche area. As an early Chinese moderate-resolution satellite, the quality of HJ satellite data is lower than the quality of GF-1 satellite data. The position error of HJ CCD data is greater than 1 km, while the position error of GF-1 WFV data is less than 100 m. Furthermore, the SNR of HJ CCD data is much lower than the SNR of GF-1 WFV data. Table 6 lists the differences between Table 5 and Table 4, showing the influence of the sensor differences on the model accuracy. From Table 8, we can see that the influence of the sensor differences for STDFA is less significant than for ESTARFM. This is because STDFA had better noise immunity. Similar results were found for the fusion of ASTER and MODIS land surface temperature products [46]. (2) Influence of bidirectional reflectance distribution function (BRDF) changes. For different data acquisition dates, the solar and satellite azimuth and zenith angles are different (Table 9). This leads to changes in the BRDF and hence in the information received by the sensor. More seriously, this could lead to differences in the shadow direction and length for the same object on different dates. Figure 4 shows an example of the difference in the shadow direction and length for trees on 3 October 2013, 7 October 2013 and 15 October 2013 recorded by the GF-1 WFV sensor. In the shaded area, the differences of reflectance in the NIR band between actual and synthetic GF-1 WFV data can reach 0.0676 for STDFA and 0.0638 for ESTARFM. However, in the unshaded area, the differences of reflectance in the NIR band between actual and synthetic GF-1 WFV data were only 0.0078 for STDFA and 0.0153 for ESTARFM. Therefore, we conclude that further study is required regarding this problem. Figure4. An example of the difference in shadow direction and length.
(3) The ESTARFM and STDFA algorithms were proposed to enhance the temporal resolution of high spatial resolution data. High spatial and temporal remote sensing data can be generated using these methods. However, spectral fusion was not considered for these methods. Therefore, these methods can only be used for spatial and temporal fusion, and we recommend using the spectral fusion method in the future to improve the spectral resolution of sensors [50]. In addition, it has been proposed that these methods could be used to fuse optical images that are easily affected by cloudy weather. Although these methods can improve the proportion of useful data, a significant proportion of the data will be polluted by cloud. The development of an optical and radar data fusion algorithm is therefore an important direction for multi-source remote sensing data studies.

Conclusions
In two study areas located in Xinjiang Province, China, the ESTARFM and STDFA methods were applied to combine HJ CCD, GF-1 WFV and MODIS reflectance data. We found the following conclusions: (1) Both ESTARFM and STDFA methods can be applied to combine HJ CCD and MODIS reflectance data, and GF-1 WFV and MODIS reflectance data. Synthetic HJ CCD data are very similar to actual HJ CCD data, with r values greater than 0.8989 generated using these two methods. Synthetic GF-1 WFV data are very similar to actual GF-1 WFV data, generating r values greater than 0.8643. (2) The accuracy differences between the fusion of HJ CCD and MODIS data and the fusion of GF-1 WFV and MODIS data were lower for STDFA than for ESTARFM. STDFA had better noise immunity than ESTARFM. (3) The spatial resolution differences between MOD09GQ and MOD09GA had a more significant impact on the fusion accuracy for ESTARFM than for STDFA. (4) Synthetic red-and NIR-band data generated by ESTARFM are more suitable for the calculation of NDVI than the data generated by STDFA.