A Modified Spatiotemporal Fusion Algorithm Using Phenological Information for Predicting Reflectance of Paddy Rice in Southern China

Satellite data for studying surface dynamics in heterogeneous landscapes are missing due to frequent cloud contamination, low temporal resolution, and technological difficulties in developing satellites. A modified spatiotemporal fusion algorithm for predicting the reflectance of paddy rice is presented in this paper. The algorithm uses phenological information extracted from a moderate-resolution imaging spectroradiometer enhanced vegetation index time series to improve the enhanced spatial and temporal adaptive reflectance fusion model (ESTARFM). The algorithm is tested with satellite data on Yueyang City, China. The main contribution of the modified algorithm is the selection of similar neighborhood pixels by using phenological information to improve accuracy. Results show that the modified algorithm performs better than ESTARFM in visual inspection and quantitative metrics, especially for paddy rice. This modified algorithm provides not only new ideas for the improvement of spatiotemporal data fusion method, but also technical support for the generation of remote sensing data with high spatial and temporal resolution.


Introduction
With the rapid progress of remote sensing techniques, the need for remotely sensed images with high temporal, spatial, and spectral resolution has increased [1][2][3].In particular, remotely sensed images with high spatial resolution and frequent coverage are needed in the monitoring of global biophysical processes, which change quickly during growing season.However, satellite data with high temporal and spatial resolutions are lacking due to the cost, long revisit cycles [4], frequent cloud contamination [5], capacity of satellite platforms [6], and technological difficulties in developing satellites.This situation presents significant disadvantages and challenges in observing and monitoring ground status in a timely and effective manner [7].Moreover, detecting the condition of rice growth in the southern area of China is difficult because of missing data caused by frequent cloudiness and precipitation.Meanwhile, the growth period of rice is short, and rice changes rapidly in its growing period.Hence, the shortage of remote sensing data on the main growth periods of rice hinders the development of research on rice, such as research involving rice condition monitoring and crop yield assessment.
The Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+), and Operational Land Imager (OLI) sensors aboard the Landsat satellites and the moderate-resolution imaging spectroradiometer (MODIS) sensor aboard the Terra and Aqua satellites provide two types of widely used satellite data [8].Landsat is suitable for long-term surface change monitoring at the regional scale due to its spatial resolution of 30 m.The most commonly used satellite sensor for mapping biophysical vegetation parameters and land cover types is Landsat [9].However, the 16-day revisit cycle of Landsat and the 35% average cloud cover of images [10] have long limited the use of Landsat in studying global biophysical processes that evolve rapidly during the growing season [11].By contrast, MODIS has a high temporal resolution and covers the Earth multiple times a day.However, it has spatial resolutions of 250, 500, and 1000 m [12], which limit the sensors' capability to quantify biophysical processes in heterogeneous landscapes.By combining Landsat and MODIS data, we can capitalize on the spatial detail of Landsat and the temporal regularity of MODIS acquisition [13].
Satellite image fusion is an effective, feasible, and inexpensive means to solve this problem.Satellite image fusion is the synergistic blending of multiple colocated images that possess distinct yet complementary attributes, and the goal is to produce a result that mitigates or transcends the individual limitations of each contributing dataset.Typical fusion merges low-temporal/high-spatial resolution (fine-resolution) data with high-temporal/low-spatial resolution (coarse-resolution) data (e.g., Landsat and MODIS) to increase the availability of detailed imagery.The most widely used fusion algorithm is the spatial and temporal adaptive reflectance fusion model (STARFM) [14] developed by Gao et al. [11].The algorithm has achieved success in monitoring seasonal changes in vegetation [15].However, this algorithm does not explicitly address the directional dependence of reflectance as a function of the sun-target-sensor geometry described by the bidirectional reflectance distribution function (BRDF) [10] and the mixed pixel problem.For the BRDF problem, Roy et al. [10] proposed a semiphysical fusion approach that uses the MODIS BRDF/albedo land surface product and Landsat ETM+ data to predict ETM+ reflectance on the same, antecedent, or subsequent data [4] to solve the limitation.To address the shortcomings of STARFM, Zhu [4] developed the enhanced spatial and temporal adaptive reflectance fusion model (ESTARFM), which adds a pair of satellite data to deduce the trend of the change in land on the basis of STARFM.Studies [4] have shown that ESTARFM improves the accuracy of the predicted fine-resolution reflectance, especially for heterogeneous landscapes, and preserves spatial details.The input data of ESTARFM limit the use of the algorithm.Consequently, flexible spatiotemporal data fusion (FSDAF) that requires minimal input data and can capture gradual land cover type change was proposed [16].Furthermore, Gevaert (Gevaert and García-Haro 2015) introduced the spatial and temporal reflectance unmixing model (STRUM), which can produce an accurate reconstruction of the normalized difference vegetation index (NDVI) trajectory with the unmixing-based method via experiments that simulate situations wherein few input high-resolution images are available.However, current STARFM-based models can only capture temporal changes caused by phenology and cannot capture changes in land cover type [4,17].Hilker et al. [15] developed the spatial-temporal adaptive algorithm for mapping reflectance change (STAARCH) model.The STAARCH model employs Landsat images taken at the start and end dates of an observation period as inputs to predict Landsat surface reflectance.By selecting the optimal Landsat acquisition, this method can determine spatial changes from Landsat and temporal changes from MODIS.Afterward, Huang proposed the unmixing-based spatiotemporal reflectance fusion model (U-STFM) to estimate the reflectance change trend without reference to the change type or land cover change.This model was proven to be more capable of capturing both types of change compared with STARFM and ESTARFM [18].FSDAF [16] also closely captures reflectance changes caused by land cover conversions.
Nevertheless, all these methods are linear models that do not consider reflectance of each endmember nonlinear change.The prior assumptions of current STARFM-based models is that the reflectance change rate of each endmember is stable [4].This assumption is reasonable for short periods, but it might cause larger errors in some time, such as phenological change of vegetation.Paddy rice in the same region might be in different phenological periods due to different planting/harvest schedules and local environmental conditions (soil water content, nutrition, and health) in southern China [12].The spatiotemporal fusion method must be made highly sophisticated to suit cases where vegetation phenology changes between the base time and the prediction time.This study proposes a nonlinear algorithm based on ESTARFM that considers phenological changes through the vegetation index curve for the fusion of Landsat-8 OLI and MODIS images.We divide the rice growth process into four different phenology periods and assume that the growth rate of rice in the same phenology period is stable.We reclassify similar pixels (neighboring pixels with the same land cover type as the central pixel are called ''similar" pixels) of rice on the basis of this theory to ensure that pixels that participate in fusion calculation (similar pixels) appear reasonable and resemble the central pixel.The new algorithm consists of four stages, including construction of enhanced vegetation index (EVI) time series, extraction of the rice phenology period, creation of new rules of searching for similar neighborhood pixels, and fusion with the modified algorithm using phenological information.This method is a promising choice for cases where vegetation phenology changes occur from the time of the available image pair to prediction and is helpful in monitoring the dynamics of surface vegetation in situations where data are missing.

Proposed Methodology
The ultimate task of the spatiotemporal data fusion method is to structure an image at prediction time by using two sets remote sensing data with different resolutions at base time.The two different sets remote sensing data should be similar in wavelength range, response function, and time of passing territory [19].Landsat and MODIS data have good consistency.ESTARFM is the most popular and widely accepted algorithm in remote sensing.The advantage of the time series of MODIS vegetation indices can effectively cover the shortage of the information missing of Landsat data [20].Hence, our study proposes a modified algorithm using phenological information extracted from time series data based on ESTARFM to fuse Landsat and MODIS data.We first extract rice in the study area from a Landsat image and structure the EVI time series in the study area through annual MODIS eight-day composite reference data.Second, we extract and smooth the EVI curve of paddy rice and divide it into four main phenology periods.Third, we change the rule of selecting similar neighborhood pixels by using the phenological information extracted from the EVI curve.Finally, we structure the image of rice in the key growth period with MODIS and Landsat-8 OLI data by using the new spatiotemporal algorithm that is modified by the phenological information of paddy rice.The flow chart of this algorithm is shown in Figure 1.

Essential Theories of ESTARFM
ESTARFM, which is based on STARFM, considers the spectral and spatial similarity between pixels and uses two pairs of base time images and one coarse-resolution image to construct an image with high spatial-temporal resolution at prediction time.It searches for similar pixels with the center pixel in a moving window within a fine-resolution image.Then, it calculates the weight of similar pixels and the conversion coefficient of each similar pixel in a search window by using spatial and spectral similarity.The predicted reflectance of the center pixel can be estimated as follows: where w is the width of the search window, t 0 is the base date, t p is the prediction time, N is the number of similar pixels, (x w/2 , y w/2 ) is the location of the central pixel, (x i , y i ) is the location of the ith similar pixel, W i is the weight of the ith similar pixel, and V i is the conversion coefficient of the ith similar pixel.F(x w/2, y w/2 , t p , B) is the reflectance of the central pixel of band B at prediction time of fine resolution, F(x w/2 , y w/2 , t 0 , B) is the reflectance of the central pixel of band B at base time of fine resolution, C(x i , y i , t p , B) is the reflectance of the similar pixel of band B at prediction time of coarse resolution, and C(x i , y i , t 0 , B) is the reflectance of the similar pixel of band B at base time of coarse resolution.
W i , which decides the contribution of the ith similar pixel to the prediction of reflectance change at the central pixel, is expressed as follows: where d i is the geographic distance between the ith similar pixel and the central pixel, R i is the correlation coefficient between each similar pixel and its correlation coarse-resolution pixel that determines the spectral similarity, and D i is a synthetic index that combines spectral and geographic distances.
Remote Sens. 2018, 10, x FOR PEER REVIEW 4 of 18 Wi, which decides the contribution of the ith similar pixel to the prediction of reflectance change at the central pixel, is expressed as follows: where di is the geographic distance between the ith similar pixel and the central pixel, Ri is the correlation coefficient between each similar pixel and its correlation coarse-resolution pixel that determines the spectral similarity, and Di is a synthetic index that combines spectral and geographic distances.To acquire accurate predicted reflectance, ESTARFM uses data at two different times that contribute to different temporal weights for predicting reflectance.Hence, the final predicted fineresolution reflectance at predicted time tp is calculated as where Tk (k = m or n) is the temporal weight between time tk and prediction time tp.
The main theoretical basis of ESTARFM is to maximize the utilization of image correlation and minimize the system error.ESTARFM improves the accuracy of predicted fine-resolution reflectance, especially for heterogeneous landscapes, and preserves spatial details [4].To acquire accurate predicted reflectance, ESTARFM uses data at two different times that contribute to different temporal weights for predicting reflectance.Hence, the final predicted fine-resolution reflectance at predicted time t p is calculated as where T k (k = m or n) is the temporal weight between time t k and prediction time t p .
The main theoretical basis of ESTARFM is to maximize the utilization of image correlation and minimize the system error.ESTARFM improves the accuracy of predicted fine-resolution reflectance, especially for heterogeneous landscapes, and preserves spatial details [4].

The Modified Spatiotemporal Algorithm Using Phenological Information for Predicting the Reflectance of Paddy Rice
A prior assumption in ESTARFM is that the proportion of each endmember and the reflectance change rate of each endmember are stable [4].This assumption is reasonable during relatively short periods in which the proportion of each endmember and the change rate of the reflectance of each endmember are stable.However, reflectance change might not be linear in several situations, such as phenological change of vegetation.This assumption might not be reasonable between base time and prediction time because paddy rice has a short growth cycle and quick changes occur in the south area of China.In this study, we divide the paddy rice growth process into different phenologyperiods and fuse these phenological information to reduce errors and improve algorithm accuracy.

Construction of the EVI Time Series of Coarse-Resolution Images
Phenology can be estimated from the change patterns of spectral vegetation indices captured by satellites (Xiao et al. 2006).Studies have shown that the annual time series of MODIS can be used to monitor vegetation phenology successfully [21,22].We take annual MODIS images (46 tiles) and land classification of the study area to obtain the EVI time series of paddy rice.The calculation formula of EVI is EVI is an improved vegetation index that considers the effects of residual atmospheric contamination and soil background.It is much less sensitive to aerosols than NDVI and does not easily reach saturation in regions covered with thick vegetation [23,24].MODIS EVI can map crop phenology stages in highly intense agricultural areas [25].Therefore, the change in vegetation growth can be objectively reflected by EVI.In summary, employing EVI for time series construction of rice is appropriate.
Although MODIS eight-day composite surface reflectance products are strictly preprocessed to reduce the impact of clouds, shadows, and aerosols, obvious residual noises still exist in regions with lasting heavy clouds, and this exerts a serious impact on information extraction [26].The EVI time series curve needs to be smoothed for noise removal.The TIMESAT [27] program (http://www.natgeo.lu.se/personal/Lars.Eklundh/TIMESAT/timesat.html) can effectively smooth time series data and reduce noise.It reveals the periodical change rule of phenology on the basis of maintaining the original shape of time series data.We select the Savitzky-Golay filter [28] to smooth the EVI time series in the TIMESAT program.The Savitzky-Golay method essentially performs local polynomial regression in the time series to determine the smoothed value for each point, and the main characteristic of this filter is that it can ensure that the shape and width of the signal are invariable during noise filtering.This method is suitable for smoothing EVI time series to determine the rule of vegetation growth.We then use linear interpolation to generate the annual EVI time series of paddy rice.

Extraction of Paddy Rice Phenology Period from EVI Time Series
In this part, we divide paddy rice growing period into four main phenology periods, which are the tillering, jointing, heading, and milk-ripe periods, respectively.We apply the vegetation index dynamic threshold method to define the start and the end of the growing season because it eliminates the influence of the background value and overcomes the inherent defect of fixed threshold.We use 20% over the minimum value of EVI to define the start and end of the growing season.The tillering and jointing periods are the beginning of the development of rice root and leaf systems, since then the EVI increases rapidly.The vegetative development of rice reaches the maximum till the heading period, and then rice changes its growth phase from vegetative growth to reproductive growth.The nutrients are transferred to seeds, and EVI decreases gradually.In the maturation period, the leaves lose most of the chlorophyll and begin to wither rapidly.We select the maximum/minimum slope method [29] that judges the growth stages by the magnitude of variation in the time series vegetation index to estimate the beginning of the jointing period and the end of the heading period.The maximum point of time series EVI determines the start of the rice heading period.
The process for the identification of the main growth stages of paddy rice using MODIS data is shown in Figure 2.
Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 18 index to estimate the beginning of the jointing period and the end of the heading period.The maximum point of time series EVI determines the start of the rice heading period.
The process for the identification of the main growth stages of paddy rice using MODIS data is shown in Figure 2.

Creation of a New Rule of Searching for Similar Neighborhood Pixels
The pixels within the window with the same land cover type as the central pixel ("similar pixels") provide specific temporal and spatial information to compute fine-resolution reflectance for the central pixel [4].To reduce the adverse impact of misclassification, we select the threshold method to select similar pixels within a local search window.The threshold is determined by the standard deviation of a population of pixels from the fine-resolution image and the estimated number of land cover classes of the image [11].When all of the bands for the ith neighbor pixel satisfy Function (7), the ith neighbor pixel is selected as a similar pixel in ESTARFM.
where σ(B) is the standard deviation of reflectance for band B and m is the estimated number of classes.The original ESTARFM uses two fine-resolution images to select similar pixels and then extracts the intersection of the two results to acquire an accurate set of similar pixels.Nevertheless, the original ESTARFM is inapplicable to cases where vegetation is in different phenology stages.
In such cases, the reflectance of similar pixels does not change linearly (or changes in the same trend).For example, in Figure 3, the central pixel is a rice pixel in the heading period, and all of the colored pixels are similar pixels selected by the original ESTARFM.The red pixels in the heading period decline in the EVI time series curve.Although the yellow pixels in milk-ripe period show the same trend as the red pixels, the rate of change is different between the red and yellow pixels.The rate of change of red pixels is faster than that of yellow pixels.Therefore, calculating the predicted reflectance by using yellow pixel might not be appropriate.The green pixels in the jointing period and the blue pixels in the tillering period show a rising trend in the EVI time series curve, which indicates that the growth of rice at this time is flourishing.Using the green and blue pixels in the

Creation of a New Rule of Searching for Similar Neighborhood Pixels
The pixels within the window with the same land cover type as the central pixel ("similar pixels") provide specific temporal and spatial information to compute fine-resolution reflectance for the central pixel [4].To reduce the adverse impact of misclassification, we select the threshold method to select similar pixels within a local search window.The threshold is determined by the standard deviation of a population of pixels from the fine-resolution image and the estimated number of land cover classes of the image [11].When all of the bands for the ith neighbor pixel satisfy Function (7), the ith neighbor pixel is selected as a similar pixel in ESTARFM.
where σ(B) is the standard deviation of reflectance for band B and m is the estimated number of classes.The original ESTARFM uses two fine-resolution images to select similar pixels and then extracts the intersection of the two results to acquire an accurate set of similar pixels.Nevertheless, the original ESTARFM is inapplicable to cases where vegetation is in different phenology stages.In such cases, the reflectance of similar pixels does not change linearly (or changes in the same trend).For example, in Figure 3, the central pixel is a rice pixel in the heading period, and all of the colored pixels are similar pixels selected by the original ESTARFM.The red pixels in the heading period decline in the EVI time series curve.Although the yellow pixels in milk-ripe period show the same trend as the red pixels, the rate of change is different between the red and yellow pixels.The rate of change of red pixels is faster than that of yellow pixels.Therefore, calculating the predicted reflectance by using yellow pixel might not be appropriate.The green pixels in the jointing period and the blue pixels in the tillering period show a rising trend in the EVI time series curve, which indicates that the growth of rice at this time is flourishing.Using the green and blue pixels in the calculation of predicted reflectance is unreasonable because their reflectance change trends are not the same.
To reduce the impact of dissimilar pixels, we use the rice EVI time series curve to reveal the process of rice growth.First, we acquire the land cover type of the study area using the support vector machine method (SVM) from the first fine-resolution image.The overall accuracy of SVM classification is 94.23% and the kappa coefficient is 0.8844, which shows that the classification result is convincing and can satisfy the needs of the study.Second, for the vegetation pixels, we calculate the EVI of the central pixel in the fine-resolution images to determine which phenology period in which the central pixel exists.Third, we calculate the EVI of similar pixels selected by the original ESTARFM to determine the phenology period in which they reside.The new similar pixels are reselected by the EVI threshold of different phenology periods from the previous similar pixels selected by the original ESTRFM.Therefore, the new similar pixels within the search window not only have the same land cover type but are also in the same phenology period in which the central pixel exists.For nonvegetation, the predicted reflectance is estimated using the original ESTARFM method.In this manner, we can ensure that the reflectance change is linear in the phenologyical change of vegetation.Moreover, the assumption that the proportion of each endmember and the reflectance change rate of each endmember are stable is reasonable through the algorithm because the vegetation change rate is stable and nearly linear in the same phenology period.
Remote Sens. 2018, 10, x FOR PEER REVIEW 7 of 18 calculation of predicted reflectance is unreasonable because their reflectance change trends are not the same.
To reduce the impact of dissimilar pixels, we use the rice EVI time series curve to reveal the process of rice growth.First, we acquire the land cover type of the study area using the support vector machine method (SVM) from the first fine-resolution image.The overall accuracy of SVM classification is 94.23% and the kappa coefficient is 0.8844, which shows that the classification result is convincing and can satisfy the needs of the study.Second, for the vegetation pixels, we calculate the EVI of the central pixel in the fine-resolution images to determine which phenology period in which the central pixel exists.Third, we calculate the EVI of similar pixels selected by the original ESTARFM to determine the phenology period in which they reside.The new similar pixels are reselected by the EVI threshold of different phenology periods from the previous similar pixels selected by the original ESTRFM.Therefore, the new similar pixels within the search window not only have the same land cover type but are also in the same phenology period in which the central pixel exists.For nonvegetation, the predicted reflectance is estimated using the original ESTARFM method.In this manner, we can ensure that the reflectance change is linear in the phenologyical change of vegetation.Moreover, the assumption that the proportion of each endmember and the reflectance change rate of each endmember are stable is reasonable through the algorithm because the vegetation change rate is stable and nearly linear in the same phenology period.

Calculation of the Prediction Reflectance
After performing all preparatory work, we estimate the reflectance of prediction time by using two fine-resolution images and three coarse-resolution images.According to Function (1), we should know the weight of similar pixels (Wi) and the conversion coefficient (Vi) between coarse-and fineresolution images.For the weight of similar pixels, higher similarity and smaller distance of the similar pixel to the central pixel produce a higher weight for the similar pixel [16].This relation indicates that the selection of similar pixels is vital for the spatiotemporal algorithm.We suppose that the reflectance of the surface pixel (mixed pixel) is modeled as a linear combination of the reflectance of different land cover components [30].Similar pixels whose corresponding coarse-resolution pixels are pure are given the largest weight (e.g., if there are n similar pixels, then the weight of the pure similar pixel is 1/n) to ensure that the reflectance predicted by similar pixels is accurate and to reduce the impact of the mixed pixel in the heterogenous region.Conversion coefficient Vi can be acquired

Calculation of the Prediction Reflectance
After performing all preparatory work, we estimate the reflectance of prediction time by using two fine-resolution images and three coarse-resolution images.According to Function (1), we should know the weight of similar pixels (W i ) and the conversion coefficient (V i ) between coarse-and fine-resolution images.For the weight of similar pixels, higher similarity and smaller distance of the similar pixel to the central pixel produce a higher weight for the similar pixel [16].This relation indicates that the selection of similar pixels is vital for the spatiotemporal algorithm.We suppose that the reflectance of the surface pixel (mixed pixel) is modeled as a linear combination of the reflectance of different land cover components [30].Similar pixels whose corresponding coarse-resolution pixels are pure are given the largest weight (e.g., if there are n similar pixels, then the weight of the pure similar pixel is 1/n) to ensure that the reflectance predicted by similar pixels is accurate and to reduce the impact of the mixed pixel in the heterogenous region.Conversion coefficient V i can be acquired by linear regression analysis of each similar pixel from the coarse-and fine-resolution reflectance at base time in a search window.As shown in Figure 3, a search window contains 8 intact and 12 partial coarse-resolution pixels.To prevent the condition wherein similar pixels that are located in a partial coarse-resolution pixel are too few to establish a reliable regression, the original ESTARFM supposes that similar pixels within the same coarse pixel have the same conversion coefficient [4].Specifically, the similar pixels outside the search window are used to build regression and obtain the conversion coefficient for the partial coarse-resolution pixel, but they are not used to estimate the reflectance of the central pixel because they are outside the search window.
The basic concept of the ESTARFM-based method is to fully use the spectral similarity and spatial information of neighborhood pixels.Temporal information is important for the fusion because fine-resolution samples that are closer in date to the prediction date should have closer reflectance values.Therefore, a temporal weight is set up, which can be calculated according to the change magnitude detected by the resampled coarse-resolution reflectance between the base time and prediction time.Prediction reflectance is then calculated by the weight of similar pixels, conversion coefficient, and temporal weight through Function (5).

Study Area
The study area was located in the city of Yueyang along the middle of the Changjiang River in the northeast part of Hunan Province, China (Figure 4).It covers an area from 28 • 25 33"N to 29 • 51 00"N and 12 • 18 31"E to 114 • 09 06"E.It lies between the Dongting Lake Plain and Jianghan Plain and is famous for being a fish, rice, and commodity grain base in China.Its territory is characterized by red, yellow, and paddy soils.Yueyang is a northern subtropical monsoon humid climate zone with four distinct seasons, abundant rainfall, long warm periods, and short cold periods.The average annual rainfall is 1289.8-1556.2mm, and the annual average temperature is between 16.5 • C and 17.2 • C, which is highly suitable for rice planting.
by linear regression analysis of each similar pixel from the coarse-and fine-resolution reflectance at base time in a search window.As shown in Figure 3, a search window contains 8 intact and 12 partial coarse-resolution pixels.To prevent the condition wherein similar pixels that are located in a partial coarse-resolution pixel are too few to establish a reliable regression, the original ESTARFM supposes that similar pixels within the same coarse pixel have the same conversion coefficient [4].Specifically, the similar pixels outside the search window are used to build regression and obtain the conversion coefficient for the partial coarse-resolution pixel, but they are not used to estimate the reflectance of the central pixel because they are outside the search window.
The basic concept of the ESTARFM-based method is to fully use the spectral similarity and spatial information of neighborhood pixels.Temporal information is important for the fusion because fine-resolution samples that are closer in date to the prediction date should have closer reflectance values.Therefore, a temporal weight is set up, which can be calculated according to the change magnitude detected by the resampled coarse-resolution reflectance between the base time and prediction time.Prediction reflectance is then calculated by the weight of similar pixels, conversion coefficient, and temporal weight through Function (5).

Study Area
The study area was located in the city of Yueyang along the middle of the Changjiang River in the northeast part of Hunan Province, China (Figure 4).It covers an area from 28°25′33″N to 29°51′00″N and 12°18′31″E to 114°09′06″E.It lies between the Dongting Lake Plain and Jianghan Plain and is famous for being a fish, rice, and commodity grain base in China.Its territory is characterized by red, yellow, and paddy soils.Yueyang is a northern subtropical monsoon humid climate zone with four distinct seasons, abundant rainfall, long warm periods, and short cold periods.The average annual rainfall is 1289.8-1556.2mm, and the annual average temperature is between 16.5 °C and 17.2 °C, which is highly suitable for rice planting.
Paddy rice is the main crop of Yueyang.In this study, we used 1000 × 1000 pixels of a Landsat image as the study area.The image contained four types of coverage, including water, building, rice, and non-paddy vegetation.Obtaining Landsat images of the main growth periods of paddy rice was difficult because of the frequent precipitation and cloudy sky in this area.Paddy rice is the main crop of Yueyang.In this study, we used 1000 × 1000 pixels of a Landsat image as the study area.The image contained four types of coverage, including water, building, rice, and non-paddy vegetation.Obtaining Landsat images of the main growth periods of paddy rice was difficult because of the frequent precipitation and cloudy sky in this area.

Satellite Data and Preprocessing
The primary data used in this study were from Landsat-8 OLI and MODIS.We used Landsat-8 surface reflectance products with a spatial resolution of 30 m.These products are available from the United States Geological Survey (https://espa.cr.usgs.gov/index/).The MODIS eight-day composite reflectance data (MOD09A1), which were selected from the highest-quality observations on cloud states, aerosol quantity, and view angle with a spatial resolution of 500 m, were obtained from the National Aeronautics and Space Administration (https://ladsweb.modaps.eosdis.nasa.gov/).These standard MODIS data were converted in terms of their format and projection by using the MODIS Reprojection Tool provided by the Land Processes Distributed Active Archive Centre.Table 1 summarizes the acquisition dates and types of remote sensing data.A brief description of the selected data is provided below.
We assembled predominantly clear (<2% cloud cover) Landsat-8 OLI scenes for the study.However, we could not obtain all of the images of these dates from the same number of row/line because of the frequent atmospheric contamination.Luckily, the study area covered several different numbers of rows/lines.Two scenes of Landsat-8 images (5 June, 30 July) were used as input data in the temporal-spatial fusion algorithm, and one scene (23 July) was used for the accuracy evaluation.The MOD09A1 series products (46 tiles for a year) were used to extract the rice EVI curve of the study area in 2016, and three tiles were used as input data in the temporal-spatial fusion algorithm.All images were collected and cut to the size of the study area.A land cover map with 30 m resolution of the study area was obtained from Landsat-8 surface reflectance by using the classification of SVM and was combined with surface investigation.All MODIS data were resampled to the same resolution with as the Landsat data.

Algorithm Implementation
The algorithms were applied to the Landsat-8 OLI and MODIS images.Six bands (blue, green, red, NIR, SWIR1, and SWIR2) of Landsat-8 OLI participated in the fusion.However, only green, red, and NIR, which corresponded to bands 3, 4, and 5 of Landsat-8 OLI and bands 4, 1, and 2 of MODIS, respectively, were used in the accuracy rating.We set the size of the search window to 3 for MODIS 500 m resolution pixels or 50 for Landsat 30 m resolution pixels (1500 m × 1500 m) due to the difficulty of land cover types, fragmented land surface, and topographic condition of the study area.The number of land cover was set to four, namely, building, water, paddy rice, and non-paddy vegetation.The land cover changes were small during this short observation period, and most spectral changes were caused by phenological change.The other parameters were similar to those in the work in Zhu et al. [4].

Results
In this section, our new spatiotemporal data fusion algorithm is evaluated through a comparison with the well-known ESTARFM.We use the 23 July 2016 (208 DOY) Landsat surface reflectance image as a validation source to assess the accuracy of the algorithms.The results estimated by the two algorithms are shown in the following three parts.

Comparison with an Actual Image in Visual Details
Figure 6 presents actual Landsat-8 OLI images and Landsat-like images produced by spatiotemporal data fusion.(a,d) are actual images, and (b,e) and (c,f) are images produced by ESTARFM and the modified algorithm using phenological information, respectively.Among these images, (a)-(c) are true-color composite images (red, green, and blue), and (d)-(f) are standard falsecolor composite images (NIR, red, and green).In terms of the visual effect, ESTARFM and the modified algorithm using phenological information perform well in the study area, and their predicted images are similar to the actual image acquired from Landsat-8 OLI.The two algorithms perform well in predicting the vegetation, building, and river path in the images, but the lake (which is beside the land) is not distinct enough.Moreover, no patch exists in the two images.Clear land and water boundaries can be predicted with the two algorithms.Linear objects, such as road and artificial river, are also distinct when the two algorithms are used.Although the resemblance between these images is remarkable, a difference exists between the prediction and observation images in detail.In Figure 7, the image predicted by ESTARFM is vague along the road in the south of the image (the lower rows), and some spatial details are smoothed, such as buildings.Meanwhile, the images produced by the modified algorithm do not have this problem.The image produced by ESTARFM cannot distinguish small surface objects and the texture information within surface objects.Generally, ESTARFM and the modified algorithm perform well in predicting images in visual inspection, but the modified algorithm performs better than ESTARFM in detail.

Comparison with an Actual Image in Reflectance of Paddy Rice
To quantitatively assess the accuracy of the two algorithms, we employed the regression coefficient of the linear fitting equation (ρ), correlation coefficient (r), and root mean square error (RMSE) of predicted reflectance compared with real reflectance.ρ represents the prediction accuracy.The larger the value is, the higher the precision is (the maximum is 1; when ρ = 1, the prediction image is similar to the observed image).r is a statistical index that studies the degree of the linear correlation

Results
In this section, our new spatiotemporal data fusion algorithm is evaluated through a comparison with the well-known ESTARFM.We use the 23 July 2016 (208 DOY) Landsat surface reflectance image as a validation source to assess the accuracy of the algorithms.The results estimated by the two algorithms are shown in the following three parts.

Comparison with an Actual Image in Visual Details
Figure 6 presents actual Landsat-8 OLI images and Landsat-like images produced by spatiotemporal data fusion.(a,d) are actual images, and (b,e) and (c,f) are images produced by ESTARFM and the modified algorithm using phenological information, respectively.Among these images, (a)-(c) are true-color composite images (red, green, and blue), and (d)-(f) are standard false-color composite images (NIR, red, and green).In terms of the visual effect, ESTARFM and the modified algorithm using phenological information perform well in the study area, and their predicted images are similar to the actual image acquired from Landsat-8 OLI.The two algorithms perform well in predicting the vegetation, building, and river path in the images, but the lake (which is beside the land) is not distinct enough.Moreover, no patch exists in the two images.Clear land and water boundaries can be predicted with the two algorithms.Linear objects, such as road and artificial river, are also distinct when the two algorithms are used.Although the resemblance between these images is remarkable, a difference exists between the prediction and observation images in detail.In Figure 7, the image predicted by ESTARFM is vague along the road in the south of the image (the lower rows), and some spatial details are smoothed, such as buildings.Meanwhile, the images produced by the modified algorithm do not have this problem.The image produced by ESTARFM cannot distinguish small surface objects and the texture information within surface objects.Generally, ESTARFM and the modified algorithm perform well in predicting images in visual inspection, but the modified algorithm performs better than ESTARFM in detail.

Comparison with an Actual Image in Reflectance of Paddy Rice
To quantitatively assess the accuracy of the two algorithms, we employed the regression coefficient of the linear fitting equation (ρ), correlation coefficient (r), and root mean square error (RMSE) of predicted reflectance compared with real reflectance.ρ represents the prediction accuracy.The larger the value is, the higher the precision is (the maximum is 1; when ρ = 1, the prediction image is similar to the observed image).r is a statistical index that studies the degree of the linear correlation between variables.The function is as follows: where x is the value of each pixel in predicted images, y is the value of each pixel in observed images, and x, y are the mean values of x and y, respectively.The value of r is larger, which means the relevance of x and y is larger.RMSE measures the deviation of the predicted values from true value and detects the consistency between the two values.The function is as follows: where n is the number of pixels participating in the calculation.The smaller the value of RMSE is, the denser the data are.The modified algorithm proposed in this paper mainly aims to improve the reflectance of paddy rice in the main phenology period.To evaluate the accuracy of the reflectance of paddy rice, we extracted all paddy rice pixels from the prediction and actual images for comparisons.
The results are shown in Table 2.
Remote Sens. 2018, 10, x FOR PEER REVIEW 11 of 18 where x is the value of each pixel in predicted images, y is the value of each pixel in observed images, and ⎯x, ⎯y are the mean values of x and y, respectively.The value of r is larger, which means the relevance of x and y is larger.RMSE measures the deviation of the predicted values from true value and detects the consistency between the two values.The function is as follows: where n is the number of pixels participating in the calculation.The smaller the value of RMSE is, the denser the data are.The modified algorithm proposed in this paper mainly aims to improve the reflectance of paddy rice in the main phenology period.To evaluate the accuracy of the reflectance of paddy rice, we extracted all paddy rice pixels from the prediction and actual images for comparisons.The results are shown in Table 2.     To display the results clearly, we employ a scatter diagram to show the density distribution of data using MATLAB.The scatter diagram in Figure 8 shows the reflectance relationship between the predicted values of the Landsat-like image and the actual values of the Landsat-8 OLI image.The line in the scatter diagram is 1:1.The points that are close to the line indicate that the algorithm can capture the reflectance change in surface objects and can achieve high accuracy in predicting the reflectance of the pixels.The color yellow means high centralization of points.For the red band, ρ is comparable for the two algorithms (0.2956 of ESTARFM vs. 0.3665 of the modified algorithm).This result means that the image produced by the modified algorithm is more similar to the observed image than that of ESTARFM.The r of the modified algorithm is slightly better than that of ESTARFM.Its RMSE is smaller than that of ESTARFM, which illustrates that the modified algorithm performs better in maintaining consistency.Higher consistency means more spatial features are maintained.From the aspect of point distribution in the scatter diagram, the points on the left of the scatter diagram in Figure 8a are too many, which means the values predicted by ESTARFM are underestimated too much when the value of reflectance of paddy rice is small.For the green band, the ρ and RMSE of the modified algorithm are better than those of ESTARFM.The r of the modified algorithm is worse than that of ESTARFM, but it does not influence the overall precision.The point distribution in the scatter diagram of ESTARFM in Figure 8c has the same problem as that in Figure 8a.However, this time, the green band of the modified algorithm performs well in keeping points around the line when the value of reflectance of paddy rice is small in scatter diagram (d).For the NIR band, the RMSE and r of the two algorithms are almost the same.ρ is improved (0.8855 of ESTARFM vs. 0.9173 of the modified algorithm), which indicates that the modified algorithm performs well in keeping the spatial features of surface objects.The modified algorithm performs better than ESTARFM in the NIR band when the reflectance value is large, as shown in scatter diagrams (f,e).To display the results clearly, we employ a scatter diagram to show the density distribution of data using MATLAB.The scatter diagram in Figure 8 shows the reflectance relationship between the predicted values of the Landsat-like image and the actual values of the Landsat-8 OLI image.The line in the scatter diagram is 1:1.The points that are close to the line indicate that the algorithm can capture the reflectance change in surface objects and can achieve high accuracy in predicting the reflectance of the pixels.The color yellow means high centralization of points.For the red band, ρ is comparable for the two algorithms (0.2956 of ESTARFM vs. 0.3665 of the modified algorithm).This result means that the image produced by the modified algorithm is more similar to the observed image than that of ESTARFM.The r of the modified algorithm is slightly better than that of ESTARFM.Its RMSE is smaller than that of ESTARFM, which illustrates that the modified algorithm performs better in maintaining consistency.Higher consistency means more spatial features are maintained.From the aspect of point distribution in the scatter diagram, the points on the left of the scatter diagram in Figure 8a are too many, which means the values predicted by ESTARFM are underestimated too much when the value of reflectance of paddy rice is small.For the green band, the ρ and RMSE of the modified algorithm are better than those of ESTARFM.The r of the modified algorithm is worse than that of ESTARFM, but it does not influence the overall precision.The point distribution in the scatter diagram of ESTARFM in Figure 8c has the same problem as that in Figure 8a.However, this time, the green band of the modified algorithm performs well in keeping points around the line when the value of reflectance of paddy rice is small in scatter diagram (d).For the NIR band, the RMSE and r of the two algorithms are almost the same.ρ is improved (0.8855 of ESTARFM vs. 0.9173 of the modified algorithm), which indicates that the modified algorithm performs well in keeping the spatial features of surface objects.The modified algorithm performs better than ESTARFM in the NIR band when the reflectance value is large, as shown in scatter diagrams (f,e).
Generally, the two spatiotemporal data fusion algorithms can capture the change in reflectance of paddy rice to some degree.From the scatter diagram, we can see that the two algorithms underestimate the actual value of rice reflectance on the whole.Even so, the modified algorithm performs relatively better than ESTARFM in paddy rice pixels from the aspect of statistical results.

Comparison with an Actual Image in EVI of Paddy Rice
EVI is an important index for detecting the vegetation phenology of remote sensing.We employ the scatter diagram of EVI of paddy rice to verify the algorithms' accuracy.The results are shown in Figures 9 and 10 and Table 3.The two algorithms closely match the actual observations (1:1 line).However, differences exist between the two scatter diagrams.The modified algorithm has a better regression coefficient than ESTARFM (0.8538 vs. 0.7794).Moreover, the EVI predicted by the two  were used to predict the image at Landsat spatial resolution at 3 July 2016.Then, the predicted images were compared with an actual Landsat-8 OLI image acquired on 7 July 2016 to assess the performance of our algorithm ESTARFM.We also used the same size as the study (1000 × 1000 pixels of a Landsat image).All data processes and input control parameters were identical to Sections 3.2 and 3.3.Since our study is mainly for paddy rice, we only compared reflectance and EVI of paddy rice pixels in predicted images with actual images.The results are shown in Table 4.These results display that our modified algorithm performs better than ESTARFM despite the fact that both algorithms applied in this area are not as good as the study area.The spatiotemporal data fusion methods have some challenges and problems that influence the overall accuracy, which is discussed in detail in Section 5.In general, our modified algorithm has better performance than that of ESTARFM in all aspects of the robustness test, which indicates that our modified algorithm is robust in improving the accuracy of ESTARFM.

Discussions
The comparison of the fusion results with the remote sensing data demonstrates that our methods are more accurate than ESTARFM.The improved performance of the modified algorithm highlights the importance of selecting similar pixels within the same phenology period for paddy rice.The modified algorithm presented in this paper has several desirable properties.ESTARFM assumes that the reflectance change rate of each endmember is stable in the fusion period.This assumption is unreasonable in several situations because vegetation grows vigorously in the main periods, contributing to a dramatic change in reflectance.Moreover, reflectance change differs in different phenology periods.Our modified algorithm comprehensively considers the phenology of paddy rice by using more coarse-resolution images.This nonlinear change is well captured by the modified algorithm, which may be partly due to the use of division of different phenology periods in our algorithm to simulate the temporal evolution of paddy rice in images.This operation eliminates the assumption.Another problem in ESTARFM is that reflectance cannot be predicted when two contradicting changes occur within a coarse-resolution pixel simultaneously and compensate for each other.Gao [11] believed that an approach is needed to use the difference in coarse-resolution image observations during a period, and small changes in coarse-resolution image observations ensure better prediction for a heterogeneous area.This problem does not exist in our modified algorithm for paddy rice pixels because the development and decline of paddy rice is divided into different types for fusion.Although the improvement of our algorithm over ESTARFM is not as significant as that over STARFM, the advantage of our modified method over ESTARFM is that we can accurately predict the reflectance of paddy rice when the phenology change occurs in the estimation period, and our algorithm can capture the reflectance changes of paddy rice that are not recorded in the based fine-resolution images using the rice phenological information from EVI time series data.By contrast, ESTARFM can only predict reflectance changes recorded in the based fine-resolution images because it fuses images based on two input image pairs.Some limitations also exist in our study.We employed MODIS eight-day composite reflectance data to fuse with Landsat data due to heavy cloud contamination.There are several days' interval of date between Landsat and MODIS pairs (four-day interval in the first pair and three-day interval in the second pair).Moreover, a four-day interval exists between the prediction and verification images which may cause errors in the result analysis.Moreover, the extraction of paddy rice pixels depends on the accuracy of classification, and any misclassification exerts an adverse impact on the entire algorithm.Finally, the study area contained other types of vegetation, and we did not take these non-paddy vegetation phenology change into consideration, which is a direction for our further work.Two parameters in both algorithms limit large-scale applications, namely, the size of the moving window and the number of land cover types.The size of moving window decides the size of area of selecting similar pixels, and the number of land cover types is decided by the surface features in actual situations.Although we can set the parameters according to the homogeneity of land surface observed from Landsat images, different conditions of regional surface suit different parameters.If we can set a dynamically changing window size to suit different surfaces, it will improve the accuracy of spatiotemporal data fusion when the fusion area is very large (e.g.,China or a global scale).Lastly, we used three fine-resolution images in this study because the study area ran across different paths and rows.However, sometimes very few fine-resolution images that were temporally close to the prediction time were available for use due to cloud and shadow contamination.These challenges need to be considered in further studies.

Conclusions
This research demonstrated the feasibility of improving the ESTARFM algorithm by adding phenological information to create a synthetic image with high spatial and temporal resolutions.We constructed the EVI time series curve from annual MODIS eight-day composite reflectance data to divide the growth of paddy rice into four main phenology periods.We established a new rule for selecting similar pixels using the phenology division and then fusing with the new rule.We tested the proposed algorithm through an experiment using real remote sensing data in the study area.Compared with the well-known fusion method ESTARFM, our algorithm showed better performance based on visual inspection and quantitative metrics, especially for NIR and red bands.The regression coefficient of prediction reflectance produced by the modified algorithm for the NIR band reached 0.9173, whereas that by ESTARFM was 0.8855.The regression coefficient of the former was 0.3665 compared with 0.2956 of the latter for the red band.We also verified the EVI results of two predicted images, and the regression coefficient of the modified algorithm was 0.8538 compared with the 0.7794 of ESTARFM.This indicates that our algorithm has better performance in EVI for paddy rice.
In summary, this study focused on eliminating the assumption that the change rate of each endmember is linear in the fusion period to improve the accuracy of prediction images for paddy rice.The modified algorithm using phenological information is helpful for structuring images of

Figure 1 .
Figure 1.Framework of the modified algorithm using phenological information extracted from MODIS time series for paddy rice.

Figure 1 .
Figure 1.Framework of the modified algorithm using phenological information extracted from MODIS time series for paddy rice.

Figure 2 .
Figure 2. Growing period EVI curve of paddy rice after smoothing by the Savitzky-Golay filter and linear interpolation.The period is divided into four phenology periods.① to ④ represent the tillering, jointing, heading, and milk-ripe periods, respectively.

Figure 2 .
Figure 2. Growing period EVI curve of paddy rice after smoothing by the Savitzky-Golay filter and linear interpolation.The period is divided into four phenology periods. 1 to 4 represent the tillering, jointing, heading, and milk-ripe periods, respectively.

Figure 3 .
Figure 3. Schematic of selecting similar pixels within a same coarse-resolution pixel.

Figure 3 .
Figure 3. Schematic of selecting similar pixels within a same coarse-resolution pixel.

Figure 4 .
Figure 4. Location and land cover types in the study area.Figure 4. Location and land cover types in the study area.

Figure 4 .
Figure 4. Location and land cover types in the study area.Figure 4. Location and land cover types in the study area.

18 Figure 5 .
Figure 5. Date of MODIS, Landsat, and prediction Landsat-like images in the spatiotemporal data fusion implementation.

Figure 5 .
Figure 5. Date of MODIS, Landsat, and prediction Landsat-like images in the spatiotemporal data fusion implementation.

Figure 6 .
Figure 6.(a,d) Actual images observed on 23 July 2016; (b,e) Prediction images by ESTARFM; (c,f) Prediction images by the modified algorithm using phenological information.The upper rows are red-green-blue composites of Landsat surface reflectance and the lower rows are NIR-red-green composites of Landsat surface reflectance.

Figure 6 .
Figure 6.(a,d) Actual images observed on 23 July 2016; (b,e) Prediction images by ESTARFM; (c,f) Prediction images by the modified algorithm using phenological information.The upper rows are red-green-blue composites of Landsat surface reflectance and the lower rows are NIR-red-green composites of Landsat surface reflectance.

Figure 7 .
Figure 7. (a) Part of the actual image observed on 23 July 2016 and (b) its corresponding prediction images by ESTARFM and (c) the modified algorithm using phenological information.The lower rows are amplified scenes of the area marked in the upper rows.

Figure 7 .
Figure 7. (a) Part of the actual image observed on 23 July 2016 and (b) its corresponding prediction images by ESTARFM and (c) the modified algorithm using phenological information.The lower rows are amplified scenes of the area marked in the upper rows.

18 Figure 8 .
Figure 8. Scatter plots of the real reflectance and the predicted ones produced by ESTARFM (left) and the modified algorithm (right) for red (a,b), green (c,d), and NIR (e,f) bands.

Figure 8 .
Figure 8. Scatter plots of the real reflectance and the predicted ones produced by ESTARFM (left) and the modified algorithm (right) for red (a,b), green (c,d), and NIR (e,f) bands.

Table 1 .
Remote sensing data types and acquisition dates.

Table 2 .
Regression coefficient of the linear fitting equation, correlation coefficient, and root mean square error for 19 July 2016, and predicted reflectance compared with real reflectance on 23 July 2016, for paddy rice.

Table 2 .
Regression coefficient of the linear fitting equation, correlation coefficient, and root mean square error for 19 July 2016, and predicted reflectance compared with real reflectance on 23 July 2016, for paddy rice.

Table 3 .
Regression coefficient of the linear fitting equation, correlation coefficient, and root mean square error of EVI of predicted reflectance compared with real reflectance, for paddy rice.In order to test the modified spatiotemporal algorithm, we employed another area to conduct the spatiotemporal fusion.The area is located around 27 • N and 113 • E, which is in Zhuzhou city, Hunan province.Three images of Landsat-8 OLI were acquired on 5 June 2016, 7 July 2016, and 23 July 2016, respectively.Accordingly, three images of MODIS were acquired on 1 June 2016, 3 July 2016, and 19 July 2016.The two pairs of Landsat-8 OLI acquired at 5 June 2016 and 23 July 2016 and the MODIS images acquired at 1 June 2016 and 19 July 2016 and another MODIS image acquired at 3 July 2016

Table 4 .
Regression coefficient of the linear fitting equation, correlation coefficient, and root mean square error of reflectance and EVI of predicted reflectance compared with real reflectance for paddy rice.