Reconstruction of Cloud-free Sentinel-2 Image Time-series Using an Extended Spatiotemporal Image Fusion Approach

Time-series for medium spatial resolution satellite imagery are a valuable resource for environmental assessment and monitoring at regional and local scales. Sentinel-2 satellites from the European Space Agency (ESA) feature a multispectral instrument (MSI) with 13 spectral bands and spatial resolutions from 10 m to 60 m, offering a revisit range from 5 days at the equator to a daily approach of the poles. Since their launch, the Sentinel-2 MSI image time-series from satellites have been used widely in various environmental studies. However, the values of Sentinel-2 image time-series have not been fully realized and their usage is impeded by cloud contamination on images, especially in cloudy regions. To increase cloud-free image availability and usage of the time-series, this study attempted to reconstruct a Sentinel-2 cloud-free image time-series using an extended spatiotemporal image fusion approach. First, a spatiotemporal image fusion model was applied to predict synthetic Sentinel-2 images when clear-sky images were not available. Second, the cloudy and cloud shadow pixels of the cloud contaminated images were identified based on analysis of the differences of the synthetic and observation image pairs. Third, the cloudy and cloud shadow pixels were replaced by the corresponding pixels of its synthetic image. Lastly, the pixels from the synthetic image were radiometrically calibrated to the observation image via a normalization process. With these processes, we can reconstruct a full length cloud-free Sentinel-2 MSI image time-series to maximize the values of observation information by keeping observed cloud-free pixels and calibrating the synthetized images by using the observed cloud-free pixels as references for better quality.


Introduction
With the European Space Agency's (ESAs) 2017 launch of Sentinel-2B satellite and 2015 launch of Sentinel-2A, the satellite constellation offers a revisit range from 5 days at the equator to a daily approach of the poles [1]. The Sentinel-2A/2B (Sentinel-2) satellites carry a multispectral instrument (MSI) sensor with 13 bands in the short-wave spectrum with spatial resolutions ranging from 10 m to 60 m. Since their launch, a vast number of time-series images have been acquired and become one of the most valuable Earth observation resources for land surface studies and environment monitoring at regional and local scales due to its high spatial and temporal resolutions [2][3][4][5][6]. However, Sentinel-2 cannot fully achieve all surface-reflectance measurements for each acquisition date due to contamination of clouds and cloud shadows on images, especially in cloudy regions. This makes a great number of the images useless and impends their applications and full value realization. Studies have found that temporally sparse earth observations, especially for areas with high probability of cloud coverage, are not sufficient for monitoring environment dynamics in a vegetation growing season [2,7,8].

Overview of the Reconstruction Method
The reconstruction approach began by gathering all time-series of Sentinel-2 images for a specified period and a study area. Then, using a spatiotemporal image fusion model to predict cloud-free synthetic Sentinel-2 images for all dates when cloud-free observation images were not available with clear-sky Sentinel-2 and MODIS images. After that, we replaced those cloud and cloud shadow contaminated pixels of the Sentinel-2 observations with the corresponding pixels of the synthetic images predicted at the same acquisition dates. The reconstruction procedure consisted of four steps: (1) an image fusion model was used to predict Sentinel-2 images at acquisition dates when cloud-free images (with either partial or full cloud coverage) were not available; (2) the cloudy and cloud shadow pixels of the cloud contaminated images were identified based on analysis of the differences of the synthetic and observation image pairs; (3) the cloudy and cloud shadow pixels of the observations were replaced by the corresponding pixels of the synthetic image; (4) the pixel values from the synthetic image were calibrated to the pixel values of the observation image with a radiometric normalization process. After these processes, an entire cloud-free Sentinel-2 images time-series can be reconstructed.
By using PSRFM [24] as the spatiotemporal image fusion model to generate the cloud-free synthetic Sentinel-2 image time-series, Figure 1 shows the main work flow of the reconstruction procedures. The procedural details are given as follows: (1) Gather the Sentinel-2 image time-series of a study area's sub-period (t 0 , t 1 , t 2 , . . . , t n ).
Without losing generality, assume this period starts at t 0 and ends at t n , and the images at t 0 and t n are clear-sky ones. Between the two acquisition dates, all other images at dates t k (k = 1, 2, . . . , n−1) are either partially cloud-covered or fully cloud-covered (with more than 50% cloud coverage). Therefore, all images at t k need to be processed for cloud-free time-series reconstruction. An entire study period can consist of multiple such sub-periods. (2) Retrieve the time-series MODIS images at the Sentinel-2 image acquisition dates. They are then processed (resampled, re-projected, and geometrically registered with the Sentinel-2 images) for image fusion modelling. (3) Generate synthetic Sentinel-2 images for all acquisition dates t k (k = 1, 2, . . . , n−1) from the clear-sky MODIS and Sentinel-2 image pairs at t 0 and t n , and MODIS images at t k using PSRFM. (4) Make a cloud and cloud shadow mask for every cloud contaminated Sentinel-2 image at t k by analyzing the reflectance differences between the observational and synthetic Sentinel-2 images. (5) Replace the cloudy and cloud shadow pixels of the Sentinel-2 observation images with the corresponding pixels of the synthetic Sentinel-2 images according to the cloud and cloud shadow mask.

Brief Description of PSRFM
The reconstruction quality of a cloud-free Sentinel-2 image time-series using our proposed approach essentially depends on two key processes. The first is to generate high quality synthetic Sentinel-2 images for cloudy and cloud shadow pixel replacement and the second is to identify cloudy and cloud shadow pixels that replace the corresponding pixels of the synthetic images. Herein, PSRFM [24,26] was used to generate cloud-free synthetic Sentinel-2 image time-series, i.e., the first key process of the reconstruction procedure. We selected PSRFM for the image fusion processing because it had been evaluated [24,26] against different landscape environments and dynamics, as well as being compared to published and well-known image fusion models, such as STARFM [18] and ESTARFM [19]. The evaluations and comparisons indicated that PSRFM outperformed STARFM and ESTARFM visually and quantitively.
When interpolating estimates of surface reflectance, PSRFM requires two co-located image pairs of clear-sky fine and coarse resolution images acquired at start date t0 and end date tn and a timeseries of cloud-free coarse spatial resolution images acquired between the start and end dates donated as t1, t1,…, tn−1 for synthetic fine resolution image prediction. PSRFM can also be applied for extrapolation when only one pair of clear-sky fine and coarse spatial resolution images are available. For example, for predicting images earlier than the date of the first clear-sky image pair at t0 or images later than the date of the last clear-sky image pair at tn, a synthetic image can be generated by the forward or backward prediction.

Brief Description of PSRFM
The reconstruction quality of a cloud-free Sentinel-2 image time-series using our proposed approach essentially depends on two key processes. The first is to generate high quality synthetic Sentinel-2 images for cloudy and cloud shadow pixel replacement and the second is to identify cloudy and cloud shadow pixels that replace the corresponding pixels of the synthetic images. Herein, PSRFM [24,26] was used to generate cloud-free synthetic Sentinel-2 image time-series, i.e., the first key process of the reconstruction procedure. We selected PSRFM for the image fusion processing because it had been evaluated [24,26] against different landscape environments and dynamics, as well as being compared to published and well-known image fusion models, such as STARFM [18] and ESTARFM [19]. The evaluations and comparisons indicated that PSRFM outperformed STARFM and ESTARFM visually and quantitively.
When interpolating estimates of surface reflectance, PSRFM requires two co-located image pairs of clear-sky fine and coarse resolution images acquired at start date t 0 and end date t n and a time-series of cloud-free coarse spatial resolution images acquired between the start and end dates donated as t 1 , t 1 , . . . , t n−1 for synthetic fine resolution image prediction. PSRFM can also be applied for extrapolation when only one pair of clear-sky fine and coarse spatial resolution images are available. For example, for predicting images earlier than the date of the first clear-sky image pair at t 0 or images later than the date of the last clear-sky image pair at t n , a synthetic image can be generated by the forward or backward prediction.
The modelling process involved a spatial partition of the fine resolution pixels sharing spectral similarity and fitting a linear transformation relating fine and coarse resolution spectra by unmixing each coarse resolution pixel based on land cover clusters. Then, the spectral change between the reference (t 0 and t n ) and prediction dates (t k ) was predicted forward and backward, respectively. Following that, a smoothing process combined the forward and backward predictions through an average filter weighted by either the time elapsed between reference and predicted date or the uncertainties of the two predicted images [24].
For the reconstruction process, we downloaded all Sentinel-2 images for which a cloud-free image time-series was required and then grouped every image into one of three classes: clear-sky images (cloud coverage < 5%), partially cloud-covered images (cloud coverage < 50%), and fully cloud-covered images (cloud coverage ≥ 50%). Here, an image with greater than 50% of cloud coverage was considered as full cloud coverage. This was based on the requirement of following cloud and cloud shadow identification methods for a reliable result. After the images were grouped, the clear-sky images were used as reference images to predict synthetic Sentinel-2 images at those dates when the images were partially or fully cloud-covered. For an image with full cloud coverage, it was entirely replaced by its corresponding synthetic image and for a partially cloud-covered image, its cloudy and cloud shadow pixels were replaced by the corresponding pixels of the synthetic image. Consequently, the final reconstructed cloud-free image time series consisted of three types of images: the clear-sky observations, the blended images that replaced the fully cloud-covered images, and the observation images with cloudy and cloud shadow pixels replacement from the synthetic images.

General Formula
Following the generation of synthetic images using PSRFM, the next challenging was to identify cloudy and cloud shadow pixels of the cloud contaminated images. If we compared a synthetic image at t k to the corresponding cloud contaminated image acquired at the same date, based on previous studies [24,26], the reflectance represented by digital numbers (DN) of the two images should be close to each other for most cloud-free pixels as the root mean square error and absolute mean error were small for all individual bands [24,26]. Consequently, a large difference of a pixel value for the two images was most likely caused by clouds or cloud shadows and other issues such as the haze on the observational image: where (i, j) is a pixel location and DN o and DN s are the digital number of the observation and synthetic images of band b, respectively. dDN is the pixel value difference of the two images. T is the threshold to separate cloud contaminated pixels from cloud-free ones.

Cloudy Pixel Identification
Clouds, especially thick clouds, have high spectral reflectance compared to most of the surface objects in all six bands (blue, green, red, near infrared, shortwave infrared 1, and shortwave infrared 2 as listed in Table 1 in Section 3). Therefore, a cloudy pixel had larger values in every band compared to a cloud-free pixel of the same kind. For cloudy pixel identification, we simply used the average of the six bands in Equation (1): where is the summation on all six pixel bands (i, j) and N is the total number of the bands that equals to 6. The summation on all the bands of a pixel (i, j) is used by Zhai et al. [27] as a cloud index. For the identification of cloudy pixels, Equation (2) becomes: The question of identifying cloudy pixels becomes how to determine T c in Equation (4). Through our analysis, we found that a graph of sorted dDN c value in ascending order for an entire image against their accumulated pixel number can help determine the threshold T c . Figure 2 presents a typical graph of a cloud contaminated image. In this figure, a cloudy pixel shows a large positive dDN c value at the right end due to its significantly increased reflectance while a cloud shadow pixel shows a large negative dDN c at the left end of the graph due to its considerably decreased reflectance. The majority of cloud-free pixels between the two extremes indicate small differences between the observation and its corresponding synthetic image, which mainly coming from modelling bias or error.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 23 For the identification of cloudy pixels, Equation (2) becomes: , The question of identifying cloudy pixels becomes how to determine in Equation (4). Through our analysis, we found that a graph of sorted value in ascending order for an entire image against their accumulated pixel number can help determine the threshold . Figure 2 presents a typical graph of a cloud contaminated image. In this figure, a cloudy pixel shows a large positive value at the right end due to its significantly increased reflectance while a cloud shadow pixel shows a large negative at the left end of the graph due to its considerably decreased reflectance. The majority of cloud-free pixels between the two extremes indicate small differences between the observation and its corresponding synthetic image, which mainly coming from modelling bias or error. To visually determine the threshold value in Equation (4), we pinpointed a few tries to identify cloudy pixels. The threshold was largely located where the change rate of the curve had a large jump towards the right end of the graph. Mathematically, this can be determined by differencing the sorted and comparing their changes to locate the point where a large curve change rate occurs. Specifically, can be derived by the following steps. First, the sorted values were arranged into multiple bins by a given interval, e.g., 100. Then the average of all the values of every bin was computed. Thereafter, the difference between the consecutive two bins can be calculated as: where , and , are the average values of the K th bin and (K+1) th bin, respectively, and is the difference between the two average values. Assuming that the middle portion of the graph represents the values of those cloud-free pixels, a pixel with any value significantly greater than the average of those values could be considered as a cloudy pixel. Hence, the threshold can be determined by using the in the middle portion (taking one third of the entire values conservatively) to calculate their mean and standard deviation and then to find the first value towards the right end of the graph that meets . Here, was an adjustable parameter. From our experiment, = 5 was a good choice for cloudy pixel identification, i.e., when 5 , the corresponding value of was the threshold for cloudy pixel identification. In comparison to other methods for cloudy pixel identification, the proposed method requires only one parameter, while other methods such as FMASK [28] require a few parameters in order to To visually determine the threshold T c value in Equation (4), we pinpointed a few tries to identify cloudy pixels. The threshold was largely located where the change rate of the curve had a large jump towards the right end of the graph. Mathematically, this can be determined by differencing the sorted dDN c and comparing their changes to locate the point where a large curve change rate occurs. Specifically, T c can be derived by the following steps. First, the sorted dDN c values were arranged into multiple bins by a given interval, e.g., 100. Then the average of all the values of every bin was computed. Thereafter, the difference between the consecutive two bins can be calculated as: where dDN c,k and dDN c,k+1 are the average values of the Kth bin and (K+1)th bin, respectively, and d kc is the difference between the two average values.
Assuming that the middle portion of the graph represents the values of those cloud-free pixels, a pixel with any value significantly greater than the average of those values could be considered as a cloudy pixel. Hence, the threshold T c can be determined by using the d kc in the middle portion (taking one third of the entire values conservatively) to calculate their mean d m and standard deviation σ m and then to find the first dDN c value towards the right end of the graph that meets d kc ≥ d m + C σ m . Here, C was an adjustable parameter. From our experiment, C = 5 was a good choice for cloudy pixel identification, i.e., when d kc ≥ d m + 5 σ m , the corresponding value of dDN c was the threshold T c for cloudy pixel identification.
In comparison to other methods for cloudy pixel identification, the proposed method requires only one parameter, while other methods such as FMASK [28] require a few parameters in order to produce a cloud mask. In addition, the threshold T c can be derived automatically from the image data and once it is determined, a cloud mask can be produced simply by comparison of the dDN c of a pixel to T c , while the parameter settings and mask computation of other methods are much more complicated.

Cloud Shadow Pixel Identification
Since cloud shadows alter the pixel value of near infrared (NIR) band and shortwave infrared band 1 (SWIR1) more significantly than other bands, we applied the average of the two bands (NIR and SWIR1) in Equation (1) for cloud shadow pixel identification. Similar to the identification of cloudy pixels, the difference of the digital number of the synthetic and the observation images was used to screen cloud shadow pixels as follows: where is the summation on NIR and SWIR1 of a pixel (i, j) of a Sentinel-2 image, and its corresponding synthetic Sentinel-2 image. In comparison to a cloud-free pixel, according to Equation (6), a cloud shadow pixel has a large negative dDN cs as clouds block the incoming radiance on cloud shadow pixels. Hence, to identify cloud shadow pixels, we have: where T cs is the threshold to be determined for distinguishing cloud shadow pixels from cloud-free ones.
If we plotted a distribution of sorted dDN cs with its accumulated pixel number of a cloud contaminated image (Figure 3), it would be similar to the distribution of the sorted dDN c in Figure 2. Hence, we determined the threshold T cs by using the same approach for the threshold identification for cloudy pixels. The threshold T cs can be pinpointed visually, as shown in Figure 3 or by the mathematic method described above for cloud threshold T c determination. The only difference was that the comparison of the curve change rate was from the middle of the graph towards the left side of the graph and C = 3 was reasonable for cloud shadow identification according to our experiment.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 23 produce a cloud mask. In addition, the threshold can be derived automatically from the image data and once it is determined, a cloud mask can be produced simply by comparison of the of a pixel to , while the parameter settings and mask computation of other methods are much more complicated.

Cloud Shadow Pixel Identification
Since cloud shadows alter the pixel value of near infrared (NIR) band and shortwave infrared band 1 (SWIR1) more significantly than other bands, we applied the average of the two bands (NIR and SWIR1) in Equation (1) for cloud shadow pixel identification. Similar to the identification of cloudy pixels, the difference of the digital number of the synthetic and the observation images was used to screen cloud shadow pixels as follows: where ∑ is the summation on NIR and SWIR1 of a pixel (i, j) of a Sentinel-2 image, and its corresponding synthetic Sentinel-2 image. In comparison to a cloud-free pixel, according to Equation (6), a cloud shadow pixel has a large negative as clouds block the incoming radiance on cloud shadow pixels. Hence, to identify cloud shadow pixels, we have: where is the threshold to be determined for distinguishing cloud shadow pixels from cloud-free ones.
If we plotted a distribution of sorted with its accumulated pixel number of a cloud contaminated image (Figure 3), it would be similar to the distribution of the sorted in Figure  2. Hence, we determined the threshold by using the same approach for the threshold identification for cloudy pixels. The threshold can be pinpointed visually, as shown in Figure 3 or by the mathematic method described above for cloud threshold determination. The only difference was that the comparison of the curve change rate was from the middle of the graph towards the left side of the graph and = 3 was reasonable for cloud shadow identification according to our experiment.

Haze Pixel Identification
In comparison to cloudy and cloud shadow pixel identification, haze pixel identification is more challenging because a haze pixel usually has mixed surface reflectance of objects on land surface.

Haze Pixel Identification
In comparison to cloudy and cloud shadow pixel identification, haze pixel identification is more challenging because a haze pixel usually has mixed surface reflectance of objects on land surface. Considering that the blue band is one of the most sensitive bands to haze effects, we used the difference of blue bands between the observation and its synthetic image in Equation (1) to identify haze affected pixels.
where DN o and DN s represent digital number of observation and its synthetic image in blue bands, respectively. dDN h is the pixel value difference between the two images. Haze usually makes a pixel value larger than that of a haze-free pixel. Similar to the cloudy pixel identification, to detect a haze pixel we have: where T h is the threshold to be determined for distinguishing between haze and haze-free pixels.
Using the same approach of determining the threshold T c for cloudy pixel identification, the threshold T h can be determined. However, the selection of the parameter value C was not as obvious as for cloudy and cloud shadow pixel identification. It depended on not only the degrees (e.g., thin or thick) of haze on the image but also the prediction accuracy of its corresponding synthetic image. As the difference ranges of dDN h (i, j) in Equation (8), it was usually much smaller than dDN c in Equation (4) or dDN cs in Equation (6), and the difference of some thin haze pixels may be close to the largest prediction errors of the synthetic image, wherein a smaller C was recommended. For our test case, we selected C = 3 for haze pixel identification.
To improve the accuracy of haze detection, we introduced an additional condition DN o (blue, i, j ) ≥ DN o,min to Equation (9). This states that for a haze pixel its observation value of blue band should be bigger than a minimum value DN o,min , otherwise the difference dDN h (i, j) is considered as contribution from the synthetic image prediction error even though the value is bigger than the threshold T h . Therefore, some misidentified haze pixels can be excluded. The minimum value DN o,min can be determined as: where DN o,mean is the mean value of all DN o (blue, i, j) and σ DN 0 is their standard deviation. n is an acceptable factor from 0.5 to 2.

Normalization of Reconstructed Images
After cloudy, cloud shadow, and haze pixels of a Sentinel-2 image were identified, they were replaced with the corresponding pixels of its synthetic image to form a cloud-free image. Since the synthetic image had minor visual differences from its counterpart observation on the reconstructed cloud-free image, there was some small roughness such as salt-and-pepper noises, especially around the edges of cloudy and cloud shadow pixels. To homogenize the differences, a normalization processing for calibrating the synthetic pixel values to the observational values was required. There are two main types of normalization methods: mapping and regression [16]. The mapping method directly establishes a pixel value equation between the processed images and uses the output of the mapping equation to replace the pixel values under consideration. The regression method creates a regression model to establish the pixel value distortion relationship between the images via pseudo-invariant features.
In this study, we took the mapping approach. The normalization process can be expressed as follows: where y n is the pixel value after the normalization process that maps a synthetic image to the observed image; x b is the replacement pixel value of the synthetic image; a and b are normalization parameters that are determined by a linear regression.
Considering that different clusters may have different reflectance change rates, we performed the linear regression to determine the normalization parameters a and b for each cluster. We followed four steps in the normalization process: (1) Segment the synthetic Sentinel-2 image into k clusters. The cluster number k can refer to the optimized clusters used for the image blending by PSRFM [24]. (2) Identify, for each cluster, all the cloud-free pixels that are common to the synthetic and observed images and belong to the same cluster, then use the pixel pairs to estimate normalization parameters a and b in Equation (11). (3) Apply the estimated normalization parameters to map the replacement pixels of the cluster to the observed image. (4) Repeat steps 2 and 3 for all the clusters.

Study Area
The study area was located in Saskatchewan, Canada, as shown in Figure 4, and had a dimension of 28 by 40 km, as well as an image size of 1400 by 2000 with 20 m spatial resolution. The landscape of the study area was in an agricultural environment. In an attempt to monitor land cover changes from pasture to cropland, we generated a daily time-series synthetic Sentinel-2 images for the vegetation growing season from 26 April 2018 to 10 October 2018 from clear-sky Sentinel-2 observations and MODIS images (MCD43A4). The image time-series was used to demonstrate our proposed methods in this study for cloud-free time-series image reconstruction.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 23 Considering that different clusters may have different reflectance change rates, we performed the linear regression to determine the normalization parameters a and b for each cluster. We followed four steps in the normalization process: (1) Segment the synthetic Sentinel-2 image into k clusters. The cluster number k can refer to the optimized clusters used for the image blending by PSRFM [24]. (2) Identify, for each cluster, all the cloud-free pixels that are common to the synthetic and observed images and belong to the same cluster, then use the pixel pairs to estimate normalization parameters a and b in Equation (11). (3) Apply the estimated normalization parameters to map the replacement pixels of the cluster to the observed image. (4) Repeat steps 2 and 3 for all the clusters.

Study Area
The study area was located in Saskatchewan, Canada, as shown in Figure 4, and had a dimension of 28 by 40 km, as well as an image size of 1400 by 2000 with 20 m spatial resolution. The landscape of the study area was in an agricultural environment. In an attempt to monitor land cover changes from pasture to cropland, we generated a daily time-series synthetic Sentinel-2 images for the vegetation growing season from 26 April 2018 to 10 October 2018 from clear-sky Sentinel-2 observations and MODIS images (MCD43A4). The image time-series was used to demonstrate our proposed methods in this study for cloud-free time-series image reconstruction.

Image Data
Two types of time-series images were used as input in the case study: one was Sentinel-2 timeseries and the other was MODIS MCD43A4 time-series. The Sentinel-2 images of the time-series had a great number of pixels contaminated by clouds and cloud shadows. Our goal was to reconstruct a cloud-free Sentinel-2 time-series from the two type images.

Image Data
Two types of time-series images were used as input in the case study: one was Sentinel-2 time-series and the other was MODIS MCD43A4 time-series. The Sentinel-2 images of the time-series had a great number of pixels contaminated by clouds and cloud shadows. Our goal was to reconstruct a cloud-free Sentinel-2 time-series from the two type images.

Sentinel-2 MSI Image Time-series
The Sentinel-2 MSI image time-series of the study area came from both Sentinel-2A and Sentinel-2B satellites and covered the vegetation growing season from 26 April 2018 to 10 October 2018. The temporal intervals of these images were 2 to 3 days. For the entire period, there were 67 observed Sentinel-2 images. Among them, there were only 14 (20.90%) clear-sky images (cloud coverage <5%) and 21 (31.34%) partially cloud-covered images (cloud coverage < 50%), as well as 32 (47.76%) fully cloud-covered images (cloud coverage ≥ 50%). Figure 5 shows the cloud-free pixel distribution in this time series. The Sentinel-2 MSI image time-series of the study area came from both Sentinel-2A and Sentinel-2B satellites and covered the vegetation growing season from 26 April 2018 to 10 October 2018. The temporal intervals of these images were 2 to 3 days. For the entire period, there were 67 observed Sentinel-2 images. Among them, there were only 14 (20.90%) clear-sky images (cloud coverage <5%) and 21 (31.34%) partially cloud-covered images (cloud coverage < 50%), as well as 32 (47.76%) fully cloud-covered images (cloud coverage ≥ 50%). Figure 5 shows the cloud-free pixel distribution in this time series.
As presented in Figure 5, majority of the pixels (about 60% of the total pixels) have 33 to 35 cloudfree days (about 18% pixels of 33 days, 22% of 34 days, and 19% of 35 days) that corresponded to about half of the total 67 image acquisition dates. Besides these, the number of pixels drop quickly with an increasing or decreasing number of cloud-free days. For example, while there were about 12.5% pixels with 36 cloud-free days, the percentage was down to 5% with 37 clear-sky days, and only about 1% with 38 days. These numbers indicated that, within the time period, fewer than half of the total image pixels of the time-period were useable. In the reconstruction process of the timeseries, all cloud contaminated pixels were replaced by the corresponding pixels of the synthetic images while keeping and maximizing the value of all observed cloud-free pixels.

MODIS MCD43A4
The second type of the images used in this study is daily MODIS MCD43A4 (Version 6). There are several collections of daily MODIS images that could be used to match the acquisition dates of the Sentinel-2 images time-series: MOD09GA from Terra, MYD09GA from Aqua, and MCD43A4. MOD09GA and MYD09GA were daily MODIS images, and MCD43A4 was daily but was with combined images from Terra and Aqua satellites with nadir bidirectional reflectance distribution function (BRDF) adjusted reflectance. Pixels of MCD43A4 were temporally weighted from the data of 16 days period for the 9th day of the period. As both Sentinel-2 satellites and MODIS had almost the same local overpass time, most likely if a Sentinel-2 image has cloud coverage issues, the daily MODIS MOD09GA/MYD09GA image acquired at the same day may also have the same problem. On the other hand, MCD43A4 came from 16 days images and cloudy pixels were not used for the data production. Furthermore, the view angle effects of MCD43A4 were removed according to bidirectional reflectance distribution function and they were alike from the nadir view angle, resulting in a stable and consistent NBAR (nadir BRDF adjusted reflectance) product [29]. Considering that the main purpose of this study was to reconstruct cloud-free Sentinel-2 image time-series, MCD43A4 images were more appropriate. Thus, MCD43A4 images at the same acquisition dates of the Sentinel-2 image time-series were used for image fusion exercise in this study. All the observation images were surface reflectance and downloaded from the USGS Earth Resources Observation and Science (EROS) Center [30]. As presented in Figure 5, majority of the pixels (about 60% of the total pixels) have 33 to 35 cloud-free days (about 18% pixels of 33 days, 22% of 34 days, and 19% of 35 days) that corresponded to about half of the total 67 image acquisition dates. Besides these, the number of pixels drop quickly with an increasing or decreasing number of cloud-free days. For example, while there were about 12.5% pixels with 36 cloud-free days, the percentage was down to 5% with 37 clear-sky days, and only about 1% with 38 days. These numbers indicated that, within the time period, fewer than half of the total image pixels of the time-period were useable. In the reconstruction process of the time-series, all cloud contaminated pixels were replaced by the corresponding pixels of the synthetic images while keeping and maximizing the value of all observed cloud-free pixels.

MODIS MCD43A4
The second type of the images used in this study is daily MODIS MCD43A4 (Version 6). There are several collections of daily MODIS images that could be used to match the acquisition dates of the Sentinel-2 images time-series: MOD09GA from Terra, MYD09GA from Aqua, and MCD43A4. MOD09GA and MYD09GA were daily MODIS images, and MCD43A4 was daily but was with combined images from Terra and Aqua satellites with nadir bidirectional reflectance distribution function (BRDF) adjusted reflectance. Pixels of MCD43A4 were temporally weighted from the data of 16 days period for the 9th day of the period. As both Sentinel-2 satellites and MODIS had almost the same local overpass time, most likely if a Sentinel-2 image has cloud coverage issues, the daily MODIS MOD09GA/MYD09GA image acquired at the same day may also have the same problem. On the other hand, MCD43A4 came from 16 days images and cloudy pixels were not used for the data production. Furthermore, the view angle effects of MCD43A4 were removed according to bi-directional reflectance distribution function and they were alike from the nadir view angle, resulting in a stable and consistent NBAR (nadir BRDF adjusted reflectance) product [29]. Considering that the main purpose of this study was to reconstruct cloud-free Sentinel-2 image time-series, MCD43A4 images were more appropriate. Thus, MCD43A4 images at the same acquisition dates of the Sentinel-2 image time-series were used for image fusion exercise in this study. All the observation images were surface reflectance and downloaded from the USGS Earth Resources Observation and Science (EROS) Center [30].

Synthetic Sentinel-2 Image Production by Image Fusion
Sentinel-2 and the MODIS images had six similar spectral bands (bandwidths) but different spatial resolutions, as is listed in Table 1. Before the image blending exercise, all the image bands were resampled to a unified spatial resolution of 20 m and the MODIS images were re-projected to map projection of the Sentinel-2 images. Then, the clear-sky image pairs of the Sentinel-2 and MCD43A4 images were prepared for model calibration and the MODIS images at all other Sentinel-2 acquisition dates were used for prediction of synthetic Sentinel-2 images by PSRFM.

Results and Discussions
As an experimental study, we reconstructed the cloud-free Sentinel-2 image time-series with 67 images of the study period from 26 April to 10 October 2018 using the proposed methods and data described in Sections 2 and 3. For the results and discussions, without losing generality, we employed six typical cloud contaminated images from the reconstructed time-series to illustrate the effectiveness of the methods. Figure 6 shows the false color composition of the six images. Three of these images (16 May,15 June and 5 October 2018) were cloud contaminated, two of them were mainly contaminated by haze (30 July and 21 August 2018), and one (3 October 2018) was contaminated by mixed thin clouds and thick clouds. We select these six images as examples for two reasons. The first was that the total contaminated pixels should be less than 50%. This was required by the method to make sure that the difference comparison was reliable for an effective result. The second was that they should include images contaminated by both cloud and haze for some challenged cases.

Cloudy, Cloud Shadow, and Haze Pixels Masks
Using the Sentinel-2 MSI observations shown in Figure 6 and their corresponding synthetic images generated by PSRFM, we tested the cloudy, cloud shadow and haze pixel identification methods proposed in Section 2. For a comparison, we also applied the well-developed FMASK method [28] to the same images for clouds and cloud shadows detection. Figure 7 displays the six cloud contaminated images with their clouds and cloud shadow masks resulted from the two clouds and cloud shadow detection methods. The cloud and cloud shadow identification method proposed by this study is called image fusion cloud masking (IFCM). Here, cloud mask includes clouds, cloud shadows, and haze elements.

Cloudy, Cloud Shadow, and Haze Pixels Masks
Using the Sentinel-2 MSI observations shown in Figure 6 and their corresponding synthetic images generated by PSRFM, we tested the cloudy, cloud shadow and haze pixel identification methods proposed in Section 2. For a comparison, we also applied the well-developed FMASK method [28] to the same images for clouds and cloud shadows detection. Figure 7 displays the six cloud contaminated images with their clouds and cloud shadow masks resulted from the two clouds and cloud shadow detection methods. The cloud and cloud shadow identification method proposed by this study is called image fusion cloud masking (IFCM). Here, cloud mask includes clouds, cloud shadows, and haze elements.
It is evident, as shown in Figure 7, that the cloud, cloud shadow, and haze masks by IFCM matched well to the cloudy, cloud shadows, and haze pixels of the images. In comparison to the results of FMASK, IFCM looked better visually, especially for the thin clouds and haze detection. For example, IFCM masked the thin clouds in the images on 3-5 October 2018 (the last two rows) more accurately. The image on 3 October 2018 had both thin and thick clouds; IFCM identified both to a certain extent while FMASK only detected thick clouds. For the image on 5 October 2018, the FMASK mask showed more omissions while the IFCM mask more accurately reflected the reality.
For the two images (the third and fourth rows) with the haze contamination, IFCM could identify the haze pixels to a certain extent, though some non-haze pixels were over identified. The over identified non-haze pixels were a result of compromise in determining the threshold value T h for the haze detection. The bigger the threshold value T h , the less the identified haze pixels, including the over identified non-haze pixels. Unlike the clouds and cloud shadow pixel detection, the difference between a haze contaminated pixel and its corresponding synthetic pixel can be smaller than the top image fusion modelling bias or errors, so that the over identified non-haze pixels are not avoidable to keep the real haze pixels detected. As FMASK was not specially developed for haze detection, it over detected the haze pixels as cloud and cloud shadow pixels more considerably. From this point of view, IFCM was more flexible than FMASK for haze detection.   White indicates clouds, dark grey shows cloud shadows, light grey represents haze, and the rest is cloud-free.

Reconstruction Results of Cloud-free Sentinel-2 Images
Based on the cloud masks resulted from the cloud and cloud shadow detection, we reconstructed the cloud-free Sentinel-2 image time-series by replacing the cloudy, cloud shadow and haze pixels of the original cloud contaminated images with the corresponding pixels of its synthetic images. Figure 8 presents the observations, synthetic, and reconstructed images for their visual comparison. From Figure 8, it can be found that both the synthetic and reconstructed images are cloud-free and the differences between the two images were small for all six images. The main reason was that the synthetic images were of reasonably high quality and the replacement of the clouds and cloud From Figure 8, it can be found that both the synthetic and reconstructed images are cloud-free and the differences between the two images were small for all six images. The main reason was that the synthetic images were of reasonably high quality and the replacement of the clouds and cloud shadow pixels in the observations from the synthetic images overall matched well to the rest of the cloud-free pixels of the observations although some minor differences between the synthetic and reconstructed images were still visible.

Normalization Results
To verify the effect of the normalization process, we compared the reconstructed Sentinel-2 images before and after the normalization process for all six images shown in Figure 9. It was apparent that the differences between the reconstructed images before and after the normalization process were not considerable for all six images. As discussed above, the main reason was that the synthetic Sentinel-2 images were well predicted and resembled their observations. The normalization process only made minor changes to those replacement pixels from the synthetic Sentinel-2 images and brought them to match better to their true observation based on the cloud-free pixels of the observations. Overall, the images after the normalization looks smoother and more natural than the reconstructed images without the normalization. For example, we saw some differences between the cloud-free pixels and the replacement pixels on the image dated 5 October 2018 before the normalization process. However, the differences were greatly reduced on the same image after the normalization.

Time-series Normalized Difference Vegetation Index (NDVI)
As shown in Figure 4, the study area was in an agricultural region. Majority of the landscape was covered by cropland (~66.3%), followed by grassland (~15.8%), shrubland (~9%), and deciduous forest (~1.6%). The four types of vegetation covered about 92.6% of study area. The rest was waterbody, buildup area, and others. As part of effort to check the quality of the reconstructed time-series Sentinel-2 images, we calculated the NDVI time-series of the four vegetation types of the study period as shown in Figure 10 from sample data of the clear-sky Sentinel-2 images, MODIS, synthetic Sentinel-2 images, and reconstructed Sentinel-2 images. The clear-sky Sentinel-2 and MODIS images were used to generate synthetic Sentinel-2 images when cloud-free images were not available. The sample data used for calculation of NDVI were randomly selected from these images and the only requirement was that all its neighbor pixels of 25 by 25 (the size of a MODIS pixel) had the same land cover type. The requirement for the deciduous sample was relaxed to 15 by 15 pixels due to a small portion of deciduous forest land cover in the study area.
As shown in Figure 10, although different land cover has different NDVI shapes, the NDVI patterns from the four images were similar for all the same land cover types. The NDVI time-series from MODIS and the synthetic Sentinel-2 images looked similar with exception of the deciduous forest. The NDVI time-series from the synthetic Sentinel-2 images were generally larger than that from the MODIS data though the general patterns of the two were the same. This was because the sample MODIS pixel for the NDVI calculation was not a homogeneous deciduous forest land cover. In general, the NDVI time-series from the synthetic Sentinel-2, reconstructed and clear sky images were very similar. This confirmed that first the synthetic images were well predicted and the second reconstructed cloud-free Sentinel-2 image time-series was well produced.

Time-series Normalized Difference Vegetation Index (NDVI)
As shown in Figure 4, the study area was in an agricultural region. Majority of the landscape was covered by cropland (~66.3%), followed by grassland (~15.8%), shrubland (~9%), and deciduous forest (~1.6%). The four types of vegetation covered about 92.6% of study area. The rest was waterbody, buildup area, and others. As part of effort to check the quality of the reconstructed timeseries Sentinel-2 images, we calculated the NDVI time-series of the four vegetation types of the study period as shown in Figure 10 from sample data of the clear-sky Sentinel-2 images, MODIS, synthetic Sentinel-2 images, and reconstructed Sentinel-2 images. The clear-sky Sentinel-2 and MODIS images were used to generate synthetic Sentinel-2 images when cloud-free images were not available. The sample data used for calculation of NDVI were randomly selected from these images and the only requirement was that all its neighbor pixels of 25 by 25 (the size of a MODIS pixel) had the same land cover type. The requirement for the deciduous sample was relaxed to 15 by 15 pixels due to a small portion of deciduous forest land cover in the study area. As shown in Figure 10, although different land cover has different NDVI shapes, the NDVI patterns from the four images were similar for all the same land cover types. The NDVI time-series from MODIS and the synthetic Sentinel-2 images looked similar with exception of the deciduous forest. The NDVI time-series from the synthetic Sentinel-2 images were generally larger than that from the MODIS data though the general patterns of the two were the same. This was because the sample MODIS pixel for the NDVI calculation was not a homogeneous deciduous forest land cover. In general, the NDVI time-series from the synthetic Sentinel-2, reconstructed and clear sky images were very similar. This confirmed that first the synthetic images were well predicted and the second reconstructed cloud-free Sentinel-2 image time-series was well produced.

Discussions
There were two critical issues when reconstructing cloud-free Sentinel-2 images time-series using our proposed image fusion approach: the quality of the blended images and the accuracy of clouds/cloud shadow pixels identification.
In this study, we used the PSRFM model for predicting cloud-free synthetic Sentinel-2 images when clear-sky Sentinel-2 images were not available. Some pixels of these synthetic images were used to replace those cloudy and cloud shadow pixels of the corresponding observations. Apparently, high quality of the synthetic Sentinel-2 images was the first key component in the chain of the reconstruction process to warrant high quality output. PSRFM is a well-developed spatiotemporal reflectance fusion model and has been validated and compared to some well-known image fusion models [24,26]. It works well for both gradual and abrupt land cover change situations, as well as for heterogeneous landscapes. Therefore, it was chosen for the synthetic image production. In addition to PSRFM model, other well-developed image fusion models can also be used for the reconstruction purpose as long as it can predict high quality of synthetic images such as STARFM, ESTARFM, FSDAF, and KFRFM [25].
To reconstruct cloud-free time-series Sentinel-2 images, MODIS MCD43A4 images were used as the coarse resolution image input. MODIS MCD43A4 images were generated from the cloud-free pixels of both Terra and Aqua satellites over a 16 days period for NBAR computation. The product provided a consistent cloud-free and nadir view image that was one of the best MODIS datasets for image fusion even though some pixels were of higher quality (full inversion), and some pixels were of lower quality (magnitude inversion) [29]. The latter could be a source of the synthetic Sentinel-2 image bias.
When using MODIS MCD43A4 images to reconstruct cloud-free time-series Sentinel-2 images, a question was raised: Why not apply the same composite method of generating cloud-free MODIS MCD43A4 images to reconstruct the cloud-free Sentinel-2 image time-series? Technically, it is possible as long as there are enough cloud-free pixels available from the Sentinel-2 observations over a 16 day period. Practically, depending on the season and the location of interest, the useful cloud-free pixels from Sentinel-2 observations over the 16 days period may be limited. For instance, according to the analysis about the cloud-free pixels of the Sentinel-2 images used for the test case in Section 3.2.1, fewer than half of the total image pixels of the time-period from 26 April 2018 to 10 October 2018 were useable. With MODIS's daily revisit, the number of images available for the composite was triple the number of Sentinel-2 images (at about a 3-day revisit frequency) available for the same period of 16 days. Therefore, in comparison to the composite method, the proposed method in this study can augment the stronger temporal intensity of MODIS and circumvent the sparsity of clear-sky observations in Sentinel-2 time series.
The second critical issue of this proposed approach was the identification of clouds and cloud shadows. The case study proposed a new method for cloudy and cloud shadow as well as haze pixel identification by analyzing the differences between observations and their synthetic images. Our proposed method has obvious advantages. It is simple and effective. A conventional cloud identification was based on the spectral difference between cloudy pixels and other cloud-free ones within an image. As some land surface objects such as buildings, bare dry soil, and roads have similar high reflectance as clouds do, it is usually problematic to distinguish clouds from those objects. However, by our method, those objects can be easily excluded from clouds. The reason is that if a land surface object, for example, a building or a road, has high spectral reflectance in nature, it should have high spectral reflectance in both observation and its synthetic images (if the synthetic image is correctly predicted which is the case from our PSRFM modelling). Therefore, their reflectance difference expressed in Equation (3) should be small, and the object can be easily identified as non-cloud object. In contrast, assuming an object has small spectral reflectance in nature, on the one hand, it will present big spectral reflectance if it is covered by clouds in a cloud contaminated image, and on the other hand, it still has a small spectral reflectance value in its corresponding cloud-free synthetic image. Hence, the large difference of the pixel values of the observation and the synthetic image would reveal it as a cloudy pixel.
The advantage for the cloud identification can be extended to the cloud shadow identification. For example, cloud shadow pixels can often be mixed to water pixels as both cloud shadow and water have small spectral reflectance. For similar analysis, assuming a synthetic image is well predicted, a water pixel can have a small value in both observation and synthetic images. Therefore, a small difference between them indicates that it is not a cloud shadow pixel. That being said, a medium to high surface reflectance object has a small value if covered by cloud shadow. Hence, a large difference of a pixel between a cloud contaminated image and its corresponding synthetic image can uncover the cloud shadow pixel.
It is possible that the difference between a haze or thin cloud contaminated pixel and its corresponding synthetic pixel is smaller than the top image fusion modelling bias or errors. This could result in false haze and thin cloud pixel masking. However, the chance is relatively small since from PSRFM model evaluation, majority of the pixels are well predicted for all the bands. A small number of outliers will not affect the overall accuracy of haze, cloudy, and cloud shadow pixel identification.
To distinguish cloud-free and cloud contaminated pixels, our method determined the values of only three threshold parameters: T c for clouds, T cs for cloud shadows, and T h for haze. They were easily determined by our proposed method, either visually or automatically. Like other methods, the cloud contaminated pixels (clouds, cloud shadow, and haze) can be over-or underestimated. Cloudier, cloud shadow, or haze pixels can be masked if decreasing the absolute value of the thresholds and vice versa. In our proposed reconstruction process, after we identified the cloud contaminated pixels of an observation image, those pixels were replaced by the pixels of the corresponding synthetic image. Since the synthetic images had high quality, we preferred to overestimate the cloud contaminated pixels instead of underestimating them as inclusion of more cloud contaminated pixels in the mask resulted in better results than omission of true cloud contaminated pixels in the mask. In the latter situation, unidentified cloud contaminated pixels remained in the reconstructed time-series. In contrast, in the former situation, some cloud-free pixels (which were wrongly identified as haze, cloudy/cloud shadow pixels) were replaced by the corresponding pixels of the synthetic image, which later were radiometrically calibrated to the observation values. Therefore, the impact of overestimating cloudy, cloud shadow, and haze pixels on the final reconstruction time-series was small.
Like other cloudy pixel identification, this proposed method worked better for images with thick cloud contamination than with thin cloud coverage as thick clouds made the affected pixels dramatically distinguishable from cloud-free pixels, and a pixel with thin cloud or haze coverage showed surface reflectance underneath. For an image with thin cloud and/or haze coverage, we adjusted the value of threshold T h . A smaller T h could identify more thin cloud or haze covered pixels. In this situation, omission of some true cloud-free pixels could be possible. However, as discussed above, such small omission would not significantly impact the quality of the reconstructed cloud-free time-series images.
Among the three types of cloud contaminated pixels (cloudy, cloud shadow, and haze pixels), the accuracy of cloudy and cloud shadow pixel identification was higher than that of haze pixel identification. The main reason was that a cloudy and cloud shadow pixel usually had a large reflectance difference compared to a cloud-free pixel while the value difference of a haze and a haze-free pixel was relatively small. A haze pixel may also contain some of the surface reflectance underneath. Hence, due to the smaller difference, bias from the synthetic image prediction may contribute to over or under identification of haze pixels. Because of this, the method for haze pixel identification needs to be further polished.

Conclusions
Time-series for Sentinel-2A/2B (Sentinel-2) MSI images are valuable resources for environmental study and change monitoring. Although the revisiting capability of Sentinel-2 constellation has been greatly improved compared to other sensors such as Landsat, values of Sentinel-2 images cannot be fully realized due to cloud coverage, especially in the cloudy regions. This study aimed to reconstruct cloud-free Sentinel-2 image time-series from a limited number of clear-sky Sentinel-2 observations and MODIS MCD43A4 images using an extended spatiotemporal image fusion approach. With this approach, the cloud-free pixels of a time-series of Sentinel-2 observations were kept while the cloudy and cloud shadow pixels were replaced by the pixels of their corresponding synthetic images. The proposed new algorithm of this study to identify cloudy, cloud shadow, and haze pixels was simple but effective. To take advantages of high quality synthetic Sentinel-2 images, the identification of cloudy, cloud shadow, and haze pixels was based on analysis of different cloud contaminated Sentinel-2 observations and synthetic image from clear-sky Sentinel-2 and MODIS MCD43A4 images. By replacing only cloud contaminated pixels while keeping all cloud-free pixels of observations, the method produced the highest quality of a reconstructed time-series image. Experiment with reconstruction of a cloud-free time-series of 67 Sentinel-2 images of the case study indicated that the proposed method worked well. Hence, our proposed approach could make Sentinel-2 images more usable and useful.
This reconstruction method depends on the quality of synthetic images. The higher the quality of synthetic images, the higher quality of a reconstructed image time-series. In this regard, choice of a good image fusion model is necessary. In this study we used PSRFM for image fusion as previous studies indicated that it produced high quality of synthetic Sentinel-2 images at various landscape situations.
While this study demonstrated the proposed method with Sentinel-2 images, this method can be also applied to reconstructing time series Landsat 8 imagery since the PSRFM model was successfully used to produce synthetic Landsat 8 image time-series in their earlier studies.
One of the main limitations of this technique was that it generated only six bands of cloud-free time-series Sentinel-2 MSI images due to the constraint of spectral bandwidth scope. The image fusion methods, including PSRFM, were built on the similarity of reflectance bandwidths between fine (e.g., Sentinel-2) and coarse (e.g., MODIS) images. As the two sensors had only six similar bands, as shown in Table 1, only the reflectance of the six bands were blended and processed for time-series image reconstruction.