Next Article in Journal
True2 Orthoimage Map Generation
Next Article in Special Issue
Editorial to Special Issue “Remote Sensing Image Denoising, Restoration and Reconstruction”
Previous Article in Journal
Near-Field Multiple Target Localization in Frequency Diverse Array Based on Tensor Decomposition
Previous Article in Special Issue
Column-Spatial Correction Network for Remote Sensing Image Destriping
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reconstruction of Sentinel-2 Image Time Series Using Google Earth Engine

1
College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China
2
State Key Laboratory of Remote Sensing Science, Beijing Normal University, Beijing 100875, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(17), 4395; https://doi.org/10.3390/rs14174395
Submission received: 22 August 2022 / Accepted: 28 August 2022 / Published: 4 September 2022
(This article belongs to the Special Issue Remote Sensing Image Denoising, Restoration and Reconstruction)

Abstract

:
Sentinel-2 NDVI and surface reflectance time series have been widely used in various geoscience research, but the data is deteriorated or missing due to the cloud contamination, so it is necessary to reconstruct the Sentinel-2 NDVI and surface reflectance time series. At present, there are few studies on reconstructing the Sentinel-2 NDVI or surface reflectance time series, and these existing reconstruction methods have some shortcomings. We proposed a new method to reconstruct the Sentinel-2 NDVI and surface reflectance time series using the penalized least-square regression based on discrete cosine transform (DCT-PLS) method. This method iteratively identifies cloud-contaminated NDVI over NDVI time series from the Sentinel-2 surface reflectance data by adjusting the weights. The NDVI and surface reflectance time series are then reconstructed from cloud-free NDVI and surface reflectance using the adjusted weights as constraints. We have made some improvements to the DCT-PLS method. First, the traditional discrete cosine transformation (DCT) in the DCT-PLS method is matrix generated from discrete and equally spaced data, we reconfigured the DCT formulas to adapt for irregular interval time series, and optimized the control parameters N and s according to the typical vegetation samples in China. Second, the DCT-PLS method was deployed in the Google Earth Engine (GEE) platform for the efficiency and convenience of data users. We used the DCT-PLS method to reconstruct the Sentinel-2 NDVI time series and surface reflectance time series in the blue, green, red, and near infrared (NIR) bands in typical vegetation samples and the Zhangjiakou and Hangzhou study area. We found that this method performed better than the SG filter method in reconstructing the NDVI time series, and can identify and reconstruct the contaminated NDVI as well as surface reflectance with low root mean square error (RMSE) and high coefficient of determination (R2). However, in cases of a long range of cloud contamination, or above water surface, it may be necessary to increase the control parameter s for a more stable performance. The GEE code is freely available online and the link is in the conclusions of this article, researchers are welcome to use this method to generate cloudless Sentinel-2 NDVI and surface reflectance time series with 10 m spatial resolution, which is convenient for landcover classification and many other types of research.

1. Introduction

Land surface reflectance is basic parameter to describe the characteristics of the Earth’s surface [1], and the vegetation index is obtained by linear or nonlinear combination of reflectance in different spectral bands. Normalized Difference Vegetation Index (NDVI) is one of the most widely used vegetation indexes [2,3], it is simple to calculate and can be used for crop yield estimation [4], leaf area index (LAI) estimation [5] and other studies. Surface reflectance and NDVI time series have been widely used in land cover classification, environmental change, crop monitoring and other fields, and have become important indicators to describe global climate change and its impact on terrestrial ecosystems [3,6,7,8,9]. There are many related geoscience studies mainly based on the surface reflectance and NDVI time series data of Advanced Very High Resolution Radiometer (AVHRR), Moderate Resolution Imaging Spectroradiometer (MODIS), Landsat and Sentinel-2 images. Compared with the limited resolution in temporal or spatial dimension for AVHRR, MODIS and Landsat images, the 10 m spatial resolution and 5-day revisit period of Sentinel-2 data has a better application range. The Sentinel-2 mission provides a combination of two satellites, Sentinel-2a and Sentinel-2b, equipped with the Multi Spectral Instrument (MSI) sensor with 13 bands from the visible and the near infrared (NIR) to short wave infrared spectral range (SWIR) at different spatial resolutions from 10 to 60 m, including the red edge band that are beneficial for vegetation physiology research [10]. Sentinel-2 images have become one of the most valuable Earth observation resources in remote sensing studies and applications [7,11,12,13]. However, clouds are common in remote sensing images which cause the retrieved surface parameters incomplete, and residual clouds or undetected cloud shadows can still be observed in Sentinel-2 surface reflectance products. Therefore, Sentinel-2 surface reflectance and NDVI time series have outliers or vacancies in time and space, which greatly limits their application range. If Sentinel-2 surface reflectance and NDVI time series can be reconstructed, the temporal and spatial resolution of related geoscience research will be greatly improved.
Many methods have been developed to reconstruct the surface reflectance or NDVI time series. Based on several studies [7,14,15], methods to reconstruct surface reflectance time series can be divided into four types: (1) temporal-based methods [16,17]; (2) spectral-based methods [18,19]; (3) hybrid methods [20,21]; (4) multi-source fusion [7,22,23,24]. In comparison, methods to reconstruct NDVI time series can be divided into four categories: (1) temporal-based methods, which can be further sub-divided into four sub-classes: (a) temporal interpolation-replacement [25,26]; (b) temporal filter [27,28,29]; (c) temporal function-fitting methods [30,31]; (d) temporal deep learning models [32,33]; (2) frequency-based methods [6,34,35]; (3) hybrid methods [36,37,38]; (4) multi-source Fusion [22,23,24]. However, most of temporal based methods are applicable only to the sensors such as AVHRR and MODIS, which have high temporal resolution but low spatial resolution. For the high spatial resolution sensor such as Landsat, the multi-source spatio-temoral fusion method is usually adopted [22,23,24]. Sentinel-2 data is similar to Landsat data but with higher spatio-temporal resolution. Some researchers [7,39,40] used the multi-source fusion method to reconstruct the Sentinel-2 image time series, among them, Zhou et al. [7] used a spatiotemporal image fusion approach, the prediction smooth reflectance fusion model (PSRFM) [24], to reconstruct the cloud-free Sentinel-2 image time series, and achieved good results; Xiong et al. [40] proposed a deep learning-based fusion algorithm, enhanced residual dense network (ERDN), that can simultaneously fuse Landsat-7,8 and Sentinel-2 surface reflectance to generate dense Sentinel-2 image time series. However, spatiotemporal or deep learning-based fusion approaches faces the problems of spatial resolution inconsistency and spectral band difference, and the latter method requires high computational power and is time-consuming. Thanks to the improved revisiting frequency of Sentinel-2 data, it is now possible to reconstruct Sentinel-2 surface reflectance and NDVI time series without the aid of external data, using methods suitable for high-frequency temporal data. In the meanwhile, the reconstruction of Sentinel-2 surface reflectance and NDVI time series on the local computer is a great challenge for the storage and computing power of the local computer. The Google Earth Engine (GEE) can solve the problem of local computer limitations.
GEE is a cloud computing-based application and development platform with powerful data storage, management and data processing capabilities, providing technical means for a wide range of earth science applications [41]. Users can directly program online using AVHRR, MODIS, Landsat and Sentinel-2 data stored on the GEE platform. However, at present, the GEE only provides simple time series analysis and composite methods, such as maximum composite method and polynomial fitting, etc., which do not meet the advanced requirements of time series analysis. In recent years, some researchers have conducted remote sensing image time series reconstruction or smoothing research on the GEE. Kong et al. [42] developed weighted Whitter with dynamic parameter λ in spatial (wWHD) method to reconstruct MODIS EVI time series on GEE. Khanal et al. [43] compared the performance of Fourier-transformation smoothing, Whittaker smoother and Linear Fit Smoothing on Landsat 5, 7 and 8 images based on yearly composites to classify land cover in the study area using GEE. Chen et al. [9] developed a simple but effective Gap Filling and Savitzky-Golay filtering method (GF-SG) on GEE to reconstruct high-quality Landsat NDVI time series. Liu et al. [44] developed a robust reconstruction method based on envelope detection and the Savitzky-Golay filter (ED-SG) method to reduce noise in the NDVI time series, and the ED-SG method was programmed on GEE to smooth the MOD09GQ-NDVI, Landsat-NDVI and Sentinel-2 NDVI. Most of their image time series reconstruction or smoothing studies deal with vegetation index. To the best of our knowledge, there are few studies on the reconstruction of surface reflectance on GEE platform.
The penalized least square regression based on discrete cosine transform (DCT-PLS) method is a smoothing algorithm proposed by Garcia [45], has the advantages of less parameters, high execution efficiency and high reconstruction accuracy. Based on the DCT-PLS method, Xiao et al. [6] proposed a method which incorporates upper envelopes of time series vegetation indices as constraint conditions to reconstruct time series of NDVI and surface reflectance from MODIS/TERRA surface reflectance product (MOD09A1). However, the DCT-PLS method is applicable to the interpolation of equal interval data with high temporal resolution. Although the approximate revisit cycle of Sentinel-2 data is 5 days, the actual acquisition date is irregular. Coupled with the influence of cloud, the amount of valid clear sky observation data is far less than that in the ideal situation, so the DCT-PLS method needs to be adjusted to adapt for unequally spaced data. The aim of this study is to reconfigure the DCT-PLS method to make it suitable for reconstruction of unequal interval data, and generate high-quality and cloudless Sentinel-2 NDVI and surface reflectance time series data, which will provide valuable information source for agriculture, landcover classification, and other research.

2. Study Area and Data

2.1. Study Area

2.1.1. Samples of Different Vegetation Species

In order to establish an image time series reconstruction algorithm that can be used for a variety of typical vegetation in different regions, 10 typical vegetation sample pixels were selected in Liaoning, Hebei, Zhejiang and Henan provinces in east China for different vegetations, including larch, rice, tobacco, corn, grape and other plants (Table 1). Sample 2 and 4 are rotated crop of two seasons, other samples are crops of one season. These samples serve the purpose of testing the DCT-PLS algorithm in local computer, as well as optimizing the control parameters in Section 3.3. They represent the typical situation of crop land in middle to north part of China.

2.1.2. Large Scale Study Area

In order to carry out the research on image time series reconstruction on the GEE platform, two large-scale research areas, Area A and Area B as presented in Figure 1, were selected to evaluate the DCT-PLS method’s performance of reconstructing large-scale NDVI and surface reflectance under different atmospheric and aerosol conditions. Area A has less precipitation throughout the year, the main crops are corn and grape, and the proportion of valid Sentinel-2 images that can be used for time series reconstruction is relatively large; Area B is cloudy and rainy throughout the year, the main crop is rice, and the proportion of effective images that can be used for time series reconstruction is small. Figure 1 shows the geographical location of the two study areas.
Area A is located in Zhangjiakou City, Hebei Province. The geographical coordinate is 115°12′–115°29′E, 40°20′–42°33′N, with an area of about 539.0 km2. Area A with the temperate continental monsoon climate. The annual average rainfall is about 380 mm, temperature is about 9.1 °C, and sunshine hours are about 3000 h. The land coverage of the Area A includes cultivated land, forests, buildings, water bodies, etc., and the main crops are corn and grape.
Area B is located in Hangzhou City, Zhejiang Province. The geographical coordinate is 119°51′–120°12′E, 29°59′–30°15′N, with an area of about 993.5 km2. It is with the subtropical monsoon climate. The annual average rainfall is about 1454 mm, temperature is about 17.8 °C, and sunshine hours are about 1765 h. The land coverage of the Area B includes buildings, water bodies, forests and cultivated land, etc., and the main crop is rice and vegetables.

2.2. Data

Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission supporting Copernicus Land Monitoring studies, including the monitoring of vegetation, soil and water cover, as well as observation of inland waterways and coastal areas, and the Sentinel-2 satellite refers to the design of Landsat and SPOT series satellites [10] in terms of their operating orbits, sensor parameters design, etc. The Sentinel-2 Level-1C (L1C) product provides geo-coded Top of Atmosphere (TOA) reflectance, and the Level-2A (L2A) product is obtained by atmospheric correction of L1C product, which provides Bottom of Atmosphere reflectance, i.e., surface reflectance. The DCT-PLS method is applied to reconstruct time series of NDVI and surface reflection from the Sentinel-L2A product, which is provided in the GEE (https://earthengine.google.com, accessed on 1 June 2022). We selected Sentinel-2 images in 10 m resolution bands, i.e., the blue, green, red and near infrared (NIR) bands, as well as NDVI for image time series reconstruction. In addition, the QA60 band are used in the study as a means to preliminarily screen for valid cloud-free observations. The information of the selected bands is shown in Table 2.
For the 10 typical vegetation samples in Table 1, the pixel-scale NDVI and reflectance values of the selected bands from 1 June 2019 to 1 June 2021 were selected on the GEE platform, and the Sentinel-2 L2A images properties were added, finally the Feature-Collection were exported to Google Drive in CSV format to provide test data for the time series reconstruction experiments on the MATLAB platform in local computer.
For the two large-scale study areas, Sentinel-2 L2A images can be directly used on the GEE platform. All the available images with the coverage of clouds less than 80% from 1 June 2019 to 1 June 2021 were selected to reconstruct Sentinel-2 image time series in 2020. The number of used and total scenes, as well as the average cloud coverage of used and total images are presented in Table 2.

3. Methodology

3.1. Introduction of Original DCT-PLS Method

The formulation of DCT-PLS method is detailed in the article of Garcia [45], for the convenience of its adjustment in Section 3.2, its main ideas will be briefed here. Let y be a vector containing a series of equal interval raw data ( y i , i = 1 , 2 , , n . ), and y ^ is a vector containing a series of equal interval predictions ( y ^ i , i = 1 , 2 , , n . ). W is matrix containing weights, and each w i corresponds to y i , ranging from 0 to 1. If the weight is 0, it means that either the data is missing or it is judged to be an outlier (cloud contaminated) and not used. The goal of this method is to find the value of y ^ that minimizes the F y ^ in Equation (1).
F y ^ = W · RSS + s · P y ^ = W 1 / 2 · y ^ y 2 + s · D · y ^ 2
where RSS is the residual sum-of-squares; · indicates the Euclidean norm; s is a positive scalar that controls the degree of smoothing, and the smoothing of y ^ increases with the increase in s ; P is the penalty function that reflects the roughness of the smoothed data; D is the Laplace operator, acting as the implementation of P . The minimization of F y ^ can be achieved by using an iteration procedure over y ^ .
With type-2 discrete cosine transformation (DCT) and inverse discrete cosine transform (IDCT), ŷ can be written as:
y ^ k + 1 = IDCT Γ · DCT W y y ^ k + y ^ k
where y ^ k refers to y ^ calculated at the k th iteration step, Γ is a diagonal matrix composed of the eigen values of D , its element Γ i , i can be expressed as:
Γ i , i = 1 + s · 2 2 cos i 1 · π / n 2 1
In our study, the initial value of w i is set to 0 or 1 according to the Sentinel-2 L2A data in QA60 band (referred to hereafter as cloud flag). When the value of the cloud flag is 0, then the y i passed the initial screening and its initial w i is set to 1, and it is referred as the initial-screened samples. In other words, when the cloud flag = 1, the y i is abelled as abnormal data or missing data and its w i is set to 0. Through the preliminary screen method by cloud flag, the initial-screened data is used in the following process. However, due to snow and other factors, the cloud flag is not always accurate, which affects the image time series reconstruction result. Therefore, in the DCT-PLS method, it is necessary to adjust the value of w i by the iteration procedure, assigning a relatively high weight to the high-quality data, and giving the abnormal data a weight close to or equal to 0. This process uses the specified weighting function based on the current residuals to construct weights, and updates the weights from iteration to iteration until the residuals remain unchanged [45,46], thus achieving the effect of removing outliers. The weighting function [45] is given by:
w i = { 1.0 u i > 0.0 ( 1 ( u i 4.685 ) 2 ) 2 1.0 < u i 4.685 < 0.0 0.0    u i 4.685 < 1.0
where u i is the Studentized residual which is adjusted for standard deviation and leverage, calculated by
u i = r i 1.486 MAD r 1 1 + 1 + 16 s 2 1 + 16 s 1
where r i = y i y ^ i is the residual of the i th observation, and the MAD represents the median absolute deviation.
The DCT-PLS method is applicable to the interpolation of data with high temporal resolution and equal interval, our work is to modify it into an algorithm for irregular interval data, and transplant the algorithm method to the GEE platform for parallel operation.

3.2. Reconfigure of the DCT-PLS for Irregular Interval Data

The DCT-PLS method realizes parameter estimation and reconstruction of data image time series through the DCT and IDCT, but the traditional DCT and IDCT are matrices generated from discrete and equally spaced data. Therefore, we need to discuss their forms under the condition of unequally spacing. Mathematically, DCT is also a form of function fitting, that is:
y ^ t = i = 0 N 1 x i a i t  
where y ^ t is the smoothed data time series, x i is the result of DCT transformation, or can be called the coefficient of DCT decomposition, a 0 t , a 1 t ,…, a N 1 t are a set of base functions of DCT, a i t can be expressed as:
a i t = c i cos t + 0.5 i π N , i = 0 , 1 , , N 1
where c i is constant as:
c 0 = 1 N ,   c 1 = 2 N , i = 1 , , N 1
where N is the order.
If t are discrete and unequally spaced value, t j | j = 0 , 1 , , M 1 , and M > N , DCT base can be expressed as a matrix ‘ A ’, which is a matrix with n rows and m columns:
A = b i , j | b i , j = c i c o s t j + 0.5 i π N , i = 0 , 1 , , N 1 ; j = 0 , 1 , , M 1
The DCT transform can be expressed as
Y ^ = A X
Where X be a vector containing the coefficient of the DCT decomposition ( x 0 ,   x 1 ,…, x N 1 . ), Y ^ be a vector containing the smoothed data time series ( y ^ 0 , y ^ 1 , , y ^ N 1 ).
In order to solve X , we need to review the Equation (1), that is, the optimal solution is the value of X that minimize the Equation (1). The DCT-PLS method uses the Equation (2) to find the best solution for the equal intervals data time series, while for the unequally spaced data time series we can use linear least squares or the linear multi variant regression function in the general math toolkit to solve X . If the linear least squares algorithm is used, Equation (1) needs to be expressed as a matrix form about the unknown vector X , which can be expressed as
F X = A X Y T W A X Y + s · X T Γ X
where w is a diagonal matrix whose diagonal elements are w j , and the remaining elements are 0, w j is weight of the j th discrete observation data. Γ is the diagonal matrix in Equation (2), its diagonal element is shown in Equation (3).
The least squares solution of the Equation (11) is as follows
X = i n v A T W A + s · Γ A T W Y
In addition to linear least square, we can use the linear multiple multi-regression function ‘mvregress’ in MATLAB, or the linear multiple regression function ‘ee.Reducer.linearRegression’ provided by the GEE platform to solve X . With the optimal estimate of X , we can not only use the Equations (9) and (10) to construct Y ^ , but also predict the y ^ t at any user-specified t .

3.3. The Setting of N and s Parameters

In the original DCT-PLS method, the N parameter can only be equal to the length of the data time series. With the reconfiguration, not only the DCT-PLS method can adapt to unequally spaced data, but also its order N can be arbitrarily defined. This provides us the opportunity to discuss the optimal setting of N . When N is low, DCT can not only reflect the changes of low-frequency information, as N increases, although more high-frequency information changes can be fitted, the amount of calculation and the demand for data storage increase in proportion to the square of N , and may also lead to the overfitting of the time series data. Therefore, it is necessary to select an appropriate value of N that balances the temporal resolution of Sentinel-2 images, the complexity of vegetation phenological changes and computer resources. Since there are 12 months and 24 solar terms in a year, 12, 24, 36, and 48, which are multiples of 12, were selected for experiments to determine the value of N .
According to Equation (2), the degree of smoothing is controlled by parameter s . If the value of s is low, it will lead to under-smoothing of the result data, while the high value of s may lead to over smoothing which results in ignoring local characteristics of the smoothed data. Therefore, by obtaining the optimal estimate of s , errors caused by over- or under-smoothing can be avoided, and a smoothing result close to the high-quality original value can be obtained. Garcia [45] used the method of generalized cross validation (GCV) to estimate the optimal of s through automatic iteration process. But the automatic iteration process is too computationally laborious for remote sensing images, especially for the Sentinel-2 image with 10 m spatial resolution. Due to the characteristics of the GEE platform itself, it is very difficult automatically iterate to estimate the optimal value of s . Moreover, if the value of s between different pixels are different, it will cause lack of comparability between pixels. On the other hand, although the optimal value of s varies due to different characteristics of the input data, in fact, when s is within a range around the optimal value, the increase in error is not obvious. So, we set the s as a constant to reduce the computation cost.
In order to evaluate the performance of DCT-PLS method with different settings of N and s parameters on the reconstruction of NDVI time series. Leave one out cross validation (LOOCV) method was adopted to estimate the accuracy of the reconstructed NDVI time series. The root mean square error (RMSE) and the coefficient of determination (R2) were used as the accuracy evaluation indicators. We also calculated the fit error RMSE of all samples as another evaluation indicator. For the sale of discrimination, RMSE_fit represents the RMSE of all data fitting, RMSE_cv (cross validation) represents the RMSE of the LOOCV method.
The 10 typical sample pixels of different vegetation species in Section 2.1.1 are adopted to determine the optimal setting for N and s . Finally, the N was selected in the multiples of 12, and s was tested in the range of [0.1, 300], and we found when N = 24, the DCT-PLS method performed best for most typical vegetation samples with low RMSE_cv and high R2 (see Table 3), and when N = 24, s usually achieve the best accuracy in range of [5,45] for most samples (see Table 4). In addition, we also can see, in the excel table included in the Supplementary Materials, when s < s _opt, the RMSE_CV decrease, while s > s _opt, the RMSE_cv increases, corresponding to the overfitting and underfitting, and when N = 24, although s > s_opt will lead to underfitting, i.e., over-smoothing, but even if s derivates from the optimal value as large as s = 300, the RMSE_cv of most typical vegetation samples are still acceptable, and the corresponding R2 is larger than 0.9, which proves the stability of DCT-PLS method; the detail of the test can also be seen in the Figure S1 and the excel files included in the Supplementary Materials. After the value of N is determined, we need to determine the value of s . Then, we narrowed the range of s to [0.1, 60], calculated its optimal value (referred as s_opt) when the RMSE_cv of NDVI time series is the minimum (RMSE_cv-min) under N = 24 conditions for 10 typical vegetation samples, and the validating data is the raw NDVI time series with outliers removed. Since NDVI is a normalization index in the range of [–1, 1], the error of about 0.005 is acceptable. We extracted the acceptable range of s when RMSE_cv-(RMSE_cv-min) < 0.005 to find a common s that is suitable for time series reconstruction of most vegetation species in middle and north China. This process of determined the best value of N and s can be seen in the Table 3 and Table 4, and the Figure S1 in the Supplementary Materials. Finally, we set s = 16 which is in the middle of the range of 13–20 as the value of s , and N = 24 for subsequent research.
Another detail is that the number of iterations in Equation (2), in which the weight of initial-screened samples is updated, needs to be determined. If the number of iterations is too small, the reconstruction error will be high, and if the number of iterations is too large, the calculation cost will increase. According to experience, we set the maximum number of iterations as 6 which can generally screen out outlier samples.

3.4. Relationship between the NDVI Reconstruction and Surface Reflectance Reconstruction

The surface reflectance data of Sentinel-2 is affected by the incorrectly detected clouds, cloud shadows and short-term snowfall, resulting in surface reflectance outliers. In addition, even under clear-sky conditions, there will be fluctuation in the surface reflectance time series due to terrain and bidirectional reflections on the surface. In contrast, as a vegetation index, NDVI is relatively less disturbed by cloud shadows, terrain, and bidirectional reflectance on the surface, and is more sensitive to cloud and snow. Additionally, NDVI time series usually shows obvious seasonal changes. So, it is more appropriate to use NDVI time series to judge abnormal data. In the article of Xiao et al. [6], they proposed of the strategy that first use the NDVI time series to judge the abnormal data, then reconstruct the NDVI time series, and finally, based on the empirical regression equation of NDVI and multi-band reflectance, use reconstructed NDVI time series to predict the multi-band reflectance time series.
The strategy adopted in this study is consistent with the article of Xiao et al. [6] in terms of NDVI time series reconstruction, but different for multi-band reflectance reconstruction. Since the weights of the NDVI samples have been obtained during the iteration of NDVI time series reconstruction, which reflects the judgment result for the credibility of NDVI quality, so these weights can be directly used for the DCT-PLS reconstruction of the multi-band reflectance time series without iteration. The advantage of this strategy is that the reconstruction process of multi-band reflectance does not require iteration or additional empirical regression, which reduces the computational complexity of image data processing. In addition, on the surface without vegetation coverage, the reconstruction effect of the new strategy is better than that of the strategy in Xiao et al. [6].
According to the Section 3.1, when the cloud flag is equal to 0, the corresponding NDVI and the surface reflectance in selected bands are marked as initial-screened samples, and they were used for the following reconstruction test. The flow chart of using DCT-PLS to reconstruct time series of Sentinel-2 NDVI and surface reflectance data is shown in Figure 2.

4. Results

4.1. The Performance of Removing Outliers through Iterations

The DCT-PLS method needs to get correct weights through multiple iterations over the NDVI time series. We take larch sample as an example to illustrate the multiple iterations process of the DCT-PLS method for single-point reconstruction. Figure 3 shows the NDVI time series for the larch sample in 2020, together with the reconstructed NDVI time series through multiple iterations, and the judgment results of NDVI outliers. There are 145 NDVI points in 2020, and 58 initial-screened NDVI samples were selected through the preliminarily screen with the criterion of cloud flag = 0. Weights of initial-screened NDVI were initialized as 1 and updated through multiple iterations by the DCT-PLS method until the studentized residual basically reach minimum. In Figure 3, the 1st iterated NDVI points are higher than the other iterated NDVI points on days 89–117. This is because the initial-screened NDVI sample on day 103 is not completely correct and its value is high. Another place worth noticing is that the 1st iterated NDVI time series has an obvious downward curve at days 215–270, which is due to the influence of the abnormal initial-screened NDVI samples with low values on day 233, 253 and 268. Through the 2nd iteration, the studentized residuals of these initial-screened NDVI samples began to decrease. We can see that the 2nd NDVI time series has deviated more from the abnormal samples than the 1st iterated NDVI time series. The NDVI time series are basically consistent after the 4th, 5th, and 6th iterations, and the studentized residuals for those abnormal samples are basically close to the minimum. Finally, 6 NDVI outliers were identified on the days 103, 138, 165, 233, 253 and 268, and their weights were assigned to 0. After the 6th process, there remained 52 reasonable NDVI samples. The final weights for all the 58 samples after the 6 iterations will be adopted for reconstruction of multiband reflectance time series.

4.2. The Performance of Time Series Reconstruction on Typical Samples

In this section, we used the DCT-PLS method to reconstruct 4-day interval NDVI and surface reflectance time series in typical vegetation samples on local computer with MATLAB software. In order to better illustrate the performance of the DCT-PLS method, the Savitzky–Golay (SG) filter method [47] is introduced to reconstruct the NDVI time series for comparison. The SG filter method is a local polynomial fitting method, which is used to smooth and compute the derivative of a set of continuous value [27].
Due to space limitations of this paper, we only show the reconstruction results of NDVI and surface reflectance in larch, rotation of rice and corn samples. The reconstruction performance of NDVI and surface reflectance in other vegetation species samples can be seen in the Figures S2–S8. For convenience, we only display the initial-screened NDVI and surface reflectance time series (referred as the NDVI obs and corresponding bands obs in Figure 4, Figure 5 and Figure 6 and Figures S2–S8 for simplicity).
Figure 4a shows the comparison of the initial-screened NDVI time series and NDVI time series reconstructed by the SG filter method and the DCT-PLS method for the larch sample in 2020. In Figure 4a, we can see that not all the initial-screened NDVI samples are correct, many of them fluctuate due to contamination of clouds, shadow or aerosols. From day 1 to day 90, the NDVI time series reconstructed by SG filter and DCT-PLS method are consistent. On the day 113, the value of the NDVI sample is 0.0755 which is in line with the trend of vegetation growth. While on the day 103, the value of NDVI sample suddenly reaches 0.2303, which may be affected by the cloud shadow, and NDVI points on days 89–103 reconstructed by SG filter method are affected by this contaminated NDVI sample. While the DCT-PLS method can correctly identify this contaminated NDVI sample and assigns its weight as 0. Therefore, the reconstructed NDVI points by the SG filter method are larger than the reconstructed NDVI from the DCT-PLS method on days 89–103. Similarly, the values of these initial-screened NDVI are abnormally low on the days 138, 165, 233, 253 and 268 and the values of these initial-screened NDVI samples are abnormally high on the days 265 and 270, and the reconstructed NDVI values by SG filterer are also significantly affected by these abnormal NDVI points. In contrast, the DCT-PLS method can assign low weights to these NDVI points, so the reconstructed NDVI time series from the DCT-PLS are smooth and can best reflect the real phenological information of the larch sample.
Figure 4b–e shows the comparisons of the initial-screened reflectance time series in blue, green, red, and NIR bands, and NDVI time series reconstructed by the DCT-PLS method for the larch sample in 2020. The weights obtained by multiple iterations over the NDVI time series by the DCT-PLS method were reused to reconstruct the surface reflectance time series. It can be seen that the days of these abnormal initial-screened surface reflectance are consistent with the days of abnormal initial-screened NDVI samples, and the reconstructed surface reflectance time series are smooth and not affected by these abnormal surface reflectance values. The reconstructed NDVI and surface reflectance time series are all shown in Figure 4f. We can see that the reconstructed NDVI series has one peak in summer and early autumn, and the reconstructed surface reflectance time series in the red band are lower than that of green band in summer, but larger than that of the green band in winter and spring, which is in line with the characteristics of larch phenology. Both the reconstructed DNVI series and surface reflectance series in the 4 bands are smooth.
Figure 5a shows the comparison of the initial-screened NDVI and surface reflectance time series reconstructed by the SG filter method and the DCT-PLS method for the rotation of rice sample in 2020. We can see that the there are two NDVI peaks for rotation of rice sample in 2020. There are fewer initial-screened NDVI samples than the larch samples, which is due to the cloudy and rainy weather in the Zhejiang province. On day 49, the initial-screened NDVI is abnormally high with 0.6707, which may be caused by the influence of cloud shadow. Since the SG filter method cannot identify contaminated NDVI samples, the NDVI values reconstructed by the SG filter method are larger than those from the DCT-PLS method on days 29–69. On the other hand, the reconstructed NDVI time series by the SG filter method deviates from the high-quality NDVI sample on day 159 because there are few valid NDVI samples on days 134–214. In the second vegetation growth period, the initial-screened NDVI sample are obvious low on the day 274 which may be caused by clouds. The DCT-PLS method can successfully identify those contaminated initial-screened NDVI put their weight to 0, so the reconstructed NDVI time samples from the DCT-PLS is smooth and can best reflect the seasonal information of the rotation of rice sample.
Figure 5b–e presents the comparisons of the reflectance from band blue, green, red and NIR of Sentinel-2 data, and reconstructed reflectance from the DCT-PLS method for the rotation of rice sample in 2020. We can see that the DCT-PLS method can reconstruct the surface reflectance in the selected bands to reflect the spectral feature of rice, especially in day 135 and 210, the NIR reflectance is small because the surface is covered by water. Figure 5f shows the NDVI and surface reflectance time series reconstructed by the DCT-PLS method. We can see that the reconstructed surface reflectance in the NIR band is larger than those in other bands, and the reconstructed surface reflectance in the green band is slightly than the that in the blue and red bands during the growing season, which is in line with the spectral characteristics of vegetation, and both the reconstructed NDVI series and surface reflectance series in the 4 bands are smooth.
Figure 6a shows the comparison of the initial-screened NDVI time series and NDVI time series reconstructed by the SG filter method and the DCT-PLS method from the corn sample in 2020. The comparisons of the reflectance in blue, green, red and NIR band from the Sentinel-2 and the DCT-PLS method for the corn sample in 2020 are displayed in Figure 6b–e. In Figure 6a, the initial NDVI sample on day 8 is abnormally low, and the corresponding reflectance samples (Figure 6b–e) are also abnormally high, which may be affected by the snow. Since the DCT-PLS method can identify the contaminated NDVI and surface reflectance samples, the reconstructed NDVI points are larger than those of the SG filter method on days 3–17. Similarly, the initial-screened NDVI and surface reflectance samples are abnormal on the days 23, 58, 243, 268 and 328, etc., while the DCT-PLS method can assign low weights to these NDVI points, so the reconstructed NDVI and surface time series from the DCT-PLS method are smooth.
Figure 6f represents the reconstructed NDVI and surface reflectance time series by the DCT-PLS method. The phenology as well as spectral contrast in the 4 bands are in line with the characteristics of vegetation.
Finally, in order to show the overall reconstruction performance for the 10 typical vegetation samples by the DCT-PLS method, density scatterplots of the reconstructed surface reflectance and NDVI time series from the DCT-PLS method versus the cloud-free surface reflectance from Sentinel-2 L2A product over the 10 vegetation species samples in 2020 are shown in Figure 7. The cloud-free data is the initial-screened data with outliers removed. The reconstructed NDVI and surface reflectance are referred as Simu1 NDVI and Simu1 reflectance in Figure 7. We can see that the reconstructed NDVI and surface reflectance time series illustrate good agreement with the cloud-free NDVI and surface reflectance time series. The highest R2 which reaches 0.9885 with NDVI series, and all the R2 are above 0.89 for the 4 band reflectances, and the RMSE_fit values are also low. It proves that the DCT-PLS algorithm can reconstruct the NDVI and reflectance time series well.

4.3. The Performance of Time Series Reconstruction for Images

For 10 m spatial resolution Sentinel-2 images, the number of all pixels in the region of a Sentinel-2 scene in a year is in units of 100 million. Therefore, it is particularly necessary to transplant the DCT-PLS algorithm for single-pixel reconstruction to an algorithm in cloud platform for large-scale area image reconstruction. Due to the advantages of parallel operation on the GEE platform, all pixels of an image can be reconstructed at the same time. During the development of the DCT-PLS method for large-scale area image reconstruction on the GEE, the value of N is 24 and the value of s is 16. The values of N and s parameters are consistent with the DCT-PLS method for single-point reconstruction, and the Sentinel-2 image with cloud coverage of less than 80% from 1 June 2019 to 1 June 2021 were selected to reconstruct the Sentinel-2-like image on user-specified date in 2020, the obtained image is referred as Simu1.
In Section 4.2, we presented the reconstruction result of NDVI and reflectance time series for typical sample pixels, and the validation reference data for calculating R2 and RMSE_fit are the cloud-free Sentinel-2 data after outlier removal. In this Section, reconstruction of the large-scale images of a certain date will be carried out in study Area A and B. Similar to other reconstruction methods, the performance of DCT-PLS is affected by the number as well as quality of available cloud-free images on adjacent dates, so the image quality of adjacent dates is discussed in this Section. We conducted experiments with three cases as follows.
Case 1: It is an ideal case that the Sentinel-2 image acquired on a certain day is in high-quality without clouds, and there are many high-quality images on adjacent dates. The aim of this experiment is to reconstruct the image of this certain day by excluding the actual cloud-free Sentinel-2 image of this certain day. The reason for this experiment is that we can use the actual Sentinel-2 image as a reference to evaluate the reconstructed image from DCT-PLS method. We used the density scatterplot of the Simu1 image versus the cloud-free Sentinel-2 image to verify the reconstruction performance.
Case 2: It is a case that the actual Sentinel-2 image acquired on a certain day is partly covered by clouds, and there are many high-quality images on adjacent dates. The reconstructed Simu1 image of this certain day is shown to verify the cloud-removing ability of the DCT-PLS method.
Case 3: It is a situation where the actual Sentinel-2 image acquired on a certain day is all covered by clouds, and there are few high-quality images in adjacent dates. The reliability of the DCT-PLS method greatly depends on the number of cloud-free Sentinel-2 observations in adjacent dates. Therefore, when there are few high-quality images on adjacent dates, the image reconstruction performance of the DCT-PLS will not be as good as in Case 1 and Case 2. We proposed a means of increasing the value of s to improve the reconstruction performance.
The actual Sentinel-2 and Simu1 images are shown with an RGB color scheme that uses red, green, blue bands. Figure 8 shows the cloud coverage of Sentinel-2 images for the Area A and B in 2020. The cloud coverage is the attribute data of the Sentinel-2 scene. Due to the limitation of computational memory on the GEE, the Area A and B we chose only occupy a part of a Sentinel-2 scene, so Figure 8 can be used as a reference for cloud coverage for Area A and B in 2020. The dates corresponding to the three cases in Table 5 have been marked on Figure 8.

4.3.1. Case 1

In this case, we selected the Sentinel-2 image acquired on 24 October 2020 for Area A (Figure 9a) and on 28 April 2020 for Area B (Figure 9c) that satisfy the situation that the image is cloud-free and there are many high-quality images on adjacent dates. Figure 9a shows that the Area A on 24 October 2020 has entered autumn, and there is basically no vegetation, while the most areas of Area B on 28 April 2020 are covered with vegetation (Figure 9c). We visually observed almost no difference between the Simu1 images reconstructed by the DCT-PLS method on the GEE platform with the actual Sentinel-2 images filtered out and the actual Sentinel-2 images for the two study areas. The Simu1 NDVI is also close to the actual Sentinel-2 NDVI. Density scatterplot of the Simu1 NDVI and surface reflectance versus the actual Sentinel-2 NDVI and surface reflectance for the two areas are shown in Figure 10 and Figure 11.
The density scatter-plot in Figure 10 indicates that the reconstruction effect of the DCT-PLS method in the Area A is fine, and the Simu1 is generally consistent with the actual acquired Sentinel-2 image. Among them, the reconstruction effect of Simu1 blue reflectance is the worst (Figure 10b), but still shows fine agreement with the actual Sentinel-2 blue reflectance (R2 = 0.8249 and Rmse_fit = 0.0116). Compared with the blue reflectance, the green, red, NIR reflectance between the Simu1 image and actual Sentinel-2 image has better consistency: R2 = 0.8249 and Rmse_fit = 0.0116 for green band; R2 = 0.8498 and Rmse_fit = 0.0132 for red band; R2 = 0.8715 and Rmse_fit = 0.0161 for NIR band. However, two scatter-plots are provided for NDVI in different scope of statistics, one is the positive range of NDVI ([0, 1], Figure 10a), with R2 = 0.845 and Rmse_fit = 0.0397; another is the full range of NDVI ([−1, 1], Figure 10f), with R2 = 0.8549 and Rmse_fit = 0.0411. As can be seen from Figure 10f, the distribution of negative NDVI in the density scatterplot deviates from the diagonal line, which indicates the increased error of the reconstructed NDVI series in pixels of negative NDVI. We found out that the land cover type of the deviation part is water, while the other land cover types show good agreement between the reconstructed and actual Sentinel-2 NDVI image.
Figure 11 shows that the reconstruction effect of the DCT-PLS method in the Area B is also satisfactory, and the reconstructed images are consistent with the actual Sentinel-2 images. Among them, the R2 in blue and green reflectance are less satisfactory, but still around 0.81. Compared with the blue and green reflectance, the red, NIR reflectance and NDVI have better agreement with the Simu1 image and actual Sentinel-2 image (Figure 11a,c–f). As can be seen from Figure 10f, the distribution of negative NDVI in the density scatterplot also deviates from the diagonal line, which corresponds to pixels of water in the river across the Area B. Similar to the findings in Area A, we find the land coverage of the deviation part is also water. While the other land cover types of Simu1 are also in good agreement with the actual Sentinel-2 image in Area B. We also tried to identify other pixels with large fitting error (deviate from diagonal line), and found out most of these pixels locate in the urban area where the neighborhood of the pixels exhibits high heterogeneity.
From the experiments in the two study areas, it can be concluded that when there are many high-quality images of adjacent dates, the DCT-PLS algorithm has a good performance to reconstruct the Sentinel-2 image, and the obtained Simu1 image has a great agreement with the actual cloud-free Sentinel-2 image. The DCT-PLS method has a satisfactory performance on the reconstruction of image with other land cover types such as vegetation and buildings, etc., but has an unsatisfactory performance on the reconstruction of image covered by water.

4.3.2. Case 2

For each quarter of 2020, we selected Sentinel-2 images in Area A (Figure 12a) and Area B (Figure 12c) that satisfy the situation that the image is partly covered by clouds and there are many high-quality images on adjacent dates. It can be seen that the DCT-PLS method can remove almost all of the clouds (Figure 12b,d). The reconstructed images at four different dates in Area A and the reconstructed images of the first three columns in Area B are basically cloud-free. The fourth Sentinel-2 image in Area B (Figure 12c), which is acquired on 29 December is covered by thick clouds. We can see that the most of the clouds were removed by the DCT-PLS method in corresponding reconstructed image (the fourth columns in Figure 12d), but there are still some clouds in the eastern region of Area B. This is because the images of this region on adjacent dates are mostly all covered by heavy cloud. We can see that most part of the reconstructed image has been greatly improved. The results of Case 2 situation prove the applicability of DCT-PLS method for large-scale image reconstruction.

4.3.3. Case 3

However, the performance of the DCT-PLS method depends on the number of clear Sentinel-2 observations in adjacent dates. When there are few available high-quality images in a long date range, the quality of reconstructed image will be reduced. In this case, we selected the Sentinel-2 image acquired on 11 July 2020 in Area A (Figure 13a) and the Sentinel-2 image acquired on 22 June 2020 in Area B (Figure 14a) that meet the situation that the acquired Sentinel-2 image is all covered by clouds, and there are few high-quality images on adjacent dates. We used the DCT-PLS method with the value of s is 16 to reconstruct these two Sentinel-2 images, but the obtained Simu1 images (Figure 13b and Figure 14b) are still contaminated by remnant clouds. As mentioned in Section 3.3, when N = 24, even if s derivates from the optimal value as large as s = 300, the DCT-PLS method still has relatively good performance. Therefore, we try to increase the value of s ( s = 30, 50, 100, 150). As shown in Figure 13b–e and Figure 14b–e, when s increases from 16 to 100, the reconstructed image becomes more and more clear. When the value of s is 150, the reconstructed image is not very different from the reconstructed image when s = 100 (Figure 13e,f and Figure 14e,f). The analysis in Section 3.3 recommends a fixed value of s = 16 for reconstruction of Sentinel-2 images in general situation. However, when there is a long period of cloudy days, such as in the rainy season, the performance of the DCT-PLS method may be compromised. In this case, the value of s should be increased, such as to 100, to achieve better performance of cloud removal. On the other hand, the side effect is that the increase in s may reduce the actual temporal resolution of the reconstructed image series, and that is not desired in phenology studies for crop of two or more seasons.

5. Conclusions

As remote sensing applications become more and more quantitative, it becomes increasingly important to obtain accurate and seamless surface reflectance in high spatio-temporal resolution. In this study, the DCT-PLS method was adopted for the reconstruction of Sentinel-2 NDVI and surface time series. For this purpose, the formula of DCT-PLS is reconfigured to adapt for data sequence of irregular interval. After the algorithm has been tested with typical sample of different vegetation species in local computer, it is transplanted to GEE cloud platform for large scale application. The code of this algorithm in GEE platform is open to public at https://code.earthengine.google.com/ac1878091de6f5879b072ab322d45053 (accessed on 1 June 2022), which can be used to obtain reconstructed cloudless images at 10 m resolution to facilitate studies on land use, urban development, agriculture, ecosystem, etc.
In the reconstruction experiment of 10 typical samples of different vegetation species, the DCT-PLS and SG filter method were used to reconstruct the NDVI time series in 2020, and it was found that DCT-PLS method could well identify, mask and reconstruct the contaminated NDVI through the iteration process, and the reconstructed NDVI time series can faithfully reflect the phenology of both one and two season crops, while the SG filter method could not. Then, the weights in the construction of NDVI series are adopted to construct the surface reflectance series in the blue, green, red and NIR bands. The surface reflectance time series reconstructed by the DCT-PLS method are smooth, the reconstructed spectral and phenology characteristics are also in line with the vegetation species. In the large-scale image reconstruction in the two study areas on the GEE platform, the DCT-PLS algorithm can effectively identify and remove cloud-contaminated pixels and reconstruct them, and the reconstructed image is in good agreement with the actual cloud-free Sentinel-2 image.
This research is mainly aimed at the reconstruction of images covered with vegetation. For the general purpose, a fixed setting of N = 24 and s = 16 is recommended, as it satisfactorily catches the phenology for the 10 typical vegetation samples that represent the typical situation of crop land in middle to north part of China. However, the images covered with other land cover types such as buildings, water and bare land are not discussed in depth, and the reconstruction performance of pixels covered by water is poor. In the extreme case that there are few clear images in a long date range, such as during the rainy season, the reconstruction performance is also less satisfactory. We recommend to increase the value of s in these situations, such as s = 30, 50, 100, 150, etc., and we found the cloud removal effect is good when the value of s is around 100.
Unfortunately, due to the memory limitation of the GEE platform, if the image area is too large, such as a scene of more than 4000 × 4000 pixels, an error will be reported and “user memory limit exceeded” will be displayed on the console. Currently, only Sentinel-2 image time series in medium-sized areas can be reconstructed.
In the future, more samples of typical vegetation, as well as other types of land cover, especially natural ecosystems, will be selected to determine a more flexible scheme for setting N and s. In addition, the performance of the DCP-PLS method for reflectance in the red-edge and shortwave infrared band, and the possibility of combining the data of other satellite sensors, such as Landsat-OLI and Terra/Aqua-MODIS, with Sentinel-2 data series, will be discussed in our follow-up research.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs14174395/s1, Figure S1: The calculated RMSE_cv values by the DCT-PLS method under different s and N parameters for different vegetation species samples in 2020. (a) Larch; (b) Rotation of rice; (c) Rice; (d) Rotation of wheat and corn; (e) Grass; (f) Soybean; (g) Potato; (h) Grape; (i) Tobacco; (j) Corn. (Bold points are the RMSE_cv minimum over different vegetation species samples; Asterisk points are the RMSE_cv values in the the acceptable range of s ). Figure S2–S8: The comparison of NDVI and band reflectance time series from Sentinel-2, the DCT-PLS and SG filter methods for the rice, rotation of wheat and corn, grass, soybean, potato, grape, tobacco samples in 2020, respectively. (a) The NDVI time series; (b–e) The surface reflectance time series in blue, green, red and NIR band respectively; (f) The reconstructed NDVI and all the band reflectance time series from the DCT-PLS method.

Author Contributions

Conceptualization, Q.L.; methodology, Q.L. and K.Y.; software, K.Y., Y.L. and S.Z.; validation, K.Y., Y.L., M.L. and S.Z.; writing—original draft preparation, K.Y. and Q.L.; writing—review and editing, K.Y., Q.L., Y.L., M.L., S.Z. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2020YFA0608703) and the Key Program of the National Natural Science Foundation of China (No. 42130104).

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments on the manuscript, which helped improve the quality of the paper. We would also like to thank the Google Earth Engine team for their wonderful work and service.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Liang, S.L.; Fang, H.L.; Chen, M.Z. Atmospheric Correction of Landsat Etm+ Land Surface Imagery-Part I: Methods. IEEE Trans. Geosci. Remote Sens. 2001, 39, 2490–2498. [Google Scholar] [CrossRef]
  2. Huang, S.; Tang, L.N.; Hupy, J.P.; Wang, Y.; Shao, G.F. A Commentary Review on the Use of Normalized Difference Vegetation Index (NDVI) in the Era of Popular Remote Sensing. J. For. Res. 2021, 32, 1–6. [Google Scholar] [CrossRef]
  3. Li, S.; Xu, L.; Jing, Y.H.; Yin, H.; Li, X.H.; Guan, X.B. High-Quality Vegetation Index Product Generation: A Review of NDVI Time Series Reconstruction Techniques. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 18. [Google Scholar] [CrossRef]
  4. Xie, Y.; Huang, J.X. Integration of a Crop Growth Model and Deep Learning Methods to Improve Satellite-Based Yield Estimation of Winter Wheat in Henan Province, China. Remote Sensing 2021, 13, 4372. [Google Scholar] [CrossRef]
  5. Sun, Y.H.; Ren, H.Z.; Zhang, T.Y.; Zhang, C.Y.; Qin, Q.M. Crop Leaf Area Index Retrieval Based on Inverted Difference Vegetation Index and NDVI. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1662–1666. [Google Scholar] [CrossRef]
  6. Xiao, Z.; Liang, S.; Wang, T.; Liu, Q. Reconstruction of Satellite-Retrieved Land-Surface Reflectance Based on Temporally-Continuous Vegetation Indices. Remote Sens. 2015, 7, 9844–9864. [Google Scholar] [CrossRef]
  7. Zhou, F.Q.; Zhong, D.T.; Peiman, R. Reconstruction of Cloud-Free Sentinel-2 Image Time-Series Using an Extended Spatiotemporal Image Fusion Approach. Remote Sens. 2020, 12, 22. [Google Scholar] [CrossRef]
  8. Sun, L.; Gao, F.; Xie, D.; Anderson, M.; Chen, R.; Yang, Y.; Yang, Y.; Chen, Z. Reconstructing Daily 30 m NDVI over Complex Agricultural Landscapes Using a Crop Reference Curve Approach. Remote Sens. Environ. 2021, 253, 112156. [Google Scholar] [CrossRef]
  9. Chen, Y.; Cao, R.; Chen, J.; Liu, L.; Matsushita, B. A Practical Approach to Reconstruct High-Quality Landsat NDVI Time-Series Data by Gap Filling and the Savitzky–Golay Filter. ISPRS J. Photogramm. Remote Sens. 2021, 180, 174–190. [Google Scholar] [CrossRef]
  10. Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: Esa’s Optical High-Resolution Mission for Gmes Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
  11. Hunt, M.L.; Blackburn, G.A.; Carrasco, L.; Redhead, J.W.; Rowland, C.S. High Resolution Wheat Yield Mapping Using Sentinel-2. Remote Sens. Environ. 2019, 233, 15. [Google Scholar] [CrossRef]
  12. Llorens, R.; Sobrino, J.A.; Fernandez, C.; Fernandez-Alonso, J.M.; Vega, J.A. A Methodology to Estimate Forest Fires Burned Areas and Burn Severity Degrees Using Sentinel-2 Data. Application to the October 2017 Fires in the Iberian Peninsula. Int. J. Appl. Earth Obs. Geoinf. 2021, 95, 102243. [Google Scholar] [CrossRef]
  13. Xiao, D.Y.; Niu, H.P.; Guo, F.C.; Zhao, S.X.; Fan, L.X. Monitoring Irrigation Dynamics in Paddy Fields Using Spatiotemporal Fusion of Sentinel-2 and MODIS. Agric. Water Manag. 2022, 263, 107409. [Google Scholar] [CrossRef]
  14. Shen, H.; Li, X.; Cheng, Q.; Zeng, C.; Yang, G.; Li, H.; Zhang, L. Missing Information Reconstruction of Remote Sensing Data: A Technical Review. IEEE Geosci. Remote Sens. Mag. 2015, 3, 61–85. [Google Scholar] [CrossRef]
  15. Tang, Z.P.; Amatulli, G.; Pellikka, P.K.E.; Heiskanen, J. Spectral Temporal Information for Missing Data Reconstruction (Stimdr) of Landsat Reflectance Time Series. Remote Sens. 2022, 14, 172. [Google Scholar] [CrossRef]
  16. Tang, H.; Yu, K.; Hagolle, O.; Jiang, K.; Geng, X.; Zhao, Y. A Cloud Detection Method Based on a Time Series of MODIS Surface Reflectance Images. Int. J. Digit. Earth 2013, 6, 157–171. [Google Scholar] [CrossRef]
  17. Pouliot, D.; Latifovic, R. Reconstruction of Landsat Time Series in the Presence of Irregular and Sparse Observations: Development and Assessment in North-Eastern Alberta, Canada. Remote Sens. Environ. 2018, 204, 979–996. [Google Scholar] [CrossRef]
  18. Li, X.H.; Shen, H.F.; Zhang, L.P.; Li, H.F. Sparse-Based Reconstruction of Missing Information in Remote Sensing Images from Spectral/Temporal Complementary Information. ISPRS J. Photogramm. Remote Sens. 2015, 106, 1–15. [Google Scholar] [CrossRef]
  19. Wu, W.; Ge, L.Q.; Luo, J.C.; Huan, R.H.; Yang, Y.P. A Spectral-Temporal Patch-Based Missing Area Reconstruction for Time-Series Images. Remote Sens. 2018, 10, 1560. [Google Scholar] [CrossRef]
  20. Malambo, L.; Heatwole, C.D. A Multitemporal Profile-Based Interpolation Method for Gap Filling Nonstationary Data. IEEE Trans. Geosci. Remote Sens. 2016, 54, 252–261. [Google Scholar] [CrossRef]
  21. Yan, L.; Roy, D.P. Large-Area Gap Filling of Landsat Reflectance Time Series by Spectral-Angle-Mapper Based Spatio-Temporal Similarity (Samsts). Remote Sens. 2018, 10, 609. [Google Scholar] [CrossRef]
  22. Zhu, X.L.; Chen, J.; Gao, F.; Chen, X.H.; Masek, J.G. An Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model for Complex Heterogeneous Regions. Remote Sens. Environ. 2010, 114, 2610–2623. [Google Scholar] [CrossRef]
  23. Zhu, X.L.; Helmer, E.H.; Gao, F.; Liu, D.S.; Chen, J.; Lefsky, M.A. A Flexible Spatiotemporal Method for Fusing Satellite Images with Different Resolutions. Remote Sens. Environ. 2016, 172, 165–177. [Google Scholar] [CrossRef]
  24. Zhong, D.T.; Zhou, F.Q. A Prediction Smooth Method for Blending Landsat and Moderate Resolution Imagine Spectroradiometer Images. Remote Sens. 2018, 10, 1371. [Google Scholar] [CrossRef]
  25. Julien, Y.; Sobrino, J.A. Comparison of Cloud-Reconstruction Methods for Time Series of Composite NDVI Data. Remote Sens. Environ. 2010, 114, 618–625. [Google Scholar] [CrossRef]
  26. Roy, D.P.; Yan, L. Robust Landsat-Based Crop Time Series Modelling. Remote Sens. Environ. 2020, 238, 16. [Google Scholar] [CrossRef]
  27. Chen, J.; Jönsson, P.; Tamura, M.; Gu, Z.; Matsushita, B.; Eklundh, L. A Simple Method for Reconstructing a High-Quality NDVI Time-Series Data Set Based on the Savitzky–Golay Filter. Remote Sens. Environ. 2004, 91, 332–344. [Google Scholar] [CrossRef]
  28. Ma, M.; Veroustraete, F. Reconstructing Pathfinder Avhrr Land NDVI Time-Series Data for the Northwest of China. Adv. Space Res. 2006, 37, 835–840. [Google Scholar] [CrossRef]
  29. Zhu, W.Q.; Pan, Y.Z.; He, H.; Wang, L.L.; Mou, M.J.; Liu, J.H. A Changing-Weight Filter Method for Reconstructing a High-Quality NDVI Time Series to Preserve the Integrity of Vegetation Phenology. IEEE Trans. Geosci. Remote Sens. 2012, 50, 1085–1094. [Google Scholar] [CrossRef]
  30. Jonsson, P.; Eklundh, L. Seasonality Extraction by Function Fitting to Time-Series of Satellite Sensor Data. IEEE Trans. Geosci. Remote Sens. 2002, 40, 1824–1832. [Google Scholar] [CrossRef]
  31. Beck, P.S.A.; Atzberger, C.; Hogda, K.A.; Johansen, B.; Skidmore, A.K. Improved Monitoring of Vegetation Dynamics at Very High Latitudes: A New Method Using MODIS NDVI. Remote Sens. Environ. 2006, 100, 321–334. [Google Scholar] [CrossRef]
  32. Das, M.; Ghosh, S.K. A Deep-Learning-Based Forecasting Ensemble to Predict Missing Data for Remote Sensing Analysis. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 5228–5236. [Google Scholar] [CrossRef]
  33. Ahmad, R.; Yang, B.; Ettlin, G.; Berger, A.; Rodriguez-Bocca, P. A Machine-Learning Based Convlstm Architecture for NDVI Forecasting. Int. Trans. Oper. Res. 2020, 28, 12887. [Google Scholar] [CrossRef]
  34. Roerink, G.J.; Menenti, M.; Verhoef, W. Reconstructing Cloudfree NDVI Composites Using Fourier Analysis of Time Series. Int. J. Remote Sens. 2000, 21, 1911–1917. [Google Scholar] [CrossRef]
  35. Lu, X.L.; Liu, R.G.; Liu, J.Y.; Liang, S.L. Removal of Noise by Wavelet Method to Generate High Quality Temporal Data of Terrestrial MODIS Products. Photogramm. Eng. Remote Sens. 2007, 73, 1129–1139. [Google Scholar] [CrossRef]
  36. Cao, R.Y.; Chen, Y.; Shen, M.G.; Chen, J.; Zhou, J.; Wang, C.; Yang, W. A Simple Method to Improve the Quality of NDVI Time-Series Data by Integrating Spatiotemporal Information with the Savitzky-Golay Filter. Remote Sens. Environ. 2018, 217, 244–257. [Google Scholar] [CrossRef]
  37. Padhee, S.K.; Dutta, S. Spatio-Temporal Reconstruction of MODIS NDVI by Regional Land Surface Phenology and Harmonic Analysis of Time-Series. GISci. Remote Sens. 2019, 56, 1261–1288. [Google Scholar] [CrossRef]
  38. Chu, D.; Shen, H.F.; Guan, X.B.; Chen, J.M.; Li, X.H.; Li, J.; Zhang, L.P. Long Time-Series NDVI Reconstruction in Cloud-Prone Regions Via Spatio-Temporal Tensor Completion. Remote Sens. Environ. 2021, 264, 112632. [Google Scholar] [CrossRef]
  39. Wang, Q.M.; Atkinson, P.M. Spatio-Temporal Fusion for Daily Sentinel-2 Images. Remote Sens. Environ. 2018, 204, 31–42. [Google Scholar] [CrossRef]
  40. Xiong, S.P.; Du, S.H.; Zhang, X.Y.; Ouyang, S.; Cui, W.H. Fusing Landsat-7, Landsat-8 and Sentinel-2 Surface Reflectance to Generate Dense Time Series Images with 10m Spatial Resolution. Int. J. Remote Sens. 2022, 43, 1630–1654. [Google Scholar] [CrossRef]
  41. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  42. Kong, D.; Zhang, Y.; Gu, X.; Wang, D. A Robust Method for Reconstructing Global MODIS Evi Time Series on the Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2019, 155, 13–24. [Google Scholar] [CrossRef]
  43. Khanal, N.; Matin, M.A.; Uddin, K.; Poortinga, A.; Chishtie, F.; Tenneson, K.; Saah, D. A Comparison of Three Temporal Smoothing Algorithms to Improve Land Cover Classification: A Case Study from Nepal. Remote Sens. 2020, 12, 2888. [Google Scholar] [CrossRef]
  44. Liu, X.K.; Ji, L.Y.; Zhang, C.; Liu, Y.H. A Method for Reconstructing NDVI Time-Series Based on Envelope Detection and the Savitzky-Golay Filter. Int. J. Digit. Earth 2022, 15, 553–584. [Google Scholar] [CrossRef]
  45. Garcia, D. Robust Smoothing of Gridded Data in One and Higher Dimensions with Missing Values. Comput. Stat. Data Anal. 2010, 54, 1167–1178. [Google Scholar] [CrossRef]
  46. Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection; John Wiley & Sons: New York, NY, USA, 1987. [Google Scholar]
  47. Savitzky, A.; Golay, M.J.E. Smoothing + Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
Figure 1. Locations of the Area A and Area B in China.
Figure 1. Locations of the Area A and Area B in China.
Remotesensing 14 04395 g001
Figure 2. Flow chart of reconstructing Sentinel-2 NDVI and Surface reflectance time series data through the DCT-PLS method.
Figure 2. Flow chart of reconstructing Sentinel-2 NDVI and Surface reflectance time series data through the DCT-PLS method.
Remotesensing 14 04395 g002
Figure 3. Iterative curve fitting process to update the weights for the NDVI time series of larch sample in 2020.
Figure 3. Iterative curve fitting process to update the weights for the NDVI time series of larch sample in 2020.
Remotesensing 14 04395 g003
Figure 4. The comparison of NDVI and band reflectance time series from Sentinel-2, the DCT-PLS and SG filter methods for the larch sample in 2020. (a) The NDVI time series; (be) The surface reflectance time series in blue, green, red and NIR band, respectively; (f) The reconstructed NDVI and all the band reflectance time series from the DCT-PLS method.
Figure 4. The comparison of NDVI and band reflectance time series from Sentinel-2, the DCT-PLS and SG filter methods for the larch sample in 2020. (a) The NDVI time series; (be) The surface reflectance time series in blue, green, red and NIR band, respectively; (f) The reconstructed NDVI and all the band reflectance time series from the DCT-PLS method.
Remotesensing 14 04395 g004
Figure 5. The comparison of NDVI and band reflectance time series from Sentinel-2, the DCT-PLS and SG filter methods for the rotation of rice sample in 2020. (a) The NDVI time series; (be) The surface reflectance time series in blue, green, red and NIR band, respectively; (f) The reconstructed NDVI and all the band reflectance time series from the DCT-PLS method.
Figure 5. The comparison of NDVI and band reflectance time series from Sentinel-2, the DCT-PLS and SG filter methods for the rotation of rice sample in 2020. (a) The NDVI time series; (be) The surface reflectance time series in blue, green, red and NIR band, respectively; (f) The reconstructed NDVI and all the band reflectance time series from the DCT-PLS method.
Remotesensing 14 04395 g005
Figure 6. The comparison of NDVI and band reflectance time series from Sentinel-2, the DCT-PLS and SG filter methods for the corn sample in 2020. (a) The NDVI time series; (be) The surface reflectance time series in blue, green, red and NIR band, respectively; (f) The reconstructed NDVI and all the band reflectance time series from the DCT-PLS method.
Figure 6. The comparison of NDVI and band reflectance time series from Sentinel-2, the DCT-PLS and SG filter methods for the corn sample in 2020. (a) The NDVI time series; (be) The surface reflectance time series in blue, green, red and NIR band, respectively; (f) The reconstructed NDVI and all the band reflectance time series from the DCT-PLS method.
Remotesensing 14 04395 g006
Figure 7. Density scatterplots between the reconstructed and the cloud-free Sentinel-2 L2A NDVI series and the surface reflectance time series in 4 bands over the 10 typical vegetation samples in 2020. (a) NDVI; (be) surface reflectance in blue, green, red and NIR band, respectively.
Figure 7. Density scatterplots between the reconstructed and the cloud-free Sentinel-2 L2A NDVI series and the surface reflectance time series in 4 bands over the 10 typical vegetation samples in 2020. (a) NDVI; (be) surface reflectance in blue, green, red and NIR band, respectively.
Remotesensing 14 04395 g007
Figure 8. The cloud coverage of Sentinel-2 images in the study areas in 2020. (a) Area A; (b) Area B. (Low means that the cloud coverage of the Sentinel-2 image is ≤10%, Middle means that the cloud coverage of the Sentinel-2 image > 10% and ≤ 60%, and High means that the cloud coverage of the Sentinel-2 image > 60%).
Figure 8. The cloud coverage of Sentinel-2 images in the study areas in 2020. (a) Area A; (b) Area B. (Low means that the cloud coverage of the Sentinel-2 image is ≤10%, Middle means that the cloud coverage of the Sentinel-2 image > 10% and ≤ 60%, and High means that the cloud coverage of the Sentinel-2 image > 60%).
Remotesensing 14 04395 g008
Figure 9. Comparing the actual acquired Sentinel-2 reflectance and NDVI images to their reconstructed correspondence. (a) Actual images in Area A (24 October 2020); (b) Reconstructed images in Area A. (c) Actual images in Area B (28 April 2020); (d) Reconstructed images in Area B.
Figure 9. Comparing the actual acquired Sentinel-2 reflectance and NDVI images to their reconstructed correspondence. (a) Actual images in Area A (24 October 2020); (b) Reconstructed images in Area A. (c) Actual images in Area B (28 April 2020); (d) Reconstructed images in Area B.
Remotesensing 14 04395 g009
Figure 10. Scatter plot of the actually acquired and reconstructed image in Case 1 situation in Area A (24 October 2020). (a) NDVI in the range of [0–1]; (be) surface reflectance in blue, green, red and NIR band, respectively; (f) NDVI in the range of [−1–1]).
Figure 10. Scatter plot of the actually acquired and reconstructed image in Case 1 situation in Area A (24 October 2020). (a) NDVI in the range of [0–1]; (be) surface reflectance in blue, green, red and NIR band, respectively; (f) NDVI in the range of [−1–1]).
Remotesensing 14 04395 g010
Figure 11. Scatter plot of the actually acquired and reconstructed image in Case 1 situation in Area B (28 April 2020). (a) NDVI in the range of [0–1]; (be) surface reflectance in blue, green, red and NIR band, respectively; (f) NDVI in the range of [−1–1]).
Figure 11. Scatter plot of the actually acquired and reconstructed image in Case 1 situation in Area B (28 April 2020). (a) NDVI in the range of [0–1]; (be) surface reflectance in blue, green, red and NIR band, respectively; (f) NDVI in the range of [−1–1]).
Remotesensing 14 04395 g011
Figure 12. Comparisons of actual cloud-contaminated Sentinel-2 images and reconstructed images over the Area A and B. (a) Cloud-contaminated Sentinel-2 images on different dates in the Area A; (b) Sentinel-2 images in the Area A reconstructed by the DCT-PLS method. (c) Cloud-contaminated Sentinel-2 images on different dates in the Area B; (d) Sentinel-2 images reconstructed in the Area B by the DCT-PLS method.
Figure 12. Comparisons of actual cloud-contaminated Sentinel-2 images and reconstructed images over the Area A and B. (a) Cloud-contaminated Sentinel-2 images on different dates in the Area A; (b) Sentinel-2 images in the Area A reconstructed by the DCT-PLS method. (c) Cloud-contaminated Sentinel-2 images on different dates in the Area B; (d) Sentinel-2 images reconstructed in the Area B by the DCT-PLS method.
Remotesensing 14 04395 g012
Figure 13. The actually acquired Sentinel-2 image (a) in Case 3 situation in Area A (11 July 2020), and reconstructed images (bf) with different values of s .
Figure 13. The actually acquired Sentinel-2 image (a) in Case 3 situation in Area A (11 July 2020), and reconstructed images (bf) with different values of s .
Remotesensing 14 04395 g013
Figure 14. The actually acquired Sentinel-2 image (a) in Case 3 situation in Area B (22 June 2020), and reconstructed images (bf) with different values of s .
Figure 14. The actually acquired Sentinel-2 image (a) in Case 3 situation in Area B (22 June 2020), and reconstructed images (bf) with different values of s .
Remotesensing 14 04395 g014
Table 1. Vegetation Species Samples Information.
Table 1. Vegetation Species Samples Information.
Typical Vegetation SamplesSample IDProvinceLongitude (°E)Latitude (°N)
Larch1Hebei117.5042.58
Rotation of rice2Zhejiang120.4730.37
Rice3Liaoning122.0940.99
Rotation of wheat and corn4Henan113.7833.92
Grass5Hebei114.7641.29
Soybean6Hebei114.7941.18
Potato7Hebei114.9241.35
Grape8Hebei115.4340.34
Tobacco9Hebei114.6639.81
Corn10Hebei115.3140.37
Table 2. Sentinel-2 L2A products information in the selected bands and Sentinel-2 L2A image information at the two study areas.
Table 2. Sentinel-2 L2A products information in the selected bands and Sentinel-2 L2A image information at the two study areas.
Sentinel-2 L2A Product Information in the Selected Bands
Band NameDescriptionCentral Wavelength (nm)Bandwidth (nm)Resolution (m)
B2Blue4906510
B3Green5603510
B4Red6653010
B8NIR84211510
QA60Cloud mask--60
Information of available images in the two study areas
Study AreaDate rangeNumber (used/total)Average cloud coverage (used/total)
Area A1 June 2019–1 June 2021117/14925.74%/40.28%
Area B1 June 2019–1 June 2021162/30128.50%/59.61%
Table 3. The lowest RMSE_cv and corresponding R2 of different N of typical vegetation samples.
Table 3. The lowest RMSE_cv and corresponding R2 of different N of typical vegetation samples.
N12243648
Vegetation Samples RMSE_cvR2RMSE_cvR2RMSE_cvR2RMSE_cvR2
Larch0.06010.95940.04870.97500.04630.97740.04890.9746
Rotation of rice0.11210.85610.08430.92680.09410.91020.12030.8708
Rice0.04720.96670.04020.97600.03700.98000.03070.9865
Rotation of wheat and corn0.05970.95690.04860.97150.03540.98550.02100.9949
Grass0.03410.96190.02200.98410.02350.98240.02120.9857
Soybean0.04050.95440.03290.97240.03250.97190.03390.9687
Potato0.03180.98220.02160.99190.02000.99300.02580.9901
Grape0.03990.96380.02590.98460.03750.96790.04070.9624
Tobacco0.02960.98070.02250.98890.02240.98900.02240.9890
Corn0.04100.96920.04830.95960.05220.95320.05710.9444
Mean0.04960.95510.03950.97310.04010.97100.04220.9667
Table 4. The acceptable range of s of typical vegetation samples when N = 24.
Table 4. The acceptable range of s of typical vegetation samples when N = 24.
Typical Vegetation SamplesRMSE_cv-mi s _optThe Acceptable Range of s
Larch0.0487168–48
Rotation of rice0.08432213–59
Rice0.0402430.1–60
Rotation of wheat and corn0.048652–22
Grass0.0220140.1–57
Soybean0.0329160.1–60
Potato0.021660.1–20
Grape0.0259186–60
Tobacco0.0225160.1–52
Corn0.04834513–60
Table 5. The corresponding dates of the images selected in Area A and B under different cases.
Table 5. The corresponding dates of the images selected in Area A and B under different cases.
CaseArea AArea B
Case110–244–28
Case21–28, 5–22, 8–20, 12–283–14, 4–18, 8–11,12–29
Case37–116–22
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yang, K.; Luo, Y.; Li, M.; Zhong, S.; Liu, Q.; Li, X. Reconstruction of Sentinel-2 Image Time Series Using Google Earth Engine. Remote Sens. 2022, 14, 4395. https://doi.org/10.3390/rs14174395

AMA Style

Yang K, Luo Y, Li M, Zhong S, Liu Q, Li X. Reconstruction of Sentinel-2 Image Time Series Using Google Earth Engine. Remote Sensing. 2022; 14(17):4395. https://doi.org/10.3390/rs14174395

Chicago/Turabian Style

Yang, Kaixiang, Youming Luo, Mengyao Li, Shouyi Zhong, Qiang Liu, and Xiuhong Li. 2022. "Reconstruction of Sentinel-2 Image Time Series Using Google Earth Engine" Remote Sensing 14, no. 17: 4395. https://doi.org/10.3390/rs14174395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop