1. Introduction
In passive remote sensing, clouds, cloud shadows (CCS) and numerous other factors (such as the Scan Line Corrector Off (SLC-OFF) for Enhanced Thematic Mapper Plus(ETM+) onboard Landsat 7) may obscure or contaminate the land surface information recorded by a remote sensor, which causes information distortion or missing data in the corresponding area of the obtained images [
1,
2]. In cloudy and rainy regions, the acquisition of cloud-free images becomes particularly difficult. As a result, the lack of cloud-free images become one of the most restrictive factors for land-related applications. Therefore, filling the missing part of the partially contaminated image will increase the data availability. In time-series remote sensing applications, such as crop classification [
3], forest disturbance history monitoring [
4], farmland abandonment, or cultivation mapping [
5], which require dense time-series images to describe the land-cover dynamics and evolution, the CCS-induced information missing problem will not only prolong the time interval between temporal-adjacent images and produce irregular time intervals, it will also induce the features derived from the time-series image characterized with different dimensions [
6]. Therefore, the reconstruction of missing areas in time-series images will benefit a time-series image based application.
For high temporal resolution images, such as the Advanced Very High Resolution Radiometer (AVHRR) and MODerate resolution Imaging Spectroradiometer (MODIS), an image composite method has been utilized to fill CCS-induced information gaps. This is because the vegetative indices, such as the normalized difference vegetation index (NDVI), which is contaminated by CCS and other noisy factors, are underestimated compared with clear observations. A maximum NDVI composite within a certain period can suppress the negative effect of noise and obtain a relatively noise-free composite image [
7]. This composite method has been extensively employed to create 8-day (d) or 16-d NDVI products for high temporal resolution images. However, multiple observations and one cloud-free observation within the composite period are needed to obtain a fully noise-free composite. In practice, satisfying this precondition in the cloudy and rainy regions is difficult, which causes a certain amount of noise to remain in 8-d or 16-d composites. To further remove the residual error in time-series observations, some well-defined mathematics models, such as the nonlinear Fourier model and the asymmetric Gaussian model, are utilized to describe the temporal evolution of the land cover. The missing values can be estimated using the predicted values of the model to obtain a cloud-free image [
8,
9].
However, the spatial resolution of high temporal resolution data, such as MODIS, is generally 250 m or coarser, which prevents the data from describing the spatial detail of the land surface accurately, and restricts their fine applications in small regions. The medium spatial resolution images, which were acquired by Landsat-like satellites, have a relatively fine spatial resolution. However, their revisiting period approaches 16 days. The selective observations and presence of CCS further prolongs the time interval between temporal-adjacent images; as a result, one cannot reconstruct the missing of CCS-induced information by the maximum-value composite technique. Also, the time interval of medium and high spatial resolution between adjacent images is relatively long and inconsistent, which hinders the ability to obtain smooth modeled time-series curves. Thus, the fitting method, such as Brooks [
8] and Vuolo [
9], cannot be used to reconstruct missing areas. New methods are needed for the missing area reconstruction that are aimed at low-temporal resolution data (which usually have a fine spatial resolution). According to whether reference images are employed, the missing area reconstruction methods can be classified into two main categories, namely, restoration or mosaic methods.
Image restoration methods extract the spatial structural distribution characteristics of an image from the clear part of the image, and then the structural characteristics are extended to the CCS-induced missing region, and the missing region is reconstructed. For example, the multiscale segmentation method divides the image into homogeneous regions with similar spectral characteristics, and then the missing region is filled using the remaining clear parts of the segments in the multiscale space [
10]. The autocorrelation between neighboring pixels is a well-recognized feature, and the Kringing interpolation is used to fill the missing CCS region [
11]. Another extensively employed method adopts the Bandelet transform to extract the geometric field that describes the variation in the grayscale value of an image, and the missing region is reconstructed according to the extracted geometric field [
12]. The image restoration method can achieve satisfactory results for small, homogenous missing regions; however, its applicability for large and complicated missing regions remains unexamined.
The mosaic method assumes that images of the same region taken at different times are not simultaneously contaminated by CCS, and therefore, the cloud-free reference image can be used to fill the CCS-induced missing area and obtain a synthetic cloud-free image. Due to the radiometric difference between the images obtained on different dates, a direct image mosaic may produce patch-based color differences in the resultant image. Therefore, radiometric normalization is used prior to image mosaic to eliminate the radiometric difference. Linear regression, regression tree, and histogram matching are extensively employed models for radiometric normalization for missing area filling [
13,
14]. Since various land covers exhibit radiometric variation over time, the adoption of an identical radiometric normalization equation for pixels of different types is unreasonable. Considering this problem, Melgani proposed a method for establishing class-based linear and nonlinear (support vector regression) equations for different classes in an image and filling the CCS-induced missing region based on the corresponding equations [
15]. Since pixel-by-pixel filling is prone to salt-and-pepper noise, Lin et al. viewed the CCS-induced missing region as a whole, and adopted a global optimization method to fill the missing area [
16]. In contrast to directly establishing a mapping relationship between two scenes of images in the original grayscale space, some recently developed methods exploited missing area reconstruction in the transformed space. For example, the compressed sensing-based method extracts low-rank and other high-level characteristics from one cloud-free reference image by dictionary learning and transfers the learned characteristics to the image with missing areas, as the new feature space can describe the nonlinear relationship between different images and achieve satisfactory results [
17]. The image mosaic procedure mentioned above indicates that the resultant image is mosaicked with an additional image, whose acquisition date will become vague. As the image acquisition time for the time-series analysis is essential, this condition may restrict the applicability of the mosaic method.
As the images are only partially contaminated by CCS, an alternative method is to employ similar pixels in the clear part from the same image to reconstruct the CCS region. Thus, the determination of similar pixels becomes a critical problem for this method. The most similar pixel method [
18] and spatial–temporal Markov model [
19] are successively employed to search for a similar pixel. Considering that crops or natural vegetation of the same class on the image have similar (but not identical) phenological cycles, utilizing time-series images to select a pixel with a similar spectral–temporal evolution, such as the profile-based interpolator (PBI) [
20], spectral angle mapper-based spatiotemporal similarity [
21], and tempo-spectral angle model [
22], will improve the reconstruction accuracy. Since the reference pixel is selected from the clear part of the image, this method can maintain the time stamp for various types of pixels [
18,
19,
20,
21,
22,
23,
24], and is a relatively ideal method for the reconstruction of cloud-free images aimed at time-series analysis. Thus, we develop a method that is characterized by these merits.
Using a pixel in the clear part of the images to fill the missing areas exhibits great potential for time-series applications; however, a pixel-by-pixel reconstruction strategy will produce a serious salt-and-pepper noise effect in the reconstructed area, which will conceal rich textural information [
25]. The inconsistent accuracy will produce a visual false edge between the reconstructed part and the remaining clear part [
26]. Salt-and-pepper noise and false edges can be attributed to the following points [
25,
26]:
- ➢
The temporal variation in a diverse landscape is inconsistent. For example, pseudo-invariant features (PIFs), such as buildings and bare land, remain stable over time. However, vegetation cover, such as forest and cultivated farmland, exhibits distinct seasonal changes. The PIFs are expected to have better accuracy than the temporal variant landscape.
- ➢
The reconstruction accuracy of different bands is diverse. Different bands are characterized by different responses to solar radiation, which will yield different reconstruction accuracies.
- ➢
The reference pixel for the pixel with missing observations is independently selected without any consideration of neighboring pixels; improper reference pixel selection will produce a different reconstruction accuracy.
- ➢
The reconstruction residuals, which are induced by the previously mentioned factor, will produce a visual seam line (i.e., false edge) between the reconstructed part and the remaining clear part.
Combined with different accuracies obtained by different land covers, different bands, and independent reference pixel selections, when the pixel-by-pixel method is applied to rich-textured regions on multiband images, the inconsistent reconstruction accuracy will produce inconsistent pixels (i.e., salt-and-pepper noise) and a false edge between the reconstructed part (corresponding to the missing region) and the clear remaining part (corresponding to the valid region). Inspired by the object-oriented image analysis [
25], we propose a spectral–temporal patch (STP)-based time-series cloud-free image reconstruction method to overcome the influence of salt-and-pepper noise and the false edge. The extracted STP will have not only homogenous spectral features, it will also have a similar temporal evolution. Instead of obtaining one cloud-free mosaic, our method aims to simultaneously reconstruct all of the missing areas in time-series images. The main contributions of our paper are summarized as follows:
- ➢
A multi-temporal image segmentation strategy, which incorporates spectral homogeneity and temporal evolution consistency, is utilized to extract the STP.
- ➢
The textural information from the clear temporal-adjacent image and the spectral information from the clear part of the same image are used to simultaneously reconstruct a missing STP, which will suppress the salt-and-pepper noise.
- ➢
The seam line will go through the actual edge defined by the STP, instead of the original seam line defined by the missing region and the valid region. As demonstrated by Soille [
27], the actual edge between different STPs in the image will help conceal the false edge and obtain the seamless image.
The remainder of this paper is structured as follows.
Section 2 introduces the reconstruction method of the missing area.
Section 3 presents the experimental setting and the evaluation of the results.
Section 4 discusses and analyzes the results of the method in this paper.
Section 5 presents the conclusions of this paper.
2. Methods
In this paper, we propose a STP-based missing area reconstruction method for time-series images. Our method considers the STP with Missing Observation (STPMO) as a basic unit that searches for the reference STP and is to be reconstructed as a whole. At the same time, the textural information, denoted by the spatial arrangement of color or intensities for neighboring pixels [
28], is extracted from the clear temporal-adjacent STP, and injected the missing observation. Through these methods, the reconstruction result not only avoids salt-and-pepper noise in the interior of the reconstructed STP, it also helps conceal the false edge between the reconstructed part and the remaining clear part.
For implementation, the proposed method includes three main procedures (as shown in
Figure 1). (1) Multi-temporal image-based STP extraction: we divide the image into an ensemble of homogeneous patches using image segmentation with multi-temporal (most) cloud-free images. Since the segment obtained by our method has similar spectral and temporal evolution characteristics, we refer to the segmentation result as an STP. (2) Reference STP selection for the STPMO: we adopt a similar STP-based measurement to select the reference STP for the STPMO. (3) Missing STP reconstruction: we reconstruct the missing STP in the image according to the reference STP, and obtain cloud-free time-series images.
For the convenience of method description, we assume that we have obtained
n scenes of an image of the same study area acquired on different dates. According to the date of acquisition, the
n images can be expressed as
I:
Due to the CSS factor, some parts of the image are missing, which should be reconstructed by our method. First, we divide the region of every image in the time series into the valid part
and the missing part
using an automatic mask method or a human-labeled method. For the image
, the valid part and the missing part can be expressed as
and
, respectively, where
and
satisfy:
where pixel
is an invalid observation contaminated by CCS, i.e., a missing value, which should be estimated using the method proposed in this paper. By reconstructing the missing values of all of the images in the time series, we can obtain seamless time-series images.
2.1. Multi-Temporal Image Segmentation
Traditionally, image segmentation groups of adjacent pixels into a homogeneous region according to the similarities among the spectral characteristics of the pixels. The obtained homogeneous region forms the basic unit of analysis, which is generally referred to as a patch or segment. As a basic image-processing method, image segmentation has attracted increased attention, and a large number of image segmentation methods and improved versions have been proposed [
29,
30,
31,
32,
33]. Mean shift segmentation has been extensively employed and achieved satisfactory performance in diverse applications. In mean shift segmentation, the cluster center constantly moves toward the density gradient ascending direction, which is referred to as drift, and therefore obtains the region of high density, and the pixels, which passed through during the process of searching the high-density regions, form a segment [
32]. Numerous studies have addressed the principles and improvement of mean shift segmentation [
33], which will not be repeated here.
The objective of the traditional image segmentation method is to split the image into an ensemble of homogenous regions, which is referred to as a segment, object, or patch. However, the pixels of a segment in our paper will require not only the same spectral characteristics but also similar temporal evolution; we denote them as STPs. To obtain the STPs, we adopt a multi-temporal image segmentation strategy similar to Dutrieux [
31] and Desclée [
34]. We selected multi-images from the time-series images to perform image segmentation. The selection of the images for segmentation considers two factors. (1) Proportion of valid pixels in the image: to obtain the image segmentation results for the entire study area, a cloud-free image in the entire study area is needed. (2) Date of the image acquisition: since vegetative land cover (such as forest and crops) has the distinct feature of temporal evolution, and selecting an image acquired at the temporal window (the best time(s) of image acquisition for vegetative land cover discrimination) will better differentiate the various STP types [
35].
For convenience in processing, we select three scenes of images for segmentation. However, if we separately segment the three images, inconsistent segmentation results will likely be obtained for the various scenes. Thus, many small segmentation STPs can appear around the border of the segmentation results. Since vegetative information is primarily concentrated in the near-infrared band, the near-infrared band of three images is utilized to construct a new image; the color of the obtained image indicates the temporal evolution of the image, and the texture exhibits local contrast and similarity information. The new image is fed to segmentation, and we obtain the STP set
P with
m STPs, which can be expressed as follows:
By multi-temporal image segmentation, the pixels within an STP will have not only similar spectral characteristics, but also similar temporal evolution. We calculate the mean
and standard deviation
of the grayscale value for STP
for image
:
where
represents a pixel with row
and column
, which belongs to the STP of
;
denotes the grayscale value of a pixel with row
and column
on image
; and
is the pixel number of the STP
. On this basis, the mean and standard deviation grayscale value of the STP
are calculated for every image in
; the results form the mean vector
and the standard deviation vector
, which can be expressed as:
The vector depicts the temporal evolution of the mean grayscale value of the STP , and the vector describes its variation magnitude.
2.2. Reference STP Selection for the STPMO
Assume that the data are missing in the region of STP in image (i.e., ), which is an STPMO, in other words, and are unknown. For the convenience of description, we set and to −1. Therefore, we should estimate and using the reference STP , whose mean () and standard deviation () are known in the image . According to this analysis, the problem has been transformed to determine the reference STP.
The reference STP should have similar temporal-spectral features as the missing STP. The correlation coefficient describes the linear correlation between two variables. When it is applied to describe the similarity of the time-series spectral measurement (such as the grayscale value), the correlation coefficient indicates the reliability of using the reference variable to predict the variable with a missing observation. According to this analysis, we calculate the linear correlation coefficient
between STPs
and
:
where
and
are the mean grayscale values for STPs
and
on image
, respectively,
and
are the mean value of vector
and
, respectively.
While calculating the correlation coefficient, we use only the valid value of and (i.e., , as indicated by Equation (10)). The dimension of the input vector for different STPs is inconsistent. Since the value of the correlation coefficient is a real number in the range of [–1, 1], for a situation that lacks one dimension, the fluctuation will not be excessive, but can reflect the linear correlation among the input remaining variables.
We use the
STPs with the largest correlation coefficient
as the candidate reference STP
of STP
. However, the candidate reference STPs can only ensure that the reference STP and STPMO values have a similar temporal–spectral variation; their grayscale value distribution range may differ significantly. The standard deviation can quantitatively describe the grayscale values’ variation amplitude in an STP. The standard deviation similarity between two STPs
and
can be described by the Euclidean distance
, which can be calculated as:
where STP
. We consider the STP with the smallest Euclidean distance in
as the reference STP of STP
Pi and denote it as
:
2.3. Missing Value Estimation for STPMO
Missing value estimation for STPMO includes two steps: (I) estimation of the mean and standard deviation of the STPMO, and (II) textural information injection. The implementation details are introduced as follows:
(I) Estimation of the mean and standard deviation of the STPMO
According to the analysis, the mean vector of the missing STP
and the reference STP
has a high linear correlation. Therefore, a linear regression model is used to describe their relationship. The linear regression coefficient
can be obtained by least squares regression with the remaining valid observations in the time series. The estimated mean value for the STP
Pi,
, can be expressed as:
Since the reference STP is selected using the minimum Euclidean distance of the standard deviation, we directly obtain the estimation of value
using
:
(II) Textural information injection
To recover the rich texture of the STP in image , we “inject” the rich textural information exhibit in the corresponding region of image into the missing part of image . First, we should select a reference image to extract the textual information. The determination of the temporal-adjacent image for image depends on two conditions: (1) the region of STP in the image in image is valid; and (2) based on satisfying condition (1), we select the acquisition date of two images as close as possible—i.e., the temporal interval between and is the smallest.
The textural information from the temporal-adjacent image will be injected to the STPMO under the restrictive condition of the estimated mean (
) and standard deviation (
). For pixel
, whose value is missing, the estimation value
can be calculated as:
where
and
are the mean and standard deviation, respectively, of the grayscale value in the STP
Pi on the image
;
and
are the estimated mean and standard deviation, respectively, of the grayscale value. As shown in Equation (15), the grayscale of the reconstructed STPMO will obey a distribution, where the mean is
and the standard deviation is
. From the above procedure, we can see that neighboring pixels in the reconstructed area will have similar reconstruction accuracy, which helps preserve the spatial arrangement color or intensity information; removing the inconsistent pixel will suppress the salt-and-pepper noise effect.
For the STP
in the image
, the missing part may be partial or complete. For cases in which only part of the STP data is missing, if the missing ratio exceeds
r, we treat the valid pixels as the missing pixel, and reconstruct them using the method. If the missing ratio is smaller than
r, then we use the remaining clear part (denoted as
) to construct the missing part (denoted as
). The setting of
r should take into consideration that the valid part must provide enough information to reconstruct the missing part. By the trial and error method, we find that
r set 0.5 is a proper value to balance the valid pixel preservation and missing pixel restoration. Then, we calculate the mean and standard deviation of the grayscale value for the valid region of
Pi (
) on the image
, which can be expressed as
and
, respectively. The mean and standard deviation of the grayscale value on the image
are
and
, respectively. For the missing value, we adopt the following equation for the reconstruction:
Since CCS can induce a data missing problem for any date of time-series images, the missing observation number in the time series is not consistent among different STPs. Generally, the smaller the missing number in the time series of the STP, the stronger the ability of the correlation coefficient to describe the linear correlation, which increases the probability of selection of the proper reference STP. Therefore, we sort the STP according to the missing data number in descending order, and sequentially reconstruct the missing STP. For example, we reconstruct the STPs with one missing observation using the free STP. The STPs with two missing observations are reconstructed using the free STP and the previously reconstructed STP with one missing. Then, we repeat this process until the STPs with missing observations have been reconstructed, and can thus obtain an entire seamless time series of images.
According to the implementation procedures, the mean and standard deviation are estimated using the reference STP in the same image, which guarantees that the reconstructed STP is consistent with the remaining clear part. Since the texture of the missing STP is injected according to the temporal-adjacent images and normalized by the estimated mean and standard deviation, the pixels in the missing STP will have similar reconstruction accuracy. The seam line of the reconstructed part and remaining part has been shifted to the actual edge between different STPs, which can help conceal the false edge.
5. Conclusions
In this paper, we propose an STP-based missing area reconstruction method for a time-series image. The reconstruction results are characterized by rich texture information (without salt-and-pepper noise) and are consistent with the original image (without a false edge). Since the missing area is constructed STP by STP, which considerably decreases the search space for reference STP, the computational efficiency is substantially improved compared with pixel-by-pixel selection. Our method can complete the processing in hours, but the contrasting method will complete it in days. The STP represents different land covers with similar spectral characteristics and temporal evolution, and the actual edges between different STPs help conceal the false edge induced by the error between the reconstructed part and the remaining clear part.
Although many factors, such as the SLC-OFF for ETM+ onboard Landsat 7, may cause missing information in remote sensing images, the CCS is the most common factor. For convenience of description, we employ CCS-induced information about the missing area to represent the missing information caused by all of the factors. Our method can also be directly applied for missing area reconstruction induced by other factors.
However, this paper has some drawbacks to be solved in future studies. (1) Numerous factors, such as the missing observation number in the time series and the missing land-cover type, affect the accuracy of our method, which may cause uncertainty in the reconstruction results. (2) The direct utilization of a grayscale value may produce additional noise to the reconstructed result. (3) The STPs were obtained by multi-temporal image segmentation using only the NIR band and three scenes of images selected for conducting multi-temporal image segmentation, which may omit some information contained in the remaining bands or images. An improved method may adopt an image segmentation method that can simultaneously handle additional bands and process incomplete data.