Satellite remote sensing systems can acquire observations/images of atmosphere, ocean, and the Earth’s surface, providing regional or even global coverages with different spatial resolutions. However, due to satellite sensor malfunction and poor atmospheric conditions, remote sensing images often suffer from missing pixel problems, such as thick cloud cover, dead pixels, and the scan line corrector (SLC) failure [1
]. In particular, the cloud fraction over land and ocean is about 55% and 72%, respectively [2
]. Therefore, missing pixel problems are common for remote sensing images, which may affect subsequent image analysis and applications.
Land surface temperature (LST), as one of the most important surface parameters for meteorological and hydrological studies [3
], is key to monitor Earth’s surface energy and water balance related landscape processes and responses [4
]. Over the past several decades, although accuracy of LST retrieval from satellite thermal infrared (TIR) measurements has significantly improved, missing pixel problems still exist for satellite-based LST products/images (e.g., Landsat 8 Analysis Ready Data LST tiles [5
For remote sensing imagery including LST images, there are a variety of traditional missing information reconstruction methods. They can be categorized into four groups [1
]: (i) spatial-based methods that assume statistical or geometrical structures from the unobscured part are transferable (e.g., geostatistical interpolation [6
], propagated diffusion [7
norm regularization [8
], and exemplar-based inpainting [10
]); (ii) spectral-based methods that utilize the redundant spectral information in multispectral and hyperspectral images (e.g., [9
]); (iii) temporal-based methods that use the temporal dependency information among images (e.g., temporal replacement [11
], temporal filter [12
], and temporal learning model [13
]); and (iv) spatial-temporal-spectral-based methods (e.g., spatial completion with temporal guidance [14
] and joint spectral-temporal methods [15
]). In addition, Sun et al. (2018) [16
] proposed a special spectral method, graph Fourier transform of matrix network, to recover the entire matrix network from incomplete observations.
Among the traditional methods, source (or reference) images are mostly used in temporal-based-related methods. Those source images have implicit/explicit correlation with the corrupted images, and they can provide information for the missing pixels’ recovery. For example, there exist seasonal patterns in satellite LST images [17
], thus corrupted LST images can be repaired from other adjacently acquired complete LST images, assuming they have high correlations with the corrupted images.
Recently, as a powerful category in deep learning, convolutional neural networks (CNNs) [18
] have been successfully applied to image-related tasks in many fields including remote sensing [19
]. For example, CNNs have been used for super-resolution of remote sensing images [20
] and denoising hyperspectral images [23
]. The success of CNNs comes from their capability of capturing high-level abstract features from images and transforming them into task-specific outputs by optimizing trainable parameters with large image datasets. There are some typical network structures in CNNs that can be applied on recovery of images with missing pixels. The first structure example is the encoder–decoder structure, where the encoder (implemented by a CNN) extracts high-level features and reduces spatial resolutions and the decoder (also implemented by a CNN) performs the opposite operations to recover the input image. Li et al. (2017) [26
] used an encoder–decoder structure to generate contents for missing regions on face images. The next structure example, U-Net [27
], is an extension of the encoder–decoder structure, with the skip connections between the matching levels its distinctive characteristic that facilitates the recovery of spatial details. Thus, U-Net has been adopted for filling in missing regions of any shape with sharp structures and fine-detailed textures [28
]. The third structure example is generative adversarial network (GAN), which consists of two neural networks (i.e., a generative network and a discriminative network) that contest with each other until both reach some equilibrium. Specifically, the generative network learns to map from a latent space to a data distribution of interest, while the discriminative network distinguishes candidates produced by the generator from the true data distribution. Ledig et al. (2017) [29
] successfully applied GAN for image super-resolution.
The goal of this paper is to develop a neural network that repairs the corrupted/masked part of a Landsat 8 ARD LST image patch (the target) with the assistance of a collocated adjacently-acquired complete LST image (the source). With regard to missing pixel reconstruction for satellite images, there have been standard CNN based methods that take advantage of auxiliary complementary data from the spatial, spectral, and temporal domains of satellite images (e.g., [30
]). With regard to the fundamental improvement on the CNN structure for image inpainting, partial convolution developed by Liu et al. (2018) [32
] improves the convolution on masked images by accounting for the available pixels in the convolution slice windows.
Therefore, the proposed model adopts the partial convolution enabled U-Net architecture as the main architecture with two modifications: (1) the extension of encoders by supplementing the source information; and (2) the utilization of partial merge layer to generate complete skip connection images for the corresponding decoders. In addition, the partial convolution is improved by considering weights during its mask correction.
Model performance of the proposed model as well as four other baseline models was benchmarked and analyzed. Two of the baseline models are from previous studies (i.e., [31
]) and the other two are variants of the proposed model with the implementation of partial convolution (as well as related operations) replaced by standard convolution and the original partial convolution [32
The remainder of this paper is organized as follows. Section 2
describes: (1) the architectures of the proposed model; (2) the baseline models; (3) the training procedure; (4) the implementation; and (5) the LST datasets for training and testing the models. Section 3
summarizes the training process and validation results and compares model performance on selected cases. Section 4
discusses the potential explanations of the performance differences between the proposed model and the baseline models.
This paper develops a deep learning model, Source-Augmented Partial Convolution v2 (SAPC2), to reconstruct missing pixels in partially-corrupted Landsat 8 ARD LST 64 × 64 image patch (the target) with the assistance of a collocated adjacently-acquired complete ARD 8 LST image patch (the source). To achieve this goal, SAPC2 utilizes the partial convolution enabled U-Net as the framework and accommodates the source into the framework by: (1) performing the shared partial convolution on both the source and the target in encoders to extract high-level features; and (2) merging the source and the target using the partial merge layer to create complete skip connection images for the corresponding decoders. SAPC2 are trained with 2.352 million target-source image pairs and its performance on validation dataset is compared with four baseline models including two previous published models (SAPC1 and STS-CNN) and two variants of SAPC2 (SAPC2-OPC and SAPC2-SC). The results show that SAPC2 has the best performance in terms of nine validation metrics. For example, the MSEmasked of SAPC2 is 7%, 20%, 44%, and 59% lower than that of SAPC1, SAPC2-OPC, SAPC2-SC, and STS-CNN, respectively, and the Lsobel,masked of SAPC2 is 4%, 28%, and 29% lower than that of SAPC1, SAPC2-OPC, and SAPC2-SC, respectively. The results also show that on selected validations cases, the repaired target image generated by SAPC2 have the fewest artifacts near the mask boundary and the best recovery of color scales and fine textures compared with the four baseline models.