Single Remote Sensing Image Super-Resolution with an Adaptive Joint Constraint Model

Remote sensing images have been widely used in many applications. However, the resolution of the obtained remote sensing images may not meet the increasing demands for some applications. In general, the sparse representation-based super-resolution (SR) method is one of the most popular methods to solve this issue. However, traditional sparse representation SR methods do not fully exploit the complementary constraints of images. Therefore, they cannot accurately reconstruct the unknown HR images. To address this issue, we propose a novel adaptive joint constraint (AJC) based on sparse representation for the single remote sensing image SR. First, we construct a nonlocal constraint by using the nonlocal self-similarity. Second, we propose a local structure filter according to the local gradient of the image and then construct a local constraint. Next, the nonlocal and local constraints are introduced into the sparse representation-based SR framework. Finally, the parameters of the joint constraint model are selected adaptively according to the level of image noise. We utilize the alternate iteration algorithm to tackle the minimization problem in AJC. Experimental results show that the proposed method achieves good SR performance in preserving image details and significantly improves the objective evaluation indices.

The multi-frame SR methods aim to recover a high-resolution (HR) image from multiple low-resolution (LR) frames. A multi-frame SR method based on frequency domain was proposed in [13]. Though this method can be implemented fast, it leads to serious visual artifacts [12]. To address the above issue, many spatial domain-based multi-frame SR methods have been proposed [15,16]. Farsiu et al. [15] proposed an iterative method using l 1 -norm in the fidelity and regularization terms. Patanavijitt and Jitapunkult [16] proposed a stochastic regularization term according to the Bayesian maximum a posteriori (MAP) estimation. Huang et al. [14] proposed a bidirectional recurrent does not consider the continuity and correlation of an image's local neighborhood, and the edge preservation ability should be further improved. To address this issue, we propose a local structure constraint based on the local gradient of the image, which can further improve the edge-preserving ability. For the sparse representation-based SR methods, it is usually assumed that the sparse coefficients corresponding to the LR image and the HR image are equal. However, due to the degradation of the LR image such as blurring and down-sampling, there is a gap between the LR sparse coefficient and the HR sparse coefficient, i.e., sparse noise. In addition, image artifacts tend to appear in the reconstructed images. Since the sparse noise and image artifacts can be well suppressed by exploiting these overlapping nonlocal similar patches, as shown in [7,31], we use the nonlocal self-similarity of the image to construct a nonlocal constraint. By using the nonlocal constraint and the proposed local structure constraint, we further propose a novel adaptive joint constraint (AJC) for the single remote sensing image SR. In the reconstruction phase, according to the level of image noise, we adaptively select regularized parameters for the proposed AJC model to obtain better reconstructed images. Our main contributions are summarized as follows: (1) Since human visual system is more sensitive to image edges, we propose a local structure filter based on the local gradient of the remote sensing image and then construct a local structural prior.
(2) The fusion of the complementary local and nonlocal priors can achieve higher SR performance. Based on this, we combine the nonlocal self-simialrity prior and the local structural prior and then propose a novel adaptive joint constraint (AJC) for the single remote sensing image SR.
(3) To further improve the proposed AJC SR method, the regularization parameters are selected adaptively according to the level of remote sensing image noise.
The organization of this paper is as follows. Section 1 is the introduction. Section 2 introduces the related work about the sparse coding-based single image SR. In Section 3, the proposed AJC algorithm is described in detail. Section 4 introduces the basic parameters setting and shows experiment results. Section 5 summarizes our work.

Related Work
The single-image SR reconstruction methods aim to transform the LR image into the HR image. To achieve this goal, sparse priors have been proposed [22,27,30,31,36,37]. Researchers have found that the image can be sparsely represented as x ≈ Θα, where α is a sparse vector composed of coefficients, and Θ denotes an over-complete dictionary, such as a discrete cosine transform (DCT) or a wavelet dictionary [30,38]. The sparse vector α can be evaluated via the following formula: where ι represents a regularization parameter, x is a HR image, Θ denotes the dictionary of x, and α represents the sparse vector of x. Since l 0 -minimization is an NP-hard problem, in general, l 1 -minimization is used to approximate the l 0 -minimization. In the single-image SR problem, we want to restore the missing details of the LR image y to enhance the spatial resolution. For this problem, we can address it by the following equation: whereα y denotes the estimated sparse vector of the LR image y, andλ is a regularization parameter of the sparse term. Since the sparse vector of the HR patch is the same as the corresponding LR patch [22], we can get the reconstructed HR image byx ≈ Θα y .

Proposed Method
To remove artifacts and improve the quality of reconstructed HR images, we propose the AJC model that includes the nonlocal and local priors. The proposed framework is shown in Figure 1, the nonlocal prior is constructed by exploiting the nonlocal self-similarity of the sparse vector, and the local prior can be obtained according to the proposed local structure filter. The proposed AJC model can be formulated as:α where the first term is the data term, J NL (α) denotes the nonlocal constraint, J L (x) represents the local prior, η is a parameter to balance the data term and the nonlocal constraint, and γ is a regularization parameter used to keep balance between the data term and the local structural constraint. For the subsequent reconstruction process, we first need to use the self-learning method to train the compact dictionary. Then, the nonlocal and local priors are constructed. Finally, we use the iterative shrinkage algorithm [39] to solve the SR reconstruction problem.

The Compact Dictionary Learning
In sparse representation, it is important to select a suitable dictionary for the image. In this paper, we learn a compact dictionary for the image according to [31]. It is worth noting that our dictionary learning uses a LR image as the input of training. The compact dictionary learning first needs to extract the HR image patches X ( For the first reconstruction, the HR image was reconstructed from the corresponding LR image by Bicubic.). Considering that human vision is sensitive to edges, we remove extremely smooth image patches in X and express them in X h . Furthermore, to further supplement the high frequency details of the images, we learn the dictionary in the high-frequency feature space of patches. Specifically, we adopt a rotationally symmetric Laplacian of Gaussian filter (size 5 × 5 with standard deviation 0.5), which can improve the average PSNR of the test images by 0.11dB in SR application. The high-frequency feature of X h is denoted by C = [C 1 , C 2 , . . . , C N ], where N represents the number of patches. Then, we adopt the K-means algorithm to divide C into K clusters (H = [H 1 , H 2 , . . . , H K ]). Next, the compact dictionary Θ can be learned from H. Specially, since the PCA approach can reduce dimension and realize de-correlation, we apply the PCA algorithm to each cluster H k (k = 1, 2, . . . , K). After PCA, the sub-dictionary Θ k corresponding to each cluster H k can be obtained according to [31]. Finally, the compact dictionary Θ corresponding to the reconstructed imagex can be constructed by Θ = [Θ 1 , Θ 2 , . . . , Θ K ]. After learning the dictionary Θ, we apply the dictionary to the proposed AJC model for obtaining the reconstructed HR imagex. It should be noted that the estimatedx will be used to update the dictionary Θ and K will be updated in the dictionary learning phase.

The Nonlocal Self-Similarity Prior
The remote sensing image has a large number of repetitive structures. That means, for a given image patch, we can find its similar patches in other parts of the image. We illustrate the nonlocal self-similarity of the remote sensing image by using the "airplane" image in Figure 1. The red patch is a target patch, its nonlocal similar patches can be found in the image, such as the blue, green, black patches. Thus, we can use the nonlocal self-similarity of the remote sensing image to guide the image SR reconstruction. According to [31], the sparse vector also has the nonlocal self-similarity. Thus, we can exploit the nonlocal self-similarity of the sparse vector to guide the image SR reconstruction, where the nonlocal prior can be formulated as: where α = [α 1 , . . . , α i , . . . , α N ], β = [β 1 , . . . , β i , . . . , β N ], and β i is the estimated α i by using the nonlocal self-similarity of α i . In view of the above, the closer the patches are, the greater the similarity weight will be. This can be achieved by gaussian weighting. Thus, we can estimate β by the nonlocal gaussian weighting of the nonlocal similar sparse vector α. For each patch x i , β i can be computed by the following equation: where C o is the collection of similar patches of x i , x s i denotes the s-th similar patch of x i , 2 is a constant that controls the degree of decay, and α s i represents the s-th similar vector of α i .

The Local Structure Prior
However, the previous prior term can not sufficiently constrain the local structures of remote sensing images. Therefore, we propose a local prior as a complementary prior. The methods [7,22,40] exploit local features of the images to constrain the solution of SR problem. JIDM [7] and SRSC [22] employed local image patches as local features. K-SVD [40] ensured that mean square error (MSE) of each pixel neither reduction nor change in each iteration, which constrains the local structures. However, it is not sufficient for these methods to preserve edge energy while taking into account the relationship between adjacent pixels. To address the above issue, we use the local gradient of the image to solve the local structures-preserving problem in the SR reconstruction. The image gradient is sensitive to the image edge. The large gradients correspond to sharp edges of the image, while the small gradients correspond to smooth regions of the image. The objective function based on the local gradient of the image can be expressed as: where p is the pixel location in the image, Z ∈ R m×n (m × n represents the size of the image) denotes the clean image, O ∈ R m×n represents the smoother image of Z, Q (Q = m × n) is the number of pixels in the image, a h,p and a v,p are weight vectors, ( h,p (O)) 2 and ( v,p (O)) 2 are the square of gradients along the horizontal and the vertical directions at the p-th position of the image O, respectively. In Equation (7), the first term is the data term to ensure that the image O is close to the image Z, the second term is a regularization term that makes the gradient of O as small as possible (except where the image has significant gradients) and constrains the edge energy, and the parameter ω balances the data term and the regularization term. By using matrix notations, we can rewrite Equation (7) as: where S h and S v ∈ R Q×Q are discrete difference operations along the h and v directions, respectively, which are defined as follows: where c and r are the column and row numbers of S h and S v , respectively, c, r = 1, 2, . . . , Q, and r 0 is the value of r in the case of r = c. Minimizing Equation (8), we can acquire the following formula: where I denotes an identity matrix, and we call E = (I + ωM) the local structure filter. Inspired by [41], we calculate weight vectors a h,p and a v,p by: where 1 is a constant for mathematical stability, and 1 = 0.0001. Since O is unknown, a h,p and a v,p are estimated by exploiting the image Z. The calculation of the local structure filter E is summarized in Algorithm 1.

Algorithm 1
The calculation of the local structure filter E.

Input: Z
Output: E 1: set parameters 1 , ω and compute a h,p , a v,p by Equation (12) The proposed local structure filter E has strong local correlation because it takes into account the local gradient of the image. Moreover, remote sensing images are usually terrain and targets images, such as airplane, harbor, parking lot, river, and mountain. These images have local correlation. To describe the local correlation of remote sensing images, we construct a local prior for remote sensing images. In order to explore the local property, we perform statistical analysis on a large number of remote sensing images. Since E can describe the local continuity of the image, we analyze the statistic property of ψ = Ex − x. For the image "Building" as an example, the distribution function curves are shown in Figure 2. We can find that the probability density of ψ is close to Gaussian distribution (note that the value of each entry in ψ is normalized to [−1, 1] in this test). According to the results, we can obtain the following formula: where F(ψ) is the the probability density of ψ, and σ h denotes a standard deviation. Then the local constraint J L (x) can be constructed according to ψ in the MAP estimate, and the mathematical expression is as follows:

Regularized Parameter Settings
Proper parameters are conducive to improve the performance of SR reconstruction, so we adopt the parametric adaptive method. Since the noise level is related to the regularization parameters [23], in this paper, η and γ are both selected adaptively according to the noise level. The noise level can be calculated by [42]. Let ξ = α − β, and assume that ξ and ψ are independent. The MAP estimation considering both ξ and ψ is a multi-prior estimation. Multi-prior estimation methods have been widely used in image processing [43,44]. The MAP estimation of ξ and ψ can be expressed as: Since E is a pre-calculated matrix and ψ = Ex − x, ψ is only related to x. We can obtain F(y|ψ) = F(y|x). F(y|ξ) and F(y|x) are generally characterized by the Gaussian distribution [43,44]: where ξ and β are assumed to be independent according to [31]. Since Θ is pre-calculated and x = Θα, F(y|ξ) = F(y|x).
F(ξ) can be obtained according to [31,45]: where ξ i,g is the g-th entry of ξ i , and σ i,g represents a standard deviation of the error ξ i,g corresponding to the Laplace distribution. Substituting Equations (13), (16), (17) and (18) into Equation (15), and ignoring constant terms, we can get: where α i,g denotes the g-th entry of α i , and β i,g represents the g-th entry of β i . Comparing Equation (4) with Equation (19), we can obtain parameters η and γ. Furthermore, to ensure mathematical stability, we add additional parameters η and γ in the calculation of η and γ, respectively.
where σ h denotes the evaluated noise level fromx during each iteration. Finally, we apply the iterative shrinkage algorithm [39] to the AJC model, and the details of solution are given in Algorithm 2. (1) obtain the noise level σ h of the reconstructed HR imagex l ; (2) estimate η and γ by Equation (20); (3) calculate the filter E by using Algorithm 1, wherex l as the input; denotes the corresponding sub-dictionary for the patch R ix (l+1/2) ; (7) reconstruct the HR imagex (l+1) according tô where R i denotes the extraction matrix ofx i ; end of inner loop; end of outer loop. 3: obtain the reconstructed HR imagex.

Experimental Setting
In this subsection, we show minutely the datasets and parameters settings. Test images are shown in Figure 3. These images come from three different remote sensing image datasets: UCMLUD [46], USC-SIPI-aerials [47] and NWPU-RESISC45 [48]. UCMLUD contains 21 classes of remote images, and each class contains 100 images with size 256 × 256. The spatial resolution for this dataset is 0.3 m/pixel. The spatial resolution for USC-SIPI-aerials is 1 m/pixel. NWPU-RESISC45 contains more kinds of images than UCMLUD. Specifically, NWPU-RESISC45 contains 45 categories, and each class has 700 images with size 256 × 256. The spatial resolution of this database ranges from 0.2 m/pixel to 30 m/pixel. Figure 3. Experimental test images, including the following images: aerial, airplane, residential, parking-lot, terrace, farmland, mountain, industrial-area, river, building, meadow, island, runaway, storage-tank, harbor.
Furthermore, the basic parameters settings in the proposed AJC method are as follows: the number of inner loop J is 160, the number of outer loop L is 5, ω is set to 0.0001, and δ = 7. In addition, to generate the test LR images, first, the HR image is blurred by the Gaussian Blur Kernel with size 7 × 7 and standard deviation 1.6. Then, we downsample the blurred image by a scale factor of 3 [7,49].

Parameters Setting
The size (τ × τ) of patch and the number (K) of clustering have an important impact on the SR performance. Too few clusters will eliminate the gaps between classes. Too many clusters will make the dictionary lose its representativeness and reliability. So we need to find an optimal K by [30]. Specifically, we first divide the training patches into K clusters, and then merge the classes containing a few image patches into the nearest neighboring classes. We analyze the impact of τ and K 0 on peak signal-to-noise ratio (PSNR) for all the test images, where K 0 denotes the predefined clustering number of K, and the results are shown in Table 1. The average PSNR varies with the patch size. In the case of the patch size of 5, the average PSNRs of the test images are superior. In the case of the same patch size and K 0 larger than 10, PSNRs are close. This phenomenon shows the robustness of the method to find an optimal K [30]. Furthermore, the larger the number of clustering, the more time it takes. In order to obtain a higher PSNR at a reasonable time, we set τ to 5 and K 0 to 50.

Comparison with Different Traditional Methods
To demonstrate the SR performance of the proposed AJC algorithm, we compare it with other SR methods, including Bicubic, SRSC [22], ASDS [30], MSEPLL [43], NARM [50] and LANR-NLM [51]. PSNR, structural similarity index (SSIM) [52] and erreur relative globale adimensionnelle de synthèse (ERGAS) [53] are used as the objective evaluation indices. The result with higher PSNR/SSIM and smaller ERGAS means the quality of the reconstructed image is better.

Noiseless Remote Sensing Images
In the noiseless case, the reconstruction results using different methods are shown in Table 2. For "Island", the LANR-NLM method acquires better objective evaluation indices. However, considering all the test images, our algorithm has superior objective indices. Specifically, the average PSNR, SSIM and ERGAS are 31.90 dB, 0.8876 and 2.2912, respectively. In addition, a graph is drawn in Figure 4 for the results provided in Table 2. From Figure 4, it can be intuitively observed that our method has better objective indices than other methods. To intuitively show the visual quality of the reconstructed image, we compare the visual results as shown in Figures 5 and 6. The SR performance of Bicubic interpolation is the worst. NARM produces smoother images. As shown in Figures 5f and 6f, the MSEPLL method also smoothes many details of the image. In Figure 5, compare with other SR reconstructed methods, the proposed method reconstructs the HR image with fewer artifacts and clearer edges.  In Figure 6, ASDS and LANR-NLM tend to smooth out image details to some extent. As shown in Figure 6h, the image reconstructed using the proposed method has a clearer white line than others.

Noisy Remote Sensing Images
To demonstrate the effectiveness of our method in the noisy case, we add noise with a standard deviation of 5 to the degraded image. The results are shown in Table 3. Compared with other methods, the ASDS method achieves better SR results on the "Aerial" image. However, the average PSNR, SSIM and ERGAS of our algorithm are the highest among these SR methods, which are 29.58 dB, 0.8144 and 2.8257, respectively. In addition, to more intuitively reflect the performance of our method, we draw a graph for the results provided in Table 3, as shown in Figure 4. Furthermore, we give visual effects as shown in Figures 7 and 8. We can find that the proposed method also has better SR performance in suppressing image noise and preserving details and edges.
Consequently, according to the experiments on test images, the proposed method can achieve the best SR results in both noiseless and noisy cases.

Comparison with Different Deep Learning Methods
To reasonably compare with deep learning methods, we chose the classic deep learning methods SRCNN [32] and SRGAN [35], and the recently proposed remote sensing images method LGCnet [54]. For these methods, we retrain and fine tune their models, and then use these models for SR reconstruction. The training data come from three remote sensing image datasets: UCMLUD [46], USC-SIPI-aerials [47] and NWPU-RESISC45 [48]. All the training images are first blurred and then down-sampled to obtain the low-resolution images. Next, both the obtained low-resolution images and their original versions are collected as training pairs to train SRCNN [32], SRGAN [35], and LGCnet [54]. When the training error stops decreasing, we reduce their initial learning rates to fine-tuned their models to achieve their best performance. The results are shown in Table 4. For "Runway", the resolution reconstructed by the LGCnet method is higher. However, for all the test images, our method is superior to SRCNN, SRGAN and LGCnet, and the average PSNR/SSIM gains over SRCNN, SRGAN and LGCnet are 0.84 dB/0.0215, 1.64 dB/0.0778 and 0.2 dB/0.0048, respectively. For ERGAS, our method is 0.0394 better than SRGAN on the test images. Our average ERGAS is 0.04 better than LGCnet's average ERGAS and 0.1984 better than the average ERGAS of SRCNN. In order to show that our method does not need a lot of external data compared with the deep learning methods, we provide the training time and the number of training images, as shown in Table 5. Compared with other methods, the proposed approach can train with only one image, and saves lots of training time. The experiment results show that our method has better SR performance in noiseless cases.

Comparison with Different Methods on Datasets
To verify our method performance, we use different methods to reconstruct the test databases, including the "Airplane" and "Storage-tank" sub-databases from the UCMLUD, and the "Island" sub-database from the NWPU-RESISC45. The results are shown in Table 6. In Table 6, in addition to our approach, ASDS is the best traditional method, LGCnet is superior to SRCNN in deep learning methods. For the "Airplane" sub-database, our method is superior to the ASDS and LGCnet methods, and the average PSNR/SSIM gains over ASDS and LGCnet are 0.81 dB/0.0106 and 0.32 dB/0.0026, respectively. Meanwhile, our method has a better ERGAS than other methods, it is 1.7553. For the "Storage-tank" sub-database, the PSNR/SSIM of our algorithm is 0.56 dB/0.0119 higher than ASDS. Our ERGAS is 0.1194 better than the ERGAS of ASDS. Compare with LGCnet, although LGCnet has better SSIM and ERGAS, our approach gets the larger PSNR. For the "Island" sub-database, the PSNR/SSIM of our method is 0.34 dB/0.0046 superior to ASDS, and the PSNR/SSIM of LGCnet is 0.08 dB/0.0019 inferior to ours. The results show that the images reconstructed using our method has better performance.
The previous experimental analysis shows the comprehensive performance of our proposed method. Specifically, for images that satisfy the joint constraint (i.e., images with repeated structures and many edges), the proposed method has superior performance. For images with highly complex texture, the proposed method is not effective enough. In our paper, the proposed approach is based on local gradient constraint and nonlocal similarity. Because images with repeated structures and many edges can perfectly satisfy these two features, they can acquire superior reconstruction performance. Thus, in most cases, this approach performs better than many other super-resolution methods. However, for images with highly complex texture, these two features are not reliable. If we use these two features for reconstruction, the reconstructed image will not be very good. In fact, many SR approaches do not work well under such extreme conditions.

The Effectiveness of Joint Constraint
The image itself comprises repetitive structures, i.e., self-similarity. Researchers often exploit the nonlocal self-similarity of images for the SR reconstruction. However, it is not enough to consider only the nonlocal self-similarity to improve the resolution of the input image. To address the above problem, we construct a local constraint according to the proposed filter. Considering the joint constraint, the quality of the reconstructed image will be greatly improved. In order to demonstrate the performance of the joint constraint, we compare the SR performance for the joint constraint and the single nonlocal constraint. The results are shown in Table 7. For all the test images, the method for the joint constraint achieves superior objective qualities. Specifically, the average PSNR and SSIM gains of the joint constraint over the single nonlocal constraint are 0.38 dB and 0.0071 , respectively, and the average ERGAS of the joint constraint is 0.097 better than the single nonlocal constraint. Experiment results demonstrate that the complementary joint constraint can effectively improve the image quality.

The Effectiveness of Adaptive Parameters
In the classic sparse coding problem, the choice of regularization parameters is very important. However, the parameters of most methods are fixed. In this paper, we adaptively select the parameters according to the noise level. To demonstrate the effectiveness of our adaptive parameters, we compare the results of the fixed parameters with those of the adaptive parameters as shown in Figure 9. In the case of fixed parameters, λ 1 and λ 2 are set to 0.33 and 0.001, respectively. In Figure 9, the maximum PSNR gain is 0.36 dB, and the corresponding image is "Runway". At the same time, the image with the largest SSIM gain is also "Runway". Specifically, it is 0.0061. For "Runway", the ERGAS with adaptive parameters is 0.0408 better than that with fixed parameters. Experiment results indicate that the adaptive parameters are beneficial to improve the SR performance.

Complexity Analysis
Our method takes major cost on three part: the sub-dictionaries learning, the calculation of local structural filter, and the nonlocal search. In the sub-dictionaries learning, the core procedure involves the clustering of K-means. Its computational complexity is O(Lτ 2 mnK). The calculation of local structural filter E takes O(LJmn). The nonlocal search is related to the patch size, the search window size b × b and the number of the images. It takes O(mnτ 2 b 2 ). Take the "Airplane" with size 256 × 256 as a test image, the LANR-NLM method takes 40.82 s, LGCnet takes 0.74s, NARM and ASDS take 68.02 s and 107.92 s, respectively, the MSEPLL and our methods take 215.88 s and 297.91 s, respectively. Therein, traditional SR methods are implemented in MATLAB 2014a on a computer with Intel(R) Core(TM) i7-7700K CPU @ 4.20GHZ 4.20GHZ, 16.0GB RAM and 64-bit Windows 7 operating system. Our method takes a little longer time than others. This is because the proposed method learns the dictionary online for the input image and performs adaptive filtering for each iteration of the HR image. However, our method does not need external training examples, which saves a large amount of training time. In conclusion, our approach achieves the best visual and objective quality with reasonable running time.

Conclusions
In this paper, we propose a novel SR scheme based on sparse representation for the single remote sensing image. First, we use the single-dictionary method to learn the compact dictionary that exploits only the unique information of remote sensing image itself. Compared with the double-dictionary method and the deep learning method, this method has an advantage in the absence of external samples. Second, we propose a local structure filter based on the local gradient of image, and then a local structure prior is constructed. After that, the joint prior is constructed, including the local structure prior and the nonlocal self-similarity prior, which can effectively improve the fine structures recovery ability. Finally, the reconstructed HR images can be obtained by using the iterative shrinkage algorithm. The results show that the proposed local structure prior shows superior edge-preserving performance and the complementary prior constructed is more conducive in improving the SR performance. Compared with other methods, the HR images reconstructed by our scheme have better visual quality and higher objective evaluation indices. In the future, we will extend the proposed method to other image processing applications.