Infrared Image Super-Resolution Reconstruction Based on Quaternion and High-Order Overlapping Group Sparse Total Variation

Owing to the limitations of imaging principles and system imaging characteristics, infrared images generally have some shortcomings, such as low resolution, insufficient details, and blurred edges. Therefore, it is of practical significance to improve the quality of infrared images. To make full use of the information on adjacent points, preserve the image structure, and avoid staircase artifacts, this paper proposes a super-resolution reconstruction method for infrared images based on quaternion total variation and high-order overlapping group sparse. The method uses a quaternion total variation method to utilize the correlation between adjacent points to improve image anti-noise ability and reconstruction effect. It uses the sparsity of a higher-order gradient to reconstruct a clear image structure and restore smooth changes. In addition, we performed regularization by using the denoising method, alternating direction method of multipliers, and fast Fourier transform theory to improve the efficiency and robustness of our method. Our experimental results show that this method has excellent performance in objective evaluation and subjective visual effects.


Introduction
Image super-resolution reconstruction (SRR) uses digital signal processing to generate high-resolution (HR) images from a single or multiple frames of low-resolution (LR) images, mainly through the super-resolution method. Image super-resolution reconstruction can efficiently utilize the potential value of existing image data and has applications such as military remote sensing reconnaissance [1], target tracking and monitoring [2][3][4], target location and recognition [5], astronomical observation [6], and medical imaging [7].
There are three types of super-resolution reconstruction methods: based on regular terms representation, learning-based methods, and partial differential equation-based methods. Learning-based image super-resolution reconstruction has been studied extensively in the recent years. For example, based on the convolutional neural network (CNN), Lim proposed an enhanced deep super-resolution network (EDSR) by removing unnecessary modules [8]. Dong redesigned the super-resolution CNN (SRCNN) structure by introducing a deconvolution layer at the end of the network, reformulating the mapping layer, adopting smaller filter sizes [9]. Xu proposed a novel global dense feature fusion convolutional network (DFFNet), which can take full advantage of global intermediate features leading to a continuous global information memory mechanism [10]. To restore various scales of image details, Du enhanced the multi-scale inference capability of CNNs by introducing competition among multi-scale convolutional filters [11]. Chi proposed a uniform deep CNN (DCNN) framework to handle the denoising and super-resolution of the CT image at the same time [12]. Zhang made a comparative study of fast super-resolution CNN (FSRCNN), deeply recursive convolutional networks (DRCN), very deep super-resolution convolutional networks (VDSR) and SRCNN for single image super-resolution with the purpose of space applications, and concluded that DRCN is the best model with more generalized for space object image [13]. Xiao formulated a joint loss function by combining the output and high-dimensional features of a non-linear mapping network, which uses satellite video data itself as a training set [14]. For infrared images, Liu proposed a classified dictionary learning method which classifies features of the samples into several reasonable clusters and trained a dictionary pair for each cluster [15]. He proposed a cascaded architecture of deep neural networks with multiple receptive fields by a large scale factor (×8) [16]. These methods learn the mapping between HR and LR images by pre-selecting test samples and accordingly reconstruct HR images. They can achieve good reconstruction results; however, the computational complexity is high.
The image reconstruction method based on the partial differential equation model has good results. The most popular of these methods are those based on the total variation (TV) regularization model [17]. This method preserves the edges of the images well, while removing image noise. However, there are "staircase artifacts" and unclear texture problems in the reconstructed image. To reduce the staircase artifacts, some scholars have proposed high-order variational models [18,19]. For example, Bredies, Kunisch, and Pock proposed total generalized variation (TGV) based on the combination of TV regularization with higher-order derivatives [20]. Although these methods can reduce staircase artifacts and protect the edges of the image, they produce "spots effect" in the processed image. To balance staircase artifacts and spot effect, a fractional-order variational model, which uses a fractional gradient instead of an integer gradient, has been proposed [21][22][23]. We have also proposed a super-resolution method, which combines quaternion [24,25] and fractional-order total variation, and uses the ADMM acceleration algorithm, achieving good results in image objective evaluation, visual effect and duration [26].
The regular term representation is an image representation model that captures the main information and intrinsic geometry of the image with a few parameters and achieves good results in terms of image restoration, target tracking, and other applications. Since Yang et al. first applied sparse representation to super-resolution reconstruction [27,28], many scholars have proposed improved methods for super-resolution reconstruction based on sparse representation [29][30][31][32][33][34][35]. In recent years, Selesnick and Chen proposed overlapping group sparse total variation (OGSTV) [36], which is a non-separating regular term that preserves the sparsity of the objective function [37]. The overlapping group sparse regularization term considers the sparsity of the image difference domain and also mines the neighborhood difference information of each point, thus mining structural sparsity characteristics of the image gradient. By overlapping the combined gradients, the difference between the smooth region and the boundary region can be improved, thereby suppressing the staircase artifacts of the TV model. The work of Selesnick and Chen, Liu et al. generalized the one-dimensional overlapping sparse regularization term into a two-dimensional overlapping sparse regularization term and introduced it into an anisotropic total variational model for denoising and deconvolution [38][39][40]. Using the Lp quasinorm instead of the L1 norm, we have also proposed a method for infrared image deblurring with an overlapping group sparse total variation method, in which the Lp quasinorm introduces another degree of freedom, better describes image sparsity characteristics, and improves image restoration [41].
Besides, there are some other types of image reconstruction models. Wang proposed an image self-embedding method, using authentication watermark and recovery watermark to complete image restoration. The authentication watermark locates the tampered area. The recovery watermark is compressed into different categories and encoded into variable lengths to improve the quality of the recovered images [42]. Xia proposed a new fast and accurate image matching algorithm, which first presents the district-identification method to obtain the integer-pixel matching result, then introduce gradient algorithm to match the sub-pixel position [43]. Wang proposed an image authentication and a recovery algorithm based on chaos and Hamming code, which can effectively detect image tampering and complete image recovery [44]. Wang proposed an image tampering detection and recovery algorithm based on jitter and chaos technology. The algorithm uses chaos technology to complete watermark embedding and encryption. Combined with the Chinese remainder theorem, it further reduces the impact of watermark embedding on image quality [45].
In fact, for the noisy images, the conventional super-resolution way is to denoise the images as a pre-processing step and then super-resolve the denoised images. In some new methods [46][47][48][49], such as the median filter transform (MFT) with parallelogram-shaped windows [47], denoising and super-resolving are integrated to provide improved results in comparison to the conventional way.
Super-resolution models based on regular terms can be solved by the alternating direction method of multipliers (ADMM) algorithm [50]. In recent years, many scholars have proposed various algorithms based on the classic ADMM, such as the plug-and-play (PnP) ADMM [51][52][53][54][55] and and regularization by denoising (RED) framework [56][57][58][59]. They are powerful image-recovery frameworks that aim to minimize an explicit regularization objective constructed from a plug-in image-denoising function. Since their introduction, they have demonstrated extremely promising results in image restoration and signal recovery problems [60][61][62].
In this study, we explore quaternion total variation and high-order to improve the sparsity exploitation of OGSTV. Our proposed method is called the quaternion and high-order overlapping group sparse (HOGS4), which is efficiently solved through the RED framework. The novelty of our work is two-fold. First, the HOGS4 method is considerably less restrictive than the OGSTV method for infrared image reconstruction as it shows good performance in terms of detail preservation by incorporating high-order image derivatives and also achieves accurate measurement of the sparsity potential from prior regularity. Second, it provides fast and efficient closed-form solutions for computationally complex sub-minimization problems using FFT.
The remainder of this paper is organized as follows. Section 2 briefly introduces the majorization-minimization (MM) method and RED framework. Section 3 describes the proposed method. In Section 4, our experiments and results are described. Finally, Sections 5 and 6 present the discussion and conclusions, respectively.

Overlapping Group Sparse Total Variation
The overlapping group sparse total variation (OGSTV) model [36] is as follows: where the symbol * is the convolution operator; F ∈ R N×N is the reconstructed image; and K 2 = −1 1 are the horizontal and vertical differential convolution kernels, respectively.
Ṽ i,j,K,K 2 is used to solve the combined gradient, whereṼ i,j,K,K is defined as where K is the group size, K l = K−1 2 , K r = K 2 . x is the largest integer value less than or equal to x.
From Equation (2), it can be seen that the combined gradient considers the gradient information of the neighborhood pixel, and the gradient information of these neighboring pixels is recombined by the L2 norm, thereby improving the difference between the smooth region and the edge region of the image [39].
The overlapping group sparse model can be solved using the MM method [63]: where ϕ (V) is the overlapping group sparse regular term, andṼ i,j,K,K is an overlapping group sparse matrix of size K × K. According to the MM method, to minimize P (V), we need to find a function Q (V, U), such that Q (V, U) ≥ P (V) for all V and U, and the equality holds if and only if V = U. According to this, the minimum value of Q (V, U) calculated each time is the optimized value of P (V), and Equation (3) can be written as According to the following inequalities: where the equal sign is only true when U = V. From Equations (3) and (5), we can obtain the optimization terms of as shown below: Equation (6) can be written as: where v is the vector form of the matrix V, C (U) is independent of V and can be considered as a constant term for V; D (U) is a diagonal matrix whose diagonal elements are defined as follows: By combining Equations (4) and (6), Equation (3) can be transformed into the following iterative optimization method: Its iterative optimal solution is as follows: where I is the identity matrix, v 0 is the vector form of V 0 , and mat represents the vector matrixing operator. Therefore, we obtain Algorithm 1 to solve Equation (3).

Regularization by Denoising
For image super-resolution reconstruction, the model can be expressed as where H is a circular matrix that represents the convolution for the anti-aliasing filter. S is a binary sampling matrix, where the rows are subsets of the identity matrix. Further, G is an observation image, and F represents the corresponding original image.
To solve the above model, we can transform it into image denoising using regularization by denoising (RED) [56,57], which relies on a general structural smoothness penalty term for regularizing any desired inverse problem. Specifically, the regularization term R (F) is defined as where f (F) is defined as the image denoising engine The denoising engine is applied to image F, and the induced penalty is proportional to the inner product between the image and its denoising residual. The smooth regularization effectively uses image adaptive Laplacian, and then extracts its definition from any image denoising engine f (·). Interestingly, under the mild assumption of f (·), it is proved that the regularized gradient is manageable, just like the given denoising residual F − f (F) [58,59].

Proposed Method
Inspired by the overlapping group sparse and quaternion total variation methods, this paper proposed a denoising model that uses the RED framework to complete infrared image super-resolution reconstruction (HOGS4). The traditional OGSTV does not fully consider pixel points, only considers first-order information [40]. To improve the denoising effect, we extend the traditional OGSTV to the high-order total variation model. The proposed model of high-order overlapping group sparse total variation not only considers first-order information but also adds the high-order gradient information of the horizontal, vertical, back diagonal and diagonal directions to the prior term. The introduction of quaternion and high-order information is used to make the prior knowledge more accurate, thus protecting the edges of the image [26], and also suppressing the influence of small edges on the estimation of the blurring core [20]. The denoising model is defined as follows: where K i (i = 1, 2, 3, 4) represents the convolution kernels along the horizontal, vertical, back diagonal, and diagonal directions, respectively. These are defined as follows: Then according to Equation (11), the HOGS4 for infrared image super-resolution reconstruction method based on RED framework can be expressed as: where regularization term R HOGS4 (F) in RED framework can be defined as To solve the HOGS4 model in the RED framework, according to the principle of ADMM, an assistant variable Z is required to convert the unconstrained problem given by Equation (16) into a constrained problem: Consequently, the corresponding augmented Lagrangian function is as follows: where Y is a Lagrange multiplier, and ρ > 0 is a penalty parameter. Because F and Z are decoupled, the minimizer of Equation (18) can be found by solving the following sequence of F and Z sub-problems: The procedure comprises the following steps: 1. To solve the sub-problem of F, let W = SH. Then, Equation (20) can be represented as follows: Considering Z (k) and Y (k) are fixed, by setting the first-order derivative of F in Equation (22) as zero, we have According to the ADMM, the solution of the sub-problem of F is 2. To solve the sub-problem Z, according to Equations (17) and (21) can be transformed as follows: Considering F (k+1) and Y (k) are fixed, by setting the first-order derivative of Z in Equation (25) as zero, we have which can be solved by the fixed point strategy, leading to the following update rule for Z [56]: where f HOGS4 (·) is HOGS4 denoising engine, which is defined as: Euqation (27) means that our approach in this case is computationally more expensive, as it will require several activations of the denoising engine f HOGS4 [56].
3. Then we update the Lagrange multiplier as The proposed SRR method is summarized in Algorithm 2.

End While
Regarding the sub-problem Z, Equation (28) can be converted into the following constraint problem: Accordingly, the augmented Lagrangian function is: where U vi and U wi (i = 1, 2, 3, 4) is the Lagrange multipliers; η vi > 0 and η wi > 0 are penalty parameters.
The minimizer of Equation (30) is the saddle point of L Z, V i , W i ; U vi , U wi , which can be found by solving the following sequence of subproblems: The procedure comprises the following steps: 1. To solve the sub-problem Z, the 2D Fourier transform of Z can be obtained by employing the convolution theorem [64]: where the symbol • represents component-wise multiplication.
vi and U (n) wi are fixed, by setting the first-order derivative of Z in Equation (35) as zero, we have For simplicity, we abbreviate Equation (36) as where Then, according to Equation (37), we have where [F (K i )] * is the conjugate map of F (K i ) . 2. To solve the sub-problem of V i in Equation (33), the MM (Algorithm 1) can be used: where V (n+1) i(m+1) represents the iteration of the MM algorithm for V 3. To solve the sub-problem W i , we set the first-order derivative of W i in Equation (34) as zero, and get: 4. Lastly, the Lagrange multiplier can be updated as In this manner, all the sub-problems of Equation (28) are solved independently. In all iterations, the sub-problem V is solved by MM algorithm according to Equations (41) and (42). Considering the special structure of the differential matrices in the sub-problem of W, we regard the differential operators as convolution operators. By introducing the convolution theorem [64], the sub-problem W is solved in the frequency domain. The entire algorithm to solve Equation (28) is summarized in Algorithm 3. Besides, for Algorithm 3, regarded as HOGS4 denoising engine, we can also use it as an independent denoising algorithm, using quaternion and high-order overlapping group sparse total variation to complete the image denoising.

Materials and Method
In this section, we present several numerical results to illustrate the performance of the proposed method. RED-HOGS4 is compared with different noise levels and Gaussian blur conditions with several other methods, including the MFT [47], RED-TV [17], RED-TGV [20], and RED-OGSTV [36] methods. Among the four methods, the MFT method used the scripts provided in [47] while other methods are based on the literatures and are combined with the RED framework for super-resolution reconstruction. Eight infrared images are selected from the infrared image database LTIR [65] and IRData [66] as test pictures, as shown in Figure 1. Our experiments were performed on a PC with an Intel CPU 2.8 GHz and 8 GB RAM using MATLAB R2014a.
ã 1: description of figure For the objective evaluation, we calculated the peak signal-to-noise ratio (PSNR) [67] and structural similarity (SSIM) [68]. PSNR is an engineering term, which can compare the similarity of two input images or signals based on the mean square error. SSIM is also a method to measure the similarity between two input images, which is designed to improve on other methods such as PSNR which are not consistent with human eye perception. These can be defined as follows: where X and X ij are the original image; Y and Y ij are the reconstructed image; u X and u Y are the mean values of X and Y, respectively. Further, σ 2 X and σ 2 Y are the variances of X and Y, respectively; σ XY is the covariance of X and Y. The parameters k 1 and k 2 are set such that the denominator of SSIM is a nonzero number. In this study, we set k 1 = 0.01 and k 2 = 0.03 [68].
In general, larger values of PSNR and SSIM indicate better performance. Therefore, in this experiment, we focus on the PSNR as well as the SSIM. In all experiments, we set the parameters empirically as follows: µ = 1, ρ = 0.001, N = 3 [57] . If γ = 1, the Algorithm 3 is a classic ADMM, but γ = 1.618 makes it converge noticeably faster than γ = 1 [38]; therefore, we set γ = 1.618. Besides, for the tol value in Algorithm 3, when N is recommended to be set to 3 in the literature [57], we found that when tol = 0.001, the PSNR value is high, so we set tol = 0.001 in all experiments. The blur matrix H in Equation (11) is set as a corresponding matrix to the blur kernel, which was generated by a MATLAB built-in command "fspecial ('gaussian', 7, 1.6)". S is set as a K-fold downsampling operator which is generated by the MATLAB built-in function "downsample(X,K)".

Infrared Image Super-Resolution Experiment without Noise
In the experiment, the LR images without noise are obtained by downsampling the HR images (2-fold, 3-fold, and 4-fold). To evaluate the performance objectively, PSNR and SSIM are calculated under different levels of super-resolving operators (corresponding to ×2, ×3, and ×4). The experimental results of each method are listed in Table 1.
It can be seen from the experimental results in Table 1  The difference between them is little. On the contrary, we can see the best PSNR values of Garden, Gate, Car and Sidewalk are RED-HOGS4 44.1652 dB, 32.1942 dB, 32.6179 dB and 33.8876 dB in the ×2 reconstruction results. Taken together, although the RED-HOGS4 method only has the best PSNR value for the four images in the ×2 reconstruction results, from an overall perspective, the average PSNR of the eight images processed by the RED-HOGS4 method is greater than that of other methods. The RED-HOGS4 method exhibits better SSIM only in individual pictures; however, the mean SSIM is worse than the processing result obtained by the MFT method. Simultaneously, the PSNR values of all the images processed by the MFT method are poor. As the super-resolution levels of super-resolving operators increase to ×3 and ×4, the results of RED-HOGS4 become significantly better than several other methods. For example, in the ×3 reconstruction results, the PSNR values of RED-HOGS4 are higher than those of RED-OGSTV by about 0.02 dB~0.08 dB. Further, in the ×4 reconstruction results, the deference is expanded to 0.02 dB~0.23 dB. Meanwhile, the SSIM values of RED-HOGS4 are also higher than those of other methods. However, as the RED-HOGS4 method is relatively complex, it takes the longest time compared to all other methods.
The following is a comparison of visual effects on three images: Street, Station, and Gate after 4-fold down-sampling of the original image without any noise using the five methods. The LR images are shown in Figure 2, in which the rectangles are compared with the SRR effects of the five methods in Figures 3-5        As can be seen from Figures 3-5, in the case no noise is introduced, after 4 times super-resolution processing, the effect of MFT processing is the worst among the images obtained by the five methods. The three images obtained by MFT have the phenomenon of unclear boundary and blur. In the visual comparison of images generated through RED-TV, RED-TGV, RED-OGSTV, and RED-HOGS4, we can see that RED-HOGS4 is better for boundary and overall processing of the image. Especially under 4 times magnification in Figure 5, the results of RED-HOGS4 method clearly show the outlines of letters and strokes, which are significantly better than the other methods.

Infrared Image Super-Resolution Experiment with Added White Gaussian Noise
In this experiment, the LR infrared images were generated by downsampling the original images by a factor of two after adding white Gaussian noise of different variance values (σ = 5, 10, 20, 30). To evaluate the performance variations based on the noise content for each method objectively, PSNR and SSIM were calculated at the ×2 super-resolving operator. These results are listed in Table 2.
The experimental results show that the MFT method has better PSNR in a few images, but worse PSNR mean values; the processing results of RED-TV and RED-TGV are better than that of the MFT, but worse than those of RED-OGSTV and RED-HOGS4. When the noise is small, the PSNR of the RED-OGSTV method is lower than that of the RED-HOGS4 and its SSIM value is higher than that of RED-HOGS4. With the increase in noise, the reconstruction results of the RED-OGSTV method are further lower than those of the RED-HOGS4 method. In terms of processing time, the RED-HOGS4 is relatively more time consuming compared to the other methods.
The visual effects comparison based on the Street, Station, and Gate images, which had added white Gaussian noise (σ = 10) and were downsampled by a factor of two, is shown as example in Figure 6 , in which the rectangles are compared with the SRR effects of the five methods in Figures 7-9, respectively.        Figures 7-9, we see that the MFT has less noise in the reconstructed image, but the entire image is too smooth, resulting in serious loss of boundary information; RED-TV and RED-TGV reconstructed images have inadequate noise removal and information protection, whereas RED-OGSTV and RED-HOGS4 have better reconstructed images, as shown in Figure 7. In Figures 8 and 9, the proposed method shows better results compared to RED-OGSTV for image edge reconstruction and noise effect.

Discussion
The HOGS4 method adopts quaternion TV and high-order OGSTV, which fully utilizes image correlations in quaternion and extends the first-order overlapping group sparsity to a higher-order such that a clear image can be reconstructed in the presence of noise interference. As to the OGSTV method, in the presence of noise, staircase artifacts are still present, and the noise removal is not as good as that of the HOGS4.
When the MFT method is used to reconstruct an image, regardless of the image being noiseless or noisy, the reconstruction result is very smooth, because of which the details are unclear.
The TV method preserves the edge and detail information of the image and smoothens the image piece by piece; hence, the result usually includes stair artifacts. The TGV method effectively reduces stair artifacts using first-order and second-order gradients during image processing. However, it also causes excessive smoothing and image distortion.
However, compared with other methods, this method is more time consuming because it introduces high-order OGSTV and quaternion, which have higher computational complexity. In the next study, we may use some accelerated iterative methods to improve the convergence speed of the algorithm, thus reducing the time consumption. As the methods in the literatures [26,69], the acceleration operator can be used to reduce the number of iterations of the ADMM algorithm, thereby reducing the time consuming of the super-resolution reconstruction algorithm. Besides, the proposed method may have other shortcomings. For example, the parameter optimization is mainly based on experience; because of the limited number of test infrared images, the parameters may not be fully applied to other sets of infrared images. For practical applications, the parameters are still necessary to optimize for the sets of infrared images. Alternatively, the adaptive mechanism of parameter optimization can be adopted in conjunction with this method.

Conclusions
In this paper, an infrared image super-resolution reconstruction method based on quaternion overlapping group sparse is proposed. This method produces improved image super-resolution reconstruction capability because it uses a combination of quaternion total variation and high-order group sparse methods. In addition, by introducing the RED framework, the super-resolution problem is transformed into multiple denoising sub-problems. When addressing these sub-problems, multiple difference operators are processed in convolution form. Using this method, according to the convolution theory, it can be converted to frequency domain operations, thereby avoiding large-scale matrix operations. Compared to MFT, TV, TGV, and OGSTV methods, the experimental results prove that the proposed method has better performance.
Although the proposed method only focuses on HOGS4, it can be easily extended to other regular models, such as TGV model, and combined with other methods, such as Lp quasinorm, to improve the performance of super-resolution reconstruction. Besides, in practical application, the method can be used for super-resolution reconstruction or denoising of grayscale images. we will continue to perform these extensions in our follow-up work.
Author Contributions: X.L. wrote this manuscript. Y.C., Z.P. and J.W. contributed to the writing, direction, and content, and revised the manuscript.