Robust Multi-Frame Super-Resolution Based on Adaptive Half-Quadratic Function and Local Structure Tensor Weighted BTV

It is difficult to improve image resolution in hardware due to the limitations of technology and too high costs, but most application fields need high resolution images, so super-resolution technology has been produced. This paper mainly uses information redundancy to realize multi-frame super-resolution. In recent years, many researchers have proposed a variety of multi-frame super-resolution methods, but it is very difficult to preserve the image edge and texture details and remove the influence of noise effectively in practical applications. In this paper, a minimum variance method is proposed to select the low resolution images with appropriate quality quickly for super-resolution. The half-quadratic function is used as the loss function to minimize the observation error between the estimated high resolution image and low-resolution images. The function parameter is determined adaptively according to observation errors of each low-resolution image. The combination of a local structure tensor and Bilateral Total Variation (BTV) as image prior knowledge preserves the details of the image and suppresses the noise simultaneously. The experimental results on synthetic data and real data show that our proposed method can better preserve the details of the image than the existing methods.


Introduction
Super Resolution (SR) refers to the process of reconstructing a High Resolution (HR) image from a single Low Resolution (LR) image or multiple low resolution images by the software method without modifying the hardware environment. SR is often used in satellite remote sensing [1], medical diagnosis [2] and video surveillance [3] and so forth. According to the number of input low resolution images, it can be divided into Single Image Super Resolution (SISR) and Multi-Frame Super Resolution (MFSR). This paper mainly studies MFSR.
The first MFSR was proposed [4] in 1984, and is based on the frequency domain. The frequency domain method is easy to implement and cheap computationally. However, this method has no way to add image prior knowledge, and can only achieve good results for images without noise and degradation, which is not feasible in practical applications. Therefore, algorithms in the spatial domain are widely studied. There are interpolation-based methods [5,6], reconstruction-based methods [7][8][9][10] and learning-based methods [11,12]. With the popularity of deep learning architectures, many deep learning networks are implemented for MFSR [13][14][15][16]. Learning-based methods need a large number of external training sets; the effect is closely related to the type and number of training sets. Although learning-based SR methods have been hot topics of research in the last few decades, are implemented for MFSR [13][14][15][16]. Learning-based methods need a large number of external training sets; the effect is closely related to the type and number of training sets. Although learning-based SR methods have been hot topics of research in the last few decades, reconstruction-based methods, which do not require training sets, are very popular. The quality of the reconstructed image only depends on our mathematical model and input LR images. This paper mainly studies reconstruction-based MFSR methods that reconstruct an HR image by using the redundancy and supplementary information between multiple LR images in the same scene. In order to find useful information redundancy, an important condition for realizing MFSR is to have sub-pixel displacement as shown in Figure 1. Because SR itself is an ill-posed problem, regularization techniques are widely used to solve the minimization problem. The methods of regularization mainly contain the fidelity term and the regularization term. The purpose of the fidelity term is to minimize the observation errors between the reconstructed HR image and the input LR images. The purpose of the regularization term is to make the reconstructed HR image reach a robust state.
The widely-used fidelity terms in the SR are the L1 norm and the L2 norm. Both the L1 norm and the L2 norm have their advantages and drawbacks. The L2 norm can produce lower variance than the L1 norm. However, the L2 norm is sensitive to outliers. Mestimators are explored in MFSR later. For example, the Huber function [17] is proposed for the fidelity term. In [18], the Lorentzian norm is applied to MFSR to increase the robustness. In [19], the performances of the Tukey, Lorentzian and Huber norms are studied concerning outliers. A locally adaptive L1-L2 fidelity norm is also proposed in [20]. Xueying Zeng et al. [8] introduce the adaptive fidelity term based on a new M-estimator halfquadratic function and bilateral edge-preserving (BEP) regularization method. Köhler et al. [9] propose a weighted Gaussian observation model which takes into account the spatial variant noise and weighted bilateral total variation to exploit the sparsity of natural images. Xiaohong Liu et al. [10] use the half-quadratic estimation method to adaptively select the error norm and propose an adaptive bilateral total variation regularization method.
Image prior knowledge is used to regularize the image reconstruction. Tikhonov regularization [21] is one of most commonly used methods. The Total Variation (TV) family, such as Bilateral Total Variation (BTV) [7], is another popular regularization technique. The BTV uses the L1 norm to handle outliers, unlike Tikhonov regularization which is based on the L2 norm.
In recent years, many researchers have proposed a large number of MFSR methods, but MFSR is an ill-posed problem and it is very difficult to reconstruct a satisfactory HR image because the result of MFSR is affected by motion estimation, image registration, unknown blur, noise, and so forth. Even if they are studied separately, each affecting factor is extremely challenging.
In order to simultaneously preserve the image edge and texture details and remove the influence of noise, we propose a new robust MFSR method. In this paper, there are three major contributions to improving the quality of the final reconstructed HR image, as follows: Because SR itself is an ill-posed problem, regularization techniques are widely used to solve the minimization problem. The methods of regularization mainly contain the fidelity term and the regularization term. The purpose of the fidelity term is to minimize the observation errors between the reconstructed HR image and the input LR images. The purpose of the regularization term is to make the reconstructed HR image reach a robust state.
The widely-used fidelity terms in the SR are the L1 norm and the L2 norm. Both the L1 norm and the L2 norm have their advantages and drawbacks. The L2 norm can produce lower variance than the L1 norm. However, the L2 norm is sensitive to outliers. M-estimators are explored in MFSR later. For example, the Huber function [17] is proposed for the fidelity term. In [18], the Lorentzian norm is applied to MFSR to increase the robustness. In [19], the performances of the Tukey, Lorentzian and Huber norms are studied concerning outliers. A locally adaptive L1-L2 fidelity norm is also proposed in [20]. Xueying Zeng et al. [8] introduce the adaptive fidelity term based on a new M-estimator half-quadratic function and bilateral edge-preserving (BEP) regularization method. Köhler et al. [9] propose a weighted Gaussian observation model which takes into account the spatial variant noise and weighted bilateral total variation to exploit the sparsity of natural images. Xiaohong Liu et al. [10] use the half-quadratic estimation method to adaptively select the error norm and propose an adaptive bilateral total variation regularization method.
Image prior knowledge is used to regularize the image reconstruction. Tikhonov regularization [21] is one of most commonly used methods. The Total Variation (TV) family, such as Bilateral Total Variation (BTV) [7], is another popular regularization technique. The BTV uses the L1 norm to handle outliers, unlike Tikhonov regularization which is based on the L2 norm.
In recent years, many researchers have proposed a large number of MFSR methods, but MFSR is an ill-posed problem and it is very difficult to reconstruct a satisfactory HR image because the result of MFSR is affected by motion estimation, image registration, unknown blur, noise, and so forth. Even if they are studied separately, each affecting factor is extremely challenging.
In order to simultaneously preserve the image edge and texture details and remove the influence of noise, we propose a new robust MFSR method. In this paper, there are three major contributions to improving the quality of the final reconstructed HR image, as follows:

•
We propose a fast and effective method for selecting appropriate LR images, which provide a relatively good input for the reconstruction; • A novel fidelity term is proposed. We adopt the half-quadratic function as the error norm adaptively. All parameters are automatically adjusted according to observation errors; • We propose a local structure tensor weighted BTV regularization term. The novel image prior can better simultaneously preserve image details and suppress noise by assigning a certain weight to each pixel according to the local structure tensor information of the image.
The rest of this paper is organized as follows. Section 2 introduces the observation model and the basic framework of MFSR. Section 3 describes our proposed algorithm in detail. Section 4 presents our experimental results on synthetic data and real data. Section 5 summarizes our paper.

Observation Model and Basic Framework of MFSR
In practical applications, noise, blur, motion and other factors will affect the final SR. Therefore, the observation model of MFSR can be expressed as follows where Y k represents the kth LR image, k = 1, 2, . . . , K. The size of Y k is mn × 1, K is the number of LR images. X represents the HR image, the size is rmrn × 1,the down sampling factor is r. M k represents the geometric motion matrix (sub-pixel displacement) of the kth LR image, the size is mrn × rmrn. B k represents the blur matrix of the kth LR image, the size is rmrn × rmrn. D represents the down sampling matrix, the size is mn × rmrn. n k represents the noise of the kth LR image, the size is mn × 1. We can simplify Equation (1) by combining DB k M k as a system matrix W k [22], and Equation (1) can be rewritten as follows The observation error r k of the LR image Y k can be defined as where |·| means the absolute value andX represents the estimated HR image. The basic framework of MFSR can be expressed aŝ where γ(X) is the regularization term with respect to X, p represents L p norm, and λ is the trade-off parameter between the fidelity term and the regularization term. BTV is calculated simply and is easy to implement, so it is often selected in the regularization term. The formula of BTV can be expressed as where P represents the size of the sliding window. ζ is a scaled weight parameter. The range of ζ is 0< ζ < 1. S n x , S m y denotes shifting X by n pixels in the horizontal direction and by m pixels in the vertical direction, respectively.

Proposed MFSR Algorithm
We first select appropriate LR images from a set of LR frames as the input for MFSR. The overall SR framework of this paper is to employ a coarse to fine strategy, enlarging LR images to the target image size gradually in order to adapt to the large scale factor. The half-quadratic function as the loss function is used to control residuals between the unknown HR image and LR images for the fidelity term. For the regularization term, we combine the local structure tensor information of the image and the image sparsity as the image prior knowledge to regularize the estimated HR image.

Selecting the Appropriate LR Images and Alignment
Many existing MFSR algorithms input all LR images instead of selecting input LR images. Some LR images not only do not improve the quality of the reconstructed HR image, but also consume more time and increase the computational complexity. In order to improve the practical applications of MFSR, it is necessary to select LR images with appropriate quality. In this paper, we can remove some low quality LR images quickly to SR.
The number of images required for SR is at least twice the square of the magnification factor under the ideal environment according to [23]. The number of appropriate frames K can be expressed as where M is the number of all LR images, r is the amplification factor. Although there are many quality evaluation methods [24][25][26] for image selection, most of them are computationally intensive. We need a simple and effective method to quickly select appropriate LR images, so we propose a method of minimum variance. The specific steps for selecting appropriate LR images are as follows: 1.
Calculate the variance of all LR images. The variance formula of the kth LR image Y k is as follows where m,n represents the width and height of the LR image Y k , respectively, Y k is the mean value of the LR image Y k , and the calculation formula is:

2.
Calculate the average of the variance of all LR images, which can be expressed as: 3. Calculate the difference between variance per LR image and the average of the variance which can be defined as: 4. Sort ∆s and choose small ∆s corresponding LR images. Appropriate LR images are adaptively selected combining Equation (6) and small ∆s.
This paper uses the optical flow algorithm to register selected LR images.

Proposed Fidelity Term
Our proposed algorithm is based on the maximum a posteriori (MAP) estimation and Bayesian theory, the expression of the estimated HR image can be written as follows: Since p(Y 1 , Y 2 , . . . , Y K ) has no effect on the estimate X, the above formula can be rewritten as: Let us suppose that each LR image Y k is independent. The Equation (12) can be simplified toX = arg max where p( Y k |X) represents the conditional probability of the LR image Y k . Given the HR image X, p(X) is the prior probability for the HR image X. The Half-Quadratic (HQ) function is first proposed in [27] as a potential function. The formula is as follows: where a is a positive constant. The L1 norm, L2 norm, half-quadratic function, Huber function, Lorentzian function and Tukey function are shown in Figure 2. We find that the half-quadratic function is close to the L2 norm when observation errors are small and it is close to the L1 norm when observation errors are big. This paper uses the half-quadratic function, which combines the advantages of the L1 norm and the L2 norm as the loss function. The half-quadratic function is strictly convex and twice continuously differentiable to obtain the optimum value easily. Let us suppose that each LR image is independent. The Equation (12) can be simplified to where ( | ) represents the conditional probability of the LR image . Given the HR image , ( ) is the prior probability for the HR image .
The Half-Quadratic (HQ) function is first proposed in [27] as a potential function. The formula is as follows: where is a positive constant. The L1 norm, L2 norm, half-quadratic function, Huber function, Lorentzian function and Tukey function are shown in Figure 2. We find that the half-quadratic function is close to the L2 norm when observation errors are small and it is close to the L1 norm when observation errors are big. This paper uses the half-quadratic function, which combines the advantages of the L1 norm and the L2 norm as the loss function. The half-quadratic function is strictly convex and twice continuously differentiable to obtain the optimum value easily. The fidelity term of our robust MFSR by the half-quadratic function is: where is the half-quadratic parameter for the kth LR image, , represents the observation error of the ith pixel of the kth LR image. Figure 3 shows the performance of the half-quadratic function with different values. We can see from Figure 3 that the larger the value is, the closer the half-quadratic function is to the L2 norm. The smaller the value is, the closer it is to the L1 norm. Therefore, the value is inversely proportional to observation errors. The fidelity term of our robust MFSR by the half-quadratic function is: where a k is the half-quadratic parameter for the kth LR image, r k,i represents the observation error of the ith pixel of the kth LR image. Figure 3 shows the performance of the half-quadratic function with different a values. We can see from Figure 3 that the larger the a value is, the closer the half-quadratic function is to the L2 norm. The smaller the a value is, the closer it is to the L1 norm. Therefore, the a value is inversely proportional to observation errors.
The observation error of the kth LR image r k can be expressed as: where |·| means absolute value, I is the total number of pixels per LR image.  The observation error of the kth LR image can be expressed as: where |•| means absolute value, is the total number of pixels per LR image. can be defined as: where max( ) represents the maximum value of . is the confidence matrix composed of confidence weights, = ( , … , , … , ) . For each LR image, the observation error is larger and the corresponding confidence weight should be smaller. So, the confidence weight of the kth LR image can be expressed as: where mean( ) represents the mean value of . is a positive constant, is the estimate value of the scale parameter in each iteration to discriminate inliers and outliers adaptively. is set to 2 in this paper. We use the median absolute deviation (MAD) [28] to estimate , the formula can be expressed as: where we set = 1.4826 for the Gaussian distribution.

Proposed Local Structure Tensor Weighted BTV Regularization Term
Due to the BTV regularization term, we cannot distinguish the edge information of the image well. We propose combining the local structure tensor information of the image with the image sparsity as image prior knowledge in order to simultaneously better preserve the edge information of the image and suppress the noise.
The proposed regularization term can be expressed as: a k can be defined as: where max(r k ) represents the maximum value of r k . C is the confidence matrix composed of confidence weights, C = (β 1 , . . . , β k , . . . , β K ) T . For each LR image, the observation error is larger and the corresponding confidence weight should be smaller. So, the confidence weight of the kth LR image β k can be expressed as: where mean(r k ) represents the mean value of r k . c is a positive constant, σ t f is the estimate value of the scale parameter in each iteration t to discriminate inliers and outliers adaptively. c is set to 2 in this paper. We use the median absolute deviation (MAD) [28] to estimate σ t f , the formula can be expressed as: where we set σ 0 = 1.4826 for the Gaussian distribution.

Proposed Local Structure Tensor Weighted BTV Regularization Term
Due to the BTV regularization term, we cannot distinguish the edge information of the image well. We propose combining the local structure tensor information of the image with the image sparsity as image prior knowledge in order to simultaneously better preserve the edge information of the image and suppress the noise.
The proposed regularization term can be expressed as: The value of w T depends on the local structure tensor of the image. The local structure tensor can well describe the local structure information of the image, which can be expressed as: where I x , I y represents the gradient in horizontal and vertical directions, respectively. Because the local structure tensor T is a positive semi-definite matrix and has two nonnegative eigenvalues, λ 1 and λ 2 (λ 1 ≥ λ 2 ), the spatial structure information of the image can be divided into three cases: When λ 1 ≈ λ 2 ≈ 0, it means that the gray level change of the image along any direction is very small; it is a flat area. When λ 1 > λ 2 ≈ 0, it means that the gray change rate of the image along a certain direction at this point is larger; it is an edge region. When λ 1 ≥ λ 2 > 0, it means that the gray level of the image changes greatly in both vertical directions; the point is a corner.
In order to describe the local structure information of the image well, we construct a local structure tensor weight matrix w T, which is expressed as: where w is a fine tuning parameter; we set w = 0.5. As the image prior weight, ζ in Equation (20) related to X − S n x S m y X can be calculated adaptively as: where c is a positive constant, c is set to 2 in our paper. σ t p is obtained from the distribution and the image prior weight ζ t−1 , the formula similar to σ t f is as follows where σ 0 = 1 for the Laplacian distribution.

Image Reconstruction
In this paper, the formula of the reconstructed HR image can be written aŝ The regularization parameter λ is used to balance the fidelity term and the regularization term. We use cross validation, which is used by Köhler et al. [9] to determine the value of λ.
There are many optimization algorithms for solving the minimization problem for MFSR. We employ Scaled Conjugate Gradient (SCG) [29] to solve the problem in the Equation (25) in this paper because the convergence of SCG is fast and the SCG algorithm can adjust the step size adaptively. We set the iteration threshold to 10 −3 . In the process of optimization, the first-order derivative function of X can be calculated as: where S −n x , S −m y is the transpose matrix of S n x , S m y respectively. I is an identity matrix.

Experiments on Synthetic Data
Our proposed method was first measured on synthetic data quantitatively, because the ground truth HR images were available. We used peak-signal-to-noise ratio (PSNR), structural similarity (SSIM) and Information fidelity criterion (IFC) to assess image quality in this paper. We used common HR images to generate synthetic LR images by using the Set 14 dataset [30], shown in Figure 4. We created 30 LR images from one HR image by adding random motion, blur, down sampling and noise. The range of random translations was from −3 to +3 pixels. The range of random rotation angles was from −1 • to +1 • . Each LR image was blurred by a Gaussian PSF with σ = 0.4. We put in mixed noises by Gaussian noise and Poisson noise.

Experiments on Synthetic Data
Our proposed method was first measured on synthetic data quantitatively, because the ground truth HR images were available. We used peak-signal-to-noise ratio (PSNR), structural similarity (SSIM) and Information fidelity criterion (IFC) to assess image quality in this paper. We used common HR images to generate synthetic LR images by using the Set 14 dataset [30], shown in Figure 4. We created 30 LR images from one HR image by adding random motion, blur, down sampling and noise. The range of random translations was from −3 to +3 pixels. The range of random rotation angles was from −1° to +1°. Each LR image was blurred by a Gaussian PSF with = 0.4. We put in mixed noises by Gaussian noise and Poisson noise. We first selected 25 appropriate LR images by using our method mentioned in Section 3.1. The results of PSNR, SSIM and IFC of these eight synthetic LR images are presented in Tables 1-3, respectively. Figure 5 shows the results of the visual comparison of several algorithms for the Cameraman image. The results of the visual comparison between our algorithm and the ground truth for some images are shown in Figure 6. We first selected 25 appropriate LR images by using our method mentioned in Section 3.1. The results of PSNR, SSIM and IFC of these eight synthetic LR images are presented in Tables 1-3, respectively. Figure 5 shows the results of the visual comparison of several algorithms for the Cameraman image. The results of the visual comparison between our algorithm and the ground truth for some images are shown in Figure 6.        Tables 1-3 show that our proposed method in PSNR, SSIM and IFC is better th other algorithms. Our degradation model and the setting of the weight function are m reasonable. Figure 5 shows the results of the HR image estimated by these algorithms the Cameraman image. Our algorithm performs better with image details. Figure 6 sho the comparison results of our estimated HR image and the ground truth. The result of method is very close to the ground truth.

Experiments on Real Data
Except for synthetic data, we tested our proposed algorithm on real data which ca from the Multi-Dimensional Signal Processing Research Group (MDSP) [31]. In pract applications, the image assessment matrices such as PSNR, SSIM and IFC cannot be u Tables 1-3 show that our proposed method in PSNR, SSIM and IFC is better than other algorithms. Our degradation model and the setting of the weight function are more reasonable. Figure 5 shows the results of the HR image estimated by these algorithms for the Cameraman image. Our algorithm performs better with image details. Figure 6 shows the comparison results of our estimated HR image and the ground truth. The result of our method is very close to the ground truth.

Experiments on Real Data
Except for synthetic data, we tested our proposed algorithm on real data which came from the Multi-Dimensional Signal Processing Research Group (MDSP) [31]. In practical applications, the image assessment matrices such as PSNR, SSIM and IFC cannot be used to evaluate the quality of the real images in the absence of ground truth images. We only tested our algorithm by visual comparison of reconstructed results for real data. Figures 7 and 8 show the results of the visual comparison of EIA and Alpaca, respectively. Tables 1-3 show that our proposed method in PSNR, SSIM and IFC is better than other algorithms. Our degradation model and the setting of the weight function are more reasonable. Figure 5 shows the results of the HR image estimated by these algorithms for the Cameraman image. Our algorithm performs better with image details. Figure 6 shows the comparison results of our estimated HR image and the ground truth. The result of our method is very close to the ground truth.

Experiments on Real Data
Except for synthetic data, we tested our proposed algorithm on real data which came from the Multi-Dimensional Signal Processing Research Group (MDSP) [31]. In practical applications, the image assessment matrices such as PSNR, SSIM and IFC cannot be used to evaluate the quality of the real images in the absence of ground truth images. We only tested our algorithm by visual comparison of reconstructed results for real data. Figures  7 and 8    (g) (h) (i)  Figures 7 and 8 show that our proposed algorithm still has better results when the magnification factor is relatively large because we employ the coarse to fine strategy which gradually enlarges to the target size. In addition, our estimated HR image, which has no obvious artifacts or noise, can maintain the details of the image and suppress the noise simultaneously, mainly because we propose the local structure tensor weighted BTV regularization term which is universal for natural images.
The experimental results on synthetic data and real data show that our proposed method has a better effect compared with current methods; especially for preserving edge information, our method has better performance.

Conclusions
MFSR is a very challenging problem. In this paper, we propose a new robust MFSR algorithm to preserve richer image detail information while suppressing noise. We firs select the LR images with appropriate quality simply and quickly instead of all LR image as the input of MFSR, which reduces the computational cost of MFSR and makes it more practical. For the fidelity term, we analyze and use the half-quadratic function as the los function. We adjust the function parameter and confidence weights adaptively according to the observation errors. We gradually reduce the observation errors of the LR image  Figures 7 and 8 show that our proposed algorithm still has better results when the magnification factor is relatively large because we employ the coarse to fine strategy which gradually enlarges to the target size. In addition, our estimated HR image, which has no obvious artifacts or noise, can maintain the details of the image and suppress the noise simultaneously, mainly because we propose the local structure tensor weighted BTV regularization term which is universal for natural images.
The experimental results on synthetic data and real data show that our proposed method has a better effect compared with current methods; especially for preserving edge information, our method has better performance.

Conclusions
MFSR is a very challenging problem. In this paper, we propose a new robust MFSR algorithm to preserve richer image detail information while suppressing noise. We first select the LR images with appropriate quality simply and quickly instead of all LR images as the input of MFSR, which reduces the computational cost of MFSR and makes it more practical. For the fidelity term, we analyze and use the half-quadratic function as the loss function. We adjust the function parameter and confidence weights adaptively according to the observation errors. We gradually reduce the observation errors of the LR images and the estimated HR image through iteration. We better suppress the noise and preserve the image edge details simultaneously using the local structure tensor weighted BTV regularization term. The combination of the local structure tensor and the sparsity of the image, as the prior knowledge of the image, can reflect the structural characteristics of the natural images.
Our proposed method is a non-blind reconstruction, which assumes that the point spread function is known. The blur kernel is often unknown in practical applications. Our next task is to study the unknown blur in MFSR. One direction of our future work is to extend our method to blind MFSR.
Author Contributions: S.L. is the first author of this paper. Her main contributions include the basic idea and writing of this paper. M.W. is the corresponding author of this paper. His main contributions include analyzing the basic idea. The main contributions of Q.H. and X.L. include checking the experimental results. All authors have read and agreed to the published version of the manuscript.