Hyperspectral Image Super-Resolution via Nonlocal Low-Rank Tensor Approximation and Total Variation Regularization

: Hyperspectral image (HSI) possesses three intrinsic characteristics: the global correlation across spectral domain, the nonlocal self-similarity across spatial domain, and the local smooth structure across both spatial and spectral domains. This paper proposes a novel tensor based approach to handle the problem of HSI spatial super-resolution by modeling such three underlying characteristics. Speciﬁcally, a noncovex tensor penalty is used to exploit the former two intrinsic characteristics hidden in several 4D tensors formed by nonlocal similar patches within the 3D HSI. In addition, the local smoothness in both spatial and spectral modes of the HSI cube is characterized by a 3D total variation (TV) term. Then, we develop an effective algorithm for solving the resulting optimization by using the local linear approximation (LLA) strategy and the alternative direction method of multipliers (ADMM). A series of experiments are carried out to illustrate the superiority of the proposed approach over some state-of-the-art approaches.


Introduction
Hyperspectral images (HSIs) are recordings of reflectance of light of some real world scenes or objects including hundreds of spectral bands ranging from ultraviolet to infrared wavelength [1,2].The abundant spectral bands of HSIs could possess fine spectral feature differences between various materials of interest and enable many computer vision and image processing tasks to be more successfully achievable.However, due to various factors, e.g., the constraints of imaging sensor and time constraints, the acquired HSIs unfortunately are of low spatial resolution, which cannot provide any further help for high precision processing requirements in many fields including mineralogy, manufacturing, environmental monitoring, and surveillance.Therefore, the task of enhancing the spatial resolution of HSIs is a valuable research issue and has received much attention in recent years.
Super-resolution is a widely-used signal post-processing technique for image spatial resolution enhancement, which produces the high resolution (HR) image from the observed low resolution (LR) image or sequence that are noisy, blurred and downsampled [3,4].Tsai and Huang [5] presented the first algorithm for the reconstruction of the HR image from a set of undersampled, aliased and noise-free LR images.Then, Kim et al. [6] extended the work to consider observation noise as well as the spatial blurring effect.Along this line, several frequency domain methods have been proposed.See, e.g., [7,8].However, such methods cannot utilize the prior information in the spatial domain.Therefore, to overcome their weaknesses, many kinds of spatial domain approaches have been developed.For instance, Ur et al. [9] proposed a non-uniform interpolation by exploiting the structural information in the spatial domain; Irani et al. [10] proposed an iterative back projection method, which can be considered as solving a simple least square problem.
In the last few years, incorporating the prior information of HR auxiliary images into the process of HSI super-resolution.In particular, multispectral and hyperspectral data fusion have been becoming more and more popular.See, e.g., [11][12][13][14] and references therein.Due to the limitations of remote sensing systems, however, it is not easy to get such HR images.Therefore, single HSI super-resolution has still attracted much interest in many real world scenarios.In [2], Akgun et al. proposed a novel hyperspectral image acquisition model, with which a projection onto convex sets based super-resolution method was proposed to enhance the resolution of HSIs.Guo et al. [15] used the unmixing information and total variation (TV) minimization to produce a higher resolution HSI.By modeling the sparse prior underlying HSIs, a sparse HSI super-resolution model was proposed in [16].Zhang et al. [17] proposed a maximum a posteriori based HSI super-resolution reconstruction algorithm, in which PCA is employed to reduce computational load and simultaneously remove noise.Huang et al. [18] presented a novel super-resolution approach of HSIs by joint low-rank and group-sparse modeling.Their approach can also deal with the situation that the system blurring is unknown.In a recent study, Li et al. [19] explored sparse properties in both spectral and spatial domains for HSI super-resolution.Recently, sparse representation and compressed sensing have successfully been used in various real image super-resolution applications.See, e.g., [20,21].Actually, such methods can be naturally used to enhance the spatial resolution of one single HSI cube in a band-by-band manner.
In this paper, following the ideas of our preliminary work [22] and the work [23] for MRI super-resolution, we consider the HSI cube as the tensor with three modes, namely, width, height, and band, and then exploit the underlying structures in both spatial and spectral domain by using direct tensor modeling techniques to achieve the spatial resolution enhancement.Precisely, the spectral bands of an HSI commonly have global strong correlations, and for each local fullband patch of an HSI (which is stacked by patches at the same location of HSI over all bands), there are many same size fullband patches similar to it; this spatial-and-spectral correlation is modeled by a nonconvex low-rank tensor penalty.In addition, for each voxel, its intensity seems to be almost equal to those in its neighbourhood from both the spatial and the spectral viewpoints; this local spatial-and-spectral smoothness is exploited by using the 3D TV penalty.As such, the HSI super-resolution task resorts to solving a nonconvex optimization problem, which could be effectively solved by combing the local linear approximation (LLA) strategy and the alternative direction method of multipliers (ADMM) that we shall show later.
This paper is organized as follows.Section 2 introduces some notations and preliminaries of tensors, which will be used for presenting our procedure.In Section 3, the proposed tensor model and its motivations are introduced.Then, an effective algorithm is developed for solving the resulting nonconvex optimization problem in Section 4. Extensive experiments on three real datasets are carried out in Section 5 to illustrate the outperformance of our model over several state-of-the-art ones.Finally, we conclude this paper with some discussions on future research in Section 6.

Notation and Preliminaries
It is known that a tensor can be regarded as a multi-index numerical array, and its order is defined as the number of its modes or dimensions.A real-valued tensor of order K is denoted by X ∈ R I 1 ×I 2 ×...×I K and its entries by x i 1 ,i 2 ,...,i K .We then can consider an n × 1 vector x as a tensor of order one, and an n × m matrix X as a tensor of order two.Following [24], we shall provide a brief introduction on tensor algebra.
The inner product of two same-sized tensors X , Y is defined as Then, the corresponding Frobenius norm can be defined as X F = X , X .The so-called mode-n matricization or unfolding of the tensor X is denoted as X (n) , where the tensor element (i 1 , i 2 , . . ., i K ) is mapped to the matrix element (i n , j) satisfying Then, the operation "fold" is defined as fold n (X (n) ) := X .The mode-n multiplication of X with the matrix U is denoted by X × n U and, elementwise, we have(X × n U) i 1 ,...,i n−1 ji n+1 ...,i K = ∑ i n x i 1 ,i 2 ,...,i N • u j,i n .The multi-linear rank is defined as an array (r 1 , r 2 , . . ., r K ), where r n = rank(X (n) ), n = 1, 2, . . ., K.
Following [25], for a given matrix X, its folded-concave norm is defined as X P λ := ∑ r j=1 P λ σ j (X) , where σ j (X) is the j-th singular value of X and r is its rank, and P λ is the so-called folded-concave penalty function defined by [26].In this work, we use one popular folded-concave penalty i.e., the minmax concave plus (MCP) penalty, which has the form: Here, a is a fixed constant.Then, similar to the tensor nuclear norm defined in [27], we can define the tensor MCP penalty by applying the MCP penalty function to each matricization X (i) : where α i ≥ 0 and satisfies ∑ N i α i = 1.

HSI Super-Resolution via Nonlocal Low-Rank Tensor Approximation and TV Regularization
In this section, we firstly propose the observation model.Then, we use the 3D TV regularization to describe local smoothness of one HSI, and adopt a path-based tensor folded-concave penalty to characterize both the global correlation and the nonlocal self-similarity of the HSI.Finally, we propose a novel regularization model for dealing with the HSI super-resolution task.The detailed framework of our procedure is presented in Figure 1.

Observation Model
The low spatial resolution HSI is expressed by the following observation model: where the tensor Y ∈ R w l ×h l ×B denotes the observed low spatial resolution HSI, D is the downsampling operator, S is the blurring operator, X ∈ R w h ×h h ×B is the HR image to be reconstructed and E represents the observation noise with the same size of Y, and can be considered as the additive Gaussian noise with zero-mean and variance σ 2 .It is easy to see that this is an ill-posed problem (w l h l < w h h h ), and some regularization terms of X based on its prior knowledge, denoted by R(X ), can be introduced to make this problem to be well-posed, leading to: where λ is a regularization parameter to balance the fidelity term and the regularization term.Then, the main objective is to design suitable regularization terms to explore the underlying priors of X .

3D TV Regularization
Total variation (TV) has been widely used to explore the spatial piecewise smooth structure for tackling various HSI processing tasks [15,28,29].It has the ability of preserving local spatial consistency and suppressing observed noise.Considering that the high spatial resolution HSI to be reconstructed is treated as a 3rd-tensor, its local spatial-and-spectral smoothness can be exploited by a weighted 3D TV term, which has the form: where x ijk is the (i, j, k)-th element of X , and w j (j = 1, 2, 3) is the weight along the j-th mode of X that controls its regularization strength.This weighted 3D TV models the piecewise smooth structure in both spatial and spectral domains.For simplicity, we fix the value of w j (j = 1, 2) to 1 since the spatial dimensions have similar effect, and tune the spectral weight w 3 between 0 to 1.We find that the reconstruction performance is stable when 0.4 ≤ w 3 ≤ 0.8.As such, we fix the value of w 3 to 0.6 in the experimental study presented in this paper.

Nonlocal Low-Rank Tensor Approximation
Nonlocal self-similarity [30][31][32] is a patch-based useful prior, which means that, for a given local patch in one image, there are many patches similar to it.Motivated by [33], separating X into a set of image patches Ω = {X n ∈ R b×b×B } P p=1 (where b is the patch size, P is the number of 3D patches with overlap), and by performing block matching [34], a group of patches that is most similar to each patch X p can be extracted.By stacking all these patches together, we can get a clustered 4th-order tensor T k with size b × b × B × d, where d is the number of 3D patches in the kth cluster.The global spectral correlation among all the bands implies that T k(3) is low-rank.In addition, the observation that the patches in each cluster possess very similar structures implies that T k( 4) is also low-rank.Combining these two points with the certain correlations in both spatial modes, we can measure the low-rank structure of each 4th-order tensor T k by a weighted sum of the rank of each unfolding, that is, where α i ≥ 0 and satisfies ∑ 4 i=1 α i = 1.Since minimizing the rank function ( 7) is generally intractable, and the matrix nuclear norm is usually used as a tight convex surrogate of the matrix rank [35], the rank function ( 7) can be replaced with the tensor nuclear norm [27]: where σ r (Z) is so-called the nuclear norm of matrix Z with size m × n.Although a convex tensor nuclear norm, such as (8), could provide satisfactory results in various tensor recovery problems, studies like [35] demonstrated that the matrix nuclear norm over-penalizes large singular values, and thus gives a biased estimator in low-rank structure learning.Fortunately, a folded-concave penalty [26] can be considered to remedy such modeling bias [25,26].Thus, we use the tensor MCP penalty defined by (3) to exploit the low-rank structure of T k , namely, where P λ is the so-called MCP penalty function defined in (2).
Thus far, we have only considered how to model the low-rank characteristic of one cluster T k .Basically, it should be necessary to exploit the spatial-and-spectral correlation of X by the following form: where N is the number of clustered groups.

Proposed Model
With the above discussions, we now propose the following regularization model for the HSI super-resolution task: where the scalars λ 1 and λ 2 are regularization parameters.It is not hard to see that ( 11) is a nonconvex optimization problem due to the nonconvexity of MCP penalty function.Thus, we only wish to find a good local solution.Specifically, as is shown in Figure 1, by choosing the direct upsampled version of observered LR tensor Y as the intilization point, we would like to estimate a good HR tensor X via the ADMM.

Optimization Procedure
The model (11) can be rewritten as the following equivalent form: min where is the so-called weighted three-dimensional difference operator, and G h , G v , G t are the first-order difference operators with respect to three different directions of HSI cube.Define the operator P that first extracts patches within a 3rd-order HSI cube and then applies nonlocal block matching on these 3rd-order patches to form a 4th-order tensor.Thus, we can denote that P k X = T k .By introducing some auxiliary variables, i.e., {M} 4 i=1 , F into (12) leads to min Based on the methodology of ADMM [36], the augmented Lagrangian function of the above problem is written as follows: where {U i } 4 i=1 and V are the Lagrangian parameters.We will transform ( 14) into three subproblems and iteratively update each variable by fixing the other ones.Let l denote the lth iteration step; then, in the (l + 1)th iteration, variables involved in model ( 13) can be updated as follows: Update X .Extracting all items related to X from ( 14), it can be easily deduced that: which equals solving the following linear equation: where S, D are the transposes of S and D, respectively, and G w indicates the adjoint of G w .This linear system can be efficiently solved by using the well-known preconditioned conjugate gradient method.
Update {M i } 4 i=1 .Similar to the works [22,25], we adopt the LLA algorithm to transform the MCP penalization problem into a weighted nuclear norm penalization problem.Specifically, we need to solve min where )) is the locally linear approximation of X P λ for a given matrix X when X (l) is given.For solving (17), we can first consider the following problem for each kth patch set: Then, according to [27], we can update each P k Thus, by applying the Theorem 3.1 in [25], we can get the following closed-form solution: Here, the singular value shrinkage operator S τ (X) is defined by S τ (X) X is the singular value decomposition of the matrix X and for a given matrix A, [D τ (A)] ij = sgn(A ij )(|A ij | − τ) + .Additionally, the weight matrix W i is defined by )/a)) + ) for some fixed a > 1.Then, as shown in Figure 1, by aggregating all such patch sets {P k Update F .Extracting all terms containing F from ( 14), we can get By introducing the so-called soft-thresholding operator: where x ∈ R and ∆ > 0, and then we update F (l+1) as Update {U i } 4 i=1 and V.The multipliers are updated by where γ is the parameter with a fixed value, namely, 1.05, and the penalty parameters µ and ν follow a certain adaptive updating scheme, which can facilitate the convergence of the proposed optimization procedure.Take µ as an example.We first initialize µ as a small value, i.e., µ = 10 −3 , and then update it by the scheme: where Res = Y − DSX (l+1) F and Res pre = Y − DSX (l) F , and c 1 and c 2 can be taken as 1.15 and 0.95, respectively.

Experimental Study
Three popular real-world HSI data sets were used in our experiments, i.e., the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Moffett Field data set [37], the Digital Imagery Collection Experiment (HYDICE) Urban data set [38] and the HYDICE Washington DC Mall data set [39].After removing seriously polluted bands and cropping images for each data set, the HSI cube used for the experiments are of 256 × 256 × 146, 256 × 256 × 140 and 256 × 256 × 140, respectively.To thoroughly evaluate the performance of the proposed approach, we considered three popular super-resolution methods for comparison, that is, the nonlocal autoregressive model (NARM) proposed by [21], the spatial-spectral group sparsity method (SSGS) proposed by [19], the low-rank and total variation regulariztions (LRTV) method proposed by [23].We also considered the nearest neighbor interpolation (NN) method that is used to achieve the upsampled HSI for comparison.For brevity, our proposed approach is dubbed as NLRTATV.In this set of experiments, the blurring kernel is chosen as the popular Gaussian kernel and all the LR HSIs are obtained by downsampling the original HR HSIs with a factor of 2 or 3, i.e., the LR HSIs are of spatial size 128 × 128 or 85 × 85.In the following experiments, similar to [19], the gray values of each band of HSI were normalized to [0, 255] to facilitate the numerical calculation, though this operation may change the relative spectral properties of the HSI bands.In addition, the parameters a and α i in ( 2) and (3) were fixed to 5 and 1 4 , respectively.For another two regularization parameters λ 1 and λ 2 of the model (12), though they should be carefully tuned, based on the analysis presented in Section 5.3, we set λ 1 as 0.3 and λ 2 as 0.04.For the ease of interested readers to reproduce the results of NLRTATV, we have released the codes at http://vision.sia.cn/our%20team/Hanzhi-homepage/vision-ZhiHan%28English%29.html.

Quantitative Comparison
Tables 1 and 2 give the super-resolution reconstruction results by all the compared methods under noise free setting and noisy setting with σ = 5, respectively, in terms of three image quality measures, including the mean peak signal-to noise ratio (MPSNR) and the mean feature similarity (MSSIM), as used in [40], and spectral angle mapper (SAM) [41].The PSNR (in dB) [42] and SSIM [43] are two traditional image quality measures in image processing and computer vision, which are used to evaluate the similarity between the target image and the reference image based on mean squared error and structural consistency, respectively.Note that, despite its popularity, PSNR may not be a very good image quality measure for HSI based problems.SAM (in rad) calculates the average angle between spectrum vectors of the target HSI and the reference one across all spatial positions.It is well-known that the higher the value of PSNR or SSIM is, the better the image quality is; the lower the value of SAM is, the smaller the spectral distortion.
As shown in Tables 1 and 2, firstly, the proposed NLRTATV method significantly outperforms other competing ones in terms of the aforementioned three quality measures.This clearly shows that more fine priors underlying the HSI cube were exploited by NLRTATV.Secondly, LRTV illustrates a superior performance over NN, SSGS and NARM because of the use of a direct tensor sparse representation of the hyperspectral cube.Both of these findings demonstrate the power of tensor modeling techniques for tackling HSI super-resolution tasks.To provide more detailed quantitative comparison, we also show in Figures 2 and 3 the PSNR and SSIM values of each band for all four of the cases listed in Tables 1 and 2, respectively.As can be seen, the tensor methods including LRTV and the proposed NLRTATV can get much higher values of SSIM and PSNR than other ones for almost every band.In addition, NLRTATV can provide significantly improved reconstruction results over LRTV, especially for noise free cases.All of these show that the proposed NLRTATV can discover more intrinsic structures of HSIs than other competing ones, and thus it possesses a better reconstruction ability.

Visual Quality Comparison
In terms of visual quality, one representative band of recovered HSIs in three typical cases obtained by all of the compared methods are shown in Figures 4-6.It is evident that the images shown by (f) in the Figures 4-6

Analysis of the Parameters
In the proposed NLRTATV model, there exist two regularization parameters λ 1 and λ 2 that need to be carefully identified.Thus, we provide some experimental analysis to show how we can choose them for practical use.(a) and (b) in Figure 7 describe the relationship between MPSNR and the regularization parameters λ 1 and λ 2 when NLRTATV performed on DC Mall data under noise free setting, respectively, with the other parameters fixed at optimal values.It can be observed that, as λ 1 and λ 2 vary in a wide range, the MPSNR value is rather stable and relatively stable correspondingly.Actually, we performed NLRTATV on several other experimental cases and found a similar behaviour as shown in Figure 7. Therefore, λ 1 can be chosen as a value in [0.1, 1] and λ 2 can be chosen as a value in [0.005, 0.05] in real implementation.For simplicity, in all the experiments conducted in this work, we set λ 1 and λ 2 as 0.3 and 0.04, respectively.
Since the nonlocal block matching operation is used in NLRTATV, the number of similar patches in each group is another important parameter that needs to be carefully considered.Figure 8 depicts the MPSNR and MSSIM gains versus the number of similar patches when NLRTATV performed on DC Mall data under a noise free setting.Here, we can observe that, as the number of similar patches increases to a value larger than 10, the curves of MPSNR and MSSIM both became stable.For simplicity, in all of the experiments, the number of similar patches was fixed to 30.

Conclusions
In this paper, we have proposed a novel method named NLRTATV for dealing with the problem of HSI super-resolution by using tensor structural modeling.The proposed method has considered the global correlation across spectral domain, the nonlocal self-similarity across spatial domain, and the local smooth structure across both spatial and spectral domains of the HSI cube by combining the low-rank tensor and 3D total variation regularizations.Extensive experimental study on three HSI datasets have demonstrated the superior performance of the proposed NLRTATV over several popular methods, which clearly shows the advantages of NLRTATV in exploring the intrinsic characteristics of the HSI cube.
Along this line, several desirable research directions can be conducted for future study.Firstly, though the proposed NLRTATV has only tested on the experiments with Gaussian blurring kernel, which is the situation in which no prior information on the blurring operator is known, a similar idea used in [18] could be incorporated into our model for tackling HSI super-resolution tasks with unknown blurring kernels.Secondly, it could be interesting to extend the deep learning ideas of [44] and [45] to design a deep tensor architecture to learn the more intrinsic characteristics of the HSI cube, and, as a result, to provide much better super-resolution reconstruction results.Thirdly, the problem of hyperspectral and multispectral data fusion has received much attention in recent years.It would be also interesting to extend the presented tensor modeling technique for dealing with such problems, which could give more effective methods for enhancing the spatial resolution of hyperspectral data.
recovered by our NLRTATV method preserve clearer and sharper edges compared with the other images shown by (b-e) in Figures4-6.One can zoom in on each figure for more visual comparison details.

Table 1 .
Super-resolution reconstruction results by different methods under noise free setting.