TNNG: Total Nuclear Norms of Gradients for Hyperspectral Image Prior

: We introduce a novel regularization function for hyperspectral image (HSI), which is based on the nuclear norms of gradient images. Unlike conventional low-rank priors, we achieve a gradient-based low-rank approximation by minimizing the sum of nuclear norms associated with rotated planes in the gradient of a HSI. Our method explicitly and simultaneously exploits the correlation in the spectral domain as well as the spatial domain. Our method exploits the low-rankness of a global region to enhance the dimensionality reduction by the prior. Since our method considers the low-rankness in the gradient domain, it more sensitively detects anomalous variations. Our method achieves high-ﬁdelity image recovery using a single regularization function without the explicit use of any sparsity-inducing priors such as (cid:96) 0 , (cid:96) 1 and total variation (TV) norms. We also apply this regularization to a gradient-based robust principal component analysis and show its superiority in HSI decomposition. To demonstrate, the proposed regularization is validated on a variety of HSI reconstruction/decomposition problems with performance comparisons to state-of-the-art methods its superior performance.

In this paper, we propose a regularization scheme that directly and simultaneously considers correlation in the spatio-spectral domains. We name the regularization function as Total Nuclear Norms of Gradients (TNNG), which calculates nuclear norms for all planes in the gradient domain of a HSI to exploit correlations in all spatio-spectral directions and obtains a low-rank approximation of its structure. The function utilizes the a priori property, where each gradient possesses strong inter-band correlation, and it directly controls each gradient that expresses smoothness in the spatial direction and correlation in the spectral direction. We also apply the TNNG regularization scheme to a Gradient-based Robust Principal Component Analysis (GRPCA) and show its superior performance in image decomposition.

1.
Exploiting the low-rankness of a global region (i.e., a high-dimensional matrix) allows us to enhance the dimensionality reduction by the prior.

2.
Inherently, HSIs have little correlation in oblique directions within the xz and yz planes. We efficiently exploit this property through our regularization. 3.
Since our method considers the low-rankness in the gradient domain, not the lowrankness in the image domain, it more sensitively detects anomalous variations.
Another notable property in our regularization is that our method does not explicitly use any sparsity-inducing norms such as 0 or 1 norms. Taking the nuclear norms on all the xz and yz planes, we implicitly reduce variations on spatial domains as well. In the end, our simple regularization simultaneously exploits the low-rank property and sparsity in spatial variations as a single function. Due to this property, we can apply our regularization function to RPCA as well as the HSI restoration, and unlike conventional methods, our method is efficient for various applications in image restoration and decomposition.
We solve convex optimization problems using primal-dual splitting (PDS) [25] and apply it to various hyperspectral image processing tasks. Experimentally, we conduct HSI restoration, RPCA-based image decomposition, and HSI pansharpening problems as examples to confirm their performance.
This paper is organized as follows. After reviewing conventional methods on HSI processing in Section 2, we describe the details of the algorithm and the solutions for the proposed optimization problem in Section 3. In Section 4, we present several experiments on image reconstruction/decomposition to illustrate the superiority of the proposed method. Section 6 presents a summary of this study and future prospects.

Related Work
Many restoration methods based on TV for multi-channel images have previously been proposed [17][18][19][20][21]. Yuan et al. [21] proposed a hyperspectral TV (HTV), which is an expansion of the TV model to HSI. They also proposed a spectral-spatial adaptive HTV (SSAHTV), which introduces adaptive weights to HTV. SSAHTV uses spectral information for its weighting and shows superior performance over HTV. The structure tensor TV (STV) [17] was proposed for multi-channel images and considers low-rankness of gradient vectors in local regions. Furthermore, its arranged version (ASTV) was proposed in [18] and outperforms STV in some image restoration problems. These methods show high performance on HSI restoration, but they only consider correlation in the spatial domain explicitly. Meanwhile, spectral information is limited to indirect use in these regularization schemes; therefore, it is relatively prone to excessive smoothing because of the substantial impact received from the spatial properties. Some of them consider the spectral gradients, few methods exploit the low-rankness in the gradient domain by a single regularization function.
It is well known that there exist high correlations among HSI along the spectral direction, especially in aerial images, as each spectral signature can be well represented by a linear combination of a small number of pure spectral endmembers, which induces the low-rank property. Low-rank matrix recovery (LRMR) [23], which utilizes the low-rankness of local blocks in HSI, has been shown to have higher performance than SSAHTV and even VBM3D [26], which is one of the most superior methods for non-local noise reduction. Furthermore, the spatio-spectral TV (SSTV) was proposed in [27]; it applies TV to the HSI gradient in the spectral direction. It is a regularization scheme that indirectly considers spatial correlation through the application of TV to the spectral gradient. Although it is a simple method, SSTV is confirmed to be superior to LRMR. Contrary to STV and ASTV, noise tends to remain in the restoration results of LRMR and SSTV because of the weak influence from the spatial properties. The methods such as LRTV [22], HSSTV [28], and LRTDTV [29] attempt to improve the performance of STV, SSTV, and its variants by introducing new regularization. However, future tasks are still to be elucidated, such as over-smoothing.
Recent DNN-based approaches have been successful for 2D image restoration tasks such as super-resolution [30,31]. Although there are a few successful methods for HSI such as [32][33][34][35][36], the superiority of DNN for HSI restoration is limited. This is due to the following reasons: (1) Acquiring high-quality large HSI data sets is not straightforward.
(2) DNN-based methods are sensitive to attributes such as the number of spectral bands, spectral ranges of wavelengths, types of sensor, and thus such a domain mismatch in attributes between training and test data often degrades performance. On the other hand, regularization-based approaches like ours are more flexible and can be applied to many restoration tasks, such as restoration, unmixing, and classification.

Low-Rankness of Spectral Gradients
Letting the spatial coordinates be x, y, and spectral coordinates z, we define "front" as the direction in which we view the xy planes along the z axis. The "top" and "side" are defined as the directions in which we view the xz and yz planes, respectively. Here, we define x ∈ R n v ×n h ×B to be a HSI with an image size of n v × n h and number of bands B. Furthermore, we define G v ∈ R n h ×B×3n v to be a combined 3D gradient image where the three gradient images for x (w.r.t. vertical, horizontal, and spectral directions) observed from the top view, are concatenated to form a 3D cuboid. Similarly, we define G h ∈ R n v ×B×3n h to be the 3D cuboid of the gradient images observed from the side view. We will explain detailed formulation in Section 3.2.
We represent the sum of ranks for all single-structure planes in G v and G h as follows: where G (i) v ∈ R n h ×B represents the i-th plane extracted from G v and similarly G (i) h ∈ R n v ×B is the i-th plane extracted from G h . Given that G v , which is observed from the top view, possesses size B in the vertical direction and size n v in depth, the total number of G h is 3n h . Since (1) uses rank with a discrete value and is a non-convex function, the minimization problem with (1) is NP-hard. Therefore, we incorporate the nuclear norm ( · * = ∑ k σ k ), which is the optimal relaxation of the rank constraint, and substitute for (1). We refer to this regularization as the Total Nuclear Norms of Gradients (TNNG). TNNG is used as regularization to approximate G v and G h through a low-rank model.
• ) is performed on some G • approximately forms a low-rank structure. In S (i) h of the noisy HSI, there is energy dispersion, which violates the low-rank structure, while there is energy concentration in a portion of S (i) h in the restored HSI. Therefore, one can see that the restored image has a low-rank structure and more closely approximates the original HSI. As this property can be seen in most images, it is expected that the low-rank approximation in the gradient domain using TNNG will work for restoration.

TNNG Regularization
In this section, we design G v and G h , and specifically formulate TNNG. We redefine a HSI with an image size of N = n v · n h and the number of bands B as a vector x = [x 1 , . . . , x B ] ∈ R NB . A single plane x j ∈ R N (j = 1, . . . , B) represents the image of the j-th band of x, and x j represents the transposed x j .
We define the linear operator D to find the spatio-spectral gradients. Let the vertical and horizontal difference operators be D v , D h ∈ R NB×NB , and let the spectral difference operator be D s ∈ R NB×NB as: where I ∈ R N×N is an identity matrix, and O ∈ R N×N is a zero matrix. Using these single-direction operators, we define the spatio-spectral difference operator D ∈ R 3NB×NB as follows: Then, we calculate the spatio-spectral gradients of HSI, Dx ∈ R 3NB . This procedure is illustrated in the left part of Figure 2. D v x, D h x, and D s x ∈ R NB represent the vectorized versions of the three 3D components (vertical, horizontal, and spectral, respectively) of the gradient image. Next, we introduce two operators P v , P h ∈ R 3NB×3NB , which rearrange pixels so that the upper and side planes of Dx are located in the front: where P v is a matrix that rotates each of D v x, D h x, and D s x such that the yz, xy, and zx planes face in the directions of "front", "side", and "top", respectively, and similarly P h has a role of rotating the cuboids such that the zx, xy, and yz planes faces in the directions of "front", "side", and "top", respectively (see the right part of Figure 2). In addition, we use the operator Γ, which converts a vector into a 3D cuboid, to rewrite the vectors P v Dx and P h Dx ∈ R 3NB as follows: The right side of Figure 2 shows an image diagram when G v and G h are designed using D, P v , and P h . Rewriting (2) using D, P v , and P h defined by (4) and (5), TNNG in (2) is expressed as follows: where Γ(P v Dx) (i) * represents the nuclear norm of the i-th plane in the 3D cuboid Γ(P v Dx).

HSI Restoration Model
In this section, we formulate an image restoration problem. The HSI observation model is represented by the following equation using an original image x ∈ R NB and an observed image y ∈ R K : where A ∈ R K×NB (K ≤ NB) is the linear operator that represents deterioration (for example, it represents random sampling in the case of compressed sensing [37,38]), and n ∈ R K represents additive noise. Based on the observation model, the convex optimization problem for the HSI restoration using TNNG can be formulated as follows: Here, λ is the weight of TNNG, C = [ρ min , ρ max ] NB represents the box constraint of the pixel values in the closed convex set, and ρ min , ρ max represent the minimum and maximum pixel values taken by x, respectively. In addition, F y is a data-fidelity function, and it is necessary to select a suitable F y depending on the observed distribution of noise. We assume that the additive noise follows a Gaussian distribution, and thus we use the 2 -norm to express the convex optimization problem of Problem (9) as follows:

Gradient-Based Robust Principal Component Analysis
We apply TNNG to 3D Robust Principal Component Analysis (RPCA). RPCA [39] has been used in many computer vision and image processing applications. The goal of RPCA in the case of 2D images is to decompose an image matrix Y ∈ R M×N to a low-rank component L ∈ R M×N and a sparse residual S ∈ R M×N by solving the following equation: We extend this to HSI image decomposition by applying TNNG as follows: By solving (12), we achieve the decomposition of the observed measurement y ∈ R NB as y = x + s, where x ∈ R NB is low-rank in the gradient domain, and s ∈ R NB is the sparse component. The two constraints in (12) guarantee the exact reconstruction and the reasonable range of pixel values. Unlike the existing 3D RPCA methods [40,41], we analyze the low-rankness of an HSI in the gradient domain. As it is gradient-based, it also has the capability of excluding sparse anomalies more sensitively, so it is expected that the performance of the decomposition improves. We call this Gradient-based RPCA (GRPCA). This problem is also convex optimization, and thus one efficiently obtains a global optimum solution by convex optimization algorithms.

HSI Pansharpening Model
Pan sharpening is a technique for generating an HSI with high spatial and spectral resolutions by combining a panchromatic image (pan image) with a high spatial resolution and a HSI with a high spectral resolution. It is important in the field of remote sensing and earth observation.
Let x ∈ R NB be a true HSI with high spatial resolution, y hs ∈ R N B (N < N) be an observed HSI with low spatial resolution, and y pan ∈ R N be an observed gray-scale pan image. The observation model of HSI for pansharpening can be formulated as follows: where B ∈ R NB×NB and S ∈ R N ×NB represent the lowpass operator and sub-sampling operator, respectively. The matrix R ∈ R N×NB represents an average operator. In our setting, we simply average all bands to makes a gray-scale image. n hs ∈ R N B and n pan ∈ R N represent additive noise. The goal of the pansharpening problem is to recover the full spatial and spectral resolution image x, which is similar to u, from the observed two images y hs and y pan . Using the proposed regularization defined by (7), we adopt the following convex optimization problem to estimate x.
where hs and pan are user-defined parameters that controls the fidelity of x.

Optimization
In this study, we adopt primal-dual splitting (PDS) [25] to solve the convex optimization problems in (10), (12), and (14). PDS is an iterative algorithm to solve a convex optimization problem of the form where F represents a convex function where the gradient is β-Lipschitz continuous and differentiable, G and H represent the non-smooth convex functions where the proximity operator can be efficiently calculated, and L is a linear operator. Proximity operator [42,43] is defined using the lower semi-continuous convex function ψ and index γ > 0, as The PDS algorithm to solve Problem (15) is given as follows [25,44,45]: where ∇F is the gradient of F, x is the primal variable, z is the dual variable of Lx, v is the mediator variable to calculate z, and the superscript (k) indicates the k-th iteration.
The algorithm converges to an optimal solution of Problem (15) by iteratively solving (17) under appropriate conditions on γ 1 and γ 2 .
In the paper, we explain only the procedure to solve Problem (10), but one can also solve Problem (12) and (14) in a similar way. The respective algorithms are shown in shown in Algorithm 1 and Algorithm 2. To apply PDS to Problem (10), we define the indicator function ι C with a closed convex set as follows: We use this indicator function to turn Problem (10) into a convex optimization problem with no apparent constraints as follows: Problem (19) is formed by the convex functions allowing efficient calculation of the proximity operator. To represent (19) in the same format as Problem (15), to which PDS can be applied, we separate each term of Problem (19) into F, G, and H as follows: The function F : R NB → R is formed only by the differentiable 2 -norm, and thus its gradient ∇F is obtained by The function G : R NB → R ∪ {∞} becomes the indicator function ι C . The prox γ 1 G of (17) becomes prox γ 1 ι C : R NB → R NB , the projection that is the proximity operator to the box constraint. If we set the projection to prox γ 1 ι C (x) = P C (x), then the projection P C (x) can be given, for i = 1, . . . , NB, as follows: We perform the operation separately for each element x i in x. The function H : R 3NB+3NB → R corresponds to the regularization function, and the dual variable z and linear operator L can be given as follows: Here, z 1 and z 2 are dual variables corresponding to P v Dx and P h Dx, respectively. Given that TNNG uses a nuclear norm, the prox 1 γ 2 H performs soft-thresholding with respect to singular values of each matrix G (i) (7). Here, the singular value decomposition of G (i) • . Thus, the proximity operator, prox 1 γ 2 · * is given as follows: where diag(·) represents a diagonal matrix with diagonal elements (·), and M is the maximum number of singular values.
Based on the above discussion, we can solve the convex optimization problem of Problem (10) using PDS, whose steps are shown in Algorithm 3. The calculation for the projection P C (step 4) is indicated in (22), and the proximity operator of nuclear norms (steps 7, 8) is given by (24). We set the stopping criterion to |x (k+1) − x (k) |/|x (k+1) | < 1e − 5.
We turn Problem (12) into a convex optimization problem with no apparent constraints as follows: Problem (25) is formed by the convex functions allowing efficient calculation of the proximity operator. To represent (25) in the same format as Problem (15), to which PDS can be applied, we separate each term of Problem (25) into F, G, and H as follows: Steps 12 of Algorithm 1 is the proximity operator, prox 1 γ 2 · 1 is given as follows: for i = 1,. . . ,NB, where sgn(·) denotes the sign of (·), prox γι {y} (·) is y.

14:
Update iteration: k ← k + 1; 15: end while 16: output : x (k+1) We turn Problem (14) into a convex optimization problem with no apparent constraints as follows: min Problem (28) is formed by the convex functions allowing efficient calculation of the proximity operator. To represent (28) in the same format as Problem (15), to which PDS can be applied, we separate each term of Problem (28) into F, G, and H as follows: Steps 12 and 13 of Algorithm 2 is updated by using the following 2 ball projection: where is the radius of the 2 -norm sphere. And,

Results
To show the effectiveness of the proposed method, we conducted several experiments on compressed sensing (CS), signal decomposition, and pansharpening. There results were compared the results with some conventional methods.
We used PSNR and Structural Similarity Index (SSIM) [46] as objective measures of CS and signal decomposition, and SAM [47], ERGAS [48], and Q2n [49] for pansharpening. The evaluation methods for SAM, ERGAS,and Q2n are explained below. y ∈ R NB is a grandtruth image, x ∈ R NB is a restored image.
• ERGAS is defined as follows: where d is resolution ratio between HSI with low spatial resolution and gray-scale pan images, µ k is the sample mean of k-th band of y.
• Q2n is defined as follows: where the definition of each variable isx = 1 P ∑ P j=1 x j ,ȳ = 1 P ∑ P j=1 y j , For PSNR, SSIM, and Q2n, higher values indicate better performance, and for SAM and ERGAS, lower values indicate better performance. We used 17 HSIs for the evaluation available online such as [50,51], which have 102 to 198 spectral bands. Variation on the dynamic range of HSIs is very high, and thus for consistent evaluation on images with various ranges, we linearly scale the images to the range [0, 1], and then the standard metrics are used. For CS and signal decomposition, we experimentally adjusted the weight λ and adopted the values where the PSNR was the highest value. For pansharpening, hs was calculated from the noise levels of n hs and y hs , and pan was calculated from the noise levels of n pan and y pan . For the parameters of the conventional methods, we experimentally adopted parameters at which the PSNR was the highest value.

Compressed Sensing
We conducted experiments on compressed sensing, in which the observation model is represented by y = Au + n, and the linear operator A ∈ R rNB ×NB denotes a known random sampling matrix (r is the ratio of sampling points). The convex optimization problem for the compressed sensing reconstruction using the regularization function R is as follows: min We compared the performances by replacing the regularization function. Additive noise n is fixed to the Gaussian noise with standard deviation σ = 0.1, and the sampling ratio is set to r = 10, 20 and 30%. Table 1 shows the numerical results of the compressed sensing reconstruction. As LRTDTV is not designed for the compressed sensing and yielded inferior results in our implementation, we did not include it in the experiment. The proposed method reported the 0.9-2.2 dB higher values in PSNR on the average and the best SSIM scores for most of the images. Moreover, we confirmed the strong superiority of the proposed method for any sampling ratio r. Figure 3 shows the experimental results for the sampling ratio r = 20%. Each PSNR[db] is 31.02 for (c), 32.75 for (d), 35.02 for (e), and 35.86 for (f). From this figure, we can observe increased blurring in ASTV and HSSTV to recover the resolution loss and remove the noise caused by the reconstruction. In the experiments on ASTV and HSSTV, the blurring was reduced by decreasing the weight λ, but we confirmed that the insufficient HSI reconstruction led to a decreased evaluation in PSNR and SSIM. For SSTV, the noise remained in the restored results. SSTV has weak smoothing effects in the spatial direction; therefore, the noise remains regardless of the weight λ. The proposed method has better restoration results than the conventional methods because the detailed areas are maintained while the noise and loss caused by the reconstruction are reduced.

Image Decomposition
To evaluate our GRPCA, we first applied it to a standard sparse component decomposition. Following conventional methods such as [41], we randomly selected 30% of pixels in a HSI and replaced them with random values with a uniform distribution, which is a typical example to measure the performance of RPCA. The goal is to decompose the corrupted image to a latent low-rank HSI and the random values. We evaluated the performance with PSNR and SSIM between the corrupted image y and the obtained HSI x in (12) and compared it with the low-rank based decomposition methods, TRPCA [41] and LRTDTV [29].
We show the results in Table 2, where one can confirm that our method outperforms the conventional methods for most of the HSIs. This result shows that the proposed regularization function is able to correctly evaluate the image based on its low-rank nature. In addition, we believe that the proposed regularization function can evaluate the low-rank property of the image better than the conventional methods.

Mixed Anomaly Removal
We have tested the decomposition capability for images with mixed anomalies composed of dead lines/circles and impulsive noise. We compared our performance with TRPCA and LRTDTV. We conducted the removal of the anomaly pattern: sparsely scattered impulsive noise and dead lines/circles, where we prepare a mask for the dead lines and circles and generate images with dead pixels by multiplying the mask and an image (See Figure 4). We assume that anomalies sparsely appear, and then they can be removed in the low-rank component x in (12). We show our decomposition results in Figure 5 and the objective comparison in Table  3, where one can confirm that our method outperforms the conventional methods for most of the HSIs. Under mixed anomalies, we experimentally confirmed that TNNG maintains stable performance compared with the conventional methods, while the performance depends on experimental conditions and types of images in LRTDTV and TRPCA. This indicates that the image can be restored by using the spectral information of other pixels even if there is a defect in the gradient region due to the evaluation of low rankness.

Pansharpening
We compared the proposed method defined by (14), with four conventional pansharpening methods: GFPCA [52], BayesNaive and BayesSparse [53], and HySure [54]. We conducted experiments on three types of noise levels, Type I (σ hs = 0.05, σ pan = 0.01), Type II (σ hs = 0.1, σ pan = 0.01), Type III (σ hs = 0.15, σ pan = 0.01), where σ hs and σ pan are the standard deviation of noises added on a hyperspectral and a pan image, respectively. Table 4 shows the comparison with the conventional methods. One can see from the results that our method outperforms the conventional methods in many cases. For subjective comparison, we show an example of the estimated HSIs from the data with Type I on PaviaC as shown in Figure 6.
The estimated HSIs with high spatial resolution are shown in Figure 7. Each PSNR[db] is 25.45 for (b), 26.90 for (c), 28.86 for (d), 30.0 for (e), and 31.16 for (f). We divided the band of these HSIs into three parts and visualized the average of each as RGB images. The details of the results obtained by GFPCA and BayesNaive are over-smoothed and its shading is unnatural. BayesSparse and HySure can estimate sharper images than those images, however, the texture of trees and buildings is lost. In contrast, the proposed method is particularly accurate in the restoration of the shading on the upper right and the texture of the central building on the upper left. As shown in (14), the optimization problem for pan-sharpening is the equation that minimizes the regularization function using the proposed method, and the number of hyperparameters is reduced by using constraints on HS and PAN images.

Discussion
Computing Time All methods are executed using Matlab 2019a on a MacBook Air (Retina, 13-inch, 2019), 1.6 GHz dual core Intel Core i5, and 16 GB 2133 MHz LPDDR3. The proposed method takes much more time than the conventional method. This is because the singular value decomposition of equation (7) needs to be performed 3n v + 3n h times. Speeding up the processing is a major issue for the future, and we plan to pursue this point in the future.

Conclusions
In this paper, we proposed a new regularization scheme for HSI reconstruction called TNNG. We focused on the HSI property where the gradient image possesses strong interband correlation and utilized this property in the convex optimization. TNNG is a regularization scheme aimed at enforcing the low-rank gradient structure when observed from the spectral direction. We achieved high-performance HSI restoration by solving this regularized convex optimization problem using PDS. In experiments, we conducted compressed sensing reconstruction, image decomposition, and pansharpening. While the efficiencies of conventional methods depend on applications, the proposed method is superior to the conventional methods through both subjective and objective evaluations in various applications. In the proposed method, the process is done for a whole image without exploiting local features. It would be necessary for the future to consider further advances in performance by introducing local processing. And, the improvement of the calculation speed is also an issue.