Next Article in Journal
Recognition of Severe Convective Cloud Based on the Cloud Image Prediction Sequence from FY-4A
Previous Article in Journal
Novel Land Cover Change Detection Deep Learning Framework with Very Small Initial Samples Using Heterogeneous Remote Sensing Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multispectral and Hyperspectral Image Fusion Based on Joint-Structured Sparse Block-Term Tensor Decomposition

1
School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
2
The School of Electronic Engineering, Xidian University, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(18), 4610; https://doi.org/10.3390/rs15184610
Submission received: 12 July 2023 / Revised: 10 September 2023 / Accepted: 15 September 2023 / Published: 19 September 2023

Abstract

:
Multispectral and hyperspectral image fusion (MHF) aims to reconstruct high-resolution hyperspectral images by fusing spatial and spectral information. Unlike the traditional canonical polyadic decomposition and Tucker decomposition models, the block-term tensor decomposition model is able to improve the quality of fused images using known endmember and abundance information. This paper presents an improved hyperspectral image fusion algorithm. Firstly, the two abundance matrices are combined into a single bulk matrix to promote structural sparsity by introducing the L 2 , 1 -norm to eliminate the scaling effects present in the model. Secondly, the counter-scaling effect is eliminated by adding the L 2 -norm to the endmember matrix. Finally, the chunk matrix and the endmember matrix are coupled together, and the matrix is reorganized by adding the L 2 , 1 -norm to the matrix to facilitate chunk elimination and solved using an extended iterative reweighted least squares (IRLS) method, focusing on the problem of the inability to accurately estimate the tensor rank in the chunk-term tensor decomposition model and the noise/artifact problem arising from overestimation of rank. Experiments are conducted on standard and local datasets, and the fusion results are compared and analyzed in four ways: visual result analysis, metric evaluation, time of the algorithm, and classification results, and the experimental results show that the performance of the proposed method is better than the existing methods. An extensive performance evaluation of the algorithms is performed by conducting experiments on different datasets. The experimental results show that the proposed algorithm achieves significant improvements in terms of reconstruction error, signal-to-noise ratio, and image quality compared with the existing methods. Especially in the case of a low signal-to-noise ratio, the proposed algorithm shows stronger robustness and accuracy. These results show that the proposed algorithm has significant advantages in dealing with multispectral high-resolution hyperspectral data.

1. Introduction

In the field of remote sensing, hyperspectral imaging, and multispectral imaging systems occupy an important position. Hyperspectral imaging has been widely used in mineral detection, environmental detection [1], climate monitoring [2], and classification [3]. Hyperspectral images are obtained by hyperspectral sensors capturing sunlight reflected from an object or scene. During the acquisition of the image, it is affected by the light, resulting in a reduced spatial resolution of the image [4], but with rich spectral information. In contrast, multispectral images have only a few bands, but the images are rich in spatial information. In the past two decades, a lot of efforts have been made to obtain hyperspectral images with high spatial resolution, and image fusion is the most economical, applicable, and effective means among all methods. The fusion mainly extracts effective high spatial resolution information from source images such as multispectral images and panchromatic images (PAN) and incorporates it into hyperspectral images. Fusion into hyperspectral images to obtain hyperspectral images with high spatial resolution helps to accurately identify observation scenes with high spatial resolution, etc. The most common fusion method is to obtain high spatial resolution hyperspectral images (HRHS) by fusing high spatial resolution multispectral images (HR-MSI) and low spatial resolution hyperspectral images (LR-HSI). At the same time, when an image is reduced in size, the redistribution of pixel information and interpolation algorithms results in a loss of detail or distortion of the original image, a process known as the “scaling effect”. Conversely, when a low-resolution image is scaled up to a higher resolution, this is often accompanied by a loss of information, and the image may appear blurred, distorted, or show unrealistic details, a process known as the “counter-scaling effect”.
In the early studies of fusion algorithms, sharpening methods were mainly used to fuse PANs and MSIs possess high spatial resolutions [5]. In 2019, Meng et al. [6] reviewed the development history of pansharpening and divided it into two main categories: component replacement (CS) [7,8] and multi-resolution analysis (MRA) [9,10,11], among which the more famous methods are based on hue–intensity–saturation (HIS) [12], wavelet-based techniques [13,14], principal component analysis (PCA) [15], the Laplace pyramid [16], etc. Yokoya et al. [17] found that sharpening fusion methods does not improve the spatial resolution of each spectral band very well after conducting extensive experiments. There is also a growing interest in the study of multispectral and hyperspectral image fusion (MHF) algorithms. A method often referred to as the multi-sharpening process involves the most popular techniques, including matrix factorization [18,19,20,21,22,23,24] and tensor decomposition [25,26,27,28,29,30,31,32]. In addition, some new research results have appeared in recent years. For example, in 2020, Feng et al. [33] proposed a two-by-two matching based on a spatial–spectral graphical regularization operator for the hyperspectral band selection method, which encodes its geometric structure by constructing a spatial–spectral information map using spatial–spectral adjacencies. In 2021, Huang et al. [34] proposed a subspace clustering algorithm for hyperspectral images, which improves the model’s noise immunity by introducing an adaptive spatial regularization method. In 2022, Qin et al. [35] proposed a multi-focus image fusion method based on sparse representation (DWT-SR) by decomposing multiple frequency bands and shortening the algorithm’s running time by utilizing GPU parallel computing, which features high contrast and richer spatial details. In 2022, Zhang et al. [36] proposed a sparse demixing method in order to avoid inaccuracy in extracting the endmember information and to retain the original domain super-pixel inter-correlation and spectral variability in the approximated image domain. In 2023, Y et al. [37] designed a new attentional reinforcement learning network to accomplish the spatio-temporal prediction of wind power.
The matrix-based decomposition method is widely used in MHF problems because of its clear motivation and excellent performance. After matrix decomposition, the potential factors have a strong physical meaning, such as endmember and abundance are non-negative [22,38,39]. In 2011, Yokoya et al. [18] used a coupled non-negative matrix decomposition (CNMF) unmixing method to decompose hyperspectral and multispectral data into endmember matrices and abundance matrices alternatively; the cost function in the model is non-convex, resulting in an unsatisfactory algorithm fusion. In the same year, Kawakami et al. [40] decomposed the target image HRHS into a basis matrix and a sparse matrix and then recombined the spectral basis matrix of LR-HSIs and the coefficient matrix of HR-MSIs to obtain them. In 2014, Simoes et al. [21] introduced total variation (TV) regularization in a subspace-based HRHS fusion framework, which effectively eliminated the noise. In 2015, Lanaras et al. [41] completed the acquisition of HRHSs by jointly unmixing two input images into a pure reflection spectral matrix and a mixed sparse matrix. In 2018, Kahraman et al. [42] proposed a graph regularization L 1 / 2 -sparse constrained non-negative matrix decomposition-based image fusion method that enhances the fusion performance by using the L 1 / 2 and streamwise regularization to enhance the unmixing. In 2021, Mu et al. [43] introduced a spectral-constrained regularization term to avoid spectral distortion and maintain spectral integrity based on the coupled non-negative matrix decomposition (CNMF) algorithm. In 2022, Khader et al. [44] proposed a new non-negative matrix factorization decomposition-inspired deep unfolding network (NMF-DuNet) to fuse LR-HSIs and HR-MSIs using a non-negative coefficient matrix and orthogonal transformation coefficient constraints. In 2022, Han et al. [45] proposed a fusion model based on spatial nonlocal similarity and low-rank prior learning to solve the problem of the existing methods that cannot simultaneously consider the spatial and spectral domain structure of hyperspectral image cubes; however, there problems still exist, such as the partial loss of spectral correlation, and the complexity of the algorithm is high. In 2023, Lin et al. [46] proposed a highly interpretable deep HSI-MSI fusion method based on a 2D image CNN-based Gaussian denoiser, which can fuse the LR-HSIs and HR-MSIs under a known imaging system and achieve good fusion performance.
While there are many other improved algorithms based on matrix decomposition, they all require the conversion of 3D data into a 2D data format, resulting in difficult exploitation of the spatial–spectral correlation in LR-HSIs. In contrast, the fusion method based on tensor decomposition can effectively avoid such defects and becomes a new research hotspot for the MHF problem. It regards LR-HSIs and HR-MSIs as a 3D data cube consisting of two spatial dimensions and one spectral dimension (i.e., spatial × spatial × spectral), where the spectral dimension is the key to identifying objects in the scene. Based on this fact, in 2018, Kanatsoulis et al. [47] proposed a coupled tensor decomposition-based (CTD) framework to solve the MHF problem, and, using canonical polyadic decomposition (CPD) uniqueness, they proved that HRHSs can be recovered under mild conditions. Xu et al. [48] constructed the nonlocally similar patch tensor of LR-HSIs based on HR-MSIs, calculated the smoothing order of all patches used for clustering, preserved the relationship between HRHSs and HR-MSIs by CPD, and proposed a multispectral and hyperspectral fusion algorithm based on the CPD decomposition of the nonlocally coupled tensor. In 2020, Prévost et al. [25] extended a similar idea to the Tucker decomposition for solving MHF problems with better results. In the same year, Xu et al. [27] proposed a one-way TV-based fusion method, which decomposes the HRHS into a core tensor and three modal dictionary forms, and imposes the L 1 -norm and the one-way TV characterizing sparsity and piecewise smoothness, respectively. In 2021, Peng et al. [49] introduced a global gradient sparse nonlocal low-rank tensor decomposition model to characterize the nonlocal and global properties of HRHSs, which can effectively capture the spatial and spectral similarity of the nonlocal cube, and introduced spatial–spectral total variation (SSTV) regularization to reconstruct HRHSs on the nonlocal domain low-rank cube, thus obtaining global spatial piecewise smoothness and spectral consistency. In 2022, Prévos et al. [50] proposed a fast coupling method based on the Tucker approximation into this work to take full advantage of the coupling information contained in hyperspectral and multispectral images of the same scene. After a large number of experiments, CPD and Tucker have been proven to have excellent recoverability on the MHF problem, and the frameworks of both have shown excellent performance; however, the failure to utilize the physical interpretation of the potential factors makes it difficult for both to utilize the a priori information to improve the algorithmic fusion effect. In addition, a new tensor decomposition, called tensor ring (TR) decomposition, is proposed in [51], which decomposes higher-rank tensors into cyclically contracted third-rank tensor sequences. This method inherits the representation capability of CPD and extends the low-rank exploration capability of Tucker decomposition. In 2022, Chen et al. [52] proposed a factorially smoothed TR decomposition model, which solved the problem that the spatial–spectral continuity of HRHSs could not be preserved in the existing TR decomposition model; however, the algorithm has high requirements on hardware equipment in the case of a large amount of spectral data. In summary, CPD and Tucker decomposition are not necessarily better than the methods for low-rank matrix estimation, and TR decomposition is not necessarily the best choice.
In contrast, linear mixed models (LMM) using matrixed LR-HSI and HR-MSI low-rank structures can circumvent such problems, and their endmembers and abundances possess a strong physical interpretation, which can be further improved by imposing structured constraints or regularization on them, especially in the presence of noise, where this information is particularly important. In 2018, Li et al. [32] demonstrated, under a relatively restricted condition, that the coupled matrix decomposition model is able to identify recovered HRHSs. For example, if the abundance matrix of an HR-MSI is sparse, i.e., under tensor notation, the LMM can be rewritten to approximate the 3D spectral data as the sum of the outer products of the endmembers (vectors) and the abundance maps (matrices). This approach can fully match the matrix-vector third-order tensor decomposition, which decomposes the tensor into the sum of the component tensors, each of which in turn consists of matrix-vector outer products [53]. The matrix-vector third-order tensor factorization can also be regarded as the block-term tensor decomposition (BTD). Therefore, matrices and vectors in the BTD can be used as abundance maps and vectors, respectively, with non-negativity and coupling properties. In 2019, Zhang et al. [54] proposed a coupled non-negative BTD model [55,56,57]. The BTD was used to decompose the spectral images to take full advantage of the physical significance of their potential factors, but the use of regularization to improve the algorithm’s performance and application in multiple scenarios was not considered. In 2021, Ding et al. [58] reformulated the recovery HRHS process as a special block-term tensor model, imposing constraints and regularization to improve the performance of the algorithm, but failed to take into account the effects between individual abundance matrices, and there were limitations in the experimental scenario. In 2022, Guo et al. [59] proposed a coupled non-negative block-term tensor decomposition algorithm based on regularization. Consider the sparsity and smoothness of the individual abundance and the endmember matrix; to these, add the L 1 -norm and bidirectional TV. However, the L 1 -norm has a non-smoothness property and is not the best choice for the regularizer. The algorithm takes too long to run. Therefore, a joint-structured sparse block-term tensor decomposition model (JSSLL1) based on joint-structured sparsity is proposed by introducing the L 2 , 1 -norm to promote the structured sparsity of the block matrix composed of potential factors and eliminate the scaling effects present in the model, The L 2 -norm is introduced to the endmember to counteract the counter-scaling effects in the decomposition model. Finally, the block matrix and endmember are coupled together, and the L 2 , 1 -norm is imposed on them to promote the elimination of the block matrix. The contributions of the proposed algorithm are as follows.
  • During the reconstruction of an HRHS, in order to promote the sparsity of the abundance matrix and counteract the scaling effect existing in the abundance matrix during image recovery, the two abundance matrices are superimposed to obtain the chunk matrix, and the blending parametrization L 2 , 1 -norm is added to it.
  • In the process of reconstructing an HRHS, the counter-scaling effect of the endmember vector is considered and avoided by adding the L 2 -norm to the endmember vector.
  • In the block-term tensor factorization model, the accuracy of the rank is a key factor that is unknown a priori. Inaccurate rank estimation may cause noise or artifacts to interfere with the experimental results. In order to solve this problem, this paper introduces the reweighted coupling of the block matrix and the endmember matrix and introduces the mixing norm. Through this method, the influence of the inaccuracy of the rank estimation can be reduced while maintaining flexibility, so as to improve the reliability of the experimental results.
Below is the remainder of this article. Section 2 describes the background and related works. Section 3 describes the MHF method and optimization scheme based on the joint-structured sparse LL1 model. Section 4 describes the experimental setup, visualization of experimental results, evaluation metrics, and classification results and develops the discussion. Section 5 summarizes the algorithm proposed in this paper and future improvement directions.

2. Background

Because LR-HSIs possess two spatial dimensions and one spectral dimension, they can be regarded as third-order tensors. In recent years, with the continuous improvement of the MHF algorithm, the tensor decomposition method has been widely used because of its ability to avoid an excessive loss of spatial and spectral information. The two more common models are the CPD and Tucker models.
The CPD model [60] is a tensor rank decomposition model. The procedure is shown below, which allows the decomposing of the tensor into the total of multiple tensors of rank one. The mathematical model is as follows (note: see Appendix A for an explanation of the relevant symbols in the text).
Y = r = 1 R A r B r C r .
where A r R I × 1 , B r R J × 1 , and C r R K × 1 are second-order factor matrices; Y R I × J × K is a third-order tensor. Figure 1 represents the flow of CPD.
The Tucker model [61] is also widely used in the field of imaging, and it is a higher-order form of principal component analysis (PCA), which focuses on decomposing the tensor into a core tensor and three-factor matrices. The Tucker decomposition process is shown in Figure 2. The mathematical model is expressed as [62]:
Y = G × 1 A × 2 B × 3 C .
where A R I × n 1 , B R J × n 2 , and C R K × n 3 are the factor matrices; G R I × J × K is a third-order core tensor.
In addition to the two tensor models mentioned above, the BTD model is gradually being applied to the field of image fusion because the potential factors have a good physical interpretation, such as in the work performed previously [59]. In solving the MHF problem, compared with the CPD and Tucker models, potential factors in the BTD model, respectively, represent the abundance and the endmember, which have a physical sense of non-negativity and spatial smoothness, and contribute to the utilization of various prior knowledge in the image fusion process.
By combining the advantages of the CPD and Tucker models, the BTD of r a n k L r , M r , N r (abbreviation: LMN model) in general form can be expressed as:
Y = r = 1 R G r × 1 A r × 2 B r × 3 C r .
where A r R I × L r , B r R J × M r , C r R K × N r , and G r R L r × M r × N r . The LMN decomposition process is shown in Figure 3. The above description shows that the BTD is approximating the tensor Y by R component tensors, and each tensor is a Tucker of r a n k L r , M r , N r . If each component tensor rank is one, it can be treated as a CPD. Similarly, if R = 1 , then Equation (3) is the Tucker model of r a n k L r , M r , N r . In other words, the BTD can decompose a tensor into the sum of multiple component tensors, and this process can be viewed as the CPD, while each component tensor decomposition is viewed as a Tucker decomposition. This overcomes the problem that each component tensor rank must be one in the CPD and lifts the restriction that the Tucker decomposition has only one component tensor. It is a combination of the CPD and the Tucker decomposition.
Usually, the special BTD model is adopted to resolve the MHF problem. The special BTD model of r a n k L , L , 1 (abbreviation: LL1 model) is obtained when each member tensor degenerates to an L r × L r matrix, and each L r takes the same value. The decomposition process of the LL1 model is shown in Figure 4. The mathematical model is as follows:
Y = r = 1 R A r B r T c r .
where A r R I × L r , B r R J × L r , and C = c 1 , , c R R K × R . Among these, A r and B r are, respectively, the subblocks of the block matrices A and B, denoted as A = [ A 1 , , A R ] R I × r = 1 R L r , B = [ B 1 , , B R ] R J × r = 1 R L r . In general, the full column rank requirement for A and B is min I , J r = 1 R L r . Under mild conditions, the LL1 model possesses a nice property, i.e., A r B r T and c r are identifiable until the alignment and scaling ambiguities arise, which is induced as follows (Theorem 4.7 of [56]). The potential factors A r B r T and c r represent the abundance information; the former represents the abundance information of the endmember r, and the latter represents the rth endmember.
In summary, the CPD and Tucker models are both low-rank models (see Figure 1 and Figure 2), capable of “encoding” dependencies in different dimensions of the data and often used in spectral image modeling with good recoverability [25,47]. Although this is useful for recovering an HRHS, it is not necessarily better than the BTD-based MHF approach in practice. The BTD-based MHF framework can be used to introduce the structural information of potential factors using their physical interpretation, which helps to standardize the MHF criteria with a priori knowledge.

3. MHF Method Based on Joint-Structured Sparse LL1 Model

3.1. The MHF Method for the LL1 Model

The proof of how the model can be interpreted using physics is described in a separate work [59]. The LL1 model is used to model the spectral images. For all spectral images, they can be viewed as a third-order tensor of the form “space × space × spectrum”. Y R I M × J M × K H , Y H R I H × J H × K H , and Y M R I M × J M × K M are used to denote the HRHS, LR-HSI, and HR-MSI, respectively. Further, I H , J H , and I M , J M denote the spatial dimensions of the LR-HSI and HR-MSI, respectively; K H and K M denote the spectral dimensions of the LR-HSI and HR-MSI, respectively. Due to the limitation of current optical sensors, multispectral images are rich in spatial information, and hyperspectral images are rich in spectral information; i.e.:
I M I H , J M J H , K H K M .
The goal of the MHF problem is to obtain the HRHS by fusing an HR-MSI and LR-HSI in the same scene. The theoretical validation of the LMM model and the LL1 model was demonstrated in another work [59]. Thus, assuming that the abundance matrix is a low-rank matrix, the LL1 model formulation on the HRHS is:
Y S = r = 1 R A r B r T c r .
where A r R I M × L r , B r R J M × L r , and C = c 1 , , c r R K H × R , r = 1 , 2 , , R . Y S denotes the fused image.
The expansion of the tensor Y can be expressed as:
Y 1 = ( C B ) A T ,
Y 2 = ( C A ) B T ,
Y 3 = [ ( B 1 c A 1 ) 1 L 1 , , ( B R c A R ) 1 LR ] C T .
The above expressions are used in the subsequent alternating iterations to solve A , B , and C . The LR-HSI can be obtained by spatially blurring and downsampling the HRHS. Thus, the fuzzy matrix and downsampling matrix are modeled by multiplying the rows and columns of each Y : , : , k component. The mathematical model is shown below:
Y H : , : , k = P 1 Y S : , : , k P 2 T ,
k = 1 , 2 , , K H .
where P 1 R I H × I M and P 2 R J H × J M , respectively, denote the spatial ambiguity and downsampling matrices along the two spatial dimensions.
If Y conforms to the LL1 model and the spatial decay model of Equation (10) holds, there is:
Y H = r = 1 R P 1 A r P 2 B r T c r .
The HR-MSI can be modeled as an HRHS spectral band aggregation form. The mathematical model is as follows:
Y M i , j , : = P 3 Y S i , j , : ,
i , j .
where P 3 K M × K H is the waveband aggregation matrix associated with the sensor specification. Similarly, if Y conforms to the LL1 model and the spectral degradation model in Equation (12) holds, there is:
Y M = r = 1 R A r B r T P 3 c r .
Under this degenerate model, the task of reconstructing the HRHS is mainly to estimate the potential factors A r B r T and c r from Y H and Y M . The HRHS is then reconstructed using Y = r = 1 R A r B r T c r . Thus, the MHF problem based on the LL1 model can be modeled as:
min A , B , C 1 2 Y H r = 1 R ( P 1 A r P 2 B r T ) c r F 2 + 1 2 Y M r = 1 R ( A r B r T ) P 3 c r F 2 , s . t . A 0 , B 0 , C 0 .

3.2. The MHF Modeling Based on Joint-Structured Sparse

The solution to the above model has an uncomfortable nature, so constraints need to be placed on the potential factors of the above objective function, and regularization methods need to be added to improve the performance of the algorithm. Consider that the operation time, computational complexity, and convergence speed of the BTD algorithm are affected by the rank R and L. In addition to the above, too high or too low estimates of R and L may allow noise/artifacts to interfere with the desired results [63]. The sparsity of the abundance matrix and the scaling/counter-scaling problems that occur in the fusion model can also affect the performance of the algorithm. Based on the above considerations, this paper introduces a regularization term that matches the structure of the BTD model, which improves the flexibility of the model by having hierarchical node blocks and column sparsity. Firstly, the potential factors A and B are formed into a chunk matrix U , which introduces an L 2 , 1 -norm to promote structured sparsity and eliminate the scaling effects present in the model. Secondly, add a L 2 -norm value to the endmember c r , in order to counteract the counter-scaling effects that occur in the decomposition model. Finally, the chunk matrix U and c r are coupled together to form a matrix φ A , B , C of the size 2 × R . The introduction of the L 2 , 1 -norm promotes the elimination of the entire block of A and B , ensuring that the BTD algorithm can avoid noise/artifacts from interfering with the experimental results in the case of an overestimated rank. In summary, the modified scheme is expressed as follows:
min A , B , C 1 2 Y H r = 1 R ( P 1 A r B r T P 2 T ) c r F 2 + 1 2 Y M r = 1 R ( A r B r T ) P 3 c r F 2 + λ φ A , B , C 2 , 1 , s . t . A 0 , B 0 , C 0 ,
φ A , B , C = U 1 2 , 1 U 2 2 , 1 U R 2 , 1 c 1 2 c 2 2 c R 2 .
where λ > 0 is the regularization parameter, matrix A r = a r 1 a r 2 a r L r R I × L r , B r = b r 1 b r 2 b r L r R J × L r , and C = c 1 c 2 c R R K × R . R and L are the rank, and r = 1 , 2 , , R . U = [ A T B T ] T denotes the rth I + J × L r block. Using the above modification scheme, while solving the MHF problem using the LL1 model, the ranks R and L can be overestimated, which can be reduced to real values by an appropriate selection of λ . In reference to the ideas in the literature [64,65], the well-known IRLS [66] method is extended to solve Model (15).
The flow chart of the algorithm is shown in Figure 5. The two images of the LR-HSI and HR-MSI are decomposed using the LL1 model to obtain the endmember matrix C and the abundance matrix A , B . Then, A , B , and C are brought into Model (15), and the extended IRLS method is used to iteratively solve A , B , and C and bring them into Equation (6) to reconstruct the HRHS.
The procedure for solving Model (15) is as follows. Express Model (15) in a more explicit form. The mathematical model is shown below:
min A , B , C 1 2 Y H r = 1 R ( P 1 A r B r T P 2 T ) c r F 2 + 1 2 Y M r = 1 R ( A r B r T ) P 3 c r F 2 + λ r = 1 R l = 1 L a r l 2 2 + b r l 2 2 + η 2 2 + c r 2 2 + η 2 .
where η is a positive integer with guaranteed smoothness. For each potential factor A , B , and C in the model, they are convex with respect to the objective function. In addition, since each factor matrix in the regularization term is indistinguishable, the exact analytic formula cannot be obtained, and the extended IRLS method is used to solve the problem of Model (15). A hierarchical IRLS structure consisting of two independent reweighted least squares steps is identified from the regularization terms. Each step produces a separate reweighting matrix. The first matrix O 1 is composed of the inverse of the external summation term of the regularization term in Model (15), which is the block of the joint weighted A and B . The second matrix O 2 consists of the inverse of the terms summed inside the regularized terms in Model (15), which is used to balance the corresponding columns of A r and B r . The expressions O 1 and O 2 , respectively, are shown below:
O 1 k r , r = l = 1 L a r l k 2 2 + b r l k 2 2 + η 2 2 + c r k 2 2 + η 2 1 / 2 ,
O 2 k r 1 L + l , r 1 L + l = a r l k 2 2 + b r l k 2 2 + η 2 1 / 2 .
where O 1 is a diagonal matrix of order R, O 1 r , r denotes the value of the rth diagonal element; O 2 is a diagonal matrix of order R L , O 2 r 1 L + l , r 1 L + l denotes the value of the element in row r 1 L + l , column r 1 L + l .
Regarding the three potential factors, A , B , and C in Model (15) are estimated by iterative alternation.

3.2.1. Sub-Problem A

Estimating A at the k + 1 th iteration, the subproblem on A is formulated as follows:
A k + 1 = arg min A 1 2 Y H 1 S k ( P 1 A ) T F 2 + 1 2 Y M 1 M k A T F 2 + λ r = 1 R l = 1 L a r l 2 2 + b r l k 2 2 + η 2 a r l k 2 2 + b r l k 2 2 + η 2 2 + c r k 2 2 + η 2 l = 1 L a r l k 2 2 + b r l k 2 2 + η 2 2 + c r k 2 2 + η 2 .
where S = Δ C P 2 B and M = Δ P 3 C B . Equation (20) is obtained by the extended IRLS method. By deriving Equation (20) and making its derivative zero, the Sylvester equation with respect to A can be obtained as follows.
P 1 T P 1 A k + 1 S kT S k + A k + 1 M kT M k + λ O k = P 1 T Y H 1 T S k + Y M 1 T M k .
where O = Δ O 1 I L O 2 . Problem (21) is solved using the conjugate gradient (CG) algorithm [67].

3.2.2. Sub-Problem B

Similarly, the minimization problem with respect to B is as follows:
B k + 1 = arg min B 1 2 Y H 2 V k ( P 2 B ) T F 2 + 1 2 Y M 2 Z k B T F 2 + λ r = 1 R l = 1 L a r l k 2 2 + b r l 2 2 + η 2 a r l k 2 2 + b r l k 2 2 + η 2 2 + c r k 2 2 + η 2 l = 1 L a r l k 2 2 + b r l k 2 2 + η 2 2 + c r k 2 2 + η 2 .
where V = Δ C P 1 A and Z = Δ P 3 C A . Equation (23) is obtained by the extended IRLS method. By deriving Equation (22) and making its derivative zero, the Sylvester equation with respect to B can be obtained as follows:
P 2 T P 2 B k + 1 V kT V k + B k + 1 Z kT Z k + λ O k = P 2 T Y H 2 T V k + Y M 2 T Z k .
Problem (23) is solved using the CG algorithm.

3.2.3. Sub-Problem C

Similarly, the minimization problem on C is as follows:
C k + 1 = arg min C 1 2 Y H 3 Q k C T F 2 + 1 2 Y M 3 E k ( P 3 C ) T F 2 + λ r = 1 R l = 1 L a r l k 2 2 + b r l k 2 2 + η 2 a r l k 2 2 + b r l k 2 2 + η 2 2 + c r 2 2 + η 2 l = 1 L a r l k 2 2 + b r l k 2 2 + η 2 2 + c r k 2 2 + η 2 .
where Q = Δ P 2 B 1 c P 1 A 1 1 L 1 , , P 2 B R c P 1 A R 1 L R , and E = Δ B 1 c A 1 1 L 1 , , B R c A R 1 L R . Equation (25) is obtained by the extended IRLS method. By deriving Equation (24) and making its derivative zero, the Sylvester equation with respect to C can be obtained as follows:
P 3 T P 3 C k + 1 E kT E k + C k + 1 Q kT Q k + λ O 1 k = Y H 3 T Q k + P 3 T Y M 3 T E k .
Then, the CG algorithm is used to solve Problem (25).
The steps regarding the proposed algorithm are shown in Algorithm 1.
Algorithm 1 JSSLL1-IRLSCG.
Input  Y , R, L, λ > 0 , η > 0 , k = 0 .
Output Target image HRHS.
While unconverged do
  Calculate O 1 and O 2 by Equations (18) and (19);
 Update O k by quation O 1 k I L O 2 k ;
 Solve Problem (21) using the CG algorithm to update A k + 1 ;
 Solve Problem (23) using the CG algorithm to update B k + 1 ;
 Solve Problem (25) using the CG algorithm to update C k + 1 ;
k k + 1 ;
end While;
Reconstruction of images using type Y = r = 1 R A r B r T c r and acquisition of HRHS.
In summary, in the proposed algorithm, when the ranks R and L are overestimated in the model, reweighting by matrix O 1 will impose joint block sparsity on A and B and column sparsity on C , which helps to estimate R. Meanwhile, re-weighting the matrix O 2 will jointly promote column sparsity on the corresponding blocks of A and B , thus estimating L.

4. Experiments and Analysis

4.1. Datasets

4.1.1. Standard Datasets

The Pavia University dataset [59] was acquired by the Reflection Optical System Imaging Spectrometer (ROSIS) over the city of Pavia. The size of the acquired image is 610 × 340 and consists of 115 spectral bands with an image resolution of 1.3 m. To enhance the experimental effect, the sub-image located in the upper left corner of the image was selected as the experimental image after removing the band containing water vapor. The size of the experimental image is 256 × 256 × 102, and the image contains rich features such as houses, roads, and trees.
The Indian Pines dataset [59] was acquired by the Airborne Visible Infrared Spectroscopic Imager (AVIRIS) over Indian, U.S.A. The AVIRIS wavelength range is between 0.4 μ m and 2.5 μ m with a spatial resolution of 20 m. After removing 24 bands that cannot be reflected by water, the remaining bands were used for the study. Given the experimental requirements, the final image size selected was 144 × 144 × 200.

4.1.2. Local Dataset

The dataset of Ningxia Sand Lake [59] was acquired by the WFV4 multispectral imager on the Gaofen-1 (GF1) satellite and the AHSI hyperspectral imager on the Gaofen-5 (GF5) satellite over Ningxia Sand Lake. The WFV4 wavelength range is 0.45–0.89 μ m, the spatial resolution is 16 m, and the size of the acquired multispectral image is 12,000 × 13,400. The wavelength range of AHSI is 0.4–2.5 μ m, the spatial resolution is 30 m, it consists of 330 bands, and the size of the acquired hyperspectral image is 2774 × 2554. After removing 187 bands with high noise again, the remaining bands are used as the research object. The experimental images were intercepted at the same position as the acquired multispectral and hyperspectral images, and the final selected image sizes were 254 × 254 × 3 for the HR-MSI and 254 × 254 × 143 for the LR-HSI, respectively.

4.2. Experimental Setup

4.2.1. Experimental Environment

Eight algorithms, CNMF [18], Hysure [21], FUSE [22], SCOTT [25], IRSR [45], STEREO [47], BTD [54], and SCLL1 [58], were selected as comparison tests. Among them, SCOTT and STEREO are based on the Tucker and CPD models to solve the MHF problem, respectively; the BTD and SCLL1 are based on the BTD model to solve the MHF problem. Considering the different experimental environments of each algorithm, it is often difficult to achieve the effect in the original literature by directly adopting the parameter settings of the original literature under the current experimental environment, so this paper optimizes the parameters of other algorithms under the current experimental environment. Specifically, standard normally distributed pseudo-random numbers are used as initialization values for all algorithms, and the optimal parameter values are selected through ablation experiments; the hyperspectral decomposition threshold in the CNMF experiments is set to θ H = 10 2 , the multispectral decomposition threshold is set to θ M = 10 2 , and the outer recirculation decomposition threshold is set to θ O = 10 4 . The value of the weighting parameter in STEREO is set to λ = 10 5 . The regularization parameter is set to λ = 10 3 and θ = 10 3 in SCLL1, and the parameters of the other algorithms that are not mentioned in the original text are used to set the parameters. The CPU in the experimental equipment is Intel(R) Xeon(R) Gold 6154; the RAM size is 256G. The operating system is 64-bit Windows 10. The experimental platform is Matlab R2017b.

4.2.2. Degradation Model

The local datasets are acquired using multispectral and hyperspectral sensors in the acquisition process to directly obtain the HR-MSI and LR-HSI, respectively. However, the standard dataset is different in that it is a hyperspectral dataset acquired by a hyperspectral sensor and lacks a corresponding multispectral dataset to perform fusion experiments. Therefore, the standard dataset is generally used as a reference image when fusion experiments are being performed, and the HR-MSI and LR-HSI used in the experiments are obtained from it by degradation, respectively. In this paper, the degradation model follows the literature [22,32,68] and strictly follows the Wald agreement. The process of acquiring the LR-HSI is such that the standard dataset is fuzzed using a 9 × 9 Gaussian kernel and sampled at 4-pixel intervals. The process of acquiring the HR-MSI is referred to in the literature [47]. At last, noise is added for the LR-HSI and HR-MSI.

4.3. Numerical Evaluation Indicators

In order to evaluate the differences between the algorithms proposed in this paper and the comparison algorithms in further detail, it is necessary to evaluate them with the help of numerical evaluation metrics, in addition to direct observation of the HRHS quality and difference images of each algorithm. In recent years, many evaluation metrics have emerged with the continuous improvement of the algorithms, but metrics such as the peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [69], correlation coefficient (CC), universal image quality index (UIQI) [70], root mean squared error (RMSE), Erreur Relative Globale Adimensionnelle de Synthese (ERGAS) [71], spectral angle mapper (SAM) [72], and TIME, which appear in the literature [4,17], are widely used, and they can comprehensively assess the superiority of the algorithms in terms of spatial information, spectral information, and algorithm efficiency. The specific details of these evaluation metrics are described in detail in our separate work [59]. Among these, the larger the PSNR value, the better the image quality; the closer the SSIM, CC, and UIQI values are to one, the better the quality of the HRHS fusion; the closer the values of the RMSE, ERGAS, SAM, and TIME are to 0, the better the fusion effect. Detailed definitions follow:
The peak signal-to-noise ratio (PSNR) evaluates the level of image distortion or noise. The general situation of PSNR is between 30 dB and 50 dB. In the ideal case, it is expected to be as high as possible, with representing the optimal value. It should be specifically noted that when the PSNR is more than 30 dB, it is difficult for the human eye to see the difference between the fused image and the reference image. The definitions are as follows.
P S N R Y , Y S = 1 N S i = 1 N S P S N R Y i , Y S i
The structural similarity (SSIM) is used to calculate the structural similarity between the HRHS and the reference image in the spatial domain [69]. The SSIM ranges from −1 to 1. When the two images are identical, the SSIM value is equal to one. The definitions are as follows.
S S I M Y , Y S = 1 M i = 1 M 2 Y ¯ i , Y S ¯ i + c 1 2 σ Y ¯ i , Y S ¯ i + c 2 Y ¯ i 2 + Y S ¯ i 2 + c 1 σ Y 2 + σ Y S 2 + c 2
The correlation coefficient (CC) describes the degrees of spectral likeness with respect to the objective image, as well as the fusion image. The CC is based on a scale from −1 to 1. The closer the CC is to one, the higher their positive correlation. The definitions are as follows.
C C Y , Y S = i = 1 N W j = 1 N H Y i , j A V E Y Y S i , j A V E Y S i = 1 N W j = 1 N H Y i , j A V E Y 2 i = 1 N W j = 1 N H Y S i , j A V E Y S 2
The universal image quality index (UIQI) [70] is specifically used to detect the average correlation between the reference image and the fused image. If the fused image is closer to the reference image, the UIQI value is closer to a value of one, which means a better reconstruction effect. The definitions are as follows.
U I Q I Y , Y S = 1 M i = 1 M 4 σ Y ¯ i Y S ¯ i 2 Y ¯ i , Y S ¯ i σ Y i 2 + σ Y S i 2 Y ¯ i 2 + Y S ¯ i 2
The root mean squared error (RMSE) is an important index describing the root mean squared error between each band of the reference image and the fused image. If the fused image is closer to the reference image, the RMSE value will be closer to a value of 0. The definitions are as follows.
R M S E Y , Y S = Y , Y S F 2 N W N H N S
The Erreur Relative Globale Adimensionnelle de Synthese (ERGAS) [71] is an important measure of the global quality and spectral variability of the reconstructed image. If ERGAS is closer to a value of 0, the reconstruction effect will be better. The definitions are as follows.
E R G A S Y , Y S = 100 s 1 N S i = 1 N S M S E Y , Y S M E A N Y S
The spectral angle mapper (SAM) [72] is a metric used to measure the magnitude of the angle between the pixel spectra of the source image and the fused image. If the SAM value is closer to a value of 0, the two images match more, and the similarity is higher. The definitions are as follows.
S A M Y , Y S = 1 N W N H i = 1 N W N H arccos Y , Y S Y i 2 Y S i 2
where Y i and Y S i are the i-th band of the initial and fused images, and A V E Y and A V E Y S are the average pixel values of the initial and fusion images, respectively. Y ¯ i and Y S ¯ i are the mean values of Y i and Y S i , respectively. σ i , σ i 2 , and σ i , j 2 are used to express the standard deviation, variance, and covariance, respectively. c i is a constant. Here, the larger the value of PSNR, the closer the SSIM, CC, and UIQI values are to 1, and the closer the RMSE, ERGAS, and SAM values are to 0, indicating a better fusion effect.
The signal-to-noise ratio (SNR) is the ratio of signal to noise in an image. A high signal-to-noise ratio is reflected in a clean and noise-free image, while a low signal-to-noise ratio makes the image rough and noisy, and the image quality is not translucent and has insufficient contrast. The definitions are as follows.
S N R = S i g n a l N o i s e

4.4. Classification Accuracy Evaluation

Here, we examine the effect of the HRHS on the pixel-level classification task. The fused images obtained by each fusion algorithm contain 11,781 labeled samples with the sample categories of Asphalt, Meadows, Trees, Painted Metal Sheets, Self-Blocking Bricks, and Shadows, and the amount for each sample categories is 2705, 5195, 1089, 1338, 1250, and 204, respectively. The fused images are used as the dataset for the classification algorithm, and 1% of the random points of each category of labeled samples is used as training samples, and the remaining 99% of the random points is used as test samples. Then, 1.5% of each category is taken as the validation set. The classification results of each HRHS were evaluated using three metrics: the overall accuracy (OA), average accuracy (AA), and kappa coefficient ( κ ). Detailed definitions are as follows.
The overall classification accuracy (OA) refers to the ratio of the number of correctly categorized image elements to the total number of categories in an HRHS.
The kappa coefficient ( κ ) is an index to evaluate the degree of agreement or accuracy between the classification results and the actual features and is calculated as follows:
κ = N i = 1 g w i i i = 1 g w i + w + i N 2 i = 1 g w i + w + i
where g is the total number of categories, i.e., the total amount of columns in the confusion matrix; w i i is the total amount of correct classification, i.e., the count of the pixels on row i and column i of the confusion matrix; w i + and w + i are the total pixel counts of i rows and i columns, respectively; and N indicates the total amount of pixels to be used for accuracy estimation.
The average categorization accuracy (AA) indicates the average value of each accuracy for each category.

4.5. Model Parameter Selection

4.5.1. Selection of the Parameters λ and η

The parameters appearing in the model can affect the effectiveness of the fusion algorithm. For the regularization parameter λ and the smoothing parameter η that appear, a reasonable choice is made by ablation experiments. This is completed as follows: Firstly, λ and η are randomly selected for the experiments to roughly estimate the interval of better results. Secondly, we fix the value of one of the parameters and assign a regular value to the other parameter, observing the changes in the evaluation indicators, such as the PSNR, and finding the better value. Finally, the optimal solution is found around the better value. With the above ideas, the parameter selection process for the Pavia University dataset is given, as shown in Figure 6 and Figure 7. By fixing one parameter, the other parameter is scaled down 10 times each time; it was observed that the change in the PSNR and other indexes in the graph was found when λ = 0.01 , η = 0.001 , the evaluation indexes reached the optimum, and the algorithm fusion effect was the best. (Note: Other datasets selected the parameter values by this method).

4.5.2. Rough Estimation of the Ranks L and R

For the LL1 fusion-based algorithm, the ranks L and R of the potential factors affect the overall performance of the algorithm. The selection process regarding the ranks L and R of the Pavia University dataset and the Indian Pines dataset are listed here, regarding the data in the experiments are the average of ten independent experiments conducted.

4.5.3. Estimation of L and R on Pavia University Dataset

Regarding the rough estimates of the ranks L and R in the experiments on the Pavia University dataset, as shown in Figure 8, the graphs show the changes in the evaluation metrics, such as the PSNR, SSIM, and CC, for different ranks of L. The L-value is estimated by fixing the rank R. Set R = 25 here. The L-value grows at a rate of 5 per increase from 10. It is clearly observed from Figure 8a that the PSNR values vary more significantly for L in the 10 , 30 range. The PSNR change remained essentially flat in the 30 , 45 range. In Figure 8b, L is in the 10 , 20 range where the three indicators SSIM, CC, and UIQI change more significantly. In summary, it can be roughly estimated that the optimal value of rank L is around L = 30 .
Figure 9 shows the graphs of the variation of evaluation metrics, such as the PSNR, SSIM, and CC, for different ranks of R in the Pavia University dataset. Fix the rank L to estimate the R-value, set L = 35 , and the R-value grows at a rate of 5 per increase from 10. As seen in Figure 9a, the PSNR values increase rapidly with R in the 10 , 15 range; there is a slow change in the PSNR during the 15 , 30 range; and the PSNR values remained essentially flat in the 30 , 45 range. In Figure 9b,c, when R in the 10 , 15 range, the three test metrics of the SSIM, CC, and UIQI improve more significantly, while the RMSE and SAM remain basically stable. In Figure 9c, there is a significant increase in the ERGAS values in the range 10 , 25 . In the rest of the ranges, all of the tested metrics have basically leveled off without any major improvement. In summary, it can be roughly estimated that the optimal value of the rank R is around R = 25 .

4.5.4. Estimation of L and R on Indian Pines Dataset

Regarding the rough estimates of the ranks L and R in the Indian Pines dataset experiment, as shown in Figure 10, the graphs show the changes in the evaluation metrics, such as the PSNR, SSIM, and CC, for different ranks of L. Fix the rank R to estimate L. Initialize R = 25 . The rank L grows from five at a rate of five per increase. The variation of PSNR values was observed from Figure 10a where the PSNR values are more significantly improved with L in the 10 , 30 range. The PSNR appears to decrease in the 30 , 35 range but not significantly. In the 35 , 40 range, it is more stable. In Figure 10b,c, L is in the interval of 5 , 20 . The test indexes, such as the SSIM and CC, have more obvious improvements, and, in summary, the optimal value of the rank L can be roughly estimated to be around L = 20 .
Figure 11 shows the graphs of the changes in the evaluation metrics, such as the PSNR, SSIM, and CC, for different ranks of R on the Indian Pines dataset. Fixing the rank L estimates R. By initializing L = 30 , the rank R grows at a rate of five per increase starting from five. Observe the change in the PSNR values from Figure 11a where the PSNR values were most significantly enhanced by R in the 5 , 10 range. The PSNR shows more pronounced upward and downward fluctuations in the 10 , 30 range but largely maintains an upward trend. In Figure 11b,c, R has a more significant improvement in the test metrics, such as SSIM and CC, in the 5 , 10 range. In the rest of the ranges, the effect is not significantly improved. In summary, an inappropriate choice of the rank R suppresses the experimental effect of the fusion algorithm, so the optimal value of the rank R is roughly estimated to be around R = 10 .

4.5.5. Signal-To-Noise Ratio Selection

The signal-to-noise ratio (SNR) is also one of the important factors affecting the fusion results of algorithms. The SNR of each algorithm is different, and it is necessary to select an SNR that all algorithms can “accept” to avoid the phenomenon of comparing the optimal and non-optimal results. The selected SNRs (unit: dB) range from 15 , 40 with a 5 dB interval. All experiments were conducted using the Pavia University dataset. The experimental results are shown in Figure 12. The values of the SNR range is 15 , 30 . Evaluation indicators, such as the PSNR, SSIM, and CC, were significantly improved. In the range 30 , 40 , all evaluation indicators tend to be stable. This shows that the performance of all algorithms is maximally released when S N R = 30 dB.

4.6. Regularization Validity Discussion

A 2D-CNBTD was proposed in previous work [55], but the algorithm is not perfect and also suffers from low algorithm efficiency, long running time, and noise/artifacts. Therefore, based on the above considerations, a new algorithm (JSSLL1) is proposed in this paper. As shown in Table 1, the numerical evaluation metrics of the three algorithms, BTD, 2D-CNBTD, and JSSLL1, are given on the Pavia University dataset, Indian Pines dataset, and NingXia Sand Lake dataset, respectively. The better values are indicated in bold, and the time optimum is noted as 0 s. Other evaluation metrics, such as the PSNR and SSIM, are described in Section 4.3. Among these, the PSNR is used to evaluate the image distortion; the SSIM and UIQI evaluate the spatial features of the fused images; the CC, RMSE, and other metrics are used to evaluate the spectral features of the fused images. From Table 1, it can be seen that the spatial feature metrics SSIM and UIQI of 2D-CNBTD and JSSLL1 are greatly improved compared with those before improvement. The CC, RMSE, and SAM metrics of the spectral features are also significantly improved. The PSNR is significantly improved as well, which can effectively remove noise from the images. The experimental results of 2D-CNBTD and JSSLL1 on the Pavia University dataset and the NingXia Sand Lake dataset show that the difference between these in the spatial and spectral feature indexes is not obvious; on the NingXia Sand Lake dataset, the PSNR value of JSSLL1 is better; on the Indian Pines dataset, each piece of data for JSSLL1 is closer to the optimal value. In summary, the JSSLL1 algorithm can effectively suppress noise, is suitable for more complex environments, has a lower computational complexity, and can achieve the goal in the least amount of time.

4.7. Standard Dataset Experimental Results and Discussion

This section shows the experimental results of the eight fusion algorithms on the Pavia University dataset and the Indian Pines dataset. The experimental results include visual images of the fusion results, difference images, and the numerical results of evaluation metrics.

4.7.1. Pavia University Dataset Experimental Results and Discussion

Through the above experiments, the parameters of the Pavia University dataset are set as: λ = 0.01 , η = 0.001 , and S N R = 30 dB. Due to the specificity of the model in this paper, the estimation and selection of the rank can be “overestimated”, so the roughly estimated ranks R and L are slightly adjusted upwards and set to: L = 35 , R = 25 . The algorithm superiority was evaluated by observing the visualized images. For the Pavia University dataset, the three bands 61, 25, and 13 were selected as the red, green, and blue bands, respectively, and fused to generate pseudo-color images. The HRHS is shown in Figure 13. Figure 13a shows the original image of the Pavia University dataset. In order to see the differences in HRHSs more clearly, the red box part in each HRHS image is enlarged, and the enlarged part is shown in the upper right corner. The difference images of all algorithms are shown in Figure 14.
As can be seen in the enlarged part shown in Figure 13, compared with the reference image, some spatial information is lost in the HRHS image of CNMF; a large number of blurred blocks and noise appear in the fusion result of SCOTT, and the spatial information is seriously missing; the fusion results of SCLL1 and the BTD are clearer; no blurred blocks and noise are seen, but some spatial information is lost. The HySure and STERE fusion results in a low loss of spectral and spatial information; no noise and blurred blocks are seen in the fusion results of the algorithm proposed in this paper, and the spatial information is also preserved.
Figure 14 shows the difference images obtained after the difference between the target image and the fused image. Among these, the difference images of CNMF, Hysure, FUSE, LRSR, and SCLL1 have a lot of texture information; SCOTT has less texture information; the difference images of STEREO, the BTD, and the algorithm of this paper are more similar and do not see more texture information, but, in comparison, STEREO and the BTD still have some building outline information.
When the PSNR is greater than 30 dB, it is difficult to observe the difference in the images with the naked eye, and the algorithm needs to be evaluated in terms of both the spectral and spatial features with the help of some evaluation metrics. The evaluation indexes of the Pavia dataset are shown in Table 2. As can be seen from the data in the table, the metrics describing the spectral and spatial information, the SSIM, CC, RMSE, ERGAS, and SAM of the algorithm in this paper, are closer to the optimal values, and only the UIQI is slightly inferior to the CNMF algorithm, which means that the algorithm in this paper performs the best in terms of spatial information, spectral information, image contrast, and brightness in the process of reconstructing images. In terms of the PSNR values, this paper’s algorithm can effectively reject noise. Under the same number of iterations and in the same environment, the running time of this paper’s algorithm, compared with the BTD and SCLL1, has been significantly improved.

4.7.2. Pavia University Dataset Classification Experimental Results and Discussion

Image classification is one of the important application scenarios for high-resolution hyperspectral images. Acquiring high-resolution hyperspectral images is another important purpose of image fusion. A high-quality fused image will make the classification experiment “half the work”. The experimental data contain information about buildings, vegetation, roads, bare soil, and other features. Therefore, a lightweight spectral–spatial convolutional network ( L S 2 C M ) [73] with a low complexity and few network parameters was chosen to perform classification experiments on the fused images from Pavia University, as well as the original images for each algorithm. A 3 × 3 deep convolutional (Conv) filter was used for the experiments, and a filter with a convolution kernel of 1 × 1 was chosen for spectral feature extraction. A 3 × 3 filter was chosen for spatial feature extraction. All other parameter settings in this paper follow the literature [73]. Each experiment was repeated ten times. The results of the classification experiments on the Pavia dataset are shown in Table 3.
As shown in Table 3, the HRHS image classification results of each algorithm are shown by the evaluation index. It should be specifically noted here that the HRHS of CNMF, Hysure, and the other algorithms are obtained by downsampling the original image, so the closer the classification result is to the original image, the better the classification result will be. Therefore, compared with other algorithms, the HRHS classification result data of the algorithms in this paper, STEREO and the BTD, are closer to the reference image; the HRHS image classification effects of CNMF and Hysure are the second, and the HRHS image classification effects of FUSE, SCOTT and SCLL1 are the worst.

4.7.3. Indian Pines Dataset Fusion Experiment Results and Discussion

The parameters for the Indian Pines dataset are set to λ = 0.01 , η = 0.001 , and S N R = 35 dB. Due to the specificity of the model in this paper, the estimation and selection of the rank can be “overestimated”, so the roughly estimated ranks R and L are slightly adjusted upward and set to L = 25 , R = 10 . Here, the three bands 61, 25, and 13 are selected as the red, green, and blue bands, respectively, and fused to generate pseudo-color images. The HRHS is shown in Figure 15, and Figure 15a shows the original image of the Indian Pines dataset. To see the differences in the HRHS more clearly, the red-boxed part of each HRHS plot is enlarged, and the enlarged part is shown in the upper right corner. The difference images for all algorithms are shown in Figure 16.
As can be seen in the enlarged part shown in Figure 15, compared with Figure 14a, the HRHS of CNMF appears distorted; the HRHS of Hysure, SCOTT, STEREO, and the BTD has a large fuzzy block and noise with serious loss of spatial information. The HRHS of FUSE, SCLL1, and this paper’s algorithm is clear; neither noise nor fuzzy block are seen, and the loss of spatial information is less.
Figure 16 shows the difference images of each algorithm on the Indian Pines dataset. Among these, the difference images of CNMF, Hysure, FUSE, SCOTT, LRSR, the BTD, and SCLL1 show a lot of spatial texture, indicating that there is a serious loss of spatial information in the HRHS obtained by the above algorithms; the difference images of STRTEO and the algorithms in this paper do not show more spatial texture, indicating that the HRHS images of these two algorithms are closer to the reference image.
The data in Table 4 show that among the metrics describing the spectral and spatial information, the UIQI, CC, RMSE, ERGAS, and SAM of this algorithm are closer to the optimal values, and only the SSIM is slightly inferior to the SCLL1 algorithm, indicating that this algorithm loses less spatial and spectral information in the process of reconstructing images, and the HRHS fused images are closer to the original dataset. In terms of the PSNR, this algorithm can better suppress noise; meanwhile, compared with the BTD and SCLL1, the running time of this algorithm is shorter.

4.8. Local Dataset Fusion Experimental Results and Discussion

The Ningxia Sand Lake dataset was acquired by two high-resolution satellites over the Sand Lake successively, and, here, the original dataset has been aligned and corrected by ENVI 5.3 software. By experimental verification, λ and η are not changed. R and L are set as R = 20 , L = 30 . Then, the three bands, 59, 38, and 20, are selected as the red, green, and blue bands, respectively, and fused to generate pseudo-color images. The HRHS is shown in Figure 17. Figure 17a,b show the HR-MSI and LR-HSI acquired by the satellites, respectively. In order to see the HRHS differences of each algorithm more clearly, each red boxed part of the HRHS plot is enlarged, and the enlarged part is shown in the upper right corner.
As can be seen from the enlarged part in Figure 17, compared with the reference image, the HRHS of CNMF and STEREO showed distortion. The presence of a great number of fuzzy blocks and noise in the HRHS images of SCOTT and STEREO, shows that their spatial information was lost in the fusion process; the HRHS of FUSE and the BTD showed a small number of blurred blocks. In comparison, the HRHS of the algorithm in this paper was clearer; neither noise nor blurred blocks were seen, and less spatial information was lost.
The Hysure and SCLL1 algorithms show outliers when experimenting on the NingXia Sand Lake dataset, and the LRSR algorithm also requires downsampling of the collected dataset for local dataset experiments, which is referred to as “pseudo-fusion”, while the algorithm in this paper directly fuses the two images. Therefore, these three algorithms are not compared. The numerical data of the other algorithms are shown in Table 5. Bold means the best value. Among the indicators describing the spatial and spectral information, the UIQI, CC, RMSE, ERGAS, and SAM of the algorithms proposed in this paper are closer to the optimal values, and the SSIM is a little smaller than the BTD numerical indicators. This means that the algorithm in this paper can retain the spatial structure and spectral information in LR-HSI and HR-MSI images well during the process of reconstructing HRHS images; the PSNR value is also the highest, which shows that the proposed algorithm also performs well in suppressing noise.

4.9. Analysis of Algorithm Complexity

The computational complexity of this model primarily involves matrix operations and element-wise computations. To calculate the complexity of the model, we can analyze the computational cost of each term and sum them up.
Each term in the model involves both matrix operations and element-wise computations. Specifically, the model consists of the following components:
First term: 1 2 Y H r = 1 R ( P 1 A r B r T P 2 T ) c r F 2 . Second term: 1 2 Y M r = 1 R ( A r B r T ) P 3 c r F 2 . Regularization term: λ r = 1 R l = 1 L a r l 2 2 + b r l 2 2 + η 2 2 + c r 2 2 + η 2 .
For each term, we can separately consider the computational complexity of matrix operations and element-wise computations.
The first term involves matrix multiplication, element-wise multiplication, and summation. The most time-consuming part is the matrix multiplication. Assuming Y H has dimensions H × W , A r and B r have dimensions H × K and W × K , and P 1 and P 2 are dimension-adaptation matrices. The matrix multiplication ( P 1 A r B r T P 2 T ) has a complexity of approximately O ( H W K ) , followed by element-wise multiplication and summation with a complexity of O ( H W ) .
The second term is similar to the first term, involving matrix multiplication, element-wise multiplication, and summation. Assume Y M has dimensions H × W , A r and B r have dimensions H × K and W × K , and P 3 is a dimension-adaptation matrix. The matrix multiplication ( A r B r T ) has a complexity of O ( H W K ) , followed by element-wise multiplication and summation with a complexity of O ( H W ) .
The regularization term involves element-wise operations such as squaring, square root, summation, and multiplication. Assuming each vector a r l , b r l , and c r has a dimension of K, the computational complexity for each r is approximately O ( K ) , resulting in a total complexity of O ( R K ) .
Therefore, adding up the complexities of these three components, the overall computational complexity of the model is approximately O ( H W K + H W + R K ) .

5. Summary

In this paper, a coupled non-negative block-term tensor decomposition model based on joint-structured sparsity is proposed for estimating HRHSs. A hierarchically structured regularization scheme is adopted to fully consider the unknown nature of the tensor rank, the long running time of the algorithm, the sparsity of the abundance matrix, and the scaling/counter-scaling effects that occur in the fusion model when the block-term tensor decomposition model is used to reconstruct the HRHS. Firstly, the sparsity of the abundance matrix is ensured, and the scaling phenomenon in the model is countered by superimposing the abundance matrix on the chunk matrix and adding the L 2 , 1 -norm; secondly, the counter-scaling effect in the model is suppressed by adding the L 2 -norm to each endmember vector; finally, the chunk matrix and the endmember matrix are reweighted and coupled, adding a L 2 , 1 -norm number, making it possible to select the appropriate regularization parameters to overestimate the tensor rank when the rank is unknown, thereby facilitating the elimination of the abundance matrix and the corresponding endmember matrix. To address the above facts, this paper conducts experiments based on standard and local datasets and solves the algorithmic model using the extended IRLS method. The experimental results show that the algorithm proposed in this paper performs well on both spatial and spectral information, in terms of both the visual sensory and numerical comparisons, and its fusion results are closer to the reference image. There are still some limitations in this paper, for example, the performance of the algorithm depends on the choice of multiple preset parameters and hyperparameters, such as the order of the matrix factorization, regularization parameters, etc. Different parameter settings may lead to different results and careful parameter tuning is required. Due to multiple matrix operations, element-wise operations, and iterative optimization, the computational complexity of the algorithm is high. This can be time-consuming when dealing with large amounts of data. In future work, we hope to further reduce the running time of the algorithm without compromising its overall performance.

Author Contributions

Funding acquisition, W.B.; Validation, K.Q. and W.F.; Writing—original draft, H.G.; Writing—review and editing, S.S. and C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This study received support from multiple sources, including the National Natural Science Foundation of China (Project No. 62201438), the Ningxia Key R&D Program of China (Project No. 2021BEG03030), and the Innovation Projects for Graduate Students of North Minzu University (Project No.YCX23151).

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes.

Acknowledgments

The authors would like to thank the Image and Intelligence Information Processing Innovation Team of the National Ethnic Affairs Commission of China for their support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MHFMultispectral and hyperspectral image fusion
HRHSHigh spatial resolution hyperspectral image
LR-HSILow-resolution hyperspectral image
HR-MSIHigh-resolution multispectral image
HSRHyperspectral super-resolution
PANPanchromatic image
CPDCanonical polyadic decomposition
TVTotal variational
PAOProximal alternating optimization
ADMM Alternating directional multiplier method
SNRSignal-to-noise ratio
PCAPrincipal component analysis
PSNRPeak signal-to-noise ratio
SSIMStructural similarity
CCCorrelation coefficient
UIQIUniversal image quality index
RMSERoot mean squared error
ERGASErreur Relative Globale Adimensionnelle De Synthese
SAMSpectral angle mapper

Appendix A. Symbol Definition

Upper- and lowercase bold letters are used to represent the vector and matrix, respectively. The higher-order tensor is denoted by Y . For a tensor Y , Y n denotes its nth order expansion. ⊗ denotes the Kronecker product. The Khatri–Rao product is denoted by ⊙ in its general version and by c in the columnar version. ∘ denotes the outer product. The superscript T indicates a transpose. The unit matrix of order is denoted by I N . The L 1 -norm, Euclidean, mixed L 2 , 1 -norm, and Frobenius norm numbers are, respectively, denoted by 1 , 2 , 2 , 1 , and F . R denotes the complex number field.

Appendix B. Algorithm Convergence Discussion

L 2 parametric regularization itself is a common method to prevent overfitting of the model, and the L 2 , 1 parametric is a hybrid parametric between the L 1 parametric and the L 2 parametric, which not only has the property of the L 2 parametric to prevent model overfitting but also has the sparse facilitation of the L 1 parametric to select the important features and discard the unimportant ones. Regarding this paper, the fusion model is shown in Equation (17). The three factors A , B , and C in this model with their associated minimization problems are Equations (20), (22) and (24).
Regarding the general IRLS [66], the expression is as follows:
x k + 1 = arg min x 1 2 b R x 2 2 + λ 2 i = 1 n x i 2 x i k + 2 + η 2
The two reweighting matrices can be derived by comparing Equations (20), (22) and (24) with Equation (A1), respectively. The first matrix O 1 is composed of the inverse of the summation terms external to the regularized terms in model Equation (17), which is the joint weighted A , B block; the second matrix O 2 is composed of the inverse of the terms internal to the summation of the regularized terms in Equation (17), which is used to balance the corresponding columns of A r and B r . Here, the scheme of a hierarchical IRLS structure is generated. As the iterative process progresses, O 1 and O 2 gradually move closer to the direction with smaller weight parameters (i.e.: decaying weights), thus avoiding overfitting. The conjugate gradient algorithm is a method between the fastest descent method and the Newton method, which overcomes the disadvantage of the slow convergence of the fastest descent method and avoids the problem that the Newton method needs to store and calculate the Hessian matrix and find the inverse. Therefore, we choose the CG algorithm to complete the solution of the three factors A , B , and C to ensure that each factor is convergent.

References

  1. Plowright, A.A.; Coops, N.C.; Chance, C.M.; Sheppard, S.R.; Aven, N.W. Multi-scale analysis of relationship between imperviousness and urban tree height using airborne remote sensing. Remote Sens. Environ. 2017, 194, 391–400. [Google Scholar] [CrossRef]
  2. Posselt, R.; Mueller, R.; Stöckli, R.; Trentmann, J. Remote sensing of solar surface radiation for climate monitoring—The CM-SAF retrieval in international comparison. Remote Sens. Environ. 2012, 118, 186–198. [Google Scholar] [CrossRef]
  3. Matikainen, L.; Karila, K.; Hyyppä, J.; Litkey, P.; Puttonen, E.; Ahokas, E. Object-based analysis of multispectral airborne laser scanner data for land cover classification and map updating. ISPRS J. Photogramm. Remote Sens. 2017, 128, 298–313. [Google Scholar] [CrossRef]
  4. Loncan, L.; De Almeida, L.B.; Bioucas-Dias, J.M.; Briottet, X.; Chanussot, J.; Dobigeon, N.; Fabre, S.; Liao, W.; Licciardi, G.A.; Simoes, M.; et al. Hyperspectral pansharpening: A review. IEEE Geosci. Remote Sens. Mag. 2015, 3, 27–46. [Google Scholar] [CrossRef]
  5. Thomas, C.; Ranchin, T.; Wald, L.; Chanussot, J. Synthesis of multispectral images to high spatial resolution: A critical review of fusion methods based on remote sensing physics. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1301–1312. [Google Scholar] [CrossRef]
  6. Meng, X.; Shen, H.; Li, H.; Zhang, L.; Fu, R. Review of the pansharpening methods for remote sensing images based on the idea of meta-analysis: Practical discussion and challenges. Inf. Fusion 2019, 46, 102–113. [Google Scholar] [CrossRef]
  7. Carper, W.; Lillesand, T.; Kiefer, R. The use of intensity-hue-saturation transformations for merging SPOT panchromatic and multispectral image data. Photogramm. Eng. Remote Sens. 1990, 56, 459–467. [Google Scholar]
  8. Aiazzi, B.; Baronti, S.; Selva, M. Improving component substitution pansharpening through multivariate regression of MS + Pan data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3230–3239. [Google Scholar] [CrossRef]
  9. Liu, J. Smoothing filter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details. Int. J. Remote Sens. 2010, 21, 3461–3472. [Google Scholar] [CrossRef]
  10. Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A.; Selva, M. An MTF-based spectral distortion minimizing model for pan-sharpening of very high resolution multispectral images of urban areas. In Proceedings of the 2003 2nd GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas, Berlin, Germany, 22–23 May 2003; IEEE: Piscataway, NJ, USA, 2003; pp. 90–94. [Google Scholar]
  11. Vivone, G.; Alparone, L.; Chanussot, J.; Dalla Mura, M.; Garzelli, A.; Licciardi, G.A.; Restaino, R.; Wald, L. A critical comparison among pansharpening algorithms. IEEE Trans. Geosci. Remote Sens. 2014, 53, 2565–2586. [Google Scholar] [CrossRef]
  12. Leung, Y.; Liu, J.; Zhang, J. An improved adaptive intensity–hue–saturation method for the fusion of remote sensing images. IEEE Geosci. Remote Sens. Lett. 2013, 11, 985–989. [Google Scholar] [CrossRef]
  13. Gomez, R.B.; Jazaeri, A.; Kafatos, M. Wavelet-based hyperspectral and multispectral image fusion. In Geo-Spatial Image and Data Exploitation II; SPIE: St Bellingham, WA, USA, 2001; Volume 4383, pp. 36–42. [Google Scholar]
  14. Zhang, Y.; He, M. Multi-spectral and hyperspectral image fusion using 3-D wavelet transform. J. Electron. (China) 2007, 24, 218–224. [Google Scholar] [CrossRef]
  15. Kwarteng, P.; Chavez, A. Extracting spectral contrast in Landsat Thematic Mapper image data using selective principal component analysis. Photogramm. Eng. Remote Sens. 1989, 55, 339–348. [Google Scholar]
  16. Li, H.; Manjunath, B.; Mitra, S.K. Multisensor image fusion using the wavelet transform. Graph. Model. Image Process. 1995, 57, 235–245. [Google Scholar] [CrossRef]
  17. Yokoya, N.; Grohnfeldt, C.; Chanussot, J. Hyperspectral and multispectral data fusion: A comparative review of the recent literature. IEEE Geosci. Remote Sens. Mag. 2017, 5, 29–56. [Google Scholar] [CrossRef]
  18. Yokoya, N.; Yairi, T.; Iwasaki, A. Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion. IEEE Trans. Geosci. Remote Sens. 2011, 50, 528–537. [Google Scholar] [CrossRef]
  19. Bendoumi, M.A.; He, M.; Mei, S. Hyperspectral image resolution enhancement using high-resolution multispectral image based on spectral unmixing. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6574–6583. [Google Scholar] [CrossRef]
  20. Berné, O.; Helens, A.; Pilleri, P.; Joblin, C. Non-negative matrix factorization pansharpening of hyperspectral data: An application to mid-infrared astronomy. In Proceedings of the 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing, Reykjavik, Iceland, 14–16 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–4. [Google Scholar]
  21. Simoes, M.; Bioucas-Dias, J.; Almeida, L.B.; Chanussot, J. A convex formulation for hyperspectral image superresolution via subspace-based regularization. IEEE Trans. Geosci. Remote Sens. 2014, 53, 3373–3388. [Google Scholar] [CrossRef]
  22. Wei, Q.; Dobigeon, N.; Tourneret, J.Y. Fast fusion of multi-band images based on solving a Sylvester equation. IEEE Trans. Image Process. 2015, 24, 4109–4121. [Google Scholar] [CrossRef]
  23. Wei, Q.; Bioucas-Dias, J.; Dobigeon, N.; Tourneret, J.Y. Hyperspectral and multispectral image fusion based on a sparse representation. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3658–3668. [Google Scholar] [CrossRef]
  24. Veganzones, M.A.; Simoes, M.; Licciardi, G.; Yokoya, N.; Bioucas-Dias, J.M.; Chanussot, J. Hyperspectral super-resolution of locally low rank images from complementary multisource data. IEEE Trans. Image Process. 2015, 25, 274–288. [Google Scholar] [CrossRef] [PubMed]
  25. Prévost, C.; Usevich, K.; Comon, P.; Brie, D. Hyperspectral super-resolution with coupled tucker approximation: Recoverability and SVD-based algorithms. IEEE Trans. Signal Process. 2020, 68, 931–946. [Google Scholar] [CrossRef]
  26. Dian, R.; Fang, L.; Li, S. Hyperspectral image super-resolution via non-local sparse tensor factorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5344–5353. [Google Scholar]
  27. Xu, T.; Huang, T.Z.; Deng, L.J.; Zhao, X.L.; Huang, J. Hyperspectral image superresolution using unidirectional total variation with tucker decomposition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 4381–4398. [Google Scholar] [CrossRef]
  28. Li, S.; Dian, R.; Fang, L.; Bioucas-Dias, J.M. Fusing hyperspectral and multispectral images via coupled sparse tensor factorization. IEEE Trans. Image Process. 2018, 27, 4118–4130. [Google Scholar] [CrossRef]
  29. Zhang, K.; Wang, M.; Yang, S. Multispectral and hyperspectral image fusion based on group spectral embedding and low-rank factorization. IEEE Trans. Geosci. Remote Sens. 2016, 55, 1363–1371. [Google Scholar] [CrossRef]
  30. Li, W.; Du, Q. A survey on representation-based classification and detection in hyperspectral remote sensing imagery. Pattern Recognit. Lett. 2016, 83, 115–123. [Google Scholar] [CrossRef]
  31. Zhang, K.; Wang, M.; Yang, S.; Jiao, L. Spatial–spectral-graph-regularized low-rank tensor decomposition for multispectral and hyperspectral image fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1030–1040. [Google Scholar] [CrossRef]
  32. Li, Q.; Ma, W.K.; Wu, Q. Hyperspectral super-resolution: Exact recovery in polynomial time. In Proceedings of the 2018 IEEE Statistical Signal Processing Workshop (SSP), Breisgau, Germany, 10–13 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 378–382. [Google Scholar]
  33. Feng, Z.; Yang, S.; Wei, X.; Gao, Q.; Jiao, L. Regularized Sparse Band Selection via Learned Pairwise Agreement. IEEE Access 2020, 8, 40096–40105. [Google Scholar] [CrossRef]
  34. Huang, S.; Zhang, H.; Pižurica, A. Subspace clustering for hyperspectral images via dictionary learning with adaptive regularization. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–17. [Google Scholar] [CrossRef]
  35. Qin, X.; Ban, Y.; Wu, P.; Yang, B.; Liu, S.; Yin, L.; Liu, M.; Zheng, W. Improved image fusion method based on sparse decomposition. Electronics 2022, 11, 2321. [Google Scholar] [CrossRef]
  36. Zhang, D.; Wang, T.; Yang, S.; Jia, Y.; Li, F. Spectral reweighting and spectral similarity weighting for sparse hyperspectral unmixing. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  37. Chengqing, Y.; Guangxi, Y.; Chengming, Y.; Yu, Z.; Xiwei, M. A multi-factor driven spatiotemporal wind power prediction model based on ensemble deep graph attention reinforcement learning networks. Energy 2023, 263, 126034. [Google Scholar] [CrossRef]
  38. Wu, R.; Chan, C.H.; Wai, H.T.; Ma, W.K.; Fu, X. Hi, BCD! hybrid inexact block coordinate descent for hyperspectral super-resolution. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2426–2430. [Google Scholar]
  39. Wu, R.; Ma, W.K.; Fu, X.; Li, Q. Hyperspectral super-resolution via global–local low-rank matrix estimation. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7125–7140. [Google Scholar] [CrossRef]
  40. Kawakami, R.; Matsushita, Y.; Wright, J.; Ben-Ezra, M.; Tai, Y.W.; Ikeuchi, K. High-resolution hyperspectral imaging via matrix factorization. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2329–2336. [Google Scholar]
  41. Lanaras, C.; Baltsavias, E.; Schindler, K. Hyperspectral super-resolution by coupled spectral unmixing. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3586–3594. [Google Scholar]
  42. Kahraman, S.; Ertürk, A.; Ertürk, S. Graph Regularized L 1/2-Sparsity Constrained Non-Negative Matrix Factorization for Hyperspectral and Multispectral Image Fusion. In Proceedings of the 2018 9th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Amsterdam, The Netherlands, 23–26 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–4. [Google Scholar]
  43. Mu, T.; Nie, R.; Ma, C.; Liu, J. Hyperspectral and panchromatic image fusion based on CNMF. In Proceedings of the 2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC), Shanghai, China, 23–25 April 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 293–297. [Google Scholar]
  44. Khader, A.; Yang, J.; Xiao, L. NMF-DuNet: Nonnegative matrix factorization inspired deep unrolling networks for hyperspectral and multispectral image fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5704–5720. [Google Scholar] [CrossRef]
  45. Han, Y.; Bao, W.; Zhang, X.; Ma, X.; Cao, M. Hyperspectral and multispectral data fusion via nonlocal low-rank learning. J. Appl. Remote Sens. 2022, 16, 016508. [Google Scholar] [CrossRef]
  46. Lin, B.; Guo, Y. Deep Hyperspectral and Multispectral Image Fusion via Probabilistic Matrix Factorization. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
  47. Kanatsoulis, C.I.; Fu, X.; Sidiropoulos, N.D.; Ma, W.K. Hyperspectral super-resolution: A coupled tensor factorization approach. IEEE Trans. Signal Process. 2018, 66, 6503–6517. [Google Scholar] [CrossRef]
  48. Xu, Y.; Wu, Z.; Chanussot, J.; Comon, P.; Wei, Z. Nonlocal coupled tensor CP decomposition for hyperspectral and multispectral image fusion. IEEE Trans. Geosci. Remote Sens. 2019, 58, 348–362. [Google Scholar] [CrossRef]
  49. Peng, Y.; Li, W.; Luo, X.; Du, J. Hyperspectral image superresolution using global gradient sparse and nonlocal low-rank tensor decomposition with hyper-laplacian prior. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5453–5469. [Google Scholar] [CrossRef]
  50. Prévost, C.; Chainais, P.; Boyer, R. Fast fusion of hyperspectral and multispectral images: A tucker approximation approach. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2076–2080. [Google Scholar]
  51. Zhao, Q.; Zhou, G.; Xie, S.; Zhang, L.; Cichocki, A. Tensor ring decomposition. arXiv 2016, arXiv:1606.05535. [Google Scholar]
  52. Chen, Y.; Zeng, J.; He, W.; Zhao, X.L.; Huang, T.Z. Hyperspectral and Multispectral Image Fusion Using Factor Smoothed Tensor Ring Decomposition. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
  53. Qian, Y.; Xiong, F.; Zeng, S.; Zhou, J.; Tang, Y.Y. Matrix-vector nonnegative tensor factorization for blind unmixing of hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2016, 55, 1776–1792. [Google Scholar] [CrossRef]
  54. Zhang, G.; Fu, X.; Huang, K.; Wang, J. Hyperspectral super-resolution: A coupled nonnegative block-term tensor decomposition approach. In Proceedings of the 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Guadeloupe, West Indies, 15–18 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 470–474. [Google Scholar]
  55. De Lathauwer, L. Decompositions of a higher-order tensor in block terms—Part I: Lemmas for partitioned matrices. SIAM J. Matrix Anal. Appl. 2008, 30, 1022–1032. [Google Scholar] [CrossRef]
  56. De Lathauwer, L. Decompositions of a higher-order tensor in block terms—Part II: Definitions and uniqueness. SIAM J. Matrix Anal. Appl. 2008, 30, 1033–1066. [Google Scholar] [CrossRef]
  57. De Lathauwer, L.; Nion, D. Decompositions of a higher-order tensor in block terms—Part III: Alternating least squares algorithms. SIAM J. Matrix Anal. Appl. 2008, 30, 1067–1083. [Google Scholar] [CrossRef]
  58. Ding, M.; Fu, X.; Huang, T.Z.; Wang, J.; Zhao, X.L. Hyperspectral super-resolution via interpretable block-term tensor modeling. IEEE J. Sel. Top. Signal Process. 2020, 15, 641–656. [Google Scholar] [CrossRef]
  59. Guo, H.; Bao, W.; Qu, K.; Ma, X.; Cao, M. Multispectral and Hyperspectral Image Fusion Based on Regularized Coupled Non-Negative Block-Term Tensor Decomposition. Remote Sens. 2022, 14, 5306. [Google Scholar] [CrossRef]
  60. Hitchcock, F.L. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys. 1927, 6, 164–189. [Google Scholar] [CrossRef]
  61. Tucker, L.R. Some mathematical notes on three-mode factor analysis. Psychometrika 1966, 31, 279–311. [Google Scholar] [CrossRef]
  62. Liu, N.; Li, W.; Wang, Y.; TAO, R.; Du, Q.; Chanussot, J. A survey on hyperspectral image restoration: From the view of low-rank tensor approximation. Inf. Sci. 2023, 66, 1–140302. [Google Scholar] [CrossRef]
  63. Hunyadi, B.; Camps, D.; Sorber, L.; Paesschen, W.V.; Vos, M.D.; Huffel, S.V.; Lathauwer, L.D. Block term decomposition for modelling epileptic seizures. EURASIP J. Adv. Signal Process. 2014, 2014, 139. [Google Scholar] [CrossRef]
  64. Beck, A. On the convergence of alternating minimization with applications to iteratively reweighted least squares and decomposition schemes. Optim. Online 2013, 25, 185–209. [Google Scholar] [CrossRef]
  65. Rontogiannis, A.A.; Kofidis, E.; Giampouras, P.V. Block-term tensor decomposition: Model selection and computation. IEEE J. Sel. Top. Signal Process. 2021, 15, 464–475. [Google Scholar] [CrossRef]
  66. Daubechies, I.; DeVore, R.; Fornasier, M.; Güntürk, C.S. Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. J. Issued Courant Inst. Math. Sci. 2010, 63, 1–38. [Google Scholar] [CrossRef]
  67. Liu, J.S.; Chen, R. Sequential Monte Carlo methods for dynamic systems. J. Am. Stat. Assoc. 1998, 93, 1032–1044. [Google Scholar] [CrossRef]
  68. Bioucas-Dias, J.M.; Plaza, A.; Dobigeon, N.; Parente, M.; Du, Q.; Gader, P.; Chanussot, J. Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 354–379. [Google Scholar] [CrossRef]
  69. Sui, L.; Li, L.; Li, J.; Chen, N.; Jiao, Y. Fusion of hyperspectral and multispectral images based on a Bayesian nonparametric approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1205–1218. [Google Scholar] [CrossRef]
  70. Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
  71. Wald, L. Quality of high resolution synthesised images: Is there a simple criterion? In Proceedings of the Third Conference “Fusion of Earth Data: Merging Point Measurements, Raster Maps and Remotely Sensed Images”, SEE/URISCA, Sophia Antipolis, France, 26–28 January 2000; pp. 99–103. [Google Scholar]
  72. Yuhas, R.H.; Goetz, A.F.; Boardman, J.W. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In JPL, Summaries of the Third Annual JPL Airborne Geoscience Workshop. Volume 1: AVIRIS Workshop; NASA: Washington, DC, USA, 1992; Volume 1, pp. 147–149. [Google Scholar]
  73. Meng, Z.; Jiao, L.; Liang, M.; Zhao, F. A lightweight spectral-spatial convolution module for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Figure 1. Canonical polyadic decomposition process.
Figure 1. Canonical polyadic decomposition process.
Remotesensing 15 04610 g001
Figure 2. The Tucker decomposition process.
Figure 2. The Tucker decomposition process.
Remotesensing 15 04610 g002
Figure 3. LMN decomposition process.
Figure 3. LMN decomposition process.
Remotesensing 15 04610 g003
Figure 4. LL1 decomposition model.
Figure 4. LL1 decomposition model.
Remotesensing 15 04610 g004
Figure 5. Algorithm flow chart.
Figure 5. Algorithm flow chart.
Remotesensing 15 04610 g005
Figure 6. Different λ values of the indicators. (a) Different λ values of PSNR. (b) Different λ values of SSIM, CC, and UIQI. (c) Different λ values of RMSE, ERGAS, and SAM.
Figure 6. Different λ values of the indicators. (a) Different λ values of PSNR. (b) Different λ values of SSIM, CC, and UIQI. (c) Different λ values of RMSE, ERGAS, and SAM.
Remotesensing 15 04610 g006
Figure 7. Different η values of the indicators. (a) Different η values of PSNR. (b) Different η values of SSIM, CC, and UIQI. (c) Different η values of RMSE, ERGAS, and SAM.
Figure 7. Different η values of the indicators. (a) Different η values of PSNR. (b) Different η values of SSIM, CC, and UIQI. (c) Different η values of RMSE, ERGAS, and SAM.
Remotesensing 15 04610 g007
Figure 8. Evaluation metrics for different L values under the Pavia University dataset. (a) Different L values of PSNR. (b) Different L values of SSIM, CC, and UIQI. (c) Different L values of RMSE, ERGAS, and SAM.
Figure 8. Evaluation metrics for different L values under the Pavia University dataset. (a) Different L values of PSNR. (b) Different L values of SSIM, CC, and UIQI. (c) Different L values of RMSE, ERGAS, and SAM.
Remotesensing 15 04610 g008
Figure 9. Evaluation metrics for different R values under the Pavia University dataset. (a) Different R values of PSNR. (b) Different R values of SSIM, CC, and UIQI. (c) Different R values of RMSE, ERGAS, and SAM.
Figure 9. Evaluation metrics for different R values under the Pavia University dataset. (a) Different R values of PSNR. (b) Different R values of SSIM, CC, and UIQI. (c) Different R values of RMSE, ERGAS, and SAM.
Remotesensing 15 04610 g009
Figure 10. Evaluation metrics for different L values under the Indian Pines dataset. (a) Different L values of PSNR. (b) Different L values of SSIM, CC, and UIQI. (c) Different L values of RMSE, ERGAS, and SAM.
Figure 10. Evaluation metrics for different L values under the Indian Pines dataset. (a) Different L values of PSNR. (b) Different L values of SSIM, CC, and UIQI. (c) Different L values of RMSE, ERGAS, and SAM.
Remotesensing 15 04610 g010
Figure 11. Evaluation metrics for different R values under the Indian Pines dataset. (a) Different R values of PSNR. (b) Different R values of SSIM, CC, and UIQI. (c) Different R values of RMSE, ERGAS, and SAM.
Figure 11. Evaluation metrics for different R values under the Indian Pines dataset. (a) Different R values of PSNR. (b) Different R values of SSIM, CC, and UIQI. (c) Different R values of RMSE, ERGAS, and SAM.
Remotesensing 15 04610 g011
Figure 12. Performance comparison of all algorithms under different SNRs. (a) PSNR of different algorithms. (b) SSIM of different algorithms. (c) CC of different algorithms. (d) UIQI of different algorithms. (e) RMSE of different algorithms. (f) ERGAS of different algorithms. (g) SAM of different algorithms. (h) Icons.
Figure 12. Performance comparison of all algorithms under different SNRs. (a) PSNR of different algorithms. (b) SSIM of different algorithms. (c) CC of different algorithms. (d) UIQI of different algorithms. (e) RMSE of different algorithms. (f) ERGAS of different algorithms. (g) SAM of different algorithms. (h) Icons.
Remotesensing 15 04610 g012
Figure 13. HRHSs of all algorithms on the Pavia University dataset.
Figure 13. HRHSs of all algorithms on the Pavia University dataset.
Remotesensing 15 04610 g013
Figure 14. Pavia University dataset difference image.
Figure 14. Pavia University dataset difference image.
Remotesensing 15 04610 g014
Figure 15. HRHSs of all algorithms on the Indian Pines dataset.
Figure 15. HRHSs of all algorithms on the Indian Pines dataset.
Remotesensing 15 04610 g015
Figure 16. Indian Pines dataset difference image.
Figure 16. Indian Pines dataset difference image.
Remotesensing 15 04610 g016
Figure 17. HRHSs of all algorithms on the Ningxia Sand Lake dataset.
Figure 17. HRHSs of all algorithms on the Ningxia Sand Lake dataset.
Remotesensing 15 04610 g017
Table 1. Algorithm improvement before and after the value comparison table.
Table 1. Algorithm improvement before and after the value comparison table.
DatasetPavia UniversityIndian PinesNingXia Sand Lake
AlgorithmsBTD2D-CNBTDJSSLL1BTD2D-CNBTDJSSLL1BTD2D-CNBTDJSSLL1
PSNR ( ) 35.7140.8640.1225.38 32.42 33.7936.97 39.49 42.39
SSIM ( 1 ) 0.87640.96320.96520.74360.93280.95150.97530.94340.9261
CC ( 1 ) 0.97050.99320.99270.70910.89620.93380.99600.99760.9978
UIQI ( 1 ) 0.83540.93210.93410.40670.69490.75650.93900.96340.9518
RMSE ( 0 ) 0.02220.01040.01090.03060.01270.011250.5750.36234.84
ERGAS ( 0 ) 0.54540.26790.28190.44310.15300.12040.08820.06390.0647
SAM ( 0 ) 0.09570.05580.05510.08500.03360.02610.04360.03090.0338
Time ( 0 / s ) 80.9489.6844.077.2611.625.5364.7973.4443.12
Table 2. Performance for Pavia University Dataset.
Table 2. Performance for Pavia University Dataset.
AlgorithmsCNMFHysureFUSESOCTTLRSRSTEREOBTDSCLL1JSSLL1
PSNR ( ) 36.5636.3931.2928.4229.8739.4035.7136.19 40.12
SSIM ( 1 ) 0.96270.96060.92400.78840.89290.95450.87640.9290 0.9652
CC ( 1 ) 0.98670.98530.95620.94580.93580.99130.97050.9795 0.9927
UIQI ( 1 ) 0.9343 0.93310.88180.71230.82990.91680.83540.89370.9341
RMSE ( 0 ) 0.01590.01690.02880.03140.03500.01200.02220.0198 0.0109
ERGAS ( 0 ) 0.41650.43950.73700.74150.76520.30860.54540.4528 0.2819
SAM ( 0 ) 0.06210.06840.08520.10510.10670.06430.09570.0932 0.0551
Time ( 0 / s ) 6.41148.4 1.40 3.8359.116.2280.9470.7144.07
Table 3. Classification Performance for Pavia University Dataset.
Table 3. Classification Performance for Pavia University Dataset.
AlgorithmsCNMFHysureFUSESOCTTSTEREOBTDSCLL1JSSLL1Original Image
KAPPA×10096.52 ± 1.6696.27 ± 0.8894.40 ± 7.0794.90 ± 5.0897.78 ± 1.0097.71 ± 2.1394.99 ± 5.0097.30 ± 2.3697.93 ± 1.51
OA(%)97.50 ± 1.2197.32 ± 0.6295.89 ± 5.3196.24 ± 3.8798.40 ± 0.7198.36 ± 1.5196.40 ± 3.5498.08 ± 1.6598.51 ± 1.09
AA(%)95.52 ± 1.9195.66 ± 1.6795.43 ± 3.6493.66 ± 5.5097.09 ± 1.6997.52 ± 1.3594.56 ± 4.8697.03 ± 1.9097.20 ± 3.08
Table 4. Performance for Indian Pines Dataset.
Table 4. Performance for Indian Pines Dataset.
AlgorithmsCNMFHysureFUSESOCTTLRSRSTEREOBTDSCLL1JSSLL1
PSNR ( ) 27.5530.4431.2427.7626.9026.2725.3832.08 33.79
SSIM ( 1 ) 0.93560.90780.92590.85860.86700.81660.7436 0.9551 0.9515
CC ( 1 ) 0.81750.87680.87930.77270.77540.82550.70910.8953 0.9338
UIQI ( 1 ) 0.64370.64380.66500.48040.44580.68000.40670.7298 0.7565
RMSE ( 0 ) 0.02410.01960.01670.02150.02640.42530.03060.0119 0.0112
ERGAS ( 0 ) 0.27360.21600.18730.27070.77540.02000.44310.1540 0.1204
SAM ( 0 ) 0.06640.04960.04010.05370.17930.33650.08500.0312 0.0261
Time ( 0 / s ) 2.7347.440.75 0.58 31.544.057.2623.855.72
Table 5. Performance for Ningxi Shahu Lake Dataset.
Table 5. Performance for Ningxi Shahu Lake Dataset.
AlgorithmsCNMFFUSESOCTTSTEREOBTDJSSLL1
PSNR ( ) 13.5736.2020.445.5236.97 42.385
SSIM ( 1 ) 0.16490.90370.23830.2988 0.9753 0.92605
CC ( 1 ) 0.46500.99530.86900.88180.9960 0.9978
UIQI ( 1 ) 0.17730.92300.36170.31250.9390 0.9518
RMSE ( 0 ) 1067.9114.44374.84922.0650.57 34.8429
ERGAS ( 0 ) 0.82500.07800.32820.82290.0882 0.0648
SAM ( 0 ) 0.25050.07020.17460.33410.0436 0.0338
Time ( 0 / s ) 27.54 1.91 8.2619.8264.7943.12
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, H.; Bao, W.; Feng, W.; Sun, S.; Mo, C.; Qu, K. Multispectral and Hyperspectral Image Fusion Based on Joint-Structured Sparse Block-Term Tensor Decomposition. Remote Sens. 2023, 15, 4610. https://doi.org/10.3390/rs15184610

AMA Style

Guo H, Bao W, Feng W, Sun S, Mo C, Qu K. Multispectral and Hyperspectral Image Fusion Based on Joint-Structured Sparse Block-Term Tensor Decomposition. Remote Sensing. 2023; 15(18):4610. https://doi.org/10.3390/rs15184610

Chicago/Turabian Style

Guo, Hao, Wenxing Bao, Wei Feng, Shasha Sun, Chunhui Mo, and Kewen Qu. 2023. "Multispectral and Hyperspectral Image Fusion Based on Joint-Structured Sparse Block-Term Tensor Decomposition" Remote Sensing 15, no. 18: 4610. https://doi.org/10.3390/rs15184610

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop