Hyperspectral and Multispectral Image Fusion Using Coupled Non-Negative Tucker Tensor Decomposition

Fusing a low spatial resolution hyperspectral image (HSI) with a high spatial resolution multispectral image (MSI), aiming to produce a super-resolution hyperspectral image, has recently attracted increasing research interest. In this paper, a novel approach based on coupled non-negative tensor decomposition is proposed. The proposed method performs a tucker tensor factorization of a low resolution hyperspectral image and a high resolution multispectral image under the constraint of non-negative tensor decomposition (NTD). The conventional matrix factorization methods essentially lose spatio-spectral structure information when stacking the 3D data structure of a hyperspectral image into a matrix form. Moreover, the spectral, spatial, or their joint structural features have to be imposed from the outside as a constraint to well pose the matrix factorization problem. The proposed method has the advantage of preserving the spatio-spectral structure of hyperspectral images. In this paper, the NTD is directly imposed on the coupled tensors of the HSI and MSI. Hence, the intrinsic spatio-spectral structure of the HSI is represented without loss, and spatial and spectral information can be interdependently exploited. Furthermore, multilinear interactions of different modes of the HSIs can be exactly modeled with the core tensor of the Tucker tensor decomposition. The proposed method is straightforward and easy to implement. Unlike other state-of-the-art approaches, the complexity of the proposed approach is linear with the size of the HSI cube. Experiments on two well-known datasets give promising results when compared with some recent methods from the literature.


Introduction
Hyperspectral imagery utilizes a broad range of the electromagnetic spectrum to obtain information of the imaged scene, allowing better identification of materials or detection of processes. As each band of a HSI contains the spectral response to a narrow interval of the electromagnetic spectrum, it is necessary to collect reflectance from a wider area on the scene, decreasing the spatial resolution of HSIs. HSI acquisition instruments have extensive limitations for capturing high-resolution images, and there is always a tradeoff between spectral resolution and spatial resolution.
HRMSI to produce a HRHSI in a tensor framework. Note that the CP-based rank of a tensor is defined as the minimum of rank-one component tensors that are summed to express that tensor, which is unknown and generally not easy to estimate. Meanwhile, in the Tucker representation there is just one component tensor which is called the core tensor. It can comprehensively express the relations between the various modes and multilinear interactions among them.
Moreover, with the recent success of deep learning techniques in various image processing tasks, an increasing research interest arose for deep learning-based image fusion. In [27] a fusion method based on convolutional neural networks (CNN) was proposed, in which the two CNN branches are devoted to spectral features of the LR-HSI and spatial neighborhood features of the HR-MSI, respectively. In [28], an unsupervised deep approach was proposed for a blind fusion of HRMSI and LRHSI, considering no prior assumptions on spatial and spectral degradation functions. Instead, they are modeled with deep network structures. The largest deficiency of the deep learning-based methods is their requirement of huge amounts of training data, which is practically not available. Furthermore, deep learning-based methods have limited generalization ability regarding the different sensor characteristics such as spectral range and the observational models.
AS HSI is naturally non-negative data, it has a strong prior knowledge to apply to the tensor factorization method, which is expected to further improve the performance of the fusion. In this paper, we extend NMF to a tensor framework, applying a nonnegative tensor decomposition. Contrary to the conventional NMF-based methods, where the spatial, spectral, or their joint structural features must be additionally imposed as constraints, in this paper, the spatio-spectral joint structures of HSIs are preserved without having to impose constraints. We optimally exploit the spectral properties of the LRHSI and the spatial properties of the HRMSI, by directly applying a coupled (non-negative) Tucker decomposition to both images. We will refer to the proposed method as coupled non-negative Tucker decomposition (CNTD). effectively shows the superposition of a spectral background and the anomaly signatures. A coupled CP decomposition based method is proposed in [43], which fused LRHSI and HRMSI to produce a HRHSI in a tensor framework. Note that the CP-based rank of a tensor is defined as the minimum of rank-one component tensors that are summed to express that tensor, which is unknown and generally not easy to estimate. Meanwhile, in the Tucker representation there is just one component tensor which is called the core tensor. It can comprehensively express the relations between the various modes and multilinear interactions among them.
Moreover, with the recent success of deep learning techniques in various image processing tasks, an increasing research interest arose for deep learning-based image fusion. In [27] a fusion method based on convolutional neural networks (CNN) was proposed, in which the two CNN branches are devoted to spectral features of the LR-HSI and spatial neighborhood features of the HR-MSI, respectively. In [28], an unsupervised deep approach was proposed for a blind fusion of HRMSI and LRHSI, considering no prior assumptions on spatial and spectral degradation functions. Instead, they are modeled with deep network structures. The largest deficiency of the deep learning-based methods is their requirement of huge amounts of training data, which is practically not available. Furthermore, deep learning-based methods have limited generalization ability regarding the different sensor characteristics such as spectral range and the observational models.
AS HSI is naturally non-negative data, it has a strong prior knowledge to apply to the tensor factorization method, which is expected to further improve the performance of the fusion. In this paper, we extend NMF to a tensor framework, applying a non-negative tensor decomposition. Contrary to the conventional NMF-based methods, where the spatial, spectral, or their joint structural features must be additionally imposed as constraints, in this paper, the spatio-spectral joint structures of HSIs are preserved without having to impose constraints. We optimally exploit the spectral properties of the LRHSI and the spatial properties of the HRMSI, by directly applying a coupled (non-negative) Tucker decomposition to both images. We will refer to the proposed method as coupled non-negative Tucker decomposition (CNTD). Using the Tucker tensor representation, the proposed method comprehensively models multilinear modes of the HSI, where the core tensor precisely expresses the relations between the various modes. Therefore we proposed this tensor framework to benefit from its ability to preserve spatio-spectral joint structures. Regarding the huge Using the Tucker tensor representation, the proposed method comprehensively models multilinear modes of the HSI, where the core tensor precisely expresses the relations between the various modes. Therefore we proposed this tensor framework to benefit from its ability to preserve spatio-spectral joint structures. Regarding the huge amount of data in hyperspectral image processing and analysis, computation efficiency is a significant factor. Thus, we propose an algorithm which is straightforward and easy to implement, with a complexity, linear with the size of the hyperspectral data cube. The main contributions of this paper are highlighted below.

•
The application of non-negativity priors to the Tucker tensor decomposition of LRHSI and HRMSI, to estimate spectral and spatial mode-dictionaries in a Tucker model, respectively. To the best of our knowledge, this is the first time that a non-negative Tucker decomposition is used to represent hyperspectral images in a HSI fusion framework.

•
The preservation of spatio-spectral joint structures of HSIs without prior knowledge requirements and much lower information losses than matrix frameworks.

•
The construction of an algorithm with lower-order complexity than the state-of-the-art.
The remainder of this paper is organized as follows. Some preliminaries on tensors are given in Section 2. Section 3 formulates the HSI-MSI fusion framework. The proposed coupled non-negative tensor decomposition (CNTD) method for super-resolution HSI is introduced in Section 4. The complexity of the proposed method is elaborated in Section 5. Section 6 describes some experimental results on two well-known datasets, Pavia University and Indian Pines. Finally, conclusions and future work are described in Section 7.

Preliminaries on Tensors
Let us denote a tensor of order N as X ∈ R I 1 ×I 2 ×...×I N , having N indices i 1 , i 2 , . . . , i N and its members by x i 1 i 2 ... i N where 1 ≤ i n ≤ I n . Tensor matricization unfolds a tensor of order N into a matrix. The mode-n matricization of X reorders the elements of X to form the matrix X (n) ∈ R I n ×I n+1 I n+2 ...I N I 1 I 2 ...I n−1 . Tensor matricization can be regarded as an extension to matrix vectorization.
The mode-n product of tensor X ∈ R I 1 ×I 2 ×...×I N and matrix A ∈ R J n ×I n , is defined by M = X × n A ∈ R I 1 ×I 2 ×...×J n ×...×I N , and entries are calculated by: The mode-n product X × n A can also be denoted in matrix form as M (n) = AX (n) . The mode-n product has two important properties: (i) the order of multiple mode-n products with different modes is arbitrary: and for multiple mode-n products with the same modes, the order is relevant: The scalar product of two tensors X , Y indicated as < X , Y > = ∑ i 1 ,i 2 ,...,i N x i 1 ,i 2 ,...,i N i 1 ,i 2 ,...,i N . The Frobenius norm of a tensor X is indicated as X F = √ < X , X >. The Tucker decomposition of X ∈ R I 1 ×I 2 ×...×I N is expressed as mode products of a core tensor U ∈ R K 1 ×K 2 ×...×K N and N mode matrices V (n) ∈ R I n ×K n , which is expressed as: which has an element-wise form given by: The mode-n matricization form of X is expressed by Kronecker products ( ) of the mode-n matricization of the core tensor and mode matrices, as: where U (n) is the mode-n matricization of the core tensor U . The Kronecker product of two matrices A ∈ R I×J and B ∈ R K×L is a matrix denoted by A B ∈ R IK×JL , defined as: Remote Sens. 2021, 13, x FOR PEER REVIEW 5 of 21 where ( ) is the mode-n matricization of the core tensor . The Kronecker product of two matrices ∈ ℝ × and ∈ ℝ × is a matrix denoted by ⨂ ∈ ℝ × , defined as: The following Kronecker product properties and the vectorization operation ( (•)) are used in this paper: All basic notations are presented in Table 1.

HSI-MSI Fusion Problem Formulation
In this paper, the HRHSI, LRHSI, and HRMSI are denoted as tensors of order three. The target HRHSI is denoted by ∈ ℝ × × , where , and are the width, height and number of spectral bands, respectively. The LRHSI and HRMSI are denoted by ∈ ℝ × × and ∈ ℝ × × , respectively, with << , ℎ << and << . In this paper, we aim at estimating a HRHSI in a fusion framework, based on the observations and . In this section, we briefly describe the matrix factorization-based fusion scheme and then detail the tensor decomposition-based scheme.

Matrix Factorization-Based Fusion Scheme
In the matrix factorization-based fusion scheme, each spectral signature of the HRHSI is considered to be a linear mixture of a small number of basis vectors: where, ( ) ∈ ℝ × is the mode-three matricization of . ∈ ℝ × contains the basis vectors in its columns and ∈ ℝ × is the coefficient matrix. Similarly ( ) ∈ ℝ × and ( ) ∈ ℝ × are the mode-three matricization of and , respectively. Conventionally, they can be considered as spectral and spatial down-sampled versions of the target HRHSI. Thus: The following Kronecker product properties and the vectorization operation (vec(·)) are used in this paper: vec All basic notations are presented in Table 1. Hadamard product Mode-n matricization of tensor X X (n) n mode matrix in Tucker decomposition

HSI-MSI Fusion Problem Formulation
In this paper, the HRHSI, LRHSI, and HRMSI are denoted as tensors of order three. The target HRHSI is denoted by Z ∈ R W×H×S , where W, H and S are the width, height and number of spectral bands, respectively. The LRHSI and HRMSI are denoted by Y h ∈ R w×h×S and Y m ∈ R W×H×s , respectively, with w W, h H and s S. In this paper, we aim at estimating a HRHSI in a fusion framework, based on the observations Y h and Y m . In this section, we briefly describe the matrix factorization-based fusion scheme and then detail the tensor decomposition-based scheme.

Matrix Factorization-Based Fusion Scheme
In the matrix factorization-based fusion scheme, each spectral signature of the HRHSI is considered to be a linear mixture of a small number of basis vectors: where, Z (3) ∈ R S×W H is the mode-three matricization of Z. E ∈ R S×k contains the basis vectors in its columns and R S×wh and Y m (3) ∈ R s×W H are the mode-three matricization of Y h and Y m , respectively. Conventionally, they can be considered as spectral and spatial down-sampled versions of the target HRHSI. Thus: where, M ∈ R W H×wh is the point spread function (PSF) matrix and the spatial downsampling in the hyperspectral sensor. It can be separated regarding width and height modes [9]: M = (P 2 P 1 ) T (11) where P 1 ∈ R w×W and P 2 ∈ R h×H are spatial separable down-sampling operators of the width and height spatial modes, respectively. Similarly: where P 3 ∈ R s×S is a matrix, modeling spectral down-sampling in the multispectral sensor. Therefore, the spectral response functions (SRF) of the multispectral sensor is included within its rows. Commonly, in the HSI-MSI fusion problem formulation, the SRF and PSF are assumed to be known [4,9,11,17,25]. the proposed approach in [19] is also used to approximate the SRF and PSF from observed data. Following (10): which, according to (12), can also be written as: In, an optimization framework, based on quadratic regularizing, is presented to estimate P 3 and M from the observed LRHSI and HRMSI: where Γ 1 (·) and Γ 2 (·) are quadratic regularizers, α 1 and α 2 are the respective regularization parameters. See [19] for more details.

Tensor Decomposition-Based Fusion Scheme
As HSIs are naturally 3D data, the tensor is a more efficient representation than a matrix form, and we can benefit from its ability to exploit intrinsic structures of the HSI and multilinear interactions between its different modes. Hence, in this paper, the target HRHSI Z is formally expressed by: which is called the Tucker representation, where W ∈ R W×n w , H ∈ R H×n h and S ∈ R S×n s are width, height and spectral dictionary matrices, respectively. n w , n h and n s are the number dictionary atoms of each mode, and C ∈ R n w ×n h ×n s is the core tensor that denotes the interactions between several modes. Mode-n (n = 1, 2, 3) matricizations of Z are given by: Similarly, both LRHSI and HRMSI are represented by: where W h ∈ R w×n w , H h ∈ R h×n h and S m ∈ R s×n s are width, height, and spectral dictionary matrices, respectively, and E h ∈ R w×h×S and E m ∈ R W×H×s denote Independent-Identically-Distributed (i.i.d.) noise of Y h and Y m , respectively. As the LRHSI and HRMSI are the spatially and spectrally down-sampled form of the HRHSI, respectively, one has:

Proposed CNTD Approach
The goal of a LRHSI and HRMSI fusion framework is to estimate a high spatio-spectral target HSI. Since w W, h H and s S, the super resolution problem is severely ill-posed and prior information is needed to regularize the fusion problem. Orthogonality and statistical independency of basis vectors in the Tucker representation, and sparsity, smoothness, and non-negativity of HSIs are some constraints that help to find a unique solution for the super-resolution problem of HSIs [44][45][46].
In this paper, we propose a new algorithm based on coupled non-negative Tucker decomposition (CNTD). The proposed method performs Tucker tensor decomposition of the LRHSI and HRMSI, subject to NTD. The original NMF method inherently loses spatiospectral joint structure information when unfolding the 3D data into matrix form. Therefore, in this paper, we impose NTD to both tensors of the HSI and MSI directly. The CNTD method effectively combines multiple data tensors, where the intrinsic spatio-spectral joint structures of HSI can be represented without loss and interdependently exploited. The proposed CNTD approach is depicted in Figure 2, illustrating the fusion of the spatial information of HRMSI and the spectral information of LRHSI to produce the HRHSI.

Proposed CNTD Approach
The goal of a LRHSI and HRMSI fusion framework is to estimate a high spatio-spectral target HSI. Since ≪ , ℎ ≪ and ≪ , the super resolution problem is severely ill-posed and prior information is needed to regularize the fusion problem. Orthogonality and statistical independency of basis vectors in the Tucker representation, and sparsity, smoothness, and non-negativity of HSIs are some constraints that help to find a unique solution for the super-resolution problem of HSIs [44][45][46].
In this paper, we propose a new algorithm based on coupled non-negative Tucker decomposition (CNTD). The proposed method performs Tucker tensor decomposition of the LRHSI and HRMSI, subject to NTD. The original NMF method inherently loses spatiospectral joint structure information when unfolding the 3D data into matrix form. Therefore, in this paper, we impose NTD to both tensors of the HSI and MSI directly. The CNTD method effectively combines multiple data tensors, where the intrinsic spatio-spectral joint structures of HSI can be represented without loss and interdependently exploited. The proposed CNTD approach is depicted in Figure 2, illustrating the fusion of the spatial information of HRMSI and the spectral information of LRHSI to produce the HRHSI.  Considering (16), (18), and (19), the LRHSI and HRMSI fusion problem is formulated by the following constrained least square optimization problems: Hyperspectral and multispectral data fusion, based on NTD, is executed by the estimation of the corresponding dictionaries and the core tensor. The non-negative Tucker tensor decomposition is non-convex in its entirety, but it is convex in each of its factors. We easily derive updated rules for each mode-dictionary matrix by matricizing the Tucker model into its corresponding modes. Therefore, these can be considered as conventional NMF for which we use a block coordinate descent scheme. Therefore, the cost functions are optimized with respect to each factor while keeping the others fixed. Traditionally in gradient descent, the learning rates are positive, but since the subtraction of terms in the updated rules can lead to negative elements, Lee and Seung [30] proposed to use adaptive learning rates to avoid subtraction and thus the production of negative elements. Like the conventional NMF, the optimization algorithm is easy to implement and computationally efficient. NTD attempts to decompose a non-negative data tensor into the multilinear products of a non-negative core tensor and non-negative mode-dictionary matrices [47]. To minimize the predefined optimization problems (23) and (24), the multiplicative update rule (MUR) is applied, and can be directly achieved by NMF multiplicative rule, for which the convergence to local optima under the non-negativity constraints has been proven in [30,48].

Updating Mode-Dictionary Matrices
Updating algorithms for each mode-dictionary matrix can be easily derived by matricizing the Tucker model into its corresponding modes. Lee and Seung's multiplicative update rule has been a widely known approach owing to the simplicity of its implementation [30]. We use an extended MUR for the mode-dictionary matrices of Y h . The first mode matricization of Y h is given by: where Y h(1) and C (1) are the first mode matricization of LRHSI (Y h ) and the core tensor (C), respectively. Equation (25) can be rewritten as Y h(1) ≈ αβ, just like the conventional NMF, for which the updating rules are given by [49]: Equation (25) can be treated as the conventional NMF, where each fraction is updated using MUR (26): where the fraction line denotes the element-wise division. Similarly, the second mode matricization of Y h is given by: and the second mode-dictionary matrix is updated as: Finally, the third mode matricization of Y h is given by Y h(3) ≈ SC (3) (H h W h ) T . Therefore, the spectral mode-dictionary (S) is updated as:

Updating Core Tensor
Applying (8) to (25): 1st f raction o f the conventional N MF vec C (1) 2nd f raction o f the conventional N MF which can be treated as the conventional NMF as well. Incorporating MUR to calculate the core tensor (C): Applying (8) to the numerator of (32): Remote Sens. 2021, 13, 2930 10 of 20 and its denominator is given by: As a result, the core tensor C is updated by: When performing MUR for Y m in a similar way, the following updating relations for Y m are obtained: As the LRHSI and HRMSI contain, respectively, spectral and spatial information of the target image, the spectral mode-dictionary S is initialized using the former and the spatial mode-dictionary matrices of the height H and width W are initialized using the latter. The spectral mode-dictionary S is initialized using the simplex identification split augmented Lagrangian (SISAL) algorithm [50], which efficiently identifies a minimum volume simplex containing the LRHSI spectral vectors. The spatial mode-dictionary matrices W and H are initialized from the mode-one and -two matricization of the HRMSI, respectively, via dictionary update cycles of the KSVD method [51]. The core tensor C is initialized using the ADMM framework presented in [9].
To fully exploit its spectral information, the proposed algorithm CNTD starts with applying NTD to the LRHSI. W h and H h are initialized by (20) and (21), respectively, to inherit the reliable spatial information from the HRMSI, while the other variables are fixed. Then, W h , H h , S and C are alternately updated by (27), (29), (30), and (35), respectively, until convergence of the objective function in (23).
The next step of the proposed algorithm is to apply NTD to the HRMSI. S m is initialized by (22), to exploit the spectral information of the LRHSI. In the optimization phase, W, H, S m and C are alternately updated using (36)-(39) while the other variables are fixed, until convergence of the objective function in (24). The super-resolution HSI is calculated using the estimated core tensor and mode-dictionary matrices. Algorithm 1 gives the pseudocode of the proposed CNTD algorithm.

Algorithm 1:
The proposed coupled non-negative tensor decomposition method.

Computational Complexity
In this section, we analyze the computational complexity of the proposed method. According to Algorithm 1, the proposed method includes two sub-optimization problems, which engage MURs to estimate the NTD of Y h and Y m . Each sub-optimization problem mainly contains four updating steps. In each step, the heaviest parts are the multiplications of the core matrix with the output of the Kronecker products. For optimizing the width dictionary, this term is given by C (1) (S H h ) T , which has a complexity order of O(n w n h n s Sh). For the height dictionary, the term C (2) (S W h ) T has a complexity order of O(n w n s n h Sw). For the spectral dictionary, the term C (3) (H h W h ) T has a complexity order of O(n w n s n h hw). Finally, the highest complexity order of the core tensor is O n w n 2 s n 2 h . Similarly, the heaviest parts of the second sub-optimization step have complexity orders O(n w n h n s sH), O(n w n s n h sW), O(n w n s n h HW), and O n w n 2 s n 2 h for W, H, S m , and C, respectively. Given that, LR-HSI and HR-HSI are the spatial and spectral down-sampled versions of HR-HSI, respectively, W, H and S are multiples of w, h and s, respectively. Therefore, the overall computational complexity of the proposed algorithm can be expressed as: From (40), one can observe that the overall computational complexity is linear with the size of the HSI cube (W, H, S). This is just the same as that of the conventional NMF algorithm, owing to the fact that each update step of the proposed CNTD method can be considered as an NMF problem. Of note, the complexity of the proposed method outperforms the other state-of-the-art tensor factorization methods [4,9,23].

Data Sets
The proposed method CNTD is performed on two well-known data sets, depicted in Figure 3. The first data set is the Pavia University image [52], which is captured by the Reflective-Optics-System-Imaging-Spectrometer (ROSIS) optical sensor upon the Pavia University in Italy. The reference HRHSI is a 120 × 120 × 93 image with 1.3 m per pixel spatial resolution. The wavelength domain from 430 to 860 mm is removed because of low SNR and water vapor absorptions. The 30 × 30 × 93 LRHSI is produced by applying a Gaussian spatial blurring filter on each band of the reference image, and down-sampling with a factor of four in both width and height directions. The HRMSI of size 120 × 120 × 4 is produced by filtering the reference image with the IKONOS-like reflectance spectral response function depicted in Figure 4. (see [19] for more details about SRF and PSF).
The Indian Pines image is the second test data set. It was acquired by the NASA Aeronautics-and-Space-Administration-Airborne-Visible-Infrared-Imaging-Spectrometer (AVIRIS) [53] over the Indian Pines scene situated in North-Western Indiana. The ref-erence image of size 120 × 120 × 224 has a wavelength domain from 400 nm to 2500 nm and a 20 m per pixel spatial resolution. We reduced its number of bands to 185 after removing the water absorption and noisy bands (1-4, 104-115, 150-170, 223 and 224). The LRHSI of size 30 × 30 × 185 was constructed after down-sampling and application of the same blurring filters as which were applied to the first data set. The LANDSAT 7-like spectral response function, depicted in Figure 4, is engaged to produce a HRMSI of size 120 × 120 × 6. The reference, LRHSI and HRMSI of both data sets are depicted in Figure 3.

Evaluation Criteria
The performance of the proposed method is evaluated using five different indices. The first index, which is a measure of spectral distortion, is the spectral angle mapper (SAM) in degrees. The angle between pixel spectral responses and of the estimated HRHSI ( ) and the reference image ( ) are calculated, then averaged over all pixels. It is expressed as: the ideal SAM value being zero. The second index is the root mean squared error (RMSE) that evaluates the quality of the estimated HRHSI ( ), compared to the reference image ( ). It is defined as:

Evaluation Criteria
The performance of the proposed method is evaluated using five different indices. The first index, which is a measure of spectral distortion, is the spectral angle mapper (SAM) in degrees. The angle between pixel spectral responses and of the estimated HRHSI ( ) and the reference image ( ) are calculated, then averaged over all pixels. It is expressed as: the ideal SAM value being zero. The second index is the root mean squared error (RMSE) that evaluates the quality of the estimated HRHSI ( ), compared to the reference image ( ). It is defined as:

Evaluation Criteria
The performance of the proposed method is evaluated using five different indices. The first index, which is a measure of spectral distortion, is the spectral angle mapper (SAM) in degrees. The angle between pixel spectral responsesẐ j and Z j of the estimated HRHSI (Ẑ) and the reference image (Z) are calculated, then averaged over all pixels. It is expressed as: the ideal SAM value being zero. The second index is the root mean squared error (RMSE) that evaluates the quality of the estimated HRHSI (Ẑ), compared to the reference image (Z). It is defined as: The third evaluation index is the relative dimensionless global error in synthesis (ERGAS), which measures the spectral quality of the estimated HRHSI, and is defined as: whereẐ i,: and Z i,: denote the ith band ofẐ and Z, respectively. µ Z i,: is the mean of Z i,: . A lower ERGAS value means a lower spectral distortion between the estimated HRHSI (Ẑ) and the reference image (Z). In the case of perfect reconstruction, it is zero. The degree of the distortion (DD) is the fourth index, defined as: where · 1 is 1 norm, and vec(Z ) and vec Ẑ are the vectorization of tensors Z andẐ, respectively. The smaller the value of DD, the lower the spectral distortion. The fifth index is the universal image quality index (UIQI) [54]. It is calculated by averaging over 32 × 32 windows. The UIQI between the ith band ofẐ and Z is calculated by: where d is the number of windows,Ẑ i j and Z i j denote the jth window of the ith band ofẐ and Z, respectively, σ Z i jẐ i j is the sample covariance between Z i j andẐ i j , µ Z i j and σ Z i j are the mean and standard deviations of Z i j , respectively. The UIQI index betweenẐ and Z is the average UIQI value of all bands, which is: The ideal UIQI value is one. All of the experiments were performed using MATLAB version R2016a, and have been run by a computer with an Intel Core i5 central processing unit (CPU) at 3.4 GHz and 32 GB random access memory (RAM).

Evaluation of the Parameters
In order to evaluate the sensitivity of the proposed CNTD approach w. r. t. and its essential parameters, i.e., the number of mode (width, height, and depth) dictionary atoms n w , n h and n s , the proposed method has been performed for different numbers of mode-dictionary atoms. Figure 5a-c shows the RMSE of the estimated Pavia University and Indian Pines data sets as functions of the number of mode-dictionary atoms n w , n h and n s , respectively. As Figure 5a,b shows, the RMSE of both data sets strongly declines when n w and n h increase from 5 to 200. After that, the RMSE does not improve any further. Therefore, we chose to set the values of n w and n h to 167 for both data sets for all remaining experiments. As can be seen from Figure 5c, the RMSE for Pavia University decreases while n s increases from 5 to 40. For Indian Pines, the RMSE decreases as n s increases from 5 to 100, after which it does not decrease any further. Hence, we set n s to 30 for both aforementioned data sets. That the proposed CNTD method requires larger width and height mode-dictionaries compared to the spectral mode-dictionary is because generally the HSIs spectral vectors belong to a much lower dimension subspace.

Comparison with State of the Art Fusion Methods
The proposed fusion method is compared with state-of-the-art methods. In order to see the impact of non-negative priors, the CNTD method is compared with tensor decomposition methods CSFT [9] and NLSTF [23], which do not include non-negativity. Additionally, the proposed CNTD approach is compared with the well-known matrix framework CNMF [21], aiming to evaluate the ability of the proposed CNTD approach to preserve the spatio-spectral structure. Furthermore, our Tucker based method is compared with a CP tensor decomposition method, referred to as STEREO [43]. Finally, we have compared with the two-branched CNN method of [28].
The experiments validate the superiority of the proposed CNTD method on three aspects: the advantage of non-negative priors, the ability to preserve the spatio-spectral structure, and the computational complexity. RMSE, SAM, DD, ERGAS and UIQI for all approaches are shown in Tables 3 and 4 for the Pavia University and Indian Pines data sets, respectively. To evaluate the computational complexity of the proposed method in comparison with the matrix-based method and the other tensor frameworks, the computation times of CNMF, CSTF, NLSTF and the proposed CNTD are shown in Table 5. The best values are depicted in bold.
It can be observed from Tables 3 and 4 that the proposed method outperforms the other competing methods in terms of RMSE, DD, and UIQI indices, and shows promising results for the other indices. The proposed method outperforms CNMF, CNN and STE-REO in almost all of the indices on both data sets. Of note, the efficiency of the CNN method highly depends on the training sample rate, while a huge amount of hyperspectral training data is practically unavailable. Furthermore, the proposed CNTD method outperforms the CSFT for most indices on both data sets. It also has better values of RMSE, DD, and UIQI than NLSTF.

Comparison with State of the Art Fusion Methods
The proposed fusion method is compared with state-of-the-art methods. In order to see the impact of non-negative priors, the CNTD method is compared with tensor decomposition methods CSFT [9] and NLSTF [23], which do not include non-negativity. Additionally, the proposed CNTD approach is compared with the well-known matrix framework CNMF [21], aiming to evaluate the ability of the proposed CNTD approach to preserve the spatio-spectral structure. Furthermore, our Tucker based method is compared with a CP tensor decomposition method, referred to as STEREO [43]. Finally, we have compared with the two-branched CNN method of [28].
The experiments validate the superiority of the proposed CNTD method on three aspects: the advantage of non-negative priors, the ability to preserve the spatio-spectral structure, and the computational complexity. RMSE, SAM, DD, ERGAS and UIQI for all approaches are shown in Tables 2 and 3 for the Pavia University and Indian Pines data sets, respectively. To evaluate the computational complexity of the proposed method in comparison with the matrix-based method and the other tensor frameworks, the computation times of CNMF, CSTF, NLSTF and the proposed CNTD are shown in Table 4. The best values are depicted in bold.
It can be observed from Tables 2 and 3 that the proposed method outperforms the other competing methods in terms of RMSE, DD, and UIQI indices, and shows promising results for the other indices. The proposed method outperforms CNMF, CNN and STEREO in almost all of the indices on both data sets. Of note, the efficiency of the CNN method highly depends on the training sample rate, while a huge amount of hyperspectral training data is practically unavailable. Furthermore, the proposed CNTD method outperforms the CSFT for most indices on both data sets. It also has better values of RMSE, DD, and UIQI than NLSTF. As can be observed from Table 4, the proposed method is much more computationally efficient than the competing tensor-based approaches, and is comparable to the CNMF method. This agrees with the detailed description of the computational complexity in Section V. In contrast to the proposed method, for some of the state-of-the-art methods, such as CSTF and NLTF, the complexity increases faster than that which is linear with the size of the HSI cube. In order to validate the performance with respect to preserving spatial structures, in Figure 6, band 30 of the LRHSI and the estimated HRHSI with CNMF, CSTF, NLSTF, CNN, STEREO and the proposed CNTD are compared with the reference HRHSI. It can be observed that the proposed CNTD approach can correctly estimate most of the spatial details of the HRHSI, though there are a few distortions in the fusion results. Additionally, the error images of band 30, which reflect the differences between the estimated HRHSI and the reference image of both data sets are shown in Figure 7. The error images of the LRHSI, CNMF, CSTF, NLSTF, CNN, STEREO and the proposed CNTD are depicted. The proposed approach estimates spatial details of the HRHSI with much lower error than the competing methods. With CNMF and CNN, the edge structures of the HSI are lost, while CSTF, NLSTF and STEREO suffer from errors in homogeneous regions. The proposed approach performs better in preserving the spatial structures of HSIs at both edges and homogeneous regions.

Conclusions
The main objective of this paper was to extend the matrix formulation of non-negative matrix factorization to a tensor framework for the purpose of hyperspectral and multispectral image fusion. We proposed a coupled non-negative tensor decomposition approach that can be treated as a conventional NMF-based model. The proposed approach performs a tucker tensor factorization of a LRHSI and a HRMSI under the constraint of non-negative tensor decomposition. Unlike other state of the art methods, the complexity of the proposed approach is linear with the size of the HSI cube. The proposed approach gives competitive results with the state-of-the-art fusion approaches. As a future plan, we will incorporate prior information, such as spectral self-similarity, sparsity, smoothness and local consistence in the non-negative tensor decomposition, in order to find better

Conclusions
The main objective of this paper was to extend the matrix formulation of non-negative matrix factorization to a tensor framework for the purpose of hyperspectral and multispectral image fusion. We proposed a coupled non-negative tensor decomposition approach that can be treated as a conventional NMF-based model. The proposed approach performs a tucker tensor factorization of a LRHSI and a HRMSI under the constraint of non-negative tensor decomposition. Unlike other state of the art methods, the complexity of the proposed approach is linear with the size of the HSI cube. The proposed approach gives competitive results with the state-of-the-art fusion approaches. As a future plan, we will incorporate prior information, such as spectral self-similarity, sparsity, smoothness and local consistence in the non-negative tensor decomposition, in order to find better unique basis vectors for the Tucker representation. Furthermore, in this paper we assumed the SRF and PSF to be known and also two input images considered to be registered. Therefore, we will try to overcome these limitations in our future work.
Author Contributions: M.Z.: conceptualization, methodology, software, writing, review and editing. M.S.H.: supervision. K.K. and P.S.: investigation, writing, review and editing. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.