SSML: Spectral-Spatial Mutual-Learning-Based Framework for Hyperspectral Pansharpening

This paper considers problems associated with the large size of the hyperspectral pansharpening network and difficulties associated with learning its spatial-spectral features. We propose a deep mutual-learning-based framework (SSML) for spectral-spatial information mining and hyperspectral pansharpening. In this framework, a deep mutual-learning mechanism is introduced to learn spatial and spectral features from each other through information transmission, which achieves better fusion results without entering too many parameters. The proposed SSML framework consists of two separate networks for learning spectral and spatial features of HSIs and panchromatic images (PANs). A hybrid loss function containing constrained spectral and spatial information is designed to enforce mutual learning between the two networks. In addition, a mutual-learning strategy is used to balance the spectral and spatial feature learning to improve the performance of the SSML path compared to the original. Extensive experimental results demonstrated the effectiveness of the mutual-learning mechanism and the proposed hybrid loss function for hyperspectral pan-sharpening. Furthermore, a typical deep-learning method was used to confirm the proposed framework’s capacity for generalization. Ideal performance was observed in all cases. Moreover, multiple experiments analysing the parameters used showed that the proposed method achieved better fusion results without adding too many parameters. Thus, the proposed SSML represents a promising framework for hyperspectral pansharpening.

Keywords:

deep learning; image fusion; hyperspectral pansharpening; deep mutual learning

1. Introduction

HSIs usually contain information on tens to hundreds of continuous spectral bands in the target area. Therefore, HSIs have a high spectral resolution but lower spatial resolution due to hardware limitations. In contrast, PANs are usually single-band images in the visible range, having high spatial resolution but low spectral resolution. Pansharpening involves the reconstruction of low-resolution (LR) HSIs and high-resolution (HR) PANs to generate HR-HSIs, and has been widely used in image classification [1], target detection [2], and road recognition [3].

Traditional HSI pansharpening technologies can be broadly divided into four categories: component substitution-based methods [4,5], model-based methods [6,7], multi-resolution analysis [8], and hybrid methods [9]. Each of these categories has certain limitations. Component substitution-based methods can cause certain types of spectral distortion; multi-resolution analysis-based methods require complex calculations; hybrid methods combine component substitution and multi-resolution analysis, thus providing good spectral retention but fewer spatial details; and, finally, model-based methods are limited by network parameter number and computational complexity.

In recent years, deep learning has been widely used in the field of image processing [10,11,12,13,14,15,16], while pansharpening has been at the primary stage of exploration [17]. Yang et al. [18] proposed a convolutional neural network (CNN) for pansharpening (PanNet), which was performed via ResNet [19] in the high-pass filter domain. Zhu et al. [20] designed a spectral attention module (SeAM) to extract the spectral features of HSIs. Zhang et al. [21] designed a residual channel attention module (RCAM) to solve the spectral reconstruction problem. However, as is well-known, CNNs can learn one feature more easily than multiple features, and have fewer parameters. Moreover, in the feature extraction process, simultaneous learning of multiple features is affected by the features’ effects on each other. To reduce the influence of these effects, Zhang et al. [22] improved classification results by measuring the difference in the probabilistic behavior between the spectral features of two pixels. Xie et al. [23] used the mean square error (MSE) loss and spectral angle mapper (SAM) loss to constrain spatial and spectral feature losses, respectively. Qu et al. [15] proposed a residual hyper-dense network and a CNN with cascade residual hyper-dense blocks. The former network extends Denset to solve the problem of spatial spectrum fusion. The latter network allows direct connections between pairs of layers within the same stream and those across different streams, which means that it learns more complex combinations between the HS and PAN images.

The above studies show that the better the spatial and spectral feature learning, the better the fusion result for deep-learning-based hyperspectral pansharpening methods. However, it is well known that hyperspectral images contain a large amount of data because of many bands. Thus, it is a challenge for the hyperspectral pansharpening method to fully learn and utilize the spatial and spectral features without increasing computation excessively. Commonly, single feature learning is easier than multiple feature learning, while multiple collaborative learning is more effective than single feature learning. Inspired by mutual learning, in this paper, we explore a novel pansharpening method that learns the spatial and spectral characteristics separately and establishes the relationship between them to learn from each other to achieve desirable results.

In recent years, a deep mutual-learning strategy (DML) [24] has been proposed for image classification, and includes multiple original networks that mutually learn from each other. This unique training strategy has great potential for multi-feature learning of a single task using few parameters. It therefore has research value in the field of HSI pansharpening. To the authors’ knowledge, there has been no application of DML to HSI pansharpening.

This paper proposes a deep mutual-learning framework integrating spectral-spatial information-mining (SSML) for HSI pansharpening. In the SSML framework, two simple networks, a spectral and a spatial network, are designed for mutual learning. The two networks learn different features independently; for instance, the spectral network captures only spectral features, while the spatial network focuses only on spatial details. Then, the DML strategy enables them to learn each other’s features. In addition, a hybrid loss function is derived by constraining spectral and spatial information between the two networks. The main contributions of this paper are summarized below:

This paper proposes an SSML framework which introduces a DML strategy into HSI pansharpening for the first time; four cross experiments are performed to verify the proposed SSML framework’s effectiveness, and the network’s generalization ability is confirmed by the latest research results in the field of HSI generalization sharpening.
A hybrid loss function, which considers the HSI characteristics, is designed to enable each network in the SSML framework to learn a certain feature independently, thus improving its overall performance so that the SSML framework can successfully generate a high-quality HR-HSI.

The rest of the paper is organized as follows. Section 2 presents related work, while Section 3 introduces the proposed SSML. Section 4 describes and analyzes the experimental results. Finally, Section 5 concludes the paper with a short overview of its contributions to research.

2. Related Work

The DML strategy [24] was initially proposed for image classification, but, after several years of development, it has been applied in many fields [25,26,27]. The DML strategy uses a mutual-loss learning function, which allows multiple small networks to learn the same task together under different initial conditions, thereby improving the performance of each of the networks [24]. For classification problems, Kullback–Leibler (KL) divergence [28] has often been used as a mutual learning loss function in the DML because it can calculate the asymmetric measure of the probability distribution between two networks; it is defined by:

D_{K L} (p_{i} | | p_{j}) = \sum_{i = 1}^{N} \sum_{m = 1}^{M} p_{i}^{m} (x_{i}) l o g \frac{p_{j}^{m} (x_{i})}{p_{i}^{m} (x_{i})}

(1)

where

D_{K L} (p_{i} | | p_{j})

calculates the distance from

p_{j}

to

p_{i}

However, in the field of HSI pansharpening, it is usually necessary to evaluate the image quality rather than the probability distribution of pixels. HSIs have a high correlation between pixels in each band. Therefore, it is necessary to consider other loss functions as the mutual learning loss function instead of the KL divergence. Traditionally, MSE and SAM [29] have been used to evaluate the spatial quality and spectral distortion of HSIs. Therefore, the effects of the MSE and SAM on the proposed SSML framework’s performance are examined in this paper.

3. Method

This section describes the proposed SSML framework and introduces the hybrid loss function.

In general, the HSI pansharpening problem can be considered a process in which a network generates an HR-HSI

H_{H R}

by inputting an LR-HSI

H_{L R}

and an HR-PAN

P_{H R}

, and using the loss function constraint to network learning, which can be expressed as:

ℓ (θ) = ∥ M (H_{L R}, P_{H R}; θ) - H_{H R} ∥

(2)

where

M (\cdot)

represents the mapping function between a CNN’s input and output data,

θ

denotes the parameters to be optimized, and

ℓ (θ)

is the loss function.

3.1. Image Preprocessing

As shown in Figure 1, the proposed framework first performs bicubic interpolation on an LR-HSI H to obtain the

H_{u p}

, which has the same size as HR-PAN P [30]. Then a contrast-limited adaptive histogram equalization is applied to the image P to obtain

P_{g}

with richer edge details [31,32]. Finally,

H_{i n i}

is obtained by injecting

P_{g}

into

H_{u p}

through guided filtering, that is

H_{i n i} = G (P_{g}, H_{u p})

, for enhancing the spatial details of HSIs.

Figure 1. The structure of the proposed SSML framework.

3.2. SSML Framework

As previously mentioned, the proposed SSML framework includes two networks, a spectral network, and a spatial network. They use specific structures to extract specific features—for instance, residual blocks for extracting spatial features and channel attention blocks for extracting spectral features. In addition, they constrain each other to learn other features by minimizing the hybrid loss function. Without loss of generality, their structures are designed to be universal and simple, as shown in Figure 2. The spectral network uses a spectral attention structure to extract spectral information, while the spatial network adopts residual learning and a spatial attention structure to capture spatial information.

Figure 2. (a) The structure of the spectral network (

S_{1}

), (b) the structure of the spatial network (

S_{2}

).

Two popular structures of the spectral network are illustrated in Figure 3a,b. The specific settings of the network are shown in Table 1. RCAM uses four convolutional layers, the size of the convolution kernel of the first two layers is 3 × 3, and the size of the last two layers is 1 × 1. The sigmoid function is used to process the feature map of the four convolutional layers, which is multiplied by the convolution result of the second layer. Then the results and input are processed in element-wise addition. The SeAM is divided into two branches after the convolution of the first two layers, which are the same as RCAM. The structure of the first branch is the same as that of the third and fourth layers of RCAM. The second branch replaces AvgPooling in the first branch with MaxPooling. The results of the two branches are processed in element-wise addition, and the subsequent steps are similar to RCAM.

Figure 3. The structures of the candidate networks. (a) RCAM; (b) SeAM; (c) traditional ResNet; (d) MSRNet.

Table 1. The specific parameter settings of the spatial network.

As for the spectral structure, most of them have been designed using the pooling operation and then stimulated. The equation is:

s = f (P (F))

(3)

where f represents the stimulated process,

P (\cdot)

indicates the pooling operation. Then, by multiplying

s_{i}

by F, a new feature map

\hat{F}

can be obtained as follows:

\hat{F_{i}} = s_{i} \otimes F_{i}

(4)

where

s_{i}

and

F_{i}

represent the weight and feature map of the ith feature.

Two popular spatial network structures are presented in Figure 3c,d. The specific settings of the network are shown in Table 2. ResNet uses two convolutional layers of equal size. The convolution kernel size is 3 × 3, and the convolution result and input are processed by element-wise addition. The first layer of MSRNet uses a size of 1 × 1 convolution kernels. The convolution results are chunked into four feature maps of equal size, which are sent to four corresponding branches for convolution operations. The first branch uses a convolution layer size of 1 × 1. Branches 2, 3, and 4 added a Relu layer and convolution compared with the previous branch. Finally, the results of the four branches are concatenated and a 1 × 1 convolution is used in the last layer.

Table 2. The specific parameter settings of the spectral network.

Assume H denotes an HR-HSI and

H^{'}

denotes an LR-HSI and suppose there is a residual

r e s_{c n n}

in H and

H^{'}

, which is expressed as:

H - H^{'} = r e s_{c n n}

(5)

A CNN can be used to learn

r e s_{c n n}

between H and

H^{'}

, and H can be obtained from

r e s_{c n n}

and

H^{'}

as follows:

H = H^{'} + r e s_{c n n}

(6)

The typical structure of the ResNet, which usually learns the residuals between the target and input data, is presented in Figure 3c. In contrast, Figure 3d shows a multi-scale ResNet (MSRNet), which learns feature maps with larger receptive fields by combining different convolution kernels.

3.3. Hybrid Loss Function

Inspired by KL divergence, this paper defines a hybrid loss function for the SSML framework according to the characteristics of the two networks in the proposed framework, forcing them to learn from each other. The hybrid loss function is defined by:

L_{S_{1}} = L_{M} (y, {\hat{y}}_{1}) + λ_{1} L_{s p a} ({\hat{y}}_{1}, {\hat{y}}_{2})

(7)

L_{S_{2}} = L_{M} (y, {\hat{y}}_{2}) + λ_{2} L_{s p e} ({\hat{y}}_{2}, {\hat{y}}_{1})

(8)

where

{\hat{y}}_{1}

is the prediction of

S_{1}

,

{\hat{y}}_{2}

is the prediction of

S_{2}

, y is the ground truth,

λ_{1}

and

λ_{2}

are the weights of the hybrid loss function,

L_{s p a}

and

L_{s p e}

are additional loss functions that constrain spatial information and spectral information, respectively, and

L_{M}

is the main loss function to constrain the whole network.

In the two networks in the SSML framework, the

L_{1}

-norm is used as the main loss function (

L_{M}

) due to its good convergence [33], and is defined by:

L_{M} (y, \hat{y}) = {∥ y, \hat{y} ∥}_{1}

(9)

For spectral feature learning in the

S_{1}

network,

L_{s p a}

chooses the MSE to constrain the spatial information loss between y and

\hat{y}

as follows:

L_{s p a} (y, \hat{y}) = \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(10)

Similarly, for spatial feature learning in the

S_{2}

network,

L_{s p e}

chooses the SAM to constrain the spectral information loss between y and

\hat{y}

.

L_{s p e} (y, \hat{y}) = \frac{1}{n} \sum_{i = 1}^{n} a r c c o s (\frac{⟨ y_{i}^{v}, {\hat{y}}_{i}^{v} ⟩}{∥ y_{i}^{v} ∥, ∥ {\hat{y}}_{i}^{v} ∥})

(11)

Finally, the SSML framework alternately updates the weights of

θ_{S_{1}}

and

θ_{S_{2}}

using the SGD as follows:

θ_{S_{1}} \leftarrow θ_{S_{1}} + r \frac{\partial (L_{1} (y, \hat{y_{1}}) + λ_{1} L_{s p a} (\hat{y_{1}}, \hat{y_{2}}))}{\partial θ_{S_{1}}}

(12)

θ_{S_{2}} \leftarrow θ_{S_{2}} + r \frac{\partial (L_{1} (y, \hat{y_{2}}) + λ_{2} L_{s p e} (\hat{y_{2}}, \hat{y_{1}}))}{\partial θ_{S_{2}}}

(13)

4. Results

4.1. Datasets and Metrics

The proposed method was evaluated on two public datasets, CAVE [34] and Pavia Center [35]. In CAVE, the wavelength range was 400 nm–700 nm, the resolution was

512 \times 512

, and there were 31 bands for a total of 32 HSIs. In Pavia Center, the range was 430 nm–860 nm, the resolution was

1096 \times 708

, and 102 bands were used for one HSI. In training,

60 %

of the overall data was selected as a training set, and the remaining data were used as a test set. Before training, the Wald protocol [30] was adopted to obtain LR-HSIs through down-sampling. In the training set, the data size was

32 \times 32

bands, and the batch size was 32. In testing, the original image size was the same as the input size. All networks were developed using the PyTorch framework, and the experiments were performed on NVIDIA GeForce GTX 2080ti GPU. In training, SGD’s weight decay was

10^{- 5}

, the momentum was

0.9

, the learning rate was

0.1

, the number of iterations was

2 \times 10^{4}

, and the learning rate was reduced by half every 1000 iterations. The proposed method was implemented in Python 3.7.3.

The performance of the proposed method was analyzed both quantitatively and visually. The evaluation indicators used in the performance analysis included the SAM [29], peak signal-to-noise ratio (PSNR) [36], correlation coefficient (CC) [37], erreur relative globale adimensionnelle de synthèse (ERGAS) [38] and root mean squared error (RMSE) [39]. These metrics reflect the image similarity, image distortion, spectral similarity, spectral distortion, and the difference between the fused image and the reference image, respectively, which are described below.

Peak signal-to-noise ratio (PSNR): The peak SNR (PSNR) is used to evaluate the spatial quality of the fused image in the unit of the band. The PSNR of the kth band is defined as

PSNR = 10 {log}_{10} (\frac{max {(R_{k})}^{2}}{\frac{1}{H W} {∥R_{k} - Z_{k}∥}_{2}^{2}})

(14)

where H and W represent the height and width dimensions with the reference image, respectively.

R_{k}

and

Z_{k}

represent the reference image and the fused image of the kth band.

{∥ \cdot ∥}_{2}

refers to the two-norm. The final PSNR is the average of the PSNRs of all bands. The higher the PSNR, the better the performance.

Correlation coefficient (CC): This is mainly used to score the similarity of the content between two images, which is defined as

CC = \frac{\sum_{i = 1}^{M} \sum_{j = 1}^{N} (Z (i, j) - \bar{Z}) (R (i, j) - \bar{R})}{\sqrt{(\sum_{i = 1}^{M} \sum_{j = 1}^{N} {(Z (i, j) - \bar{Z})}^{2}) (\sum_{i = 1}^{M} \sum_{j = 1}^{N} {(R (i, j) - \bar{R})}^{2})}}

(15)

where

R (i, j)

and

Z (i, j)

denote the spectral vector of the reference image and the fused image, respectively, at the pixel position of

(i, j)

. The CC in HSI fusion is calculated as the average over all bands. The larger the CC is, the better the fusion image can be. Spectral angle mapper (SAM): The SAM is generally utilized to evaluate the degree of spectral information preservation at each pixel, which is defined as

SAM = arccos (\frac{⟨ R (i, j), Z (i, j) ⟩}{{∥ R (i, j) ∥}_{2} {∥ Z (i, j) ∥}_{2}})

(16)

where

⟨ R (i, j), Z (i, j) ⟩

refers to the inner product of

R (i, j)

and

Z (i, j)

; the overall SAM is the average of the SAMs of all pixels. The lower the SAM, the better the performance.

Erreur relative globale adimensionnelle de synthèse (ERGAS): The ERGAS is specially designed to assess the quality of high-resolution synthesized images, and measures the global statistical quality of the fused image. It is defined as

ERGAS = \frac{100}{r} \sqrt{\frac{1}{L} \sum_{k = 1}^{L} \frac{{∥R_{k} - Z_{k}∥}_{2}^{2}}{μ^{2} (R_{k})}}

(17)

where r refers to the ratio of the spatial downsampling ratio from HR-HSI to LR-HSI. u

(R_{k})

denotes the mean value of the reference image of the kth band. The smaller the ERGAS, the better the performance.

Root mean squared error (RMSE): RMSE can be used to measure the difference between R and Z, which is defined as

RMSE = \sqrt{\frac{\sum_{k = 1}^{L} \sum_{i = 1}^{H} \sum_{j = 1}^{W} {(R_{k} (i, j) - Z_{k} (i, j))}^{2}}{H W L}}

(18)

where L represents the number of spectral bands.

R_{k}

(i, j) and

Z_{k}

(i, j) denote the element value at spatial location

(i, j)

in band k of the reference image and the fused image.The smaller the root mean squared error (RMSE), the better the performance.

4.2. DML Strategy Validation for Different Cases

The comparison results of the SSML framework for different deep networks are presented in Table 3 and Table 4. Four cases were analyzed: The

S_{1}

network uses RCAM or SeAM, and the

S_{2}

network uses MSRNet or ResNet. Depending on the experience, it was set that

λ_{1} = 50

and

λ_{2} = 0.8

.

Table 3. Comparison results of the SSML for different deep networks on the CAVE dataset.

Table 4. Comparison results of the SSML for different deep networks on the Pavia Center dataset.

As shown in Table 3 and Table 4, the performance of

S_{1}

and

S_{2}

networks in the SSML exceeded that of the original network in most cases. Without loss of generality, the loss value curve of the SSML, having

S_{1}

with the SeAM and

S_{2}

with the ResNet, was analyzed at the Pavia Center to determine the reasons for the advantage of the DML strategy. A comparison of the loss value curves of

S_{1}

in the SSML and original

S_{1}

during 5000 training iterations on the Pavia Center is presented in Figure 4a, and their difference curve is presented in Figure 4b. As shown in Figure 4b, the loss values of

S_{1}

in the SSML were slightly higher than those of the original

S_{1}

before 1000 iterations; however, after 1000 iterations, the loss values of

S_{1}

in the SSML were lower than those of the original

S_{1}

. Thus, it can be concluded that the SSML had a slow convergence speed in the early training stage because of the alternate optimization. Nonetheless, it exhibited advantages of minimum loss value and convergence speed with increase in the training iteration number. This indicates that introducing the DML strategy in the SSML can help to achieve better results in HSI pansharpening.

Figure 4. (a) Loss function curves of the SeAM (

S_{1}

) of the SSML framework and the original SeAM during 5000 iterations of training; (b) the difference curve between the loss values of the original SeAM and the SeAM (

S_{1}

) in the SSML framework during 5000 training iterations.

4.3. Effect of the Number of Training Samples

This experiment investigated the effect of the proportion of the training set on the fusion effect. Usually, deep-learning-based hyperspectral image sharpening training sets and test sets select

60 %

and

40 %

content, respectively. In the experiment,

50 %

and

50 %

,

60 %

and

40 %

, and

70 %

and

30 %

were selected for the training and testing sets, respectively. The number of iterations, learning rate, and other parameters was the same. Each group of experiments was repeated 10 times; the experimental results are shown in Table 5. It can be seen that when

60 %

of the training samples were selected, the training samples were moderated, and the fusion results were improved. Therefore,

60 %

and

40 %

of the training and testing sets were selected for subsequent experiments.

Table 5. Experimental results of different proportions of training samples.

4.4. Comparisons with Advanced Methods

The proposed SSML was compared with five state-of-the-art methods, including three traditional methods, namely, CNMF [6], Bayesian naive [7], GFPCA [9], and two deep-learning-based methods, namely, PanNet [18] and DDLPS [40]. The two deep-learning methods and our method were repeated 10 times for each group of experiments. The experiments were performed on the CAVE and Pavia Center datasets.

4.4.1. Results on CAVE Data Set

The results of different methods on the CAVE dataset are presented in Figure 5, Figure 6 and Figure 7. The result in Figure 5b denotes a fuzzy visualization result; Figure 5d is too sharp, and Figure 5e has a color difference. In colormap, Figure 5a includes a large area of spectral distortion on the surface of the balloon; Figure 5b,c,e have significant spectral distortions at the edges.

Figure 5. The visual results of different methods on the CAVE dataset. (a) CNMF; (b) Bayesian naive; (c) GFPCA; (d) PanNet; (e) DDLPS; (f) Original RCAM; (g) Original MSRNet; (h)

S_{1}

(RCAM) in the SSML framework; (i)

S_{2}

(MSRNet) in the SSML framework; (j) Ground truth. Note that the false color image is selected for clear visualization (red: 30, green: 20, and blue: 10). The even rows show the difference maps of the corresponding methods.

Figure 6. The visual results of different methods on the CAVE dataset. (a) CNMF; (b) Bayesian naive; (c) GFPCA; (d) PanNet; (e) DDLPS; (f) Original RCAM; (g) Original ResNet; (h)

S_{1}

(SeAM) in the SSML framework; (i)

S_{2}

(ResNet) in the SSML framework; (j) Ground truth. Note that the false color image is selected for clear visualization (red: 30, green: 20, and blue: 10). The even rows show the difference maps of the corresponding methods.

Figure 7. The visual results of different methods on the CAVE dataset. (a) CNMF; (b) Bayesian naive; (c) GFPCA; (d) PanNet; (e) DDLPS; (f) Original SeAM; (g) Original ResNet; (h)

S_{1}

(RCAM) in the SSML framework; (i)

S_{2}

(ResNet) in the SSML framework; (j) Ground truth. Note that the false color image is selected for clear visualization (red: 30, green: 20, and blue: 10). The even rows show the difference maps of the corresponding methods.

The results of the SSML framework with the (SeAM and ResNet) hybrid function and the other methods are presented in Figure 6. There is a certain spectral distortion in Figure 6h,i, which was generated by

S_{1}

(SeAM) and

S_{2}

(ResNet) in the SSML framework, but was lower than that of the other methods. The results of the SSML framework with the (RCAM and ResNet) hybrid function and the other methods are presented in Figure 6. The results in Figure 6h,i had higher visual image quality than the other results.

Table 6 and Table 7 show the evaluation indicators for the proposed method and several state-of-the-art methods. As shown in Table 6, CNMF, Bayesian naive, and GFPCA are not deep-learning methods. The results were stable, and the time was short, but the methods were found to be not as effective as the deep-learning methods. The SSML framework with S1 (RCAM) had slightly lower values of the ERGAS and RMSE than the original RCAM; in most cases, the SSML framework with S1 (RCAM) and SSML S2 (MSRNet) achieved better results than the other methods for all evaluation indicators. Regarding time consumption, the proposed method framework was much shorter in duration than DDLP and slightly higher than PanNet, but fusion performance was improved.

Table 6. The quality indicator results of different methods on the CAVE data set.

Table 7. The quality indicator results of different methods on the Pavia Center data set.

4.4.2. Results on Pavia Center Dataset

The results of different methods on the Pavia Center dataset are presented in Figure 8. The SSML framework used the (RCAM and MSRNet) hybrid function. The colormaps in Figure 8a,c,d indicate that the corresponding methods performed relatively poorly in dealing with the shadow part; in Figure 8e, certain details, such as the river surface, are missing. In Figure 8h,i, it can be seen that the proposed framework improved image details on the image compared to the original network. This also demonstrates the effectiveness of the proposed hybrid loss function in the mutual learning strategy.

Figure 8. The visual results of different methods on the Pavia Center dataset. (a) CNMF; (b) Bayesian naive; (c) GFPCA; (d) PanNet; (e) DDLPS; (f) Original RCAM; (g) Original MSRNet; (h)

S_{1}

(RCAM) in the SSML framework; (i)

S_{2}

(MSRNet) in the SSML framework; (j) Ground truth. Note that the false color image is selected for clear visualization (red: 70, green: 53, and blue: 19). The even rows show the difference maps of the corresponding methods.

As presented in Table 7, the indicator results of the proposed SSML framework were better than those of the comparison methods. Compared with the original networks, the SSML achieved obvious improvements for all indicators, which demonstrated the effectiveness of the proposed hybrid loss function in the mutual learning strategy.

4.5. Hybrid Loss Function Analysis

In this section, the reason for using a hybrid loss function consisting of two different loss functions (e.g., Equations (12) and (13)) instead of a single mutual learning loss function is explained. We compare the proposed SSML framework with the typical DML model [24].

Table 8 shows the effect of different mutual learning loss functions on the model performance. The SSML framework used the combination of the SeAM (

S_{1}

) and MSRNet (

S_{2}

) functions on the CAVE dataset. When

S_{1}

and

S_{2}

used the (

L_{1} + S A M

) loss function, there was a positive effect on

S_{2}

but a negative effect on

S_{1}

. The reason was that

S_{1}

paid more attention to spectral features and no more spatial features could be learned from

S_{2}

, while

S_{2}

did the opposite. When

S_{1}

and

S_{2}

used the (

L_{1} + M S E

) loss function,

S_{1}

used its own spectral feature learning advantage and obtained spatial information form

S_{2}

, which yielded good results in the PSNR and SAM. Thus, the experimental results demonstrated the feasibility of the proposed hybrid loss function.

Table 8. Effects of different loss functions.

4.6. Generalization Ability of SSML

To verify the generalization ability of the proposed SSML framework, we applied the SSML framework to the state-of-the-art residual hyper-dense network (RHDN) method [15]. The original fusion results of the RHDN method were used as

H_{i n i}

in the SSML framework, as shown in Figure 1. Then the spectral

S_{1}

, spatial

S_{2}

networks, and their hybrid loss functions based on mutual learning strategies, were used to transfer information of different features to improve the results.

In experiments performed, we used the Pavia Center dataset-added, which was divided into 160 × 160 image blocks for training the RHDN method. As shown in Figure 9, four cases were also analyzed: the

S_{1}

network used RCAM or SeAM, and the

S_{2}

network used MSRNet or ResNet. The fusion results of the RHDN network were guided by mutual learning. From five performance indexes, especially SAM, RMSE, and ERGAS, we can see that the SSML framework was able to effectively improve the fusion effect when selecting the appropriate spectral and spatial network structure. Furthermore, the SSML framework only took a short time to upgrade the fusion results. Thus, the proposed SSML framework demonstrated generalization ability for HSI pansharpening.

Figure 9. Quality evaluation for the comparison of results before and after mutual learning. (a) PSNR; (b) CC; (c) SAM; (d) RMSE; (e) ERGAS.

4.7. Effect of Deep Network Parameter Number on SSML Performance

SSML aims to learn the same tasks from each other to achieve optimal results. In Table 9, the parameter number comparison of

S_{1}

and

S_{2}

in the SSML framework and the PanNet and DDLPS is given. Compared with the PanNet, the number of parameters of the SSML networks was greatly reduced; in particular, the parameter number of the SeAM was only one fifth that of the PanNet. Compared with the DDPLS, the parameter number of the SeAM was reduced by 24.8%, MSRNet by 28%, ResNet by 31%, and RCAM by 62.2%. These results indicate that SSML has better feature extraction capability and has fewer parameters under the same task.

Table 9. The number of parameters of different deep-learning networks.

5. Conclusions

This paper proposes an SSML framework integrating spectral-spatial information-mining for HSI pansharpening. In contrast to the existing CNN-based hyperspectral pansharpening framework, based on the DML strategy, we designed spectral and spatial networks for learning the spectral and spatial features. Furthermore, a set of mixed loss functions, based on a mutual learning strategy, is proposed for transfer of information for different features, which can extract features without introducing excessive computation through mutual learning. In experiments undertaken, several cases were examined to evaluate the effect of DML on the pansharpening result. The results demonstrated that introducing the DML strategy into the SSML framework was able to help achieve improved results in HSI pansharpening. The performance of the SSML framework was compared with several state-of-the-art methods; the results of the comparisons demonstrated the effectiveness and advantages of the proposed SSML framework. The latest fusion results were used to verify the generalization ability of the SSML framework, with improved results observed. Discussion of the feasibility of the hybrid loss function and the number of deep network parameters suggested that the proposed SSML framework represents a promising framework for HSI pansharpening.

In future, HSI pansharpening under the SSML framework will be explored further to identify improved spectral-spatial features for HSIs. A further research direction will involve the application of the DML strategy to other image-processing fields.

Author Contributions

X.P.: conceptualization, methodology, validation, writing—original draft; Y.F.: conceptualization, methodology, visualization, writing—original draft; S.P.: methodology, supervision, formal analysis; K.M.: conceptualization, writing—review and editing, supervision; L.L.: supervision, investigation; J.W.: supervision, writing—review. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China (62101446, 62006191), the Xi’an Key Laboratory of Intelligent Perception and Cultural Inheritance (No. 2019219614SYS011CG033), the Key Research and Development Program of Shaanxi (2021ZDLSF06-05, 2021ZDLGY15-04), and the Program for Changjiang Scholars and Innovative Research Team in University (No. IRT_17R87). Supported by the International Science and Technology Cooper-ation Research Plan in Shaanxi Province of China (No. 2022KW-08).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

HSIs	Hyperspectral images
PAN	Panchromatic
HR	High resolution
LR	Low resolution
CNN	Convolutional neural network
PNN	Pansharpening neural network
KL	Kullback–Leibler
DML	deep mutual learning strategy
CC	Correlation coefficient
PSNR	Peak signal-to-noise ratio
SAM	Spectral angle mapper
RMSE	Root mean squared error
ERGAS	Erreur relative globale adimensionnelle de synthèse
SSIM	Structural similarity index measurement

References

Camps-Valls, G.; Tuia, D.; Bruzzone, L.; Benediktsson, J.A. Advances in hyperspectral image classification: Earth monitoring with statistical learning methods. IEEE Signal Process. Mag. 2013, 31, 45–54. [Google Scholar] [CrossRef]
Wang, Z.; Zhu, R.; Fukui, K.; Xue, J.H. Matched shrunken cone detector (MSCD): Bayesian derivations and case studies for hyperspectral target detection. IEEE Trans. Image Process. 2017, 26, 5447–5461. [Google Scholar] [CrossRef] [PubMed]
Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep learning approaches applied to remote sensing datasets for road extraction: A state-of-the-art review. Remote Sens. 2020, 12, 1444. [Google Scholar] [CrossRef]
Aiazzi, B.; Baronti, S.; Selva, M. Improving component substitution pansharpening through multivariate regression of MS + Pan data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3230–3239. [Google Scholar] [CrossRef]
Garzelli, A.; Nencini, F.; Capobianco, L. Optimal MMSE pansharpening of very high resolution multispectral images. IEEE Trans. Geosci. Remote Sens. 2007, 46, 228–236. [Google Scholar] [CrossRef]
Yokoya, N.; Yairi, T.; Iwasaki, A. Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion. IEEE Trans. Geosci. Remote Sens. 2011, 50, 528–537. [Google Scholar] [CrossRef]
Wei, Q.; Dobigeon, N.; Tourneret, J.Y. Fast fusion of multi-band images based on solving a Sylvester equation. IEEE Trans. Image Process. 2015, 24, 4109–4121. [Google Scholar] [CrossRef]
Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A.; Selva, M. MTF-tailored multiscale fusion of high-resolution MS and Pan imagery. Photogramm. Eng. Remote. Sens. 2006, 72, 591–596. [Google Scholar] [CrossRef]
Liao, W.; Huang, X.; Van Coillie, F.; Gautama, S.; Pižurica, A.; Philips, W.; Liu, H.; Zhu, T.; Shimoni, M.; Moser, G.; et al. Processing of multiresolution thermal hyperspectral and digital color data: Outcome of the 2014 IEEE GRSS data fusion contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2984–2996. [Google Scholar] [CrossRef]
Cao, F.; Guo, W. Cascaded dual-scale crossover network for hyperspectral image classification. Knowl.-Based Syst. 2020, 189, 105122. [Google Scholar] [CrossRef]
Liu, L.; Wang, J.; Zhang, E.; Li, B.; Zhu, X.; Zhang, Y.; Peng, J. Shallow—Deep convolutional network and spectral-discrimination-based detail injection for multispectral imagery pan-sharpening. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1772–1783. [Google Scholar] [CrossRef]
Peng, J.; Liu, L.; Wang, J.; Zhang, E.; Zhu, X.; Zhang, Y.; Feng, J.; Jiao, L. PSMD-Net: A Novel Pan-Sharpening Method Based on a Multiscale Dense Network. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4957–4971. [Google Scholar] [CrossRef]
Tan, Y.; Xiong, S.; Li, Y. Automatic Extraction of Built-Up Areas From Panchromatic and Multispectral Remote Sensing Images Using Double-Stream Deep Convolutional Neural Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3988–4004. [Google Scholar] [CrossRef]
Tang, X.; Li, M.; Ma, J.; Zhang, X.; Liu, F.; Jiao, L. EMTCAL: Efficient Multi-Scale Transformer and Cross-Level Attention Learning for Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1. [Google Scholar]
Qu, J.; Xu, Z.; Dong, W.; Xiao, S.; Li, Y.; Du, Q. A Spatio-Spectral Fusion Method for Hyperspectral Images Using Residual Hyper-Dense Network. IEEE Trans. Neural Netw. Learn. Syst. 2022, PP, 1–15. [Google Scholar] [CrossRef]
Tang, X.; Zhang, H.; Mou, L.; Liu, F.; Zhang, X.; Zhu, X.; Jiao, L. An Unsupervised Remote Sensing Change Detection Method Based on Multiscale Graph Convolutional Network and Metric Learning. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5626915. [Google Scholar] [CrossRef]
Qu, J.; Shi, Y.; Xie, W.; Li, Y.; Wu, X.; Du, Q. MSSL: Hyperspectral and Panchromatic Images Fusion via Multiresolution Spatial-Spectral Feature Learning Networks. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 5504113. [Google Scholar] [CrossRef]
Yang, J.; Fu, X.; Hu, Y.; Huang, Y.; Ding, X.; Paisley, J. PanNet: A deep network architecture for pan-sharpening. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5449–5457. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Zhu, M.; Jiao, L.; Liu, F.; Yang, S.; Wang, J. Residual Spectral–Spatial Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote. Sens. 2020, 59, 449–462. [Google Scholar] [CrossRef]
Zhang, T.; Fu, Y.; Wang, L.; Huang, H. Hyperspectral image reconstruction using deep external and internal learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 8559–8568. [Google Scholar]
Zhang, E.; Zhang, X.; Yang, S.; Wang, S. Improving hyperspectral image classification using spectral information divergence. IEEE Geosci. Remote. Sens. Lett. 2013, 11, 249–253. [Google Scholar] [CrossRef]
Xie, W.; Cui, Y.; Li, Y.; Lei, J.; Du, Q.; Li, J. HPGAN: Hyperspectral pansharpening using 3-D generative adversarial networks. IEEE Trans. Geosci. Remote. Sens. 2020, 59, 463–477. [Google Scholar] [CrossRef]
Zhang, Y.; Xiang, T.; Hospedales, T.M.; Lu, H. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4320–4328. [Google Scholar]
Wu, R.; Feng, M.; Guan, W.; Wang, D.; Lu, H.; Ding, E. A mutual learning method for salient object detection with intertwined multi-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8150–8159. [Google Scholar]
Rajamanoharan, G.; Kanaci, A.; Li, M.; Gong, S. Multi-task mutual learning for vehicle re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Li, K.; Yu, L.; Wang, S.; Heng, P.A. Towards cross-modality medical image segmentation with online mutual knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 775–783. [Google Scholar]
Kullback, S.; Leibler, R. On information and sufficiency. Ann. Math. Stat. 2006, 22, 79–86. [Google Scholar] [CrossRef]
Yuhas, R.H.; Goetz, A.F.; Boardman, J.W. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In Proceedings of the Summaries 3rd Annual JPL Airborne Geoscience Workshop, Pasadena, CA, USA, 1–5 June 1992; Volume 1, pp. 147–149. [Google Scholar]
Wald, L.; Ranchin, T.; Mangolini, M. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogramm. Eng. Remote. Sens. 1997, 63, 691–699. [Google Scholar]
Ma, J.; Fan, X.; Yang, S.X.; Zhang, X.; Zhu, X. Contrast limited adaptive histogram equalization-based fusion in YIQ and HSI color spaces for underwater image enhancement. Int. J. Pattern Recognit. Artif. Intell. 2018, 32, 1854018. [Google Scholar] [CrossRef]
Zheng, Y.; Li, J.; Li, Y.; Cao, K.; Wang, K. Deep residual learning for boosting the accuracy of hyperspectral pansharpening. IEEE Geosci. Remote. Sens. Lett. 2019, 17, 1435–1439. [Google Scholar] [CrossRef]
Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 2016, 3, 47–57. [Google Scholar] [CrossRef]
Yasuma, F.; Mitsunaga, T.; Iso, D.; Nayar, S.K. Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum. IEEE Trans. Image Process. 2010, 19, 2241–2253. [Google Scholar] [CrossRef] [PubMed]
Fauvel, M.; Tarabalka, Y.; Benediktsson, J.A.; Chanussot, J.; Tilton, J.C. Advances in spectral-spatial classification of hyperspectral images. Proc. IEEE 2012, 101, 652–675. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Alparone, L.; Wald, L.; Chanussot, J.; Thomas, C.; Gamba, P.; Bruce, L.M. Comparison of pansharpening algorithms: Outcome of the 2006 GRS-S data-fusion contest. IEEE Trans. Geosci. Remote. Sens. 2007, 45, 3012–3021. [Google Scholar] [CrossRef]
Wald, L. Data Fusion: Definitions and Architectures: Fusion of Images of Different Spatial Resolutions; Presses des MINES: Paris, France, 2002. [Google Scholar]
Yang, Y.; Wan, W.; Huang, S.; Lin, P.; Que, Y. A novel pan-sharpening framework based on matting model and multiscale transform. Remote Sens. 2017, 9, 391. [Google Scholar] [CrossRef]
Li, K.; Xie, W.; Du, Q.; Li, Y. DDLPS: Detail-based deep Laplacian pansharpening for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8011–8025. [Google Scholar] [CrossRef]

Figure 1. The structure of the proposed SSML framework.

Figure 2. (a) The structure of the spectral network (

S_{1}

), (b) the structure of the spatial network (

S_{2}

).

Figure 3. The structures of the candidate networks. (a) RCAM; (b) SeAM; (c) traditional ResNet; (d) MSRNet.

Figure 4. (a) Loss function curves of the SeAM (

S_{1}

) of the SSML framework and the original SeAM during 5000 iterations of training; (b) the difference curve between the loss values of the original SeAM and the SeAM (

S_{1}

) in the SSML framework during 5000 training iterations.

Figure 5. The visual results of different methods on the CAVE dataset. (a) CNMF; (b) Bayesian naive; (c) GFPCA; (d) PanNet; (e) DDLPS; (f) Original RCAM; (g) Original MSRNet; (h)

S_{1}

(RCAM) in the SSML framework; (i)

S_{2}

(MSRNet) in the SSML framework; (j) Ground truth. Note that the false color image is selected for clear visualization (red: 30, green: 20, and blue: 10). The even rows show the difference maps of the corresponding methods.

Figure 6. The visual results of different methods on the CAVE dataset. (a) CNMF; (b) Bayesian naive; (c) GFPCA; (d) PanNet; (e) DDLPS; (f) Original RCAM; (g) Original ResNet; (h)

S_{1}

(SeAM) in the SSML framework; (i)

S_{2}

(ResNet) in the SSML framework; (j) Ground truth. Note that the false color image is selected for clear visualization (red: 30, green: 20, and blue: 10). The even rows show the difference maps of the corresponding methods.

Figure 7. The visual results of different methods on the CAVE dataset. (a) CNMF; (b) Bayesian naive; (c) GFPCA; (d) PanNet; (e) DDLPS; (f) Original SeAM; (g) Original ResNet; (h)

S_{1}

(RCAM) in the SSML framework; (i)

S_{2}

(ResNet) in the SSML framework; (j) Ground truth. Note that the false color image is selected for clear visualization (red: 30, green: 20, and blue: 10). The even rows show the difference maps of the corresponding methods.

Figure 8. The visual results of different methods on the Pavia Center dataset. (a) CNMF; (b) Bayesian naive; (c) GFPCA; (d) PanNet; (e) DDLPS; (f) Original RCAM; (g) Original MSRNet; (h)

S_{1}

(RCAM) in the SSML framework; (i)

S_{2}

(MSRNet) in the SSML framework; (j) Ground truth. Note that the false color image is selected for clear visualization (red: 70, green: 53, and blue: 19). The even rows show the difference maps of the corresponding methods.

Figure 9. Quality evaluation for the comparison of results before and after mutual learning. (a) PSNR; (b) CC; (c) SAM; (d) RMSE; (e) ERGAS.

Table 1. The specific parameter settings of the spatial network.

Spatial Network	Layer Number	Layer Type	Kernel Size
RCAM	1–2	Convolution, Relu	$3 \times 3 \times 64$
RCAM	3–4	AvgPooling, Convolution, Relu	$1 \times 1 \times 64$
SeAM	1–2	Convolution, Relu	$3 \times 3 \times 64$
	Branch 1: 3–4	AvgPooling, Convolution, Relu	$1 \times 1 \times 64$
	Branch 2: 3–4	MaxPooling, Convolution, Relu	$1 \times 1 \times 64$

Table 2. The specific parameter settings of the spectral network.

Spectral Network	Layer Number	Layer Type	Kernel Size
ResNet	1–2	Convolution, Relu	$3 \times 3 \times 64$
MSRNet	1	Convolution	$1 \times 1 \times 64$
	Branch 1: 2	Convolution, Relu	$3 \times 3 \times 64$
	Branch 2: 2–3	Convolution, Relu	$3 \times 3 \times 64$
	Branch 3: 2–4	Convolution, Relu	$3 \times 3 \times 64$
	Branch 4: 2–5	Convolution, Relu	$3 \times 3 \times 64$
	end	Convolution	$1 \times 1 \times 64$

Table 3. Comparison results of the SSML for different deep networks on the CAVE dataset.

Network		PSNR↑						SAM↓
Network		Original		SSML		Improve		Original		SSML		Improve
$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$
RCAM	MSRNet	36.639	36.508	36.654	36.701	0.015	0.193	3.506	3.836	3.472	3.511	0.034	0.325
RCAM	ResNet	36.639	36.148	36.694	36.392	0.055	0.244	3.506	4.191	3.497	4.113	0.009	0.078
SeAM	MSRNet	36.177	36.508	36.330	36.683	0.153	0.175	3.674	3.836	3.667	3.765	0.007	0.071
SeAM	ResNet	36.177	36.148	36.420	36.469	0.243	0.321	3.674	4.191	3.545	4.140	0.129	0.051

Bold indicate the SSML results are better than the original.

Table 4. Comparison results of the SSML for different deep networks on the Pavia Center dataset.

Network		PSNR↑						SAM↓
Network		Original		SSML		Improve		Original		SSML		Improve
$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$	$S_{1}$	$S_{2}$
RCAM	MSRNet	30.940	30.887	33.793	32.384	2.853	1.497	5.473	5.551	5.424	5.540	0.049	0.011
RCAM	ResNet	30.940	31.742	35.179	33.716	4.239	1.974	5.473	5.227	5.707	5.222	−0.234	0.005
SeAM	MSRNet	28.319	30.887	28.359	33.904	0.040	3.017	6.998	5.551	7.206	5.434	−0.208	0.007
SeAM	ResNet	28.319	31.742	28.529	34.235	0.210	2.493	6.998	5.227	6.909	5.220	0.089	0.007

Bold indicate the SSML results are better than the original.

Table 5. Experimental results of different proportions of training samples.

Training Set:Test Set		PSNR↑	CC↑	SAM↓	ERGAS↓	RMSE↓
50%:50%	$S_{1}$	34.3861 ± 0.3176	0.9922 ± 0.0004	5.5889 ± 0.2019	3.3808 ± 0.1424	0.0199 ± 0.0009
50%:50%	$S_{2}$	35.3559 ± 0.3928	0.9968 ± 0.0006	4.7093 ± 0.2541	2.7475 ± 0.1562	0.0158 ± 0.0011
60%:40%	$S_{1}$	36.7423 ± 0.3694	0.9956 ± 0.0002	3.4023 ± 0.1965	2.6823 ± 0.1253	0.0154 ± 0.0002
60%:40%	$S_{2}$	36.7125 ± 0.4265	0.9956 ± 0.0002	3.5632 ± 0.2305	2.5883 ± 0.1425	0.0150 ± 0.0004
70%:30%	$S_{1}$	35.9981 ± 0.3862	0.9951 ± 0.0006	3.8964 ± 0.1863	2.7958 ± 0.1321	0.0160 ± 0.0008
70%:30%	$S_{2}$	36.3145 ± 0.4312	0.9969 ± 0.0005	3.9567 ± 0.2131	2.9658 ± 0.1513	0.0158 ± 0.0007

Bold and underlined indicate the best results for

S_{1}

and

S_{2}

, respectively.

Table 6. The quality indicator results of different methods on the CAVE data set.

Method	PSNR↑	CC↑	SAM↓	ERGAS↓	RMSE↓	Test Time(s)
CNMF	35.9016	0.9871	7.4917	3.9066	0.0254	6.5246
Bayesian naive	34.1978	0.9921	3.5855	3.4395	0.0201	1.2290
GFPCA	35.5430	0.9946	4.1396	2.9139	0.0171	2.0399
PanNet	35.1069 ± 1.0236 0.9931 ± 0.0028	3.4659 ± 0.5623 2.9707 ± 0.3185	0.0172 ± 0.0023 8.0399
DDLPS	35.9246 ± 0.8962 0.9931 ± 0.0023	3.6725 ± 0.6543 2.7236 ± 0.4362	0.0158 ± 0.0019 68.2050
Original RCAM	36.5925 ± 0.7369 0.9953 ± 0.0008	3.4926 ± 0.1255 2.5501 ± 0.2003	0.0149 ± 0.0008 11.5247
Original MSRNet	36.5729 ± 0.5235 0.9954 ± 0.0003	3.7962 ± 0.1644 2.6032 ± 0.1456	0.0152 ± 0.0003 10.9655
$S_{1}$ (RCAM) in SSML	36.7423 ± 0.3694 0.9956 ± 0.0002	3.4023 ± 0.1965 2.6823 ± 0.1253	0.0154 ± 0.0002 11.8631
$S_{2}$ (MSRNet) in SSML	36.7125 ± 0.42650.9956 ± 0.0002	3.5632 ± 0.23052.5883 ± 0.1425	0.0150 ± 0.0004 11.1993

Bold and underlined indicate the best results for

S_{1}

and

S_{2}

, respectively.

Table 7. The quality indicator results of different methods on the Pavia Center data set.

Method	PSNR↑	CC↑	SAM↓	ERGAS↓	RMSE↓	Test Time(s)
CNMF	28.5311	0.9201	8.1422	7.2041	0.0390	23.5246
Bayesian naive	24.5955	0.9043	6.5851	7.5300	0.0411	14.1390
GFPCA	28.2160	0.9069	6.5825	7.4378	0.0405	11.3717
PanNet	23.8625 ± 1.3205 0.9286 ± 0.0195	15.1135 ± 1.9656 20.032 ± 2.3641	0.0678 ± 0.0095 23.0399
DDLPS	28.9523 ± 0.6854 0.9120 ± 0.0126	6.6524 ± 0.6528 8.2153 ± 0.6529	0.0501 ± 0.0125 732.4960
Original RCAM	31.0258 ± 1.0265 0.9452 ± 0.0121	5.4468 ± 0.3254 5.4021 ± 0.1965	0.0297 ± 0.0007 27.1002
Original MSRNet	30.8825 ± 0.5214 0.9465 ± 0.0120	5.4729 ± 0.2145 5.5025 ± 0.1524	0.0295 ± 0.0006 26.0266
$S_{1}$ (RCAM) in SSML	33.6399 ± 0.6523 0.9501 ± 0.0251	5.3911 ± 0.2545 4.5658 ± 0.3211	0.0294 ± 0.0006 26.7003
$S_{2}$ (MSRNet) in SSML	32.4521 ± 0.45120.9481 ± 0.0144	5.4054 ± 0.1451 4.6251 ± 0.2254	0.0294 ± 0.0003 27.0465

Bold and underlined indicate the best results for

S_{1}

and

S_{2}

, respectively.

Table 8. Effects of different loss functions.

Loss Function		$L_{1}$	$L_{1}$ + SAM	$L_{1}$ + MSE	Our
PSNR	$S_{1}$	36.18	34.553	36.333	36.330
PSNR	$S_{2}$	36.51	36.670	36.627	36.683
SAM	$S_{1}$	3.674	6.9264	3.6413	3.6673
SAM	$S_{2}$	3.836	3.9149	3.8121	3.7653
ERGAS	$S_{1}$	2.741	3.2144	2.6957	2.6643
ERGAS	$S_{2}$	2.651	2.5891	2.5989	2.5910

Bold indicate the best value.

Table 9. The number of parameters of different deep-learning networks.

Other		SSML
Other		$S_{1}$		$S_{2}$
PanNet	DDLPS	RCAM	SeAM	ResNet	MSRNet
3239 k	812 k	307 k	611k	560 k	585 k

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

SSML: Spectral-Spatial Mutual-Learning-Based Framework for Hyperspectral Pansharpening

Abstract

1. Introduction

2. Related Work

3. Method

3.1. Image Preprocessing

3.2. SSML Framework

3.3. Hybrid Loss Function

4. Results

4.1. Datasets and Metrics

4.2. DML Strategy Validation for Different Cases

4.3. Effect of the Number of Training Samples

4.4. Comparisons with Advanced Methods

4.4.1. Results on CAVE Data Set

4.4.2. Results on Pavia Center Dataset

4.5. Hybrid Loss Function Analysis

4.6. Generalization Ability of SSML

4.7. Effect of Deep Network Parameter Number on SSML Performance

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics