Multispectral Transforms Using Convolution Neural Networks for Remote Sensing Multispectral Image Compression

A multispectral image is a three-order tensor since it is a three-dimensional matrix, i.e., one spectral dimension and two spatial position dimensions. Multispectral image compression can be achieved by means of the advantages of tensor decomposition (TD), such as Nonnegative Tucker Decomposition (NTD). Unfortunately, the TD suffers from high calculation complexity and cannot be used in the on-board low-complexity case (e.g., multispectral cameras) that the hardware resources and power are limited. Here, we propose a low-complexity compression approach for multispectral images based on convolution neural networks (CNNs) with NTD. We construct a new spectral transform using CNNs, where the CNNs are able to transform the three-dimension spectral tensor from large-scale to a small-scale version. The NTD resources only allocate the small-scale three-dimension tensor to improve calculation efficiency. We obtain the optimized small-scale spectral tensor by the minimization of original and reconstructed three-dimension spectral tensor in self-learning CNNs. Then, the NTD is applied to the optimized three-dimension spectral tensor in the DCT domain to obtain the high compression performance. We experimentally confirmed the proposed method on multispectral images. Compared to the case that the new spectral tensor transform with CNNs is not applied to the original three-dimension spectral tensor at the same compression bit-rates, the reconstructed image quality could be improved. Compared with the full NTD-based method, the computation efficiency was obviously improved with only a small sacrifices of PSNR without affecting the quality of images.


Introduction
Multispectral images are acquired by a multispectral camera that is able to collect the reflected, emitted, or backscattered energy from an object or scene in multiple bands of the electromagnetic spectrum [1][2][3][4][5].Multispectral images are considered the holy grail of observation tools in Earth observation because they can provide both spectral and spatial information of an object or scene.A multi-spectral camera is usually used as a payload of a satellite in multiple applications, such as monitor environments, surveying minerals, military targets, and so on [6][7][8][9].Unfortunately, the information content of a multispectral image with large spatial and spectral information (high spectral and spatial resolution) [10,11] is much greater than the tolerance capabilities of current on-orbit available memory [12] and image-transmission downlink bandwidth of satellites.Therefore, it is helpful to compress remote sensing multispectral images without compromising quality.
Since multispectral images are three-dimensional, multispectral image compression fundamentally removes spatial and spectral redundancies, which is slightly different from suffers high calculation complexity (see Table 1) and conflicts with the low-complexity requirements of on-board multispectral cameras.
In our previous works, we investigated the tensor decomposition compression methods to achieve low complexity.In [40], Tucker decomposition is applied to multispectral images with comparatively few bands in the post-transform domain.In [41], an accelerated nonnegative tensor decomposition (i.e., a pair-wise multilevel grouping approach) for hyperspectral image compressions was developed.This method is able to reduce the calculation complexity through the slight sacrifice of the compression performance.
In this work, we propose a new spectral tensor transform using convolutional neural networks (CNNs) to reduce the computational complexity of a large-scale tensor decomposition.We use the constructed spectral tensor transform in conjunction with nonnegative tensor decomposition (NTD) in the DCT domain to compress multispectral images, which is able to reduce the total resources utilization of tensor decomposition.This method focuses on transforming large-scale spectral tensors into the small-scale version.NTD resources are only allocated to small-scale spectral tensor in the DCT domain; thus, the low computation complexity is achieved.By means of exploiting the compact spectral information hidden in multispectral image tensor, this method uses the relatively small total NTD resources are, which can achieve efficient calculations.
The remainder of the paper is organized as follows.Section 2 presents the proposed compression framework, including the compression principle of NTD, the spectral transform of the CNNs, and the compression scheme.Experimental results are reported in Section 3. In Section 4, we conclude this paper.

Materials and Methods
In Section 2.1, we present the principle of multispectral compression based on NTD, followed in Section 2.2 by the proposed spectral transform of the CNNs.Finally, Section 2.3 demonstrates the proposed compression scheme using the CNNs.

NTD for Multispectral Compression
A multispectral image is a 3D matrix and fundamentally includes both spatial redundancy and spectral redundancy.The multispectral compression aims at removing both redundancies.Figure 1 shows the general principle for multispectral compression using NTD approaches.The NTD-based compression method involves three steps.In the first step, a two-dimensional (2D) transform (e.g., 2D-DWT and 2D-DCT) is utilized to remove spatial correlation in intra-band.In the second step, all transformed bands form a three-dimensional tensor.After that, a Tucker Decomposition is applied to the three-dimensional tensor to generate a core tensor and several factor matrixes.In the last step, an adaptive arithmetic encoder is used to process the core tensor and factor matrixes.  1) and conflicts with the low-complexity requirements of on-board multispectral cameras.
In our previous works, we investigated the tensor decomposition compression methods to achieve low complexity.In [39], Tucker decomposition is applied to multispectral images with comparatively few bands in the post-transform domain.In [40], an accelerated nonnegative tensor decomposition (i.e., a pair-wise multilevel grouping approach) for hyperspectral image compressions was developed.This method is able to reduce the calculation complexity through the slight sacrifice of the compression performance.
In this work, we propose a new spectral tensor transform using convolutional neural networks (CNNs) to reduce the computational complexity of a large-scale tensor decomposition.We use the constructed spectral tensor transform in conjunction with nonnegative tensor decomposition (NTD) in the DCT domain to compress multispectral images, which is able to reduce the total resources utilization of tensor decomposition.This method focuses on transforming large-scale spectral tensors into the small-scale version.NTD resources are only allocated to small-scale spectral tensor in the DCT domain; thus, the low computation complexity is achieved.By means of exploiting the compact spectral information hidden in multispectral image tensor, this method uses the relatively small total NTD resources are, which can achieve efficient calculations.
The remainder of the paper is organized as follows.Section 2 presents the proposed compression framework, including the compression principle of NTD, the spectral transform of the CNNs, and the compression scheme.Experimental results are reported in Section 3. In Section 4, we conclude this paper.

Materials and Methods
In Section 2.1, we present the principle of multispectral compression based on NTD, followed in Section 2.2 by the proposed spectral transform of the CNNs.Finally, Section 2.3 demonstrates the proposed compression scheme using the CNNs.

NTD for Multispectral Compression
A multispectral image is a 3D matrix and fundamentally includes both spatial redundancy and spectral redundancy.The multispectral compression aims at removing both redundancies.Figure 1 shows the general principle for multispectral compression using NTD approaches.The NTD-based compression method involves three steps.In the first step, a two-dimensional (2D) transform (e.g., 2D-DWT and 2D-DCT) is utilized to remove spatial correlation in intra-band.In the second step, all transformed bands form a three-dimensional tensor.After that, a Tucker Decomposition is applied to the three-dimensional tensor to generate a core tensor and several factor matrixes.In the last step, an adaptive arithmetic encoder is used to process the core tensor and factor matrixes.A tensor is a multi-way array or multi-dimensional matrix.We use Y ∈ R I 1 ×I 2 ×I 3 to express a three-dimensional tensor, where each element is a transformed coefficient.Here, I 1 × I 2 × I 3 is the dimension of the tensor Y.The decomposition of a three-dimensional tensor is that a given tensor is decomposed into three component (or factor) matrixes.We provide the important notations shown in Table 2 to demonstrate the principle of the three-dimensional tensor decomposition.When a Tucker decomposition (also called the best rank approximation) is applied to the three-dimensional tensor, a low-dimension core tensor and several factor matrixes are generated [42].Here, the low-dimension core tensor is denoted by G ∈ R J 1 ×J 2 ×J 3 (J 1 × J 2 × J 3 is the dimension of the tensor G).Factor matrixes are denoted by 3).Given a tensor Y, a core tensor G and three-component matrixes A (n) are found, which perform the following approximate decomposition: where E is an estimation error of tensor Y in the decomposition process, and Ŷ is an equivalent tensor (i.e., approximately evaluated tensor) using the generated core tensor and factor matrixes.
Figure 2 shows the graphic illustration of the Tucker Decomposition of Equation ( 1) on the three-dimensional tensor.A tensor is a multi-way array or multi-dimensional matrix.We use to express a three-dimensional tensor, where each element is a transformed coefficient.Here, 1 2 3

I I I
× × is the dimension of the tensor Y .The decomposition of a three-dimensional tensor is that a given tensor is decomposed into three component (or factor) matrixes.We provide the important notations shown in Table 2 to demonstrate the principle of the three-dimensional tensor decomposition.When a Tucker decomposition (also called the best rank approximation) is applied to the three-dimensional tensor, a low-dimension core tensor and several factor matrixes are generated [41].Here, the low-dimension core tensor is denoted by Factor matrixes are denoted by [ , ,..., ] ( 1,2,3) . Given a tensor Y , a core tensor G and three-component matrixes are found, which perform the following approximate decomposition: (3) where E is an estimation error of tensor Y in the decomposition process, and Ŷ is an equivalent tensor (i.e., approximately evaluated tensor) using the generated core tensor and factor matrixes. Figure 2 shows the graphic illustration of the Tucker Decomposition of Equation ( 1) on the three-dimensional tensor.

I I I × ×
(1) A graphic representation of Tucker Decomposition on a three-dimensional tensor.
To achieve Equation (1), the Tucker decomposition is a search process for the optimal core tensor and factor matrixes to make the minimized residual error tensor.To achieve Equation (1), the Tucker decomposition is a search process for the optimal core tensor and factor matrixes to make the minimized residual error tensor.
The Tucker decomposition is able to achieve the compression results because original high dimension tensor is able to become into a lower dimension core tensor.The tensor decomposition is performed on the whole multispectral image, which means it can simultaneously remove the spectral and residual spatial redundancies.Here, we use a fast algorithm based on HALS-NTD [43,44] to implement the Tucker decomposition.The important notations of the decomposition algorithm with HALS-NTD are in Table 3. Algorithm 1 shows the HALS-NTD to process a three-order tensor.In Algorithm 1, Step 11 and Step 16 are the learning rules for NTD factors A and core tensor G.The detailed explanations of the learning rules can be found in [43,44].

Algorithm 1 [43,44]:
1: Input: a given tensor Y, its size is The core tensor size is 3: Begin 4: Initializing G and all A (n) .

Hadamard product Element-wise division ⊗
Kronecker product p-norm (length) of the vector x, where p = 1, 2

Proposed Multispectral Tensor with CNNs
Here, we propose an efficient spectral tensor transform method using convolution neural networks to compress remote sensing multispectral image tensors.Different from the traditional NTD method directly applied to a large-scale original multispectral tensor representation, the proposed concept is that the use of a learning network transforms the original large-scale spectral tensor into the small-scale version.Then, the NTD is applied to the small-scale spectral tensor in the DCT domain.Thus, the total NTD resources are relatively small, which can achieve efficient calculation.Considering that the transform is an unsupervised learning task, we construct a self-learning network using two convolution neural networks.Figure 3 shows the spectral transform schematic of convolution neural networks.

Proposed Multispectral Tensor with CNNs
Here, we propose an efficient spectral tensor transform method using convolution neural networks to compress remote sensing multispectral image tensors.Different from the traditional NTD method directly applied to a large-scale original multispectral tensor representation, the proposed concept is that the use of a learning network transforms the original large-scale spectral tensor into the small-scale version.Then, the NTD is applied to the small-scale spectral tensor in the DCT domain.Thus, the total NTD resources are relatively small, which can achieve efficient calculation.Considering that the transform is an unsupervised learning task, we construct a self-learning network using two convolution neural networks.Figure 3 shows the spectral transform schematic of convolution neural networks.Here, the CNN in the forward channel is called the forward CNN.Accordingly, the CNN in the backward channel is called the backward CNN.We use  Here, the CNN in the forward channel is called the forward CNN.Accordingly, the CNN in the backward channel is called the backward CNN.We use X ∈ R N to express a multispectral three-dimension tensor, which is the input of the forward CNN.The first step of the CNN is a convolution operation between X and a filter with m 1 learned filters having the length of n 0 .The convolution can be modeled using a matrix multiplication as where W 1 is the convolution filter of the first layer and X 1 is the created feature map.In the second step, a nonlinear function is applied to the created feature map to obtain the first layer CNN feature denoted by Z 1 , which can be expressed as: where Ψ can be implemented using a Rectifier Linear Unit (ReLU), Ψ(t) = max(t, 0).The second layer can be obtained as the same method as: where W 2 is the convolution filter of the first layer.Using the same principle, other layers can be obtained.The nth layer can be expressed as: We use the C 1 (•) to express the whole operation model of the forward CNN.We use V to express the resultant representation through the CNN.Equation ( 6) of the forward CNN can be expressed as: where W is the filter parameter of the forward CNN.The DCT transform is applied to the resultant representation from the forward CNN, where the calculation model can be expressed as where DT(•) express the DCT transform, B is the DCT basis.
To obtain the optimized features, i.e., V * , we add the back-forward channel constituting an inverse DCT transform and a CNN.Moreover, the DCT transform basis is an orthogonal basis, which means B −1 = B T .The backward CNN can be expressed by X = C 2 (U, V), where U is the filter parameter of the CNN and C 2 (•) is the calculation function of the CNN.V is the inverse DCT transform signal, V = DT −1 (F).The back-forward calculation model can be expressed as where C 2 (•) is the calculation model of the backward CNN in the back-forward channel.To obtain the best wavelet representation (i.e., V * ), we minimize the error between the input X signal and the reconstructed X to perform the learning goal as To fast solve Equation ( 10), we use an alternate iteration concept, which can be divided into two steps: Step 1: Step 2: where λ 1 and λ 2 are the regularization parameters.To efficiently calculate Ŵ and Û, here, we use Equations ( 13) and ( 14) to replace Equations ( 11) and ( 12) by adding a regularization term as: Step 1: Step 2: where β 1 and β 2 are the regularization parameters.The detailed derivation procedure from Equations ( 11) and (12) to Equations ( 13) and ( 14) can be seen in Appendix A. The two-step learning algorithm of the multispectral transform can be summarized, as given below, which is used to obtain the best small-scale spectral tensor and CNN parameters.By the new spectral tensor transform with two CNNs, the best small-scale spectral tensor can be obtained.The best small-scale spectral tensor is a small-scale version of the original large-scale spectral tensor, which is able to preserve structural information of the original spectral tensor, has a small size and therefore is able to reduce the utilization of the total NTD resources.12: Until iteration condition is met, here, the iteration condition is k = K. 13: Return: V * = Vk , U * = Ûk , Ŵ * = Ŵk

Proposed Multispectral Compression Scheme with CNNs
We use CNNs and NTD to construct an image compressor to obtain both low-complexity and high compression performance.The overall architecture of the proposed compression approach is shown in Figure 4.The overall architecture of the whole algorithm includes three parts: an encoding CNN channel, a decoding CNN channel, and an NTD channel.The encoding CNN channel and decoding CNN channel form a closed-loop link to train the filter parameters of two CNNs.First, multispectral training images are input to the closed-loop link.The two-step algorithm (i.e., Algorithm 2) is performed to obtain the optimized filter parameters of two networks.Second, a multispectral image passes through the encoding CNN channel, NTD channel, and Entropy coding unit to complete the compression task.In the encoding CNN channel, a CNN firstly transforms the large-scale image representation into small-scale features (i.e., compact image tensors).After that, a 3D-DCT is applied to the small-scale image tensors to obtain the DCT tensors.The NTD channel performs the NTD process and completes the final encoding in connection with entropy encoders.When the decompression task is required, the compressed bit-streams can be used to reconstruct the original multispectral images via the decoding CNN channel.The decoding CNN channel has two functions.The first is that the use of another CNN helps the encoding channel's CNN to obtain the best small-scale wavelet representation.The second is that it is used to decode compressed bit-streams when a reconstruction task is required.When the decompression task is required, the compressed bit-streams can be used to reconstruct the original multispectral images via the decoding CNN channel.The decoding CNN channel has two functions.The first is that the use of another CNN helps the encoding channel's CNN to obtain the best small-scale wavelet representation.The second is that it is used to decode compressed bit-streams when a reconstruction task is required., the first CNN proceeds by minimizing the mean squared error (MSE) between the reconstructed spectral image via the second CNN and original spectral image as where is the learnable parameter of the first CNN, Û is the learnable parameter of the second CNN, X is the original image representation, and S M is the number of samples in the training set S. In Equation ( 15), Û and X are input parameters.Given the training data S of input-target pairs {X(m), Û}, the first CNN proceeds by minimizing the mean squared error (MSE) between the reconstructed spectral image via the second CNN and original spectral image as where X(m) = C 2 ( Û, C 1 (W, X(m))), W is the learnable parameter of the first CNN, Û is the learnable parameter of the second CNN, X is the original image representation, and M S is the number of samples in the training set S. In Equation ( 15), Û and X are input parameters.Accordingly, the loss function of the second CNN can be expressed as: where V is the small-scale feature from the first CNN.The best small-scale image representation is obtained by the error minimization between the original and reconstructed image representations in the two CNNs.Then, the small-scale image representation is also further transformed to small-scale DCT tensor using the 3D-DCT.This is done to remove the spatial and spectral correlation.The small-scale DCT tensor is decomposed by the NTD method.In the DCT domain, the NTD is applied to remove residual spatial correlation and spectral correlation.Finally, an entropy encoding completes the final encoding task.The learned convolution filter from two CNN can be used as the side information of the compressor.
In the NTD channel, we use a multilevel decomposition (MD) method to implement the fast NTD method.Here, we take the band number of 32 as an example to demonstrate the MD principle.Figure 5 shows the MD principle when the band number is 32.The process level number is 3.In the first level, there are eight sub-tensors.Each sub-tensor is processed by TD to produce core tensor and three-factor matrixes.In the second level, every four core tensors produced in the first level are considered as a new tensor.There are the regrouped tensors.Each tensor is processed by TD.In the third level, the two core tensors produced in the first level are considered as a new tensor, which is processed by TD.Finally, one core tensor and several factor matrixes are produced.entropy encoding completes the final encoding task.The learned convolution filter from two CNN can be used as the side information of the compressor.
In the NTD channel, we use a multilevel decomposition (MD) method to implement the fast NTD method.Here, we take the band number of 32 as an example to demonstrate the MD principle.Figure 5 shows the MD principle when the band number is 32.The process level number is 3.In the first level, there are eight sub-tensors.Each sub-tensor is processed by TD to produce core tensor and three-factor matrixes.In the second level, every four core tensors produced in the first level are considered as a new tensor.There are the regrouped tensors.Each tensor is processed by TD.In the third level, the two core tensors produced in the first level are considered as a new tensor, which is processed by TD.Finally, one core tensor and several factor matrixes are produced.
in the ith level, and , ( ) in next level.At each level, the resulted core tensor operates has the same the size of 1, 1, We use F = f η,ρ to express a spectral group in multilevel TD structure.Here, η is the decomposition level, and ρ is the tensor order of in F. In F, the jth tensor in the level i is denoted as f i,j .Let S( f i,j ) be the size of the current tensor f i,j .Let Th( f i,j ) be the size of core tensor decomposed in the current level.S( f i,j ) is equal to J 1,i,j × J 2,i,j × J 3,i,j in the ith level, and Th( f i,j ) is the J 1,i+1,j × J 2,i+1,j × J 3,i+1,j in next level.At each level, the resulted core tensor operates has the same the size of J 1,i+1,j × J 2,i+1,j × J 3,i+1,j , j = 1, 2, . . ., J. The size J 1,i+1,j × J 2,i+1,j × J 3,i+1,j of core tensor in each level could be determined by the compression bit-rate.In all levels of TD, the different level has different the size of the core tensor.Let G( f i,j ) express the resulted core tensor f i,j .The adjacent tensor relationship is as follows: We allocate bit-rates to f i,j in F based on a l 1 norm approach.In the first level, the jth tensor allocated bit-rates can be expressed as: where T toall is total bit-rates.The levels use the same way.

Results
To evaluate the compression performance of the proposed algorithm with the spectral transform using CNNs, we used MATLAB to perform the experiments on a personal computer (PC).The working parameters of the experimental PC are 3.6 GHz of CPU and 4 GB of memory.We tested our CNN-based NTD method on 50 multispectral images that consist of a variety of buildings, cities, and mountains.All 50 multispectral images were compressed by conforming to the following processing.First, each band of the original was 1024 × 1024 pixels.Second, the CNN-based spectral transform method was applied to the original spectral image to obtain the small-scale spectral tensor.Third, small-scale spectral tensor performed a DCT transform to remove spatial correlations.In the DCT domain, NTD was performed to remove the residual spectral and spatial correlations.Figure 6 shows the reconstructed multispectral image, where the bit-rates were set from 0.25 bpp to 2.0 bpp. Figure 7 shows the zoomed area of the fourth band, where the area was 100 × 100 pixels.With the increase of the bit-rates, the reconstruction quality gradually increased.From the subjective aspect, the reconstructed bands at the bit-rate of 2.0 bpp were very close to the original bands.As shown in the reconstructed images, the reconstructed images had better quality at the bit-rate of 0.25 bpp because our algorithm had better rate-distortion performance.
We objectively evaluated the proposed method by means of peak-signal-noise-ratio (PSNR), mean structural similarity (MSSIM), and visual information fidelity (VIF).In [36], the TD in the wavelet domain has the best compression performance, achieving a higher PSNR than PCA+jpeg2000 and 3D SPECK.We also used the conventional NTD to process all DCT coefficient blocks, where NTD resources were equally allocated to the large-scale DCT coefficient blocks.In addition, we used DWT+BPE as a reference method to process all wavelet coefficient blocks.The bit-rate was set to 0.5-3 bpp. Figure 8 show the test PSNR, MSSIM, and VIF using different methods.
The proposed algorithm on the experimental multispectral images was compared with PCA, SPIHT+2D-DWT with KLT [45], SPECK+2D-DWT [46] with KLT, and POT [47].The average PSNR was considered as the PSNR of the corresponding method.The PSNR comparisons are demonstrated in Table 4. Due to the full usage of CNN and NTD in the DCT domain, the proposed compression achieved a good compression performance.The proposed method could improve PSNR by 0.46-1.63dB against SPIHT+2D-DWT with KLT at 2-0.25 bpp.

Results
To evaluate the compression performance of the proposed algorithm with the spectral transform using CNNs, we used MATLAB to perform the experiments on a personal computer (PC).The working parameters of the experimental PC are 3.6 GHz of CPU and 4 GB of memory.We tested our CNN-based NTD method on 50 multispectral images that consist of a variety of buildings, cities, and mountains.All 50 multispectral images were compressed by conforming to the following processing.First, each band of the original was 1024 × 1024 pixels.Second, the CNN-based spectral transform method was applied to the original spectral image to obtain the small-scale spectral tensor.Third, small-scale spectral tensor performed a DCT transform to remove spatial correlations.In the DCT domain, NTD was performed to remove the residual spectral and spatial correlations.Figure 6 shows the reconstructed multispectral image, where the bit-rates were set from 0.25 bpp to 2.0 bpp. Figure 7 shows the zoomed area of the fourth band, where the area was 100 × 100 pixels.With the increase of the bit-rates, the reconstruction quality gradually increased.From the subjective aspect, the reconstructed bands at the bit-rate of 2.0 bpp were very close to the original bands.As shown in the reconstructed images, the reconstructed images had better quality at the bit-rate of 0.25 bpp because our algorithm had better rate-distortion performance.We objectively evaluated the proposed method by means of peak-signal-noise-ratio (PSNR), mean structural similarity (MSSIM), and visual information fidelity (VIF).In [35], the TD in the wavelet domain has the best compression performance, achieving a higher PSNR than PCA+jpeg2000 and 3D SPECK.We also used the conventional NTD to process all DCT coefficient blocks, where NTD resources were equally allocated to the large-scale DCT coefficient blocks.In addition, we used DWT+BPE as a reference method to process all wavelet coefficient blocks.The bit-rate was set to 0.5-3 bpp. Figure 8 show the test PSNR, MSSIM, and VIF using different methods.Finally, we tested the compression time of the two methods.Figure 9 shows results of the comparison of the processing time of our algorithm and conventional NTD approaches.As shown in Figure 9, the proposed method was faster than conventional NTD method at different bit-rates.Figure 10 shows the comparison of compression performance and compression time at different bit-rates.As shown in Figure 10, the proposed method had a small sacrificing PSNR compared with conventional NTD method at different bit-rates.Table 5 shows the average improvement of the compression time and decrease of the compression performance.The computation efficiency of our compression approach improved 49.66% compared to traditional approaches while only sacrificing 0.3369 dB.The small sacrificing PSNR does not affect the quality of the images.These results indicate that our algorithm has low-complexity and high performance.This method has the possibility of spatial application.Figure 10 shows the comparison of compression performance and compression time at different bit-rates.As shown in Figure 10, the proposed method had a small sacrificing PSNR compared with conventional NTD method at different bit-rates.Table 5 shows the average improvement of the compression time and decrease of the compression performance.The computation efficiency of our compression approach improved 49.66% compared to traditional approaches while only sacrificing 0.3369 dB.The small sacrificing PSNR does not affect the quality of the images.These results indicate that our algorithm has low-complexity and high performance.This method has the possibility of spatial application.

Discussion
We discuss the proposed method from the following aspects, which is shown Table 6.As shown in Figures 8-10, the compression performance of the proposed method was slightly lower than the traditional NTD.However, the proposed method improved the computation efficiency by 49.66%.In the traditional NTD method, the whole spectral tensor is directly decomposed by the NTD algorithm in the DCT domain.In the proposed method, the whole spectral tensor is processed from two compression steps.The first compression step is that the CNNs transforms a large-scale spectral tensor into a small-scale spectral tensor.The second compression step is that the NTD is applied to the small-scale spectral tensor.These two steps achieve a high compression performance.The sacrificing PSNR is firstly from the first step because the NTD and inverse NTD do not include the self-learning network.This is done because we consider the high calculation complexity of the NTD and inverse NTD.The sacrificing PSNR is also from the multi-level NTD.We also tested the compression performance of the CNNs with NTD and inverse NTD.We found that the compression performance was basically the same as the traditional NTD method.However, the calculation complexity was very high.Our current method performed a tradeoff between compression performance and complexity.As shown in Figures 9 and 10, the proposed method could improve the computation efficiency by 49.66% compared with the traditional NTD method.The calculation efficiency improvement was from two stages.The first stage is that we use the CNN to obtain a compact spectral tensor, which is actually a compression process.For the traditional NTD method, if it also obtains the same compact spectral tensor, a lot of calculation resources need to be used.The second stage is that we use a fast NTD method based on multilevel decomposition technologies.This stage is able to improve the decomposition speed of the three-order tensor.In the future, we will use hardware platforms, such as FPGA, DSP, and GPU, to implement the compression algorithm, where we will analyze the hardware resource utilization.
In the CNNs, the DCT and entropy encoder is used because our compression algorithm uses the DCT as the sparse representation tool and the entropy encoder as the entropy coding.In the self-learning network, we also use Hardmard Transform (HT) to replace the DCT.After obtaining the small-scale spectral tensors, we still use the DCT and NTD to remove the spatial and spectral correlations.We also multispectral images to perform the experiments when the different learning networks are used.Figure 11 shows the comparison of two different learning networks.The experimental results show the use of the HT-based CNNs obtained a lower PSNR (i.e., lower compression performance) than the proposed method.Thus, the sparse represent method and entropy encoder in the convolution neural learning network need to keep the same to the latter stage of the compression methods.

Conclusion
In this paper, we propose a low-complexity but efficient compression approach based on the convolution neural networks in conjunction with NTD for multispectral images.First, we use two convolution neural networks to construct a new spectral transform.The new spectral transform can transform a large-scale three-dimension spectral tensor into a small-scale three-dimensional spectral tensor.We obtain the optimized small-scale spectral tensor by the minimization of original and reconstructed three-order spectral tensor in CNNs.The new spectral transform is able to remove both spatial and spectral correlations.Second, the NTD is only applied to the small-scale three-dimensional spectral tensor in the DCT domain to improve the calculation efficiency.The NTD and DCT are used to remove residual spatial correlation and spectral correlation.Finally, the resultant core tensor and factor matrixes are encoded by an entropy encoder (removing statistical redundancies to explore the probability of symbols [48]) to complete the final compression task.The experimental results show the proposed method improved the computation efficiency by 49.66% while only sacrificing 0.3369 dB compared to the conventional direct NTD in the wavelet domain.
The proposed method has the potential for use in high-resolution remote sensing multispectral cameras or other remote sensing cameras [49][50][51].
Our research process has four research directions: (1) the use of other bases, such as DWT and PCA, to improve the spectral or spatial representation capability; (2) optimization of the compression algorithm regarding the tradeoff between the compression performance and complexity; (3) constructing a more complex learning network consisting of the dictionary learning and tensor decomposing to implement an end-to-end compression scheme; (4) combining the distributed source coding (DSC) scheme to construct a DSC-CNN; (5) integrating compressive sensing [52] into the proposed scheme to construct a high performance compressor; and (6) optimization of our method for hardware design.These aforementioned issues will be investigated in future research.

Figure 1 .
Figure 1.Schematic of multispectral compression based on NTD.

Figure 2 .
Figure 2. A graphic representation of Tucker Decomposition on a three-dimensional tensor.

: End 21 :
Until the converge condition is achieved.22: End 23: Return G and A (n) . p

NX
∈  to express a multispectral three-dimension tensor, which is the input of the forward CNN.The first step of the CNN is a
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 21 shown in Figure 4.The overall architecture of the whole algorithm includes three parts: an encoding CNN channel, a decoding CNN channel, and an NTD channel.The encoding CNN channel and decoding CNN channel form a closed-loop link to train the filter parameters of two CNNs.First, multispectral training images are input to the closed-loop link.The two-step algorithm (i.e., Algorithm 2) is performed to obtain the optimized filter parameters of two networks.Second, a multispectral image passes through the encoding CNN channel, NTD channel, and Entropy coding unit to complete the compression task.In the encoding CNN channel, a CNN firstly transforms the large-scale image representation into small-scale features (i.e., compact image tensors).After that, a 3D-DCT is applied to the small-scale image tensors to obtain the DCT tensors.The NTD channel performs the NTD process and completes the final encoding in connection with entropy encoders.

Figure 4 .
Figure 4. Overall architecture of the compression scheme using the CNNs in conjunction with NTD.Given the training data S of input-target pairs { } ( ) X m U ,

Figure 4 .
Figure 4. Overall architecture of the compression scheme using the CNNs in conjunction with NTD.

Figure 5 .
Figure 5.The multilevel NTD method when the band number is 32.

Figure 5 .
Figure 5.The multilevel NTD method when the band number is 32.

Figure 8 .
Figure 8. PSNR of three methods at different bit rates (a); VIF of three methods at different bit rates (b); and MSSIM of three methods at different bit rates (c).

Figure 9 .
Figure 9. Compress time occupied by two methods.

Figure 10 .
Figure 10.Comparison of compression performance and compression time: (a) decreased PSNR of the proposed method compared to conventional NTD; and (b) the improvement of the proposed method compared to conventional NTD.

21 Figure 11 .
Figure 11.Comparison of compression performance using the different learning networks.
Remote Sens. 2019, 11, x FOR PEER REVIEW 3 of 21 compress hyper-spectral images.The experimental results show that this approach could reach 49.5-58.1 dB for Cuprite at 0.05-1 bpp, and 40.1-55.3dB for Moffett Field at 0.05-1 bpp.Moreover, this method gains higher PSNR compared with Principal Components Analysis (PCA) + JPEG2000 and 2D Set Partitioned Embedded Block Coder (SPECK).These two investigations indicate that the TD-based approach is able to gain better compression performance for multispectral image compression.Unfortunately, this method suffers high calculation complexity (see Table

Table 4 .
Comparison of the compression performance with different compression method.

Table 5 .
Comparison of the computation efficiency and compression performance.

Table 5 .
Comparison of the computation efficiency and compression performance.

Table 6 .
Evaluation of the proposed compression method.