Noise Removal Based on Tensor Modelling for Hyperspectral Image Classiﬁcation

: With the current state-of-the-art computer aided manufacturing tools, the spatial resolution of hyperspectral sensors is becoming increasingly higher thus making it easy to obtain much more detailed information of the scene captured. However, the improvement of the spatial resolution also brings new challenging problems to address with signal dependent photon noise being one of them. Unlike the signal independent thermal noise, the variance of photon noise is dependent on the signal, therefore many denoising methods developed for the stationary noise cannot be applied directly to the photon noise. To make things worse, both photon and thermal noise coexist in the captured hyperspectral image (HSI), thus making it more difﬁcult to whiten noise. In this paper, we propose a new denoising framework to cope with signal dependent nonwhite noise (SDNW), Pre-estimate—Whitening—Post-estimate (PWP) loop, to reduce both photon and thermal noise in HSI. Previously, we proposed a method based on multidimensional wavelet packet transform and multi-way Wiener ﬁlter which performs both white noise and spectral dimensionality reduction, referred to as MWPT-MWF, which was restricted to white noise. We get inspired from this MWPT-MWF to develop a new iterative method for reducing photon and thermal noise. Firstly, the hyperspectral noise parameters estimation (HYNPE) algorithm is used to estimate the noise parameters, the SD noise is converted to an additive white Gaussian noise by pre-whitening procedure and then the whitened HSI is denoised by the proposed method SDNW-MWPT-MWF. As comparative experiments, the Multiple Linear Regression (MLR) based denoising method and tensor-based Multiway Wiener Filter (MWF) are also used in the denoising framework. An HSI captured by Reﬂective Optics System Imaging Spectrometer (ROSIS) is used in the experiments and the denoising performances are assessed from various aspects: the noise whitening performance, the Signal-to-Noise Ratio (SNR), and the classiﬁcation performance. The results on the real-world airborne hyperspectral image HYDICE (Hyperspectral Digital Imagery Collection Experiment) are also presented and analyzed. These experiments show that it is worth taking into account noise signal-dependency hypothesis for processing HYDICE and ROSIS HSIs.


Introduction
Hyperspectral images consist of a considerably large number of narrow spectral bands which are uniformly distributed over a wide spectral range. For each pixel, the almost continuous spectral signature adds the third orthogonal spectral dimension to the two-dimensional spatial domain and constitutes the well-known 3D data cube. Thus, hyperspectral signatures offer the capability to identify and discriminate ground cover types. To benefit from the additional spectral dimension, specific processing methods of hyperspectral images, for instance spectral signatures unmixing, target detection The generalized SD noise model was also used in [8], where the HYperspectral Noise Parameter Estimation (HYNPE) algorithm was proposed. Unlike [3], HYNPE utilized the assumption that the noise in each band after being whitened followed a standard Gaussian distribution. The joint probability density function (PDF) of the whitened noise values was used to form the ML criterion. Nonetheless, the signal and noise values were assumed to be known in the ML criterion, which was not true in practical situations. Hence, the MLR-theory based approach was exploited to estimate them. However, MLR calculates the parameters by minimizing the Least Square Error (LSE), which is a biased estimator when the noise is not white. Thus, the estimates of MLR are not accurate in HYNPE, leading to the inaccuracy of the final parameter estimates. In this paper, we investigate the relevance of a new denoising framework for reducing simultaneously the SD photon noise and the SI thermal noise in HSIs on classification [21]. Considering that the noise variance is entangled with the signal, a denoising loop is proposed to remove the noise from HSIs. Each iteration of the loop consists three steps: firstly, we do the pre-estimate, i.e., use MWPT-MWF to directly denoise the HSI containing the photon and thermal noise. Secondly, use the pre-estimate to estimate the noise parameters by employing the ML criterion, and then whiten the noise. Finally, apply MWPT-MWF to the whitened HSIs to obtain the post-estimate. Then, in the next iteration, use the post-estimate of the last iteration as the pre-estimate of current iteration. After several iterations, the HSIs will be well denoised and the resulting classification results are improved.
The remainder of this paper is organized as follows: Section 2 gives the signal model used in this paper. Appendix A presents the Multilinear Algebra Tools. Appendix B overviews the HYNPE algorithm. Appendix C presents the MWPT-MWF method. Section 3 presents the proposed iterative denoising method: considering that the noise variance is dependent on the signal, a denoising loop is proposed to remove the noise from HSIs. Each iteration of the loop consists of three steps: (i) we perform a pre-estimation, i.e., use MWPT-MWF [22,23] to directly denoise the HSI containing the photon and thermal noise, (ii) we use the pre-estimation process to estimate the noise parameters by employing the ML criterion, and then whiten the noise and (iii) we apply MWPT-MWF to the whitened HSIs to obtain the post-estimate results. Then in the next iteration, we use the post-estimate of the last iteration as the pre-estimate of current iteration. After several iterations, the HSIs will be well denoised. Section 4 gives some comparative experimental results. The real-world HSI reflective optics system imaging spectrometer (ROSIS) is used in the experiments to evaluate the performances of denoising and classification processes. Finally, Section 6 provides details of the conclusions of the research undertaken.
In this paper, we denote by x ∈ R a scalar x ∈ R I 1 a vector X ∈ R I 1 ×I 2 a matrix X ∈ R I 1 ×I 2 ×...I N a N-order tensor X n n-mode unfolding matrix X X T a tensor obtained by calculating the square root of each element of X I n n-mode dimension X the Frobenius norm of X • the vector outer product the Khatri-Rao product the Hadamard product < X , Y > the inner product between X and Y E[·] the mathematical expectation (.) T the transposition.

Signal Modeling with Thermal and Photon Noise
An HSI is a three-dimensional data cube and can be modeled as a tensor R ∈ R I 1 ×I 2 ×I 3 , with the first two dimensions being the spatial domain and with the third dimension being the spectral domain.
In fact, a hyperspectral sensor captures a HSI as a series of two-dimensional images of the spatial domain by a sensor array. By extending the data model in [7] to the 3D representation, a noisy HSI can be expressed as a third order tensor R ∈ R I 1 ×I 2 ×I 3 composed of a multidimensional signal X ∈ R I 1 ×I 2 ×I 3 impaired by an additive random noise N ∈ R I 1 ×I 2 ×I 3 : where N accounts for both thermal and photon noise and its variance depends on the pixel x i 1 ,i 2 ,i 3 in the useful signal X . The photon noise is caused by the random fluctuation of photon flux arriving at the CCD sensor, and it follows a Poisson model [24]. As the pixel size becomes smaller in the new-generation hyperspectral sensor, the number of photons that reach a pixel per unit time becomes smaller as well. Hence, the photon noise cannot be neglected anymore [20]. For a given entry x i 1 ,i 2 ,i 3 of the pure signal HSI tensor X ∈ R I 1 ×I 2 ×I 3 , the corresponding photon noise element p i 1 i 2 i 3 of tensor P ∈ R I 1 ×I 2 ×I 3 can be expressed as [25]: where u i 1 ,i 2 ,i 3 is a stationary, zero-mean uncorrelated random process independent on x i 1 ,i 2 ,i 3 with variance σ 2 u,i 3 . The thermal noise component in each sensor is electronics noise, denoted by t i 1 ,i 2 ,i 3 which can be modeled as an additive zero-mean white Gaussian noise in each band with variance σ 2 t,i 3 , while the noise variance changes from sensor to sensor due to different states of the electronic components in the sensors. Elementwise, the data model is [7]: Then, we can define N = P + T , and Equation (1) can be correspondingly rewritten as The unfolding matrix R 3 ∈ R I 3 ×M 3 of the HSI data tensor R ∈ R I 1 ×I 2 ×I 3 (with M 3 = I 1 I 2 ) can be expressed as : where X 3 is the mode-3 unfolding matrix of the multidimensional signal tensor X and with P 3 and T 3 being the mode-3 unfolding matrices of P and T , respectively.

Proposed Method
The aim of this paper is to obtain the pure signal estimateX , which is necessary to determine the noise variance. Nonetheless, since the noise is signal-dependent, the noise variances of the entries of R are different from each other and are related to the signal entries x i 1 i 2 i 3 . It is worth noting that, for a given entry x i 1 i 2 i 3 , the noise n i 1 i 2 i 3 is a summation of two Gaussian-distributed variables u i 1 i 2 i 3 and where N (·) denotes the normal distribution, and σ 2 is the noise variance, which can be expressed as [3,7,8]: This is to say that a precise noise variance estimateσ 2 i 1 i 2 i 3 needs the precise signal estimatex i 1 i 2 i 3 . Hence, the signal estimate and noise variance estimate problems are inter-related, thus making the signal estimate problem difficult to solve. In HYNPE, the noise parameters are estimated by using the signal estimate generated by the MLR theory based method, which estimates the signal by minimizing LSE. However, the LSE estimator requires that the signal and noise should be statistically independent, which is not satisfied in the SD photon noise situation. Thus, the estimates of the signal and noise are not precise, which makes the parameter estimate result unreliable as well. Moreover, since there is only one step for estimating the signal, the imprecise estimate degrades the performances of noise parameter estimation.
To use the classical parameter estimation algorithms, such as LSE and LMMSE, it is necessary to make the noise "independent of" the signal.
From Equation (8), it is evident that the noise variance σ 2 To cut off this relation, we need to whiten the noise: where the underlined is used to distinguish the whitened data from the original data. After the whitening operation, we can consider that the noise n i 1 i 2 i 3 is independent from the whitened signal It is worth noting that, in the likelihood function Equation (A13), the signal value x i 1 i 2 i 3 is assumed to be known. However, we cannot get this prior information in realistic situations; therefore, the signal value x i 1 i 2 i 3 should be replaced by its estimatex i 1 i 2 i 3 , as is presented in Appendix B.
Referring to [22], the wavelet-tensor-based algorithm MWPT-MWF yields the most accurate signal estimatex i 1 i 2 i 3 . In addition, the well-known HYNPE algorithm permits obtaining the ML estimates ofσ 2 u,i 3 andσ 2 t,i 3 . Hence, in this paper, we choose to combine these two methods to get more accurate estimation of noise variance for each element of N , which can be calculated by: When the noise variance of each entry of R is obtained, the noise can be whitened by: and the whitened hyperspectral image can be written as However, MWPT-MWF was proposed for the white noise situation, therefore, when we use it to estimatex i 1 i 2 i 3 directly without noise whitening, the estimate result is not accurate. To distinguish it with the signal estimate after noise whitening,x i 1 i 2 i 3 is named as pre-estimate. Correspondingly, the estimatex i 1 i 2 i 3 obtained after the noise whitening procedure is more accurate thanx i 1 i 2 i 3 , so we name it as post-estimate. The PWP process needs to be repeated several times to improve the performance of estimation. In fact, HYNPE only takes the first pre-estimation step in the PWP procedure, which degrades its performance when estimating the parameters. However, we utilize the more accurate post-estimate of current PWP iteration as the pre-estimate of the next PWP iteration. Therefore, the estimate accuracy can be improved in the PWP loop.
To adaptively stop the PWP loop according to the processed HSI, we need to find a stop criterion. The RMSE between the pre-estimateX and the post-estimateX is given as follows: whereX andX are the tensor forms of the pre-estimate and post-estimate, respectively. With the iteration times increasing, the RMSE X becomes asymptotically stable. Hence, we can use the relative error of RMSE X between two adjacent iterations as the stop criterion: where RMSE 0 X is the RMSE of last iteration. If e is less than a given value , the loop should be terminated. This newly proposed method is called Signal-Dependent-Noise-Whitening MWPT-MWF(SDNW-MWPT-MWF), and, to make it easy to understand, its pseudo-code and flowchart are also supplied in Algorithm 1 and Figure 1, respectively.
Compute the signal pre-estimateX by performing the MWPT-MWF to the data tensor R. 5. for j = 1; j <= J; j++ do 6. Compute the SD and SI noise variance estimatesσ 2 u,i 3 andσ 2 t,i 3 by using Equation (A12). 7. Compute the noise varianceσ 2 9. Compute the whitened signal post-estimateX by performing the MWPT-MWF to the whitened tensor R. 10. Compute the signal post-estimateX by performing the inverse whitening operation toX : Compute RMSE X by using Equation (13). 12. Compute e by using Equation (14). 13. if e < then 14.
Break. 15. end if 16. Use the post-estimateX in this iteration as the pre-estimateX in next iteration:X ←X . 17. Refresh the value of RMSE 0 X : RMSE 0 X ← RMSE X . 18. end for 19. return tensorX . 20. end procedure

Experimental Results
The data set used in the experiments is the HSI captured by the ROSIS during a flight campaign over Pavia University, Northern Italy.
The ROSIS owns 103 spectral bands and 610 × 340 pixels with the geometric resolution being 1.3 m. In this paper, only a part of 250 × 250 pixels of this image is used. Hence, it is modeled as a 250 × 250 × 103 tensor in the experiments. The SD photon noise P and the SI thermal noise T are both taken into account. In order to reproduce different noise scenarios, the SNR ranged from 20 dB to 40 dB with a step of 5 dB. As the power of the SD photon noise and that of the SI thermal noise are of the same level, in this paper, only the case E[ X X T P 2 ] = E[ T 2 ] is taken into account. The random noise is generated with a variance depending on the value of the useful signal according to Equation (8) and added into the signal X as Equation (3) to create the noisy HSI data R. The raw HSI has SNR between 35 and 40 dB [26,27]. This high-SNR HSI could be viewed as a noise-free data cube, so, in this experiment, the raw image can be taken as a reference data cube X .
The RGB composites of X and R are shown in Figure 2. Correspondingly, Figure 3 presents the curve of the mean noise variance versus the band number.  In the experiments, the wavelet db3 and transform level [1 1 0] are employed in the SDNW-MWPT-MWF. It is worth noting that various denoising methods can be used to replace the MWPT-MWF in the proposed SDNW-MWPT-MWF method. MWF is a classical tensor-based denoising method and MLR was used in the HYNPE method. Therefore, MWF and MLR have been considered to replace the MWPT-MWF as comparative experiments and are named as SDNW-MWF and SDNW-MLR, respectively. In this paper, the value of is set to 10 −3 for these three methods.

Noise-Whitening Performance Evaluation and Comparison
According to the relationship given in Equation (10), the noise variance estimateσ 2 relies on the SD noise variance estimateσ 2 u,i 3 and the SI noise variance estimateσ 2 t,i 3 . Hence, the performance of estimatingσ 2 u,i 3 andσ 2 t,i 3 influences directly the noise variance estimation result. Thus, we firstly consider the evolution ofσ 2 u,i 3 andσ 2 t,i 3 in the estimation loop. The RMSE is employed to analyze the accuracy ofσ 2 u,i 3 andσ 2 t,i 3 . The RMSE of the SD photon noise variance and the SI thermal noise variance are calculated by Notice that low values for RMSE SD and RMSE SI denote good estimation accuracy. A comparative analysis of SDNW-MWF, SDNW-MLR and SDNW-MWPT-MWF have been carried out by analyzing RMSE SD and RMSE SI with the maximum iteration times being set as 10. Figures 4 and 5 present the evolution of RMSE SD and RMSE SI (in logarithmic scale) with the iteration times. From these two figures, it can be seen that, in the case where SNR I NPUT = 20 dB, SDNW-MLR performs better than SDNW-MWF in estimating both σ 2 u,i 3 and σ 2 t,i 3 . However, in the case where SNR I NPUT = 40 dB, SDNW-MWF outperforms SDNW-MLR. Nonetheless, in both cases, the proposed SDNW-MWPT-MWF can improve the estimation performance significantly according to the lowest RMSE SD and RMSE SI it obtains. Moreover, RMSE SD and RMSE SI are greater than the initial error in SDNW-MLR and SDNW-MWF, whereas RMSE SD and RMSE SI are well constrained in SDNW-MWPT-MWF.  Apart from σ 2 u,i 3 and σ 2 t,i 3 , the estimate of the signal also influences the accuracy of the noise variance estimate of a pixel (see Equation (10)). Since the post-estimatex i 1 i 2 i 3 of current PWP iteration is used as the pre-estimatex i 1 i 2 i 3 of the next PWP iteration, we only analyze the estimation performance of the post-estimatex i 1 i 2 i 3 . To assess the performance of the signal estimatorx i 1 i 2 i 3 , we resort to the SNR I NPUT , which will be presented in Section 4.2.
To intuitively present the noise whitening results, Figure 6 shows the mean noise variance of each band after the noise whitening operation. It is evident that the mean noise variance generated by SDNW-MWPT-MWF changes slightly around 1 and is quite constant with respect to the band number. However, the mean noise variance generated by SDNW-MLR and SDNW-MWF is not very satisfactory. In Figure 6a, it can be seen that the trend of the mean noise variances in the lower bands is similar to that in Figure 3 thus implying that the noise in the lower bands (from 1 to 20) is not well whitened. On the other hand, in the bands from 20 to 100, the noise variances are relatively constant, the mean noise variance value is not 1. In Figure 6b, the noise whitening results by SDNW-MLR and SDNW-MWF are worse than that in Figure 6a. Hence, the whitening results in Figure 6 are strongly in favor of the proposed method SDNW-MWPT-MWF. The normal probability plot is usually used to visually ascertain whether or not a dataset is approximately normally distributed, which means that the values in the dataset have the same noise variance. Hence, we supply the normal probability plots of the noise before and after whitening in Figure 7. It is obvious that the noise before whitening (Figure 7a) is not normally distributed. After whitening by SDNW-MWF (Figure 7b

Denoising Performance
In Section 4.1, we have discussed in detail the SD and SI noise variance estimates RMSE SD and RMSE SI . In this subsection, we show some results about the denoising performance. Figure 9 presents the noise removed in band 10 from the ROSIS HSI by SDNW-MLR, SDNW-MWF and SDNW-MWPT-MWF in the noise environment where SNR I NPUT = 20 dB. It is obvious that the removed noise is SD noise by comparing visually Figure 9 with the original image Figure 2a.
The experimental results in Figure 9 imply that the proposed denoising framework is efficient in removing the SD photon noise in the HSI. Additionally, we have assessed the denoising performance of various methods by analyzing the criterion SNR OUTPUT . Figure 10 presents the evolution of the SNR OUTPUT in various noisy environments. It is obvious that the SNR OUTPUT generated by SDNW-MWPT-MWF reaches a higher stable value after several iterations. Conversely, the SNR OUTPUT of SDNW-MLR and SDNW-MWF are relatively lower than that of SDNW-MWPT-MWF due to the highest estimation errors of RMSE SD and RMSE SI as shown in Figures 4 and 5, respectively. When SNR I NPUT = 20 dB, the SNR OUTPUT generated by SDNW-MWF is only improved marginally compared to the SNR I NPUT . On the other hand, when SNR I NPUT = 40 dB SDNW-MWF improves the SNR significantly but the SNR OUTPUT is not stable in the evolution when the iterations are increased. The SDNW-MLR performs well when SNR I NPUT = 20 dB, but when SNR I NPUT = 40 dB the denoising performance of SDNW-MLR is not good compared to that of SDNW-MWF and SDNW-MWPT-MWF. This trend becomes worse as the SNR OUTPUT of SDNW-MLR is lower than the SNR I NPUT . Moreover, Figure 11 compares the SNR OUTPUT of each method from 20 dB to 40 dB with a step of 5 dB. It shows that SDNW-MLR can improve the SNR from 20 to 30 dB, while degrading the SNR from 35 to 40 dB. The SDNW-MWF is even worse as there is only a marginal improvement of the SNR in 20 dB. From 25 to 40 dB, it degrades the SNR. Nonetheless, from the results presented in Figures 10 and 11, we can conclude that SDNW-MWPT-MWF is a stable and reliable denoising method in various noisy environments.

Classification after Denoising
In Section 4.2, we have mainly compared various methods with the capability of improving the SNR. However, some methods might also modify the useful signal severely in the denoising process and which cannot be reflected by the SNR. The classification is employed to distinguish different materials in HSIs [21], and it is sensitive to the signal distortion. Hence, in this paper, we take into account the classification improvement ability of the various methods considered.
Two real-world images are considered for this investigation. The first one, referred to as ROSIS HSI, is described in the beginning of this section.
In the ROSIS HSI, there are nine classes: bitumen, self-blocking bricks, trees, shadows, gravel, bare soil, asphalt, painted metal sheets, and meadows, which are shown in Figure 2b and in the ground truth in Figure 12 with different colors. A proportion of 10% of the reference data of each class is randomly selected as the training samples. The numbers of training and testing samples are shown in Table 1. The SVM classifier is employed to do the classification and its kernel function is RBF with γ = 1 and the penalty parameter C = 100.  The second one, referred to as HYDICE HSI, was acquired by the HYperspectral Digital Imagery Collection Experiment (HYDICE) and has 148 spectral bands (from 435 to 2326 nm), 310 rows, and 220 columns. The scene is shown in Figure 13a. This HSI is modeled as a tensor R ∈ R 310×220×148 and its ground truth is shown in Figure 13b. According to the ground truth, there are seven land cover classes in HYDICE HSI: field, trees, road, shadow and three different targets. A proportion of 30% of the reference data of each class is randomly selected as the training samples. The numbers of training and testing samples are shown in Table 2.  The classification is applied to the denoised HSIs using various methods in order to compare their abilities of improving the classification performance. Figure 14 presents the classification results obtained in the noisy environment where SNR I NPUT = 30 dB. It can be seen from the "bare soil" class that there are less misclassified pixels in the results obtained by SDNW-MWPT-MWF compared to the other results. To make it easy to compare, we give the Overall Accuracy (OA) and Kappa coefficient (K) results for the various methods used in Table 3. It can be seen that if the classification is applied directly to the noisy HSI, the OA is only 91.33% and K = 0.71. After denoising by SDNW-MLR, there is a marginal improvement of OA resulting in 91.88% and K = 0.81. SDNW-MWF performs better than SDNW-MLR, and its OA is increased to 94.20% and K to 0.94. The proposed SDNW-MWPT-MWF makes the most significant improvement among the denoising methods. Indeed, its OA is 98.51% and its K is 0.97. The classification result clearly shows that the proposed SDNW-MWT-MWF is efficient in improving the classification performance.
To investigate how the number of training samples affects the performance of our method and other comparison methods, we considered a proportion of 50% of the reference data of each class is randomly selected as the training samples. Table 4 shows the OA and Kappa coefficient results. As shown from Tables 3 and 4, our proposed method outperforms all other methods. In particular, with the number of training samples reducing (10%), our method could obtain bigger accuracy gains than other comparison methods.    Figure 15 shows the OA values obtained from the denoised HYDICE HSI and shows that the SDNW-MWPT-MWF method permits reduction of the noise, which is of great interest for SVM classifier. The comparison of the OA and Kappa values (see Table 5) calculated for each preprocessing of denoising shows that the multilinear algebra-based method SDNW-MWPT-MWF leads to better classification results than the considered methods in this experiment.  To make the experimental results more convincing, we have compared the classification improvement performance of SDNW-MLR, SDNW-MWF and SDNW-MWPT-MWF in varying noisy environments, i.e., SNR I NPUT varies from 20 dB to 40 dB with a step of 5 dB. The classification result of the noisy HSI without denoising is also supplied as a benchmark. Figure 16 presents the curves of OA versus SNR I NPUT . In the cases where the HSI is impaired severely (SNR I NPUT = 20 and 25 dB), the classification result can be improved significantly after denoising, which implies that the denoising is a necessary preprocessing procedure prior to the classification. Above 30 dB, SDNW-MLR can only improve the classification result marginally and the same is valid for SDNW-MWF with more than 35 dB. Nonetheless, SDNW-MWPT-MWF performs better than SDNW-MLR and SDNW-MWF and it improves the OA significantly for SNR I NPUT values from 20 dB to 40 dB.

Discussion
Hyperspectral sensors collect data in hundreds of narrow contiguous spectral bands, providing a powerful means to discriminate different materials by their spectral features. In the last decade, for improving the classification, several denoising algorithms for hyperspectral images were proposed. Most were derived assuming the spatial stationarity of the noise that affects hyperspectral images, meaning that the noise characteristics are assumed to be the same in each hyperspectral image region. The existing algorithms are proposed to remove the signal independent noise for enhancing hyperspectral image applications. In this study, we have proved that assumption is not valid for new-generation hyperspectral sensors, where photon noise, which depends on the spatially varying signal level, is not negligible. We are thus studying the possible impacts of signal-dependent noise on the performance of existing classification algorithms. By assuming a signal-dependent noise model, we have proposed a novel multilinear algebra-based algorithm to remove simultaneously the signal-independent noise and signal-dependent noise. This result has inspired a new iterative procedure noise-whitening (SDNW-MWPT-MWF) which improves the SVM's robustness in the presence of signal-dependent noise. The noise-whitening is achieved by exploiting estimates of the noise variance for each band and for each pixel. The performance of the proposed SDNW-MWPT-MWF method are validated on the simulated HSIs disturbed by both SD and SI noise and on the real-world ROSIS and HYDICE HSIs. From the analysis and the comparative study against other similar methods in the experiments, it can be concluded that SDNW-MWPT-MWF method can effectively reduce both SD and SI noise from HSIs. It is also necessary to take into account the signal-dependent noise in the denoising when dealing with HSIs that were collected by a new-generation airborne hyperspectral sensors. Indeed, this study demonstrated that the signal-dependent noise may affect the properties of existing classification algorithms, thus encouraging future work about the impact of SD noise. Our ongoing activity is aimed at demonstrating the benefits arising from using the SDNW-MWPT-MWF algorithm in mitigating the impact of the SD noise in different algorithms for hyperspectral data exploitation.
Since in this study the optimal parameter combination is found by time-consuming brute force searching, future works will be focused on the reduction of the computational load. A heuristic algorithm can be used to search for the optimal (sub-optimal) parameter combination such as a genetic algorithm.

Conclusions
In this paper, a denoising method SDNW-MWPT-MWF has been proposed under the PWP denoising framework for the purpose of reducing the SD photon and SI thermal noise in HSIs. Two other adapted denoising methods SDNW-MLR, SDNW-MWF are also used as comparative methods. The performances of these methods are compared in the case of photon noise whitening, denoising and improvement of classification. The photon noise whitening experiment, the denoising experiment and the classification experiment are designed to assess the parameter estimation performance, the improvement of the SNR and the distortion of the spectra in the HSI, respectively. From the experimental results obtained, it can be concluded that the SDNW-MLR performs well in removing the noise, though it also changes the signal severely in the denoising process as shown from the lower classification result OA. In contrast, SDNW-MWF performs well in preserving the signal though it cannot remove the noise well from the images. However, the proposed SDNW-MWPT-MWF is able to obtain high SNR OUTPUT as well as the high OA, thus implying that it is able to remove noise well while still preserving the signal.
These promising results encourage us to extend our experiments on other hyperspectral data such as Indian Pines and Salinas HSIs.
In this study, we employed the HYNPE algorithm to estimate the noise parameters, and then filtered noise by the proposed method SDNW-MWPT-MWF. Nonetheless, SDNW-MWPT-MWF converts SD noise to additive white Gaussian noise by a pre-whitening procedure and then applied MWPT-MWF. Due to the parameter estimate errors, the whitened noise is only approximately white, therefore the performance of filters developed for the white noise might degrade. Thus, it is interesting to develop a filter that can directly process the SD noise without the noise whitening procedure. For example, the tensor-based filtering method using a PARAFAC tensor decomposition could be carried out in the future.  The MLR theory estimates the signal by exploiting the strong spectral correlation of the useful signal and the weak between-band correlation of the random noise in the HSI. It assumes that the signal estimate row vectorx T i 3 in band i 3 can be expressed as a linear combination of the noisy data row vectors r T j 3 (j 3 = 1, . . . , i 3 − 1, i 3 + 1, . . . , I 3 ) in the other I 3 − 1 bands: where Θ i 3 = r 1 , . . . , r i 3 −1 , r i 3 +1 , . . . , r I 3 and w i 3 ∈ R (I 3 −1) is the combination weight vector. Then, the optimal weight vector can be estimated by minimizing the LSE: The solution of the LSE problem is well known and can be expressed as: and the reconstruction can be written as Extracting the components of each frequency C R l,m , C X l,m and C N l,m from C R l , C X l and C N l respectively by using Equation (A16), we obtain: C R l,m = C X l,m + C N l,m .
From Parseval's theorem, the following expression can be obtained: which means that minimizing the MSE between X and its estimateX is equivalent to minimizing the MSE between C X l,m andĈ X l,m for each m. IfĈ X l,m is estimated by Tucker3 decomposition of C R l,m : then H 1,m , H 2,m , H 3,m are the mode-n filters of the multiway Wiener filter [33]. After estimatingĈ X l,m for each m, we obtainĈ X l by concatenatingĈ X l,m . Furthermore, the estimateX can be obtained by inverse MWPT, i.e.,X =Ĉ X l × 1 (W l 1 1 ) T × 2 (W l 2 2 ) T × 3 (W l 3 3 ) T . (A27)

Appendix C.3. Best Transform Level and Basis Selection
In MWPT-MWF algorithm, several parameters should be determined: 1. Level of transform: the performance of the algorithm is affected by the level of transform, which depends on the size of tensor R. The maximum level can be calculated by: where · rounds a number upward to its nearest integer, and the constant 5 is reduced from log 2 I k to make sure there are enough elements in each mode so that the transform is meaningful. Then, the set of possible transform levels can be expressed as: L k = {0, 1, · · · , N L k }, k = 1, 2, 3, where {·} denotes a set. 2. Basis of transform: there are many wavelet bases designed for different cases. For the simplicity of expression, we define: W = {w 1 , w 2 , · · · , w N W } (A30) to denote the set of possible wavelet bases, where N W is the number of wavelets in this set.
The best transform level and basis should minimize the MSE or risk R c (X ,X ) = E X −X 2 [34], whose equivalent form using the coefficients can be expressed as As the selection of the optimal l, w depends on X , which is generally unknown, to overcome this drawback, an alternative solution should be found. Denoting