Variational Low-Rank Matrix Factorization with Multi-Patch Collaborative Learning for Hyperspectral Imagery Mixed Denoising

: In this study, multi-patch collaborative learning is introduced into variational low-rank matrix factorization to suppress mixed noise in hyperspectral images (HSIs). Firstly, based on the spatial consistency and nonlocal self-similarities, the HSI is partitioned into overlapping patches with a full band. The similarity metric with fusing features is exploited to select the most similar patches and construct the corresponding collaborative patches. Secondly, considering that the latent clean HSI holds the low-rank property across the spectra, whereas the noise component does not, variational low-rank matrix factorization is proposed in the Bayesian framework for each collaborative patch. Using Gaussian distribution adaptively adjusted by a gamma distribution, the noise-free data can be learned by exploring low-rank properties of collaborative patches in the spatial/spectral domain. Additionally, the Dirichlet process Gaussian mixture model is utilized to approximate the statistical characteristics of mixed noises, which is constructed by exploiting the Gaussian distribution, the inverse Wishart distribution, and the Dirichlet process. Finally, variational inference is utilized to estimate all variables and solve the proposed model using closed-form equations. Widely used datasets with different settings are adopted to conduct experiments. The quantitative and qualitative results indicate the effectiveness and superiority of the proposed method in reducing mixed noises in HSIs. Abstract: In this study, multi-patch collaborative learning is introduced into variational low-rank matrix factorization to suppress mixed noise in hyperspectral images (HSIs). Firstly, based on the spatial consistency and nonlocal self-similarities, the HSI is partitioned into overlapping patches with a full band. The similarity metric with fusing features is exploited to select the most similar patches and construct the corresponding collaborative patches. Secondly, considering that the latent clean HSI holds the low-rank property across the spectra, whereas the noise component does not, variational low-rank matrix factorization is proposed in the Bayesian framework for each collaborative patch. Using Gaussian distribution adaptively adjusted by a gamma distribution, the noise-free data can be learned by exploring low-rank properties of collaborative patches in the spa-tial/spectral domain. Additionally, the Dirichlet process Gaussian mixture model is utilized to approximate the statistical characteristics of mixed noises, which is constructed by exploiting the Gaussian distribution, the inverse Wishart distribution, and the Dirichlet process. Finally, variational inference is utilized to estimate all variables and solve the proposed model using closed-form equations. Widely used datasets with different settings are adopted to conduct experiments. The quantitative and qualitative results indicate the effectiveness and superiority of the proposed method in reducing mixed noises in HSIs. Abstract: In this study, multi-patch collaborative learning is introduced into variational low-rank matrix factorization to suppress mixed noise in hyperspectral images (HSIs). Firstly, based on the spatial consistency and nonlocal self-similarities, the HSI is partitioned into overlapping patches with a full band. The similarity metric with fusing features is exploited to select the most similar patches and construct the corresponding collaborative patches. Secondly, considering that the latent clean HSI holds the low-rank property across the spectra, whereas the noise component does not, variational low-rank matrix factorization is proposed in the Bayesian framework for each collaborative patch. Using Gaussian distribution adaptively adjusted by a gamma distribution, the noise-free data can be learned by exploring low-rank properties of collaborative patches in the spa-tial/spectral domain. Additionally, the Dirichlet process Gaussian mixture model is utilized to approximate the statistical characteristics of mixed noises, which is constructed by exploiting the Gaussian distribution, the inverse Wishart distribution, and the Dirichlet process. Finally, variational inference is utilized to estimate all variables and solve the proposed model using closed-form equations. Widely used datasets with different settings are adopted to conduct experiments. The quantitative and qualitative results indicate the effectiveness and superiority of the proposed method in reducing mixed noises in HSIs.


Introduction
Hyperspectral images (HSIs) are acquired by hyperspectral sensors, represented as a 3D data-cube containing both rich spectral and spatial information. Due to the limitations of the acquisition and transmission process, HSIs unavoidably suffer from various degradations, such as noise contamination, stripe corruption, missing data relating to the voxels in the data-cube or entire spectral bands [1][2][3][4][5]. These degradations severely limit the quality of the images and influence the precision of the subsequent processing, including unmixing, target detection, and classification [6][7][8][9]. Therefore, image restoration is of critical importance and challenging in the preprocessing stage of HSI analysis.
Previously, the traditional 2D or 1D denoising models have been applied for reducing noises in HSIs pixel-by-pixel [10] or band-by-band [11]; however, these methods ignore the correlations between different spectral bands or adjacent pixels and often result in relatively low-quality results. To further enhance the denoising performance, more efficient methods have been proposed, of which the key point is to elaborately encode the prior knowledge on the structure underlying a natural HSI, especially the characteristic across the spatial and spectral dimensionality of the image. Othman and Qian [12] made an initial attempt to resolve this issue by designing a hybrid spatial-spectral derivative-domain wavelet shrinkage model, which was constructed by exploring the dissimilarity of the signal regularity existing along the space and spectrum of a natural HSI. Fu et al. [13] proposed an effective restoration model by considering the underlying sparsity across the spatial-spectral domain, high correlation across spectra, and non-local self-similarity over space. Meanwhile, a series of methods expanding the wavelet-based method from 2D to 3D has been proposed, for example, the so-called "non-local means" filtering approach has become popular in image processing and extensions have been developed in order to denoise structural 3D images [14]. Letexier et al. [15] proposed a generalized multi-dimensional Wiener filter for denoising hyperspectral images. Similarly, Chen et al. [16] extended Sendur and Selesnick's bivariate wavelet thresholding from 2D image denoising to 3D data cube denoising. To get better denoising results, as an extension of the BM3D [11] method, Maggioni et al. presented a BM4D model [17]. Utilizing highly correlated spectral information and highly similar spatial information, the spectral-spatial adaptive sparse representation model was proposed for reducing the noise in HSIs [18]. By explicitly treating HSI data as a 3D cube, denoising models based on tensor decomposition have appeared. In Reference [19], a novel coupled spectral-spatial tensor representation framework was proposed for noise reduction of hyperspectral images. Chen et al. proposed a low-rank tensor decomposition model for HSI restoration [20]; however, most of the above-mentioned approaches are limited due to their insufficient usage of the correlations in the spectral domain, which results in suboptimal performance while suppressing mixed noises.
By efficiently exploring the latent knowledge across spectral bands for HSIs, low-rank models have been proposed and widely utilized to restore the pure datasets from the degraded images, with competitive performances [21][22][23][24][25][26]. The classical low-rank matrix factorization (LRMF) model was presented by K. Mitra et al. and T. Okatani et al. [24]. Subsequently, using the low-rank matrix recovery (LRMR) framework, an HSI restoration technique was explored to simultaneously remove various noises in an HSI [25]. The global and non-local low-rank factorization (GLF) was proposed to suppress the noises in HSIs by utilizing the low dimensional sub-spaces and the self-similarity of the real HSI [26]. These approaches obtained satisfactory results by effectively exploiting the spectral information.
To sufficiently enhance the denoising performance, it is necessary to well integrate the spatial characteristics of HSIs into the low-rank-based models [27][28][29][30][31][32]. Wang et al. [29] proposed a novel low-rank constraint and spatial-spectral total variation regularization model by jointly utilizing global low-rank and local spatial-spectral smooth properties of HSIs. Wang et al. [30] developed a total variation regularized low-rank tensor decomposition (LRTDTV) method, in which HSI was regarded as a third-order tensor rather than being divided into the patches or being unfolded into the matrix. In [31], a novel robust principal component analysis approach was introduced into the spatial-spectral low-rank model for mixed noise removal by fully identifying the intrinsic structures of the mixed noise and clean HSIs. Based on the global correlation along the spectra and nonlocal self-similarity across space, a low-rank tensor dictionary learning (LTDL) approach was explored with satisfactory performance in [32]. In the spatial domain, HSI has latent consistency. Using this, patch learning has been widely applied to depict spatial information and has achieved good performance [33][34][35][36][37]. When the HSIs are heavily polluted by noise, the patches with little effective information are usually not directly used to recover the noise-free data. Therefore, these methods could not efficiently learn and represent the intrinsic spatial consistency and nonlocal similarities of HSIs and thus limit their denoising performance.
Deep learning has also been widely used for HSIs [38][39][40][41][42][43]. Their success suggests its effectiveness for learning and depicting latent features when denoising the HSI. Additionally, hyperspectral images are usually polluted by various noises, which often have different statistical features [44,45], such as the noises depending on the signal, the noises depending on the space domain, or spectral domain noise and mixed noise. Therefore, it is necessary to construct a model for suppressing complex mixed noises in order to deal with real HSI scenarios.
To alleviate the above limitations, a variational low-rank matrix factorization model, combined with multi-patch collaborative learning (VLRMFmcl), is proposed in the Bayesian framework to suppress various noises in HSIs. The main contributions of this work are summarized as follows.
(1) Multi-patch collaborative learning is exploited to effectively depict and learn the spatial consistency and the nonlocal self-similarity in the HSI. The pixels within the same collaborative patches share similar spatial-spectral characteristics, which are utilized to effectively improve the performance of denoising the patches polluted by heavy noises.
(2) Variational low-rank matrix factorization is proposed to learn and characterize the collaborative patch data by exploring latent characteristics across the spatial-spectral domain, in which the latent clean image in degraded HSI has the property of low rank and the mixed noises do not. The Gaussian distribution with zero mean and the variance adjusted by gamma distribution is explored to represent the latent clean image. The Dirichlet process Gaussian mixture is exploited to depict the inherent statistical features of different noises in the HSI, which are adapted and learned by exploring the Gaussian distribution, the inverse Wishart distribution, and the Dirichlet process. Through this process, the underlying mixed noise of the HSI can be fit adaptively without needing to know specific noise types or intensity.
(3) Considering the uncertainty of information about latent variables, the posteriors of the latent clean image and the mixed noises are both explicitly parameterized and update in a closed form by utilizing the variational inference. The feasibility and validity of the VLRMFmcl method are evaluated under different experimental conditions. Compared with several popular denoising methods, VLRMFmcl can reduce the noises of the hyperspectral images while preserving the structural information.
The paper is organized as follows. Section 2 gives a detailed description of the proposed restoration model, which is performed using variational inference. In Section 3, several experimental results are presented by utilizing the real-world HSI datasets. The conclusions are given in Section 4.

Proposed Model
To effectively suppress the various noises in HSIs, multi-patch collaborative learning is explored to represent the intrinsic spatial consistency and non-local self-similarity of HSIs. Then, the learned patches are input into the variational low-rank matrix factorization model, which is developed to suppress the mixed noises of each patch in the Bayesian framework. Figure 1 presents the framework of the proposed VLRMFmcl.

Multi-Patch Collaborative Learning
In HSIs, the adjacent pixels have high consistency in the space domain [26][27][28][29]. Based on this fact, they are often divided into overlapping three-dimensional patches for the HSI analysis. The effective information of one patch is very scarce when most of the pixels within this patch are polluted by a large amount of noise. Therefore, it is very difficult to recover the noiseless data by directly exploiting this image patch. To solve these problems, it becomes important to effectively utilize the patches in HSIs. Figure 2 presents some patches from the Pavia Centre data (presented in Section 3.2), in which the area marked by the red box represents a test patch, and the ones marked by three green boxes represent its neighboring patches. The similarities between the patches marked by the red box and the green boxes are very different. Inspired by this basic characteristic in the HSI, the heavily polluted patches could be restored by the patches with high similarities with them. Additionally, it has been argued that the "collaborative" nature of the approximation can improve classification accuracy [46]. Considering that the HSI denoising aims to facilitate subsequent applications (e.g., classification), a similar "collaborative" [46] nature is introduced. Based on these, a multi-patch collaborative   Figure 2 presents some patches from the Pavia Centre data (presented in Section 3.2), in which the area marked by the red box represents a test patch, and the ones marked by three green boxes represent its neighboring patches. The similarities between the patches marked by the red box and the green boxes are very different. Inspired by this basic characteristic in the HSI, the heavily polluted patches could be restored by the patches with high similarities with them. Additionally, it has been argued that the "collaborative" nature of the approximation can improve classification accuracy [46]. Considering that the HSI denoising aims to facilitate subsequent applications (e.g., classification), a similar "collaborative" [46] nature is introduced. Based on these, a multi-patch collaborative learning strategy is proposed by exploring the similarities between different image patches to effectively learn about the HSI. Supposing the hyperspectral image X, it is segmented into overlapped three-dimensional patches with the size of × × , where represent the spatial size of patches and is the total number of bands in the HSI. For a pixel from X, the collection consists of all pixels of a patch centered at the sample . The pixels in N( ) i x can be considered to contain similar characteristics. The i y is formed by stacking    Figure 2 presents some patches from the Pavia Centre data (presented in Section 3.2), in which the area marked by the red box represents a test patch, and the ones marked by three green boxes represent its neighboring patches. The similarities between the patches marked by the red box and the green boxes are very different. Inspired by this basic characteristic in the HSI, the heavily polluted patches could be restored by the patches with high similarities with them. Additionally, it has been argued that the "collaborative" nature of the approximation can improve classification accuracy [46]. Considering that the HSI denoising aims to facilitate subsequent applications (e.g., classification), a similar "collaborative" [46] nature is introduced. Based on these, a multi-patch collaborative learning strategy is proposed by exploring the similarities between different image patches to effectively learn about the HSI. Supposing the hyperspectral image X, it is segmented into overlapped three-dimensional patches with the size of × × , where represent the spatial size of patches and is the total number of bands in the HSI. For a pixel from X, the collection consists of all pixels of a patch centered at the sample . The pixels in N( ) i x can be considered to contain similar characteristics. The i y is formed by stacking Supposing the hyperspectral image X, it is segmented into overlapped three-dimensional patches with the size of d × d × λ, where d represent the spatial size of patches and λ is the total number of bands in the HSI. For a pixel x i from X, the collection N(x i ), 1 ≤ i ≤ d 2 consists of all pixels of a patch centered at the sample x i . The pixels in N(x i ) can be considered to contain similar characteristics. The y i is formed by stacking all pixels of N(x i ) into a vector, which can be regarded as the fusing feature of x i . The similarities between the different patch data can be formulated as where I 1×d 2 λ is the row vector with the dimension of d 2 λ, of which the elements are equal to one. Obviously, the larger the value of the SimilarIndex y i , y j is, the higher the similarities that are observed between N(x i ) and N(x j ). According to Equation (1), we can select the most similar (P-1) patches to the patch centered at x i , and construct the non-local patch data Y i . When d is large enough, it can be considered as all the most similar data to be searched in the whole hyperspectral image.

Variational Low-Rank Matrix Decomposition
In the existing literature, many computer vision, machine learning, and statistical problems can be approached by solving and learning a low-dimensional linear model. In this case, the low-rank matrix decomposition has been widely concerned and applied in many fields [3][4][5][6], which can effectively explore the low-dimensional properties of the observed data. Assuming X = [y 1 ,...,y H ]∈ R M×H represents the observed data, M and H represent the spatial size of X, the general low-rank matrix decomposition model is formulated as where U = [u 1 ,...,u L ]∈ R M×L and V = [v 1 ,...,v H ] T ∈ R H×L represent the decomposition matrix, L min(M, H); n is the noise, which is depicted by Gaussian distribution, Laplace distribution, or polynomial distribution.
Obviously, the pixels of the same collaborative patch have similar characteristics in the spatial and spectral domains. In other words, these pixels have the low-rank property and can be effectively learned and expressed by the low-rank matrix decomposition. Additionally, the hyperspectral images are usually polluted by various noises with different statistical properties. The Gaussians mixture model can effectively learn and depict the different noises, including Gaussian noise, sparse noise, and so on. Above all, the noise model is explored to depict the complex noises in real HSIs, in which the Dirichlet process is utilized to adaptively achieve the selection of Gaussian distribution and the determination of the number of Gaussian distributions. The symbol Y = {y i } P i=1 represents the collaborative patch data, and M = d 2 λ represents the dimensions of the sample y i . According to (2), the proposed Bayesian low-rank matrix decomposition model for denoising the collaborative patch data can be written as follows: The first term is the low-rank decomposition term, in which u i ∈ R M and v j ∈ R L are defined as u i ∼ N(0, τ −1 ui I) and v j ∼ N(0, τ −1 vj I) separately. That is, u i and v j are drawn from the Gaussian distribution with zero mean and variances of τ ui and τ vj , individually. In order to improve the model robustness and reduce the sensitivity of parameters, the gamma distribution is introduced to adaptively adjust the parameter τ ui and τ vj . The first term can be formulated as: where I is the column vector, of which the entries are all equal to one; a 0 , b 0 , c 0 and d 0 represent the hyper-parameters of the gamma distribution. The second term in Equation (3) represents the mixed noises in the real HSI. Considering the complex statistical properties of the noises in the HSI, the Gaussians mixture model is utilized to depict the different noises, which are displayed as In Equation (5), µ k and Σ k are the mean and variance of the k-th Gaussian distribution, which are learned and represented by the Gaussian distribution and the inverse Wishart distribution. The Gaussian distribution and the inverse Wishart distribution are conjugate. µ 0 and Σ 0 represent the mean and variance of the parameter µ k ; and e 0 and f 0 are the freedom degree and scale matrix of the inverse Wishart distribution. To effectively depict the various noises in a data-driven way, the indicator variable z ij ∈ {0, 1} K , K → ∞ is introduced to determine and learn the number and mode of the Gaussian distribution, and ∑ k z ijk = 1. z ij is drawn from the polynomial distribution with the parameter π, which is learned through the Dirichlet process. Figure 3 shows the graphical representation of the Bayesian low-rank matrix decomposition model.
In Equation (5), and are the mean and variance of the k-th Gaussian distribution, which are learned and represented by the Gaussian distribution and the inverse Wishart distribution. The Gaussian distribution and the inverse Wishart distribution are conjugate. and represent the mean and variance of the parameter ; and and are the freedom degree and scale matrix of the inverse Wishart distribution. To effectively depict the various noises in a data-driven way, the indicator variable ∈ {0,1} , → ∞ is introduced to determine and learn the number and mode of the Gaussian distribution, and ∑ = 1. is drawn from the polynomial distribution with the parameter , which is learned through the Dirichlet process. Figure 3 shows the graphical representation of the Bayesian low-rank matrix decomposition model. (3) when recovering the missing pixels. = {0,1} × is the sampling matrix whose elements are equal to 0 or 1. Therefore, represents the loss of the f-th element in when acquiring; ) means that the f-th element in is effectively collected.
i j k Figure 3. Graphical representation of the variational low-rank matrix decomposition model.

Variational Bayesian Inference
According to Equations (3) and (4) and Figure 3, it can be observed that all the variances in the proposed Bayesian low-rank matrix decomposition model satisfy the conjugation. Therefore, variational Bayesian inference can be used to solve the model. Assuming the symbol { , , , , , , , } Then, we can obtain 3) when recovering the missing pixels. ∆ = {0, 1} M×P is the sampling matrix whose elements are equal to 0 or 1. Therefore, represents the loss of the f -th element in y i when acquiring; Σ f i = 1( f = 1, · · · , M) means that the f -th element in y i is effectively collected.

Variational Bayesian Inference
According to Equations (3) and (4) and Figure 3, it can be observed that all the variances in the proposed Bayesian low-rank matrix decomposition model satisfy the conjugation. Therefore, variational Bayesian inference can be used to solve the model.
where KL(q(Ψ)| p(Ψ|Y, Θ ) ) is utilized to represent the KL divergence distance between the variational approximation q(Ψ) and the true joint probability distribution p(Ψ|Y, Θ ). It can be easily seen that the expression ln p(Y|Θ ) has a strict lower bound because KL(q(Ψ)|p(Ψ|Y, Θ ) )≥0. Therefore, the optimal solution of the proposed model can be calculated by minimizing KL(q(Ψ)| p(Ψ|Y, Θ ) ). Algorithm 1 presents the pseudocode of the VLRMFmcl method.

Algorithm 1. The VLRMFmcl Method
Input: the noisy HSI image X; the spatial size d of patches; the total number λ of bands; Abstract: In this study, multi-patch collaborative learning is introduced into variational low-rank matrix factorization to suppress mixed noise in hyperspectral images (HSIs). Firstly, based on the spatial consistency and nonlocal self-similarities, the HSI is partitioned into overlapping patches with a full band. The similarity metric with fusing features is exploited to select the most similar patches and construct the corresponding collaborative patches. Secondly, considering that the latent clean HSI holds the low-rank property across the spectra, whereas the noise component does not, variational low-rank matrix factorization is proposed in the Bayesian framework for each collaborative patch. Using Gaussian distribution adaptively adjusted by a gamma distribution, the noisefree data can be learned by exploring low-rank properties of collaborative patches in the spatial/spectral domain. Additionally, the Dirichlet process Gaussian mixture model is utilized to approximate the statistical characteristics of mixed noises, which is constructed by exploiting the Gaussian distribution, the inverse Wishart distribution, and the Dirichlet process. Finally, variational inference is utilized to estimate all variables and solve the proposed model using closed-form equations. Widely used datasets with different settings are adopted to conduct experiments. The quantitative and qualitative results indicate the effectiveness and superiority of the proposed method in reducing mixed noises in HSIs.

Introduction
Hyperspectral images (HSIs) are acquired by hyperspectral sensors, represented as a 3D data-cube containing both rich spectral and spatial information. Due to the limitations of the acquisition and transmission process, HSIs unavoidably suffer from various degradations, such as noise contamination, stripe corruption, missing data relating to the voxels in the data-cube or entire spectral bands [1][2][3][4][5]. These degradations severely limit the quality of the images and influence the precision of the subsequent processing, including unmixing, target detection, and classification [6][7][8][9]. Therefore, image restoration is of critical importance and challenging in the preprocessing stage of HSI analysis.
Previously, the traditional 2D or 1D denoising models have been applied for reducing noises in HSIs pixel-by-pixel [10] or band-by-band [11]; however, these methods ignore the correlations between different spectral bands or adjacent pixels and often result in relatively low-quality results. To further enhance the denoising performance, more efficient methods have been proposed, of which the key point is to elaborately encode the Divide X into the overlapping patches with the size of d × d × λ; for each pixel x i in X do Obtain the collection N(x i ) and y i , where 1 ≤ i ≤ d 2 λ Calculate the similarities SimailarIndex(y i , y j ) between N(x i ) and N(x j ); Select the most similar (P-1) patches to the patch centered at x i , and construct the Abstract: In this study, multi-patch collaborative learning is introduced into variational low-rank matrix factorization to suppress mixed noise in hyperspectral images (HSIs). Firstly, based on the spatial consistency and nonlocal self-similarities, the HSI is partitioned into overlapping patches with a full band. The similarity metric with fusing features is exploited to select the most similar patches and construct the corresponding collaborative patches. Secondly, considering that the latent clean HSI holds the low-rank property across the spectra, whereas the noise component does not, variational low-rank matrix factorization is proposed in the Bayesian framework for each collaborative patch. Using Gaussian distribution adaptively adjusted by a gamma distribution, the noisefree data can be learned by exploring low-rank properties of collaborative patches in the spatial/spectral domain. Additionally, the Dirichlet process Gaussian mixture model is utilized to approximate the statistical characteristics of mixed noises, which is constructed by exploiting the Gaussian distribution, the inverse Wishart distribution, and the Dirichlet process. Finally, variational inference is utilized to estimate all variables and solve the proposed model using closed-form equations. Widely used datasets with different settings are adopted to conduct experiments. The quantitative and qualitative results indicate the effectiveness and superiority of the proposed method in reducing mixed noises in HSIs.

Introduction
Hyperspectral images (HSIs) are acquired by hyperspectral sensors, represented as a 3D data-cube containing both rich spectral and spatial information. Due to the limitations of the acquisition and transmission process, HSIs unavoidably suffer from various degradations, such as noise contamination, stripe corruption, missing data relating to the voxels in the data-cube or entire spectral bands [1][2][3][4][5]. These degradations severely limit the quality of the images and influence the precision of the subsequent processing, including unmixing, target detection, and classification [6][7][8][9]. Therefore, image restoration is of critical importance and challenging in the preprocessing stage of HSI analysis.
Previously, the traditional 2D or 1D denoising models have been applied for reducing noises in HSIs pixel-by-pixel [10] or band-by-band [11]; however, these methods ignore the correlations between different spectral bands or adjacent pixels and often result in relatively low-quality results. To further enhance the denoising performance, more efficient methods have been proposed, of which the key point is to elaborately encode the The updating equations of the model variances are listed as follows.
(1) updating z ij and v t : The posterior probability of v t is still drawn from the beta process. Supposing v t ∼ beta(g t , h t ), g t and h t can be calculated by For the variance z ij , where Supposing Φ is the digamma function, γ ik l,1 and γ ik l,2 are expressed as: (2) updating µ k : The posterior probability of µ k is still drawn from the Gaussian distribution, which satisfies the following conditions: (3) updating Σ k : The posterior probability of Σ k is still drawn from the inverse-Wishart distribution, which satisfies the following conditions: From Equation (12), we can obtain the following expression: (4) updating u i : The posterior probability of u i is still drawn from the Gaussian distribution, which is formulated as where the mean µ u i and the variance Ω u i are shown as (5) updating τ ui : The posterior probability of τ ui is still drawn from the gamma distribution, which satisfies the following conditions: where the parameters a and b i are shown as (6) updating v i : The posterior probability of v i is still drawn from the Gaussian distribution, which can be written as follows: where the mean µ v j and the variance Ω v j are: (7) updating τ vj : The posterior probability of τ ui is still drawn from the gamma distribution, which satisfies the following conditions: where the parameters c and d j are shown as
In addition, five metrics were chosen to numerically evaluate the denoising performance of the different algorithms-the peak signal-to-noise ratio (PSNR), feature similarity (FSIM) [47], the mean spectral angle (MSA), noise reduction (NR) [48,49], and the mean relative deviation (MRD) [48,49]. At the same time, the visual effect was utilized as an intuitive way to determine the denoising performance. Suppose I den and I re f represent the denoised and reference images, respectively; I 1 and I 2 represent the spatial size of the image.
(a) The greater the value of PSNR is, the better the denoising image quality is. PSNR (in dB) is formulated as: PSNR = 10 log 10 ( 255 2 * I 1 I 2 (b) The greater the value of FSIM is, the better the denoising image quality is. FSIM is formulated as: where S l (x) is derived from the phase congruency and the image gradient magnitude of I den and I re f ; PC m (x) is the maximum phase congruency of PC den (for I den ) and PC re f (for I re f ); and Ω represents the entire airspace of the image. (c) MSA was used to estimate spectral fidelity between the denoising images and reference images in the spectral domain. The smaller the value of MSA is, the better the spectral fidelity of the restored algorithms. The MSA is calculated by: (d) NR was used to evaluate the noise reduction of different restored methods in the frequency domain. The greater the value of NR is, the better the performance of the denoising algorithms. NR is formulated as where N 1 is the power of the frequency components generated by stripes in the restored image and N 0 is for the reference image. N 1 and N 0 can be obtained by where P c (D) is the averaged power spectrum down the columns of an image with D being the distance from its reference image in Fourier space, and ℘ is the stripe noise region of the spectrum. (e) MRD was utilized to compare the degree of distortion between the selected noiseless region of the restored images and reference images. The smaller the value of MRD is, the smaller the image distortion. In the experiment, a 10 × 10 window was selected to calculate the MRD value. MRD is formulated as:

Experiment on the Beads Data Set
The Beads data set, acquired from Columbia University, has a spectral resolution of 10 nm and a spectral range from 400 nm to 700 nm. The data set has a total number of 31 consecutive spectral bands. The spatial resolution in each band is 512 × 512.
Three kinds of noises were considered in the simulation experiment. The detailed descriptions are listed as follows.
(1) Gaussian white noise with the mean 0 and fixed variance.
(2) Poisson noise is added by adjusting the ratio between the maximum brightness and the initial image brightness, which can be written as X poission = X · peak. X poission represents the image polluted by Poisson noise; X is the initial image data; peak refers to the intensity of the Poisson noise. To reduce the passion noise, we utilized settings similar to those in [50]. The variance-stabilizing transformation (VST) was utilized to convert Poisson noise into Gaussian noise before implementing a denoising approach. The final denoising images were obtained by the inverse variance stability transformation.
(3) Sparse noise is added to the randomly selected pixels by utilizing uniform distribution with the interval [−10, 10].
Mixed noise, consisting of zero-mean Gaussian noise with variance σ = 0.1 and Poisson noise with peak = {5, 10,20,30,50,70, 100, 130, 160}, was added to the Beads data. Then the nine compared methods and the proposed one were utilized to restore the noisy Beads data. The performance curves of the simulation experiments are shown in Figure 4, where the vertical coordinates represent the values of PSNR, FSIM, and MSA, respectively. The horizontal coordinates represent the value of parameter peak. Comparing the curves of PSNR, FSIM, and MSA, it can be clearly observed that both the PSNR and FSIM values of the VLRMFmcl method were higher than those of BM3D, ANLM3D, BM4D, LRMR, GLF, LRTDTV, DnCNN, and HSID-CNN methods. At the same time, the MSA of VLRMFmcl was lower than these eight compared algorithms. Compared with LTDL, the proposed model is superior in PSNR. For FSIM and MSA, it showed better values than LTDL in most cases. These facts indicate that VLRMFmcl can effectively improve the quality of the noisy HSI by better maintaining the image feature information and restoring the spectral information in the HSI. In addition, the performance curve of the VLRMFmcl method is smoother than the nine comparison algorithms, which means VLRMFmcl is more stable when denoising the HSI. Figure 5 shows the restored images of band 27 obtained by different models, which are polluted by Gaussian noise, sparse noise, and missing pixels. Compared with the noisy image in Figure 5b, the quality of Figure 5c-l is significantly improved. According to Figure 5, it can be seen that VLRMFmcl can effectively reduce the various forms of noise in the HSI with a large difference in brightness. The denoising results can preserve the structural information and the edges of the homogeneity region. Using patch-matching three-dimensional filtering, BM3D smoothed out some feature structures and blurred the visual effect while suppressing the different noises and restoring the missing pixels. ANLM3D showed better visual performance than BM3D and DnCNN; however, ANLM3D had a weaker ability to recover the detailed information in the HSI. The restored images obtained by BM4D were too smooth and lost some information. As shown in Figure 5f, the results of the LRMR method were significantly better than BM3D, ANLM3D, and DnCNN, but the results of LRMR still showed obvious sparse noise. Utilizing low-rank factorization of tensors constructed by nonlocal similar 3D patches, GLF was able to recover the basic shapes of the Beads dataset, but its result lacked sharpness. As shown in Figure 5, the results of LRTDTV, LTDL, and VLRMFmcl were much better than those of BM3D, ANLM3D, BM4D, LRMR, GLF, DnCNN, and HSID-CNN. In general, the VLRMFmcl method can remove the mixed noises and restore the missing pixels of the Beads data. most cases. These facts indicate that VLRMFmcl can effectively improve the quality of th noisy HSI by better maintaining the image feature information and restoring the spectr information in the HSI. In addition, the performance curve of the VLRMFmcl method smoother than the nine comparison algorithms, which means VLRMFmcl is more stab when denoising the HSI.   Figure 5 shows the restored images of band 27 obtained by different models, whic are polluted by Gaussian noise, sparse noise, and missing pixels. Compared with the nois image in Figure 5b, the quality of Figure 5c-l is significantly improved. According to Fig  ure 5, it can be seen that VLRMFmcl can effectively reduce the various forms of noise i the HSI with a large difference in brightness. The denoising results can preserve the stru tural information and the edges of the homogeneity region. Using patch-matching thre dimensional filtering, BM3D smoothed out some feature structures and blurred the visu effect while suppressing the different noises and restoring the missing pixels. ANLM3 showed better visual performance than BM3D and DnCNN; however, ANLM3D had weaker ability to recover the detailed information in the HSI. The restored images ob tained by BM4D were too smooth and lost some information. As shown in Figure 5f, th results of the LRMR method were significantly better than BM3D, ANLM3D, an DnCNN, but the results of LRMR still showed obvious sparse noise. Utilizing low-ran factorization of tensors constructed by nonlocal similar 3D patches, GLF was able to r cover the basic shapes of the Beads dataset, but its result lacked sharpness. As shown i Figure 5, the results of LRTDTV, LTDL, and VLRMFmcl were much better than those o BM3D, ANLM3D, BM4D, LRMR, GLF, DnCNN, and HSID-CNN. In general, the VLRM Fmcl method can remove the mixed noises and restore the missing pixels of the Bead data.
To make a more intuitive comparison of different algorithms, Figure 6 shows th pseudo-color images of the restored images (R: 3, G: 12, B: 25). As can be seen in Figure  the denoising results of VLRMFmcl were better than those of BM3D, ANL3D, BM4D LRMR, GLF, LRTDTV, LTDL, DnCNN, and HSID-CNN. Additionally, the restored r sults of VLRMFmcl were very similar to those of the reference images, which can be easi observed by comparing Figure 5a,l.   To make a more intuitive comparison of different algorithms, Figure 6 shows the pseudo-color images of the restored images (R: 3, G: 12, B: 25). As can be seen in Figure 6, the denoising results of VLRMFmcl were better than those of BM3D, ANL3D, BM4D, LRMR, GLF, LRTDTV, LTDL, DnCNN, and HSID-CNN. Additionally, the restored results of VLRMFmcl were very similar to those of the reference images, which can be easily observed by comparing Figure 5a,l.

Experiment on the Pavia Centre Dataset
The Pavia Centre dataset was acquired from the Reflective Optics System Imaging Spectrometer. It contains 115 bands, and each band consists of 1096 × 715 pixels. The spectral range of the Pavia Centre dataset is from 0.43 micrometers to 0.86 micrometers. After removing 13 noisy bands, the remaining 102 bands were used in the following analysis. In the experiment, a subset of the Pavia Centre with the size of 400 × 400 × 102 was used.
Three kinds of noises were considered for the Pavia Centre dataset: (1) Gaussian white noise with the mean 0 and the noise variance σ = 0.1. (2) Sparse noise was added to the randomly selected pixels by utilizing the uniform distribution with the interval [−5, 5].
(3) Deadlines were added to the same position as the selected bands in the HSI. Their width varied from one line to three lines. Table 1 shows the evaluation results for PSNR, FSIM, and MSA calculated by different approaches. All the bold numbers in Table 1 indicate the best results. By utilizing the nonlocal self-similarity and adaptively learning the noises in the HSI, the VLRMFmcl method shown the best PSNR/MSA values and the suboptimal FSIM value than the compared methods. Compared with the noisy image, the PSNR and FSIM values obtained by VLRMFmcl were increased by 21.66 and 0.1443, respectively, and MSA was reduced by 0.264. HSID-CNN simultaneously assigned the spatial information and adjacent correlated bands to the network, where multiscale feature extraction was employed to capture both the multiscale spatial feature and spectral feature. Its FSIM value was optimal. Instead of learning the noise variance, BM3D and DnCNN denoised the HSI with the predefined fixed noise variance band by band, which could not efficiently utilize the spectral correlations of the HSI. As shown in Table 1, the PSNR, FSIM, and MSA values of BM3D and DnCNN were significantly lower than those of the other methods while reducing the mixed noises. Noticing that, ANLM3D denoised the HSI by using the high nonlocal self-similarity and making a balance between the smoothing and details preservation; BM4D adopted the 3D nonlocal self-similarity data cube to exploit the local correlations between some neighboring bands; and GLF reduced the mixed noise by utilizing low-rank factorization of tensors constructed by nonlocal similar 3D patches. In Table 1, the ANLM3D, BM4D, and GLF methods presented relatively good results by exploring the spatial-spectral information of the HSI. LRMR, LRTDTV, and LTDL took advantage of the low-rank property in HSI, and their PSNR and FSIM values were better than those of BM3D, ANLM3D, GLF, and DnCNN.  Figure 7 shows the results of band 90 obtained by different denoising approaches. To make a better visual evaluation, Figure 8 shows the comparison of the pseudo-color images (R: 60, G: 30, B 2). It can be seen that the image qualities of the BM3D, ANLM3D, BM4D, LRMR, GLF, LRTDTV, TDTL, DnCNN, HSID-CNN, and VLRMFmcl methods were significantly improved compared to the noise images as shown in Figures 7b and 8b. As can be seen in Figures 7c and 8c, the denoising results obtained by the BM3D method were relatively fuzzy, and the method could not effectively inhibit the strip noise. ANLM3D, BM4D, LRMR, GLF, LRTDTV, LTDL, DnCNN, and HSID-CNN could only suppress some of the noise. It can be easily seen in Figures 7l and 8l that the proposed VLRMFmcl model was able to effectively suppress Gaussian noise, sparse noise, and deadlines, and its results were better than those of the compared methods.

Experiment on the Urban Dataset
Urban data, with a size of 307×307×210, was acquired from the HYDICE sensor. Due to the detector-to-detector difference, it has different strips and mixed noises versus bands. The image contains 210 bands, and each band consists of 307 × 307 pixels. Its spectral range is from 0.4 to 2.5 micrometers. Table 2 gives the NR and MRD values of band 109 for the Urban dataset, in which all the bold numbers indicate the best results. In Table 2, it can be seen that the proposed approach was able to effectively reduce the noise of the Urban data, and could retain the detailed information well, which means that VLRMFmcl can effectively reduce noises with low resolution rates.  Figures 9a and 10a, these two images were heavily polluted with stripes and mixed noise. As shown in the blue rectangles in Figure 9, obvious stripes can be observed in the results obtained by BM3D, ANLM3D, BM4D, LRMR, LRTDTV, and LTDL. Their structure and edge information were also fuzzy. This fact indicates that BM3D, ANLM3D, BM4D, LRMR, LRTDTV, and LTDL showed weaker performance in denoising the severely polluted bands for the Urban dataset. The LRMR method performed better in the target and detail recovery, but its denoising results still showed obvious stripes and mixed noises. As shown in Figure 9i, the result of DnCNN smoothed out some structures and blurred the visual effect. As shown in Figures 9 and 10, GLF, HSID-CNN, and VLRMFmcl could effectively restore the edges and textures of the image, while suppressing the mixed noise.
luted with stripes and mixed noise. As shown in the blue rectangles in Figure 9, obvious stripes can be observed in the results obtained by BM3D, ANLM3D, BM4D, LRMR, LRTDTV, and LTDL. Their structure and edge information were also fuzzy. This fact indicates that BM3D, ANLM3D, BM4D, LRMR, LRTDTV, and LTDL showed weaker performance in denoising the severely polluted bands for the Urban dataset. The LRMR method performed better in the target and detail recovery, but its denoising results still showed obvious stripes and mixed noises. As shown in Figure 9i, the result of DnCNN smoothed out some structures and blurred the visual effect. As shown in Figures 9 and  10, GLF, HSID-CNN, and VLRMFmcl could effectively restore the edges and textures of the image, while suppressing the mixed noise.  (i) (j) (k)  To facilitate the visual comparison, Figure 11 presents the pseudo-color images of the restored results calculated by different approaches (R: 55, G: 103, B: 207). Comparing the white oval regions, it can be easily seen that the proposed VLRMFmcl method was able to effectively suppress the noises in the smooth area. Meanwhile, it could effectively restore the edge and structure information. Therefore, VLRMFmcl was superior to BM3D, To facilitate the visual comparison, Figure 11 presents the pseudo-color images of the restored results calculated by different approaches (R: 55, G: 103, B: 207). Comparing the white oval regions, it can be easily seen that the proposed VLRMFmcl method was able to effectively suppress the noises in the smooth area. Meanwhile, it could effectively restore the edge and structure information. Therefore, VLRMFmcl was superior to BM3D, ANLM3D, BM4D, LRMR, GLF, LRTDTV, LTDL, DnCNN, and HSID-CNN when denoising the Urban data.

Conclusions
Introducing multi-patch collaborative learning into low-rank matrix decomposition, a variational model was proposed under the Bayesian framework to achieve the reduction of three kinds of noise in HSIs. The non-local self-similarity of HSIs was explored by developing multi-patch collaborative learning. Through this process, the pixels from edges and heterogeneous regions could be effectively depicted. Then, the variational low-rank matrix decomposition model was constructed to separate the latent noise-free data and mixed noises for collaborative patches. Gaussian distribution with the zero mean and variance adaptively regulated by a gamma distribution was exploited to learn and represent the low-rank property of collaborative patches in the spatial-spectral domain and obtain the related clean data. To sufficiently suppress the mixed noise, their statistical characteristics were effectively depicted by the Dirichlet process Gaussian mixture model, which was constructed using the Gaussian distribution, the inverse Wishart distribution, and the Dirichlet process. Variational Bayesian inference was used to solve the model, having the advantages of simple calculation and high stability. Simulation experiments with different combinations of Gaussian noise, Poisson noise, deadlines, and stripe noise demonstrated the effectiveness of the proposed method. Compared with the BM3D, ANLM3D, BM4D,

Conclusions
Introducing multi-patch collaborative learning into low-rank matrix decomposition, a variational model was proposed under the Bayesian framework to achieve the reduction of three kinds of noise in HSIs. The non-local self-similarity of HSIs was explored by developing multi-patch collaborative learning. Through this process, the pixels from edges and heterogeneous regions could be effectively depicted. Then, the variational low-rank matrix decomposition model was constructed to separate the latent noise-free data and mixed noises for collaborative patches. Gaussian distribution with the zero mean and variance adaptively regulated by a gamma distribution was exploited to learn and represent the low-rank property of collaborative patches in the spatial-spectral domain and obtain the related clean data. To sufficiently suppress the mixed noise, their statistical characteristics were effectively depicted by the Dirichlet process Gaussian mixture model, which was constructed using the Gaussian distribution, the inverse Wishart distribution, and the Dirichlet process. Variational Bayesian inference was used to solve the model, having the advantages of simple calculation and high stability. Simulation experiments with different combinations of Gaussian noise, Poisson noise, deadlines, and stripe noise demonstrated the effectiveness of the proposed method. Compared with the BM3D, ANLM3D, BM4D, LRMR, GLF, LRTDTV, LTDL, DnCNN, and HSID-CNN methods, the proposed VLRMFmcl method showed superior performance in both the quantitative and qualitative evaluations.