Enhancement of Component Images of Multispectral Data by Denoising with Reference

: Multispectral remote sensing data may contain component images that are heavily corrupted by noise and the pre-ﬁltering (denoising) procedure is often applied to enhance these component images. To do this, one can use reference images—component images having relatively high quality and that are similar to the image subject to pre-ﬁltering. Here, we study the following problems: how to select component images that can be used as references (e


Introduction
Remote sensing (RS) is widely used in many applications [1,2].It provides high information content of images, fast data collection possibility for large territories, availability of different sensors both airborne and spaceborne, and so on.Modern remote sensing tends to improve the spatial resolution of sensors and to make them multichannel, for example, multi-polarization radar, hyperspectral, and multispectral [1][2][3][4].Recently, a multispectral sensor Sentinel 2 has been launched and has already produced valuable and interesting data [5].
Multichannel data contain more information about a sensed terrain compared with single-channel data.However, there exists the following problem in multichannel sensing-images in one or a few components are corrupted by noise [4,6,7] (actually, noise is present in all images, but its influence in some components is negligible, as will be shown later).If a noise is intensive (input peak signal-to-noise ratio (PSNR) is low), it is worth applying pre-filtering in order to enhance RS data and to improve the performance of the next RS data processing, such as classification, segmentation, parameter estimation, and so on [4,8].
There are many approaches to filter multichannel images.They can be classified into component-wise, vectorial (three-dimensional, 3D), and hybrid.Component-wise denoising is the simplest among them, allowing parallel processing of component images [7][8][9][10][11].However, similar to filtering of color images [9][10][11], component-wise filtering is not able to exploit inter-channel correlation of component images inherent for practically all types of multichannel images [11][12][13][14].Meanwhile, its exploiting often leads to considerably more efficient denoising [7][8][9].A question is how large is the inter-channel correlation and how to exploit it properly and efficiently.
Among first filters exploiting inter-channel correlation are vector filters based on order statistics (see the works of [9][10][11] and references therein).Originally, they were oriented on removal of impulsive and mixed noises.However, impulse noise is rarely met in RS data produced by modern multispectral sensors.
Later, denoising methods based on orthogonal transforms appeared with the main application to color [15][16][17] images where components are processed jointly.Some of these methods have been modified to work with multichannel RS images [15,18].The necessity of such modifications appears because a noise can be of different intensity, and even different type, in component images of multichannel RS data [3,4,6,15,19].This either makes inapplicable filtering techniques designed to cope with identical characteristics of the noise in all components [10,19] or reduces their performance.
There are several approaches to deal with the aforementioned non-identical characteristics.The most typical ones are to carry out proper variance stabilizing transforms [8], normalize component images in channels [15], perform pre-filtering [15], modify the algorithm [18], and so on.One problem is that this makes filtering more complex and makes it necessary to have a priori information on noise characteristics or to estimate them accurately in a blind manner arises.
An important peculiarity and positive feature of this group of methods is that usually the largest positive effect due to filtering occurs for component images that are "the noisiest" [15,18,20].The joint processing of more component images might provide more efficient denoising [20], but this does not happen necessarily.Meanwhile, the joint use of more component images leads to difficulties in processing dealing with more memory and time needed.Thus, the amount of jointly processed component images has to be either optimized or chosen in a reasonable way.Unfortunately, such an optimization has not been done yet.
A new group of methods of multichannel data filtering has recently appeared that can be treated as a hybrid.They exploit inter-channel correlation in different ways.The main idea is that in multichannel RS data, there can be the so-called low quality or "junk channels" (component images) and high quality component images in the sense of high input peak signal-to-noise ratio (PSNR) and absence of other distortions.There was a discussion concerning is it worth keeping junk channels for further processing and analysis [4,21].Currently, many researchers consider that it is worth keeping them for further consideration under the condition that images in "junk channels" are pre-filtered with high efficiency [4,[22][23][24].A question is how such filtering can be done?
There are many proposed solutions that employ different principles.The method by Yuan et al. [25] uses the total variation algorithm applied in both spatial and spectral views.The problem is that the possible signal-dependent nature of the noise has not been taken into account.A method based on the parallel factor analysis (PARAFAC) approach to denoising has been proposed in the work of [26], but it also assumes an additive noise model.Anisotropic diffusion is applied for hyperspectral imagery enhancement in the work of [27], demonstrating also improvement of classification, but the noise model is not specified.Chen and Qian have proposed to filter hyperspectral data using principal component analysis and wavelet shrinkage [28], but again, the additive noise model was considered.Meanwhile, as will be shown in the next section, the signal-dependent component of the noise can be present.
Recently, the use of non-local based approaches to denoising multichannel images has become popular [2,24,29].The main progress and benefits result from the fact that similar patches that can be used in collaborative denoising can be found not only in a given component image, but also in other component images.Other positive outcomes result from the fact that in multichannel RS data, there can be almost noise-free component images (called references) that are quite similar to a noisy component image that needs enhancement [22][23][24]30,31].The main ideas are either to retrieve and exploit some information from the reference (for example, about positions of edges [22]) or to incorporate reference image(s) into processing directly.Important items here are to find a proper reference and to make it as "close" to the noisy image as possible (e.g., by appropriate nonlinear transformation [31]).The approach [24,30,31] allows using both DCT [15] and BM3D [32] filters as well as to easily cope with signal-dependent noise in the component image to be denoised by applying a proper variance stabilizing transform (VST) to it before filtering.
These properties can be very useful in the denoising of junk components in multispectral data, for example, Sentinel-2 recently put into operation for which noise has been shown to be signal-dependent [33] and having quite different characteristics in different component images.One more specific property of multispectral data acquired by Sentinel-2 is that different component images are characterized by different spatial resolutions.There are three component images (##1, 9, and 10) that have a resolution of 60 × 60 m 2 ; six component images (## 5, 6, 7, 8A, 11, and 12) that have a resolution of 20 × 20 m 2 , while the remaining four (##2, 3, 4, and 8) possess the best resolution of 10 × 10 m 2 .This feature distinguishes Sentinel-2 multispectral data from hyperspectral data partly discussed above, which have approximately the same spatial resolution in all sub-band images.This difference shows that methods of joint processing of two or more component images that have different resolution have to take this fact into consideration.
The aforementioned peculiarities (signal-dependent character, sufficiently different input PSNR and resolution) of Sentinel-2 multispectral data determine the novelty of the problem statement-to design methods for noise removal in component images that originally have low input PSNR.Recall that recent studies [34,35] show that it is difficult to expect high efficiency of any kind of image denoising if input PSNR is high and/or image is textural or contains a lot of fine details (these are just the cases for many RS images).So, we focus on noise removal in particular component images of Sentinel-2 data supposing that filtering of other component images is not needed (this allows for saving time and resources for data pre-processing).
The novelty of our proposed approach consists of the following two aspects.First, we show that component images with a resolution better than a component image to be denoised can be used.Second, by analyzing component image similarity, we propose a method to select component images that can be used as references.

Image/Noise Model and Basic Principles of Image Denoising with Reference
A general image/noise model considered below is as follows: where I t ij denotes the true image value in an ij-th pixel; n ij (I t ij ) is the noise statistical properties, which are dependent on I t ij ; and I Im , J Im define the processed image size.If one deals with a multichannel image, index q can be added to all components in (1).Note that if a multispectral image is considered, even I Im and J Im should have index q because spatial resolution and the number of pixels in each component image, respectively, is individual.
Let us explain from the very beginning why we rely on the signal-dependent model of the noise (1).The model that assumes variance in an ij-th pixel is the following: where σ 2 0 is the variance of the signal independent (SI) noise component and k is the parameter that determines the properties of the signal-dependent (SD) component tested in the work of [33] for Sentinel-2 multispectral images, provided after applying light compression by JPEG2000.Moreover, even more complicated models of signal dependence have been considered in the work of [33] (where specific effects appear as a result of a lossy compression), but we will further accept the general model (2).
For this model, one can use the so-called equivalent noise variance that is equal to /(I Im × J Im ) if the true image is available.Alternatively, it can be estimated as the true image is not available, but quite accurate estimates σ2 0 and k were obtained in a blind manner from images at hand [6,33,36], where I mean is the image mean.This means that if an equivalent noise variance σ 2 eq in an image is sufficiently larger than σ 2 0 , then noise should be considered signal-dependent and this feature has to be taken into account in image processing.Gaussianity tests carried out for manually chosen homogeneous regions show that noise is practically Gaussian.
Let us analyze multispectral data from Sentinel-2 using the estimates of σ 2 0 and k provided by the method [33].The noise parameter estimates for two granules (sets of multispectral data) are presented in Table 1.As one can see, in practically all component images, the equivalent variance is considerably larger than σ 2 0 , although the contribution of the SD component is always smaller than that of the SI component.The only exception is the component image in channel #10, where σ 2 0 is practically the same as the corresponding equivalent variance.This means that the signal-dependent nature of the noise has to be taken into account.
Note that the equivalent variance of the noise is the smallest in component image #10.Thus, one might think that this image is the least noisy.However, this conclusion is not correct, as we have not taken into account the range of image representation.Let us also analyze peak signal-to-noise ratio.To avoid the possible presence of hot pixels in data and bright points, consider below the so-called robust estimate of input PSNR determined as PSNR rob inp = 10 log 10 D 2 rob /σ 2 eq , where D rob = I (p) − I (r) , p = 0.99I Im J Im , and r = 0.01I Im J Im ; I (p) and I (r) are the p-th and r-th order statistics of image values, respectively.The obtained values of PSNR rob inp are presented in Table 1.It is seen that the values of this metric are larger than 45 dB for 12 out of 13 component images.This means that these images are of high quality and noise cannot be noticed in visualized component images [37] (one example is shown in Figure 1a).Meanwhile, there is also an image in sub-band 10 for which PSNR rob inp is only 11.6 dB and, therefore, noise is visible (one example is given in Figure 1b).As it is seen, noise is not white because specific diagonal structures are observed.Such artifacts can be, most probably, suppressed in frequency domain by special pre-or post-processing.However, their removal is out of the scope of this paper.
One more observation is that these component images are similar to each other and the cross-correlation factor R #10 for them is equal to 0.57.
The cross-correlation factors R #10 for the component image in channel #10 and other component images are given in Table 1.One can see that the correlation is low for component images (##1 . . .4) that relate to the visible range, but it increases and exceeds 0.77 for the components number 11 and 12.If resolutions in channel 10 and another channel image are different, the corresponding downsampling is applied before calculation of the correlation factor.
One question concerns a stability of noise properties.To check this, we carried out estimation of noise and image parameters for another granule.They are given in the lower part of Table 1.As one can see, there are certain differences, but the main tendencies are the same.There is a comparable contribution of both signal independent and dependent components.The most "noisy" is the image in channel #10.The most similar images to the image in channel #10 are the images in channels ## 11 and 12.
We processed the image in Figure 1b by the 2D (component-wise) DCT based filter [38] with standard settings.The output is presented in Figure 2 and it is seen that the noise has been partly removed, but the image quality still remains poor (details and edges are smeared, strip-like interferences remain).This means that more efficient denoising is required.
We also analyzed other fragments and other granules of multispectral data produced by Sentinel-2.The image and noise properties are similar in the sense of noise nature and characteristics as well as values of PSNR rob inp and inter-channel correlation.For image denoising with a reference, it is assumed that a reference image or a set of reference images I ref ijs , i = 1, . . ., I Im , j = 1, . . ., J Im , s = 1, . . ., S are available, where S ≥ 1 defines a number of potential reference images.All candidate reference images are supposed to be noise-free, or at least such that input PSNRs for them are 10 dB or more larger than input PSNR for the image to be denoised.It is also supposed that downsampling is applied if the reference image has other resolution than the noisy one.
We also analyzed other fragments and other granules of multispectral data produced by Sentinel-2.The image and noise properties are similar in the sense of noise nature and characteristics as well as values of rob inp

PSNR
and inter-channel correlation.For image denoising with a reference, it is assumed that a reference image or a set of reference images ref   Another assumption is that potential reference images are in some sense similar to Two other cases relate to the noise model described by ( 1) and (2).Then, if the noise is signaldependent, it is usually recommended to apply a proper homomorphic or variance stabilizing transform (VST) to deal with an additive noise (although often non-Gaussian) in filtering [8,39].An advantage of this approach is that the additive nature of the noise in an image to be denoised allows for applying a wider set of efficient filters [34].As VST, the generalized Anscombe transform [8] or logarithmic transform [39] can be used, depending on the type of signal-dependent noise one deals in each particular case.
If VST is applied, one has nVST ij Im Im I ,i 1,..., I , j 1,..., J = = and may use either a linear transform (case 3) or nonlinear transform (case 4) and minimize either Another assumption is that potential reference images are in some sense similar to I n ij , i = 1, . . ., I Im , j = 1, . . ., J Im .It is known that similarity of images can be measured differentlymean square errors (MSE) between images, cross-correlation factor, and so on.One can also apply a linear or nonlinear transform of reference image(s) before calculating measures of closeness.In this work, we assume that a linear or non-linear transform has been applied to the reference image in order to make it as close as possible in MSE sense to the noisy image subject to denoising.There are several possible cases.Let us consider them more in detail with a discussion of when and why each of them takes place.
The first practical case is that noise in I n ij , i = 1, . . ., I Im , j = 1, . . ., J Im is additive, and then the main metric that describes similarity is Here, I ref mod ij , i = 1, . . ., I Im , j = 1, . . ., J Im defines the modified reference image, which can be either linearly transformed as or nonlinearly transformed as where S 0 , ∆ 0 denote the parameters of linear least MSE regression (4) (case 1) and Ψ(I ref ij ) defines nonlinear transformation (case 2) that leads to minimizing MSE n rmod .
Two other cases relate to the noise model described by ( 1) and (2).Then, if the noise is signal-dependent, it is usually recommended to apply a proper homomorphic or variance stabilizing transform (VST) to deal with an additive noise (although often non-Gaussian) in filtering [8,39].An advantage of this approach is that the additive nature of the noise in an image to be denoised allows for applying a wider set of efficient filters [34].As VST, the generalized Anscombe transform [8] or logarithmic transform [39] can be used, depending on the type of signal-dependent noise one deals in each particular case.
If VST is applied, one has I nVST ij , i = 1, . . ., I Im , j = 1, . . ., J Im and may use either a linear transform (case 3) or nonlinear transform (case 4) and minimize either or respectively.
Let us now recall how denoising with a reference is carried out for the simplest case of having I n ij , i = 1, . . ., I Im , j = 1, . . ., J Im and a properly chosen I ref mod ij , i = 1, . . ., I Im , j = 1, . . ., J Im .Then, the noisy and the reference images are denoised jointly.A two-point DCT is applied first in the "vertical direction", getting "sum" and "difference" images.The obtained images are filtered by the 2D DCT-based filter or by BM3D with properly selected hard thresholds.After this, inverse two-point DCT is applied and the obtained first component is considered as the filtered image.
If VST is used, then the same operations are applied to I nVST ij , i = 1, . . ., I Im , j = 1, . . ., J Im and . ., I Im , j = 1, . . ., J Im obtained by minimizing ( 6) or (7).The only difference is that the denoised image has to be subject to inverse VST.The described operations are illustrated in Figure 3. ) ) respectively.
Let us now recall how denoising with a reference is carried out for the simplest case of having  6) or (7).The only difference is that the denoised image has to be subject to inverse VST.The described operations are illustrated in Figure 3. Also, note that it is possible to have two or more modified reference images after a three-point or more DCT applied in the vertical direction to decorrelate data.We considered two reference images instead of one in the work of [31], and filtering has occurred to be more efficient in terms of standard metrics, such as output PSNR, and visual quality metrics, such as PSNR-HVS-M, which takes into account two important properties of human vision system (HVS), namely, less sensitivity to distortions in high frequency components and masking (M) effect of image texture and other heterogeneities [40].Besides, after two-or three-point DCT, it is possible to apply component-wise different filters including standard DCT, BM3D, or others.Usually, if a given filter is more efficient in component-wise (single-channel) denoising, its use is also beneficial in the considered denoising with a reference [30].It is also worth stressing that optimal (recommended) parameters of thresholds applied in DCT coefficient thresholding have been determined in the literature [24,30,31].These thresholds differ from those usually recommended for the cases in which these filters are employed for noise removal in single channel images.Thus, in our further studies, we will use just optimal thresholds.Also, note that it is possible to have two or more modified reference images after a three-point or more DCT applied in the vertical direction to decorrelate data.We considered two reference images instead of one in the work of [31], and filtering has occurred to be more efficient in terms of standard metrics, such as output PSNR, and visual quality metrics, such as PSNR-HVS-M, which takes into account two important properties of human vision system (HVS), namely, less sensitivity to distortions in high frequency components and masking (M) effect of image texture and other heterogeneities [40].Besides, after two-or three-point DCT, it is possible to apply component-wise different filters including standard DCT, BM3D, or others.Usually, if a given filter is more efficient in component-wise (single-channel) denoising, its use is also beneficial in the considered denoising with a reference [30].It is also worth stressing that optimal (recommended) parameters of thresholds applied in DCT coefficient thresholding have been determined in the literature [24,30,31].These thresholds differ from those usually recommended for the cases in which these filters are employed for noise removal in single channel images.Thus, in our further studies, we will use just optimal thresholds.

Performance Criteria
We start analyzing the performance of methods of image filtering with reference(s) for simulated data [24,30,31].In our simulations, four test images typical for remote sensing, presented in Figure 4 and denoted as FR01, FR02, FR03, and FR04, and two high quality component images denoted RS1 and RS2 of AVIRIS hypercube of data were used.Additive White Gaussian Noise (AWGN) with variance σ 2 was artificially added to these images (note that noise in original component images of hyperspectral images was considered negligible).
In order to simulate reference images for all these test images, we need to ensure that reference images are similar to the test ones according to certain similarity measures (i.e., to have sufficient but not too high cross-correlation factor).At the same time, they also have to be different in several senses-with a different dynamic range, and containing some additional content not present in the image to be denoised (see the example in Figure 1).We cannot simply distort the original test image randomly, as this will be equivalent to adding a noise and PSNR inp decreasing.The use of a more complex simulation requires knowledge of the image information content formation, which is a priori unknown.Because of this, and based on thorough empirical study of multichannel images, we simulated the reference image as I ref ij = 32 I t ij + 0.5I t180 ij , i = 1, . . ., I Im , j = 1, . . ., J Im , where I t180 ij , i = 1, . . ., I Im , j = 1, . . ., J Im denotes the same noise free test image rotated by 180 • .Thus, as a reference image, we use a weighted sum of the original image with its copy rotated by 180 • .This allows us to provide a correlation factor of the same level as for real-life multispectral data in channels 10-12.

Performance Criteria
We start analyzing the performance of methods of image filtering with reference(s) for simulated data [24,30,31].In our simulations, four test images typical for remote sensing, presented in Figure 4 and denoted as FR01, FR02, FR03, and FR04, and two high quality component images denoted RS1 and RS2 of AVIRIS hypercube of data were used.Additive White Gaussian Noise (AWGN) with variance σ 2 was artificially added to these images (note that noise in original component images of hyperspectral images was considered negligible).
In order to simulate reference images for all these test images, we need to ensure that reference images are similar to the test ones according to certain similarity measures (i.e., to have sufficient but not too high cross-correlation factor).At the same time, they also have to be different in several senses-with a different dynamic range, and containing some additional content not present in the image to be denoised (see the example in Figure 1).We cannot simply distort the original test image randomly, as this will be equivalent to adding a noise and  The obtained reference images for the test images FR01 and FR02 in Figure 4 are visualized in Figure 5 (note that the reference images are in the dynamic range, considerably different from those of the original range 0…255).
To characterize the efficiency of filtering, we used the following metrics.First, input PSNR is defined as The obtained reference images for the test images FR01 and FR02 in Figure 4 are visualized in Figure 5 (note that the reference images are in the dynamic range, considerably different from those of the original range 0 . . .255).
To characterize the efficiency of filtering, we used the following metrics.First, input PSNR is defined as where DR denotes the range of image representation and σ 2 is a noise variance (equivalent variance if noise is signal-dependent).Output PSNR is expressed as where MSE out is the output mean square error (MSE).Effectiveness is then characterized by Alongside PSNR, we would like to analyze the visual quality of original (noisy) and filtered images.To do this, we propose using the metric PSNR-HVS-M (denoted later as PHVSM) [40].Then, one has ( ) ( ) where

MSE and
HVS out MSE are input and output MSEs, respectively, calculated while taking into account the aforementioned peculiarities of HVS.Note that a filtering method can be considered good if it performs better than others for a wide set of test images and a wide range of noise variances (input PSNRs).Alongside PSNR, we would like to analyze the visual quality of original (noisy) and filtered images.To do this, we propose using the metric PSNR-HVS-M (denoted later as PHVSM) [40].Then, one has PHVSM inp = 10 log 10 DR 2 /MSE HVS inp ,

Analysis of Simulation Data
PHVSM out = 10 log 10 DR 2 /MSE HVS out , where MSE HVS inp and MSE HVS out are input and output MSEs, respectively, calculated while taking into account the aforementioned peculiarities of HVS.
Note that a filtering method can be considered good if it performs better than others for a wide set of test images and a wide range of noise variances (input PSNRs).
The results for component-wise processing by BM3D filter are presented in Table 2 for comparison purposes.Note that BM3D is one of the best image filters that can be applied component-wise.One can see from the comparisons in Table 2 that BM3D slightly outperforms the 2D DCT-based filter, but the improvements due to employing denoising with a reference are far more significant.
An analysis of the data shows the following.The use of denoising with reference is always beneficial compared with 2D DCT-based filtering.The gain in PSNR is about 3 dB for AWGN variance σ 2 = 10 even if the reference image is transformed linearly.The use of nonlinear transformation of the reference image additionally provides 2 dB improvement.The benefits according to PHVSM are considerable too.While component-wise filtering improves this metric by only about 1 dB, filtering with linearly transformed reference provides about 3.5 dB improvement, and denoising with nonlinear transformation produces an additional improvement of about 2.5 dB.Thus, total improvement due to denoising with reference transformed nonlinearly reaches about 5 dB according to PSNR and about 5.5 dB according to PHVSM.
For noise variances σ 2 = 25 and σ 2 = 100, the situations and conclusions are similar.Although 2D DCT-based filtering improves quality of images according to both metrics, this improvement is not large for the test images FR01, FR02, FR03, and FR04, which contain fine details and textures.Effectiveness is better for the images RS1 and RS2.Meanwhile, denoising with references performs considerably better, although the benefits of nonlinear transformation of reference images are not essential, as for the case of σ 2 = 10.
This means that the method of denoising with reference performs well for different intensities of the noise (values of input PSNR) and different test images typical for remote sensing.The use of nonlinear transformation is preferable because performance is better.Note that determination of parameters of transformations, either linear or nonlinear, does not take much time to compute, requiring one to solve a system of linear equations.This operation takes considerably less time than filtering itself, although DCT-based denoising is simple and fast as well.
The noisy test image FR04 (AWGN, σ 2 = 100) is presented in Figure 6.Noise is visible in homogeneous image regions.The output image for the 2D DCT-based filter is represented in Figure 7. Noise is suppressed, but edges and fine details are partly smeared.Improvement of visual quality is not obvious.The results of image denoising using linearly and nonlinearly processed reference images are shown in Figures 8 and 9, respectively.The main difference compared with the image in Figure 7 is that edges and details are preserved better and, because of this, better visual quality is provided.For comparison purposes, we also give values of metrics for the original and denoised images.The results of image denoising using linearly and nonlinearly processed reference images are shown in Figures 8 and 9, respectively.The main difference compared with the image in Figure 7 is that edges and details are preserved better and, because of this, better visual quality is provided.For comparison purposes, we also give values of metrics for the original and denoised images.The results of image denoising using linearly and nonlinearly processed reference images are shown in Figures 8 and 9, respectively.The main difference compared with the image in Figure 7 is that edges and details are preserved better and, because of this, better visual quality is provided.For comparison purposes, we also give values of metrics for the original and denoised images.The denoising efficiency can be additionally improved if one uses two reference images and/or a BM3D filter instead of a DCT-based filter in the denoising with reference.

Application to Real Life Images
Let us see how good the filtering result is if denoising with reference is applied to a real-life image.The output for the 2D DCT-based filter has been already shown in Figure 2 and that image was partly smeared.The output of the proposed denoising technique with one nonlinearly transformed reference (second order polynomial was used) from channel #11 is shown in Figure 10a.The output for the case of using two nonlinearly transformed references numbers 11 and 12 (again, the second order polynomial was applied) is demonstrated in Figure 10b.
Both images are considerably "sharper" than the image in Figure 2 and more details are visible.Comparing the images in Figure 10 and the enlarged fragments in Figure 11, it is possible to state that the use of two reference images produces better visual quality of the processed image.The denoising efficiency can be additionally improved if one uses two reference images and/or a BM3D filter instead of a DCT-based filter in the denoising with reference.

Application to Real Life Images
Let us see how good the filtering result is if denoising with reference is applied to a real-life image.The output for the 2D DCT-based filter has been already shown in Figure 2 and that image was partly smeared.The output of the proposed denoising technique with one nonlinearly transformed reference (second order polynomial was used) from channel #11 is shown in Figure 10a.The output for the case of using two nonlinearly transformed references numbers 11 and 12 (again, the second order polynomial was applied) is demonstrated in Figure 10b.
Both images are considerably "sharper" than the image in Figure 2 and more details are visible.Comparing the images in Figure 10 and the enlarged fragments in Figure 11, it is possible to state that the use of two reference images produces better visual quality of the processed image.The denoising efficiency can be additionally improved if one uses two reference images and/or a BM3D filter instead of a DCT-based filter in the denoising with reference.

Application to Real Life Images
Let us see how good the filtering result is if denoising with reference is applied to a real-life image.The output for the 2D DCT-based filter has been already shown in Figure 2 and that image was partly smeared.The output of the proposed denoising technique with one nonlinearly transformed reference (second order polynomial was used) from channel #11 is shown in Figure 10a.The output for the case of using two nonlinearly transformed references numbers 11 and 12 (again, the second order polynomial was applied) is demonstrated in Figure 10b.
Both images are considerably "sharper" than the image in Figure 2 and more details are visible.Comparing the images in Figure 10 and the enlarged fragments in Figure 11, it is possible to state that the use of two reference images produces better visual quality of the processed image.The magnified difference image is shown in Figure 12.Comparing it to the image in Figure 1b, it is seen that noise (including strip-like artifacts) has been efficiently removed.The absence of visible regular structures in this image shows that almost no structural distortions were introduced into the output image by filtering.
In practice, one might be interested in how to decide what component image to choose among possible candidates.The strictly theoretical answer is that the component image that produces the smallest MSE (3) if the noisy image is not subject to VST, or the smallest MSE ( 6) or (7) if VST is applied, should be used.This means that all possible candidates have to be tried and the best one(s) has to be left for use in the proposed denoising method.The magnified difference image is shown in Figure 12.Comparing it to the image in Figure 1b, it is seen that noise (including strip-like artifacts) has been efficiently removed.The absence of visible regular structures in this image shows that almost no structural distortions were introduced into the output image by filtering.
In practice, one might be interested in how to decide what component image to choose among possible candidates.The strictly theoretical answer is that the component image that produces the smallest MSE (3) if the noisy image is not subject to VST, or the smallest MSE ( 6) or (7) if VST is applied, should be used.This means that all possible candidates have to be tried and the best one(s) has to be left for use in the proposed denoising method.Meanwhile, in practice, this approach can be simplified.For example, if one knows that component images in a given channel are usually the most similar to the component images to be filtered, then it is possible to skip the choice of possible candidates and to set the fixed reference channel.For the considered case of multispectral Sentinel-2 data, the component images in the channel #10 are worth denoising.The component images in the channel #11 are worth using if one reference is applied.If two references are used, then one can employ the component images in channels ##11 and 12, as shown in the example above.

Conclusions
We considered the properties of component images acquired by Sentinel-2 multispectral sensor.It has been shown that there are component images in channel #10 for which denoising is expedient.We demonstrated that the use of the method of image denoising with reference can be a good solution in the sense of efficiency of noise suppression and simplicity of filtering.Both simulated and real-life data proving this are presented.
The method has several modifications where the use of nonlinear transformation of reference image(s) is preferable.Moreover, the use of two references instead of one provides additional benefits.The recommendations concerning selection of proper references for multispectral data are given.Meanwhile, in practice, this approach can be simplified.For example, if one knows that component images in a given channel are usually the most similar to the component images to be filtered, then it is possible to skip the choice of possible candidates and to set the fixed reference channel.For the considered case of multispectral Sentinel-2 data, the component images in the channel #10 are worth denoising.The component images in the channel #11 are worth using if one reference is applied.If two references are used, then one can employ the component images in channels ##11 and 12, as shown in the example above.

Figure 2 .
Figure 2. The output of the DCT-based filter.
1,..., I , j 1,..., J = = .It is known that similarity of images can be measured differently-mean square errors (MSE) between images, cross-correlation factor, and so on.One can also apply a linear or nonlinear transform of reference image(s) before calculating measures of closeness.In this work, we assume that a linear or non-linear transform has been applied to the reference image in order to make it as close as possible in MSE sense to the noisy image subject to denoising.There are several possible cases.Let us consider them more in detail with a discussion of when and why each of them takes place.The first practical case is that noise in n ij Im Im I ,i 1,..., I , j 1,..., J = = is additive, and then the main metric that describes similarity is ,..., I , j 1,..., J = = defines the modified reference image, which can be either linearly transformed as ,..., I , j 1,..., J denote the parameters of linear least MSE regression (4) (case 1) and ref ij (I ) Ψ defines nonlinear transformation (case 2) that leads to minimizing n r mod MSE .

Figure 2 .
Figure 2. The output of the DCT-based filter.
the noisy and the reference images are denoised jointly.A two-point DCT is applied first in the "vertical direction", getting "sum" and "difference" images.The obtained images are filtered by the 2D DCT-based filter or by BM3D with properly selected hard thresholds.After this, inverse two-point DCT is applied and the obtained first component is considered as the filtered image.If VST is used, then the same operations are applied to nVST ij Im Im I ,i 1,..., I , j 1,..., J = = and ref mod ij Im Im I ,i 1,..., I , j 1,..., J = = obtained by minimizing (

Figure 3 .
Figure 3. Block-diagram of the proposed processing approach.VST-variance stabilizing transform; MSE-mean square error.

Figure 3 .
Figure 3. Block-diagram of the proposed processing approach.VST-variance stabilizing transform; MSE-mean square error.
inp PSNR decreasing.The use of a more complex simulation requires knowledge of the image information content formation, which is a priori unknown.Because of this, and based on thorough empirical study of multichannel images, we simulated the reference image as ..., I , j 1,..., J = = denotes the same noise free test image rotated by 180°.Thus, as a reference image, we use a weighted sum of the original image with its copy rotated by 180°.This allows us to provide a correlation factor of the same level as for real-life multispectral data in channels 10-12.(a) (b) Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 17 (c) (d)

Figure 5 .
Figure 5. Visualized reference images for the test images FR01 (a) and FR02 (b).

Figure 5 .
Figure 5. Visualized reference images for the test images FR01 (a) and FR02 (b).

Figure 10 .
Figure 10.Output image for filtering with one (a) and two (b) nonlinearly transformed references (component image in channel #11 and component images in channels ##11 and 12).

Figure 11 .
Figure 11.Enlarged fragment of output image for filtering with one (a) and two (b) nonlinearly transformed references (full images are presented in Figure 10).

Figure 10 .
Figure 10.Output image for filtering with one (a) and two (b) nonlinearly transformed references (component image in channel #11 and component images in channels ##11 and 12).

Figure 11 .
Figure 11.Enlarged fragment of output image for filtering with one (a) and two (b) nonlinearly transformed references (full images are presented in Figure 11).

Figure 12 .
Figure 12.Magnified difference image for filtering with two nonlinearly transformed references (component images in channels ##11 and 12).

Figure 12 .
Figure 12.Magnified difference image for filtering with two nonlinearly transformed references (component images in channels ##11 and 12).

Table 1 .
Noise parameters in component images of Sentinel-2.PSNR-peak signal-to-noise ratio.

Table 1 .
Noise parameters in component images of Sentinel-2.PSNR-peak signal-to-noise ratio.