Boosting of Denoising Effect with Fusion Strategy

: Image denoising, a fundamental step in image processing, has been widely studied for several decades. Denoising methods can be classiﬁed as internal or external depending on whether they exploit the internal prior or the external noisy-clean image priors to reconstruct a latent image. Typically, these two kinds of methods have their respective merits and demerits. Using a single denoising model to improve existing methods remains a challenge. In this paper, we propose a method for boosting the denoising effect via the image fusion strategy. This study aims to boost the performance of two typical denoising methods, the nonlocally centralized sparse representation (NCSR) and residual learning of deep CNN (DnCNN). These two methods have complementary strengths and can be chosen to represent internal and external denoising methods, respectively. The boosting process is formulated as an adaptive weight-based image fusion problem by preserving the details for the initial denoised images output by the NCSR and the DnCNN. Speciﬁcally, we design two kinds of weights to adaptively reﬂect the inﬂuence of the pixel intensity changes and the global gradient of the initial denoised images. A linear combination of these two kinds of weights determines the ﬁnal weight. The initial denoised images are integrated into the fusion framework to achieve our denoising results. Extensive experiments show that the proposed method signiﬁcantly outperforms the NCSR and the DnCNN both quantitatively and visually when they are considered as individual methods; similarly, it outperforms several other state-of-the-art denoising methods.


Introduction
Digital images are often corrupted by noise during the acquisition or transmission of the images [1], rendering these images unsuitable for vision applications such as remote sensing and object recognition. Therefore, image denoising is a fundamental preprocessing step that aims at suppressing noise and reproducing the latent high quality image with fine image edges, textures, and rich details. A corrupted noisy image can be generally described as: where the column vector x denotes the original clean image, and the v denotes the additive noise. There are many possible solutions for x of a noisy image y because the noise v is unknown. This is a fact that encourages scholars to continue seeking for new methods to achieve better denoising results.
Various image denoising studies assume v to be additive white gaussian noise (AWGN). Considering that AWGN is stationary and uncorrelated among pixels, we made the same assumption for this study. Denoising methods can be classified into two types [2], internal methods and external ones. The internal methods denoise an image patch using other noisy image patches within the noisy image, whereas the external methods denoise a patch using externally clean image patches. In the past several years, the internal sparsity and the self-similarity of images were usually utilized to achieve better denoising performance. Non-local Means (NLM), proposed by Baudes et al. [3,4], is the first filter that utilizes the non-local self-similarity in images. NLM obtains a denoised patch by first finding similar patches and obtaining their weighted average. Because searching for similar patches in various noise levels may be computationally impractical, typically, only a small neighborhood of the patch is considered for searching possible matches. BM3D [5], a characteristic benchmark method, builds on the strategy of NLM by grouping similar patches together, and suggests a two-step denoising algorithm. First, the input image is roughly denoised. Then, the denoising is refined by collecting similar patches to accomplish a collaborative filtering in the transform domain. This two-step process contributes to the effectiveness of the BM3D, making it a benchmark denoising algorithm. The nuclear norm minimization (NNM) method was proposed in [6] for video denoising; nevertheless, it was greatly restricted by its capability and flexibility in handling many practical denoising problems. In [7], Gu et al. presented weighted nuclear norm minimization (WNNM). This was a low rank image denoising approach based on non-local self-similarity; however, it suppressed the low rank parts and shrank the reconstructed data. The K-SVD [8] denoising utilizes the sparse and redundant representations of an over-complete learned dictionary to produce a high-quality denoised image. Such a dictionary was initially learned from a large number of clean images. Later, it was directly learned from the noisy image patches [9]. Motivated by the idea of similar image patches sharing similar subdictionaries, Chatterjee et al. [10] proposed the K-LLD. Instead of learning a single over-complete dictionary for an entire image, K-LLD first performs a clustering step based on the patches using the local weight function presented in [11]. Then, it separately finds the most optimal dictionary for each cluster to denoise the patches from each cluster. Similarly, the authors of learned simultaneous sparse coding (LSSC) [12,13] exploit self-similarities of image patches combined with sparse coding to further improve the performance of image denoising methods based on dictionary learning using a single dictionary. Taking advantage of the noise properties of local patches and different channels, a scheme called trilateral weight sparse coding (TWSC) was proposed in [14]. In this model, the noise statistics and sparsity priors of images are adaptively characterized by two weight matrices. Based on the idea of nonlocal similarity and sparse representation of image patches, Dong et al. introduced the nonlocally centralized sparse representation (NCSR) model [15] and the concept of sparse coding noise, thereby changing the objective of image denoising to suppressing the sparse coding noise. K-means clustering is applied to cluster the patches obtained from the given image into K clusters; then, a PCA sub-dictionary is adaptively learned for each cluster, leading to a more stable sparse representation. It is a fact that NCSR is efficient in capturing image details and adaptively representing them with a sparse description. However, since each image patch is considered as an independent unit of the sparse representation in the dictionary learning and sparse coding stages, ignoring the relationships among the patches can result in inaccurate sparse coding coefficients.
There was a major leap in denoising performance with the revival of neural networks, which are trained on large collections of external noisy-clean image priors. Zoran and Weiss [16] presented gaussian mixture models (GMMs) using a gaussian mixture prior learned from a database of clean natural image patches to reproduce the latent image. PG prior based denoising (PGPD), a method developed based on GMMs, was proposed in [17] to exploit the non-local self-similarity of clean natural images. A convolutional neural network (CNN) for denoising was proposed in [18], where a five-layer convolutional network was specifically designed to synthesize training samples from abundantly available clean natural images. Subsequently, fully connected denoising auto-encoders [19] were suggested for image denoising. Nevertheless, the early CNN-based methods and the auto-encoders cannot compete with the benchmark BM3D [5] method. In [20], the plain multi-layer perceptron method is used to tackle image denoising with a multi-layer perceptron trained using training examples. This achieves a performance that is comparable with that of the BM3D method. Schmidt and Roth introduced the cascade of shrinkage fields (CSF) [21], which combines a random field-based model and half-quadratic optimization into a single learning framework to efficiently perform the denoising. Chen et al. [22,23] further presented the trainable nonlinear reaction diffusion (TNRD) method for image denoising problems. It learns the parameters from training data through a gradient descent inference approach. Both the CSF and TNRD show promise in narrowing the gap between denoising performance and computational efficiency. However, the specified forms of the priors adopted by these methods are limited with regard to capturing all the features related to image structure. Inspired by combining learning-based approaches with the traditional methods, Yang et al. [24] defined a network known as the BM3D-Net by unrolling the computational pipeline of the classical BM3D algorithm into a CNN structure. It achieves competitive denoising results and significantly outperforms the traditional BM3D method. With regard to the development of deep CNNs, some prevalent deep CNN-based approaches are favorably compared to many other state-of-the-art methods both quantitatively and visually (e.g., recursively branched deconvolutional network (RBDN) [25], fast and flexible denoising convolutional neural network (FFDNet) [26], and residual learning of deep CNN (DnCNN) [27]). Santhanam et al. [25] developed the RBDN for denoising as well as general image-to-image regression. Proposed by Zhang et al. in [26], by inputting an adjustable noise level map, the FFDNet is able to achieve visually convincing results on the trade-off between detail preservation and noise reduction with a single network model. Rather than outputting the denoised image x directly, in the case of the DnCNN, a residual mappingv is employed to estimate the noise existent in the input image, and the denoising result is x = y −v. Taking advantage of batch normalization [28] and residual learning [29], the DnCNN can handle several prevailing denoising tasks with high efficiency and performance.
Various image denoising algorithms have produced highly promising results; however, the experimental results and bound calculations in [30] showed that there is still room for improvement for a wide range of denoising tasks. Some image patches inherently require external denoising; however, external image patch prior-based methods do not make good use of the internal self-similarity. Further improvement of the existing methods or the development of a more effective one using a single denoising model remains a valid challenge. Therefore, we are interested in combining both internal and external information to achieve better denoising results. To this end, we choose NCSR and DnCNN as the initial denoisers by considering their performance and complementary strengths. NCSR, a powerful internal denoising method that combines nonlocal similarity and sparse representation, demonstrates exceptionally high performance in terms of denoising regular and repeated images. The DnCNN possesses an external prior modeling capacity with a deep architecture. This is better for denoising irregular and smooth regions and is complementary to the internal prior employed by the NCSR. In other words, the combination of NCSR and DnCNN can strongly explore both the internal and external information of a given region in the initially denoised images.
In this study, we introduce a denoising effect boosting method based on an image fusion strategy. The objective is to further improve performance by fusing images that are originally denoised by NCSR and DnCNN. These methods have complementary strengths and can be chosen to represent the internal and external denoising methods. Note that, the proposed denoising effect boosting method is simpler than the deep learning-based one introduced in [31]. In the latter method, a CNN is leveraged to iteratively learn the denoising model in each stage in the deep boosting method; this requires massive images for training to achieve an appropriate final result. In contrast, our method boosts the denoising effect using the image fusion strategy. Without using any training samples, we compute the weight map along each image pixel to fuse two initially denoised images for an enhanced denoising effect. In summary, the novelty of our method lies in three aspects. First, our method combines complementary information from images denoised using two state-of-the-art methods via a fusion strategy. Second, the strategy is excellent in terms of the preservation of details via a simple fusion structure. Third, it does not involve a computationally expensive training step. The DnCNN model used in this study was trained by its original developers, and the parameters are set using the source code of the model. Furthermore, NCSR is based on the nonlocal self-similarity and sparse representation of image patches, which need not be learned from external samples. Therefore, our method does not involve any loop iterations for processing images. The effectiveness of the proposed denoising booster can be seen in Figure 1, where some test images and the corresponding denoised images are shown. The proposed booster performs well with regard to the preservation of the image details. In the Lena image, the NCSR can recover the eyelashes; however, it produces artifacts on the eyeball. Though DnCNN produces less artifacts, it tends to create an over-smooth region, with the eyelashes being almost invisible. However, by combining the strengths of these two methods, our method can preserve more details without generating many artifacts in the same region. We can also observe that the line in the House image has a gray intensity in the result obtained using the NCSR. Nevertheless, it becomes brighter after boosting is performed by combining the denoising performance of the DnCNN with that of the NCSR. The boosting process is formulated as an adaptive weight-based image fusion problem to enhance the contrast and preserve the image details of the initially denoised images. Specifically, unlike many existing conventional pixel-wise image fusion methods that employ one weight to reflect the pixel value in the image sequence, our method applies a weight map to adaptively reflect the relative pixel intensity and the global gradient of the initially denoised images obtained using the NCSR and the DnCNN, respectively. Taking the overall brightness and neighboring pixels into consideration, two kinds of weights are designed as follows: 1. The relative pixel intensity based weight is designed to reflect the importance of the processed pixel value relative to the neighboring pixel intensity and the overall brightness. 2. The global gradient based weight is designed to reflect the importance of the regions with largely variational pixel values and to suppress the saturated pixels in the initial denoised images.
A linear combination of these two kinds of weights determines the final weight. Two initially denoised images are incorporated into the fusion framework, and the boosting method can significantly combine the complementary strengths of the two aforementioned methods to achieve better denoising results. Several extensive experimental results demonstrate that the proposed method visually and quantitatively outperforms many other state-of-the-art denoising methods. The key contributions of this study are summarized as follows:

•
Optimal combination. We introduce a denoising effect boosting method to improve the denoising performance of a single method, NCSR or DnCNN. Each denoiser has its own characteristics. The NCSR performs well on images with abundant texture regions and repeated patterns. Owing to the strategies of residual learning [29] and batch normalization [28], the DnCNN is better for denoising irregular and smooth regions. A linear combination of NCSR and DnCNN is better than either of the individual methods as well as a number of other state-of-the-art denoising methods. To the best of our knowledge, the proposed denoising effect boosting method is the first of its kind in image denoising. • Weight design. We introduce two adaptive weights to reflect the relative pixel intensity and global gradient. One is to emphasize the processed pixel value according to the surrounding pixel intensities and the overall brightness. The other is to emphasize the areas where pixel values vary significantly and to suppress saturated pixels in the initial denoised images. Therefore, the weight design is powerful in preserving image details and enhancing the contrast when denoising.
In Section 2, we first review two denoising methods, NCSR and DnCNN, and highlight their contributions to our study. In Section 3, we describe the proposed method in detail and present the proposed adaptive combination algorithm. In Section 4, the experimental results obtained using the proposed method are compared with those of other state-of-the-art methods. In Section 5, we discuss the results in detail. Finally, we conclude the study and discuss the directions for future research in Section 6.

Nonlocally Centralized Sparse Representation (NCSR) for Image Denoising
The NCSR algorithm involves decomposing a noisy image into a set of overlapping patches, learning the sub-dictionaries to sparsely code the image patches, making an estimate of the sparse coding vectors, and combining the estimated patches to form the denoised image. It can be equivalently introduced as below.
Using the notation employed in [9], for an image x ∈ R N , we denote x i = R i x as an image patch of size √ n × √ n at pixel i; furthermore, R i represents a matrix for extracting the patch x i from x. For a given dictionary D ∈ R n×M , n ≤ M, x i can be sparsely coded as x i ≈ Dα x,i by solving an l 1 -minimization problem written as: where α i represents the sparse coding coefficient of x i , P is the sum of sparse codes for image x, and λ is the regularization parameter. The redundant patch-based representation is obtained by overlapping the image patches. This aims at suppressing the boundary artifacts. The entire image x is denoted by a set of sparse codes For the convenience of expression, let here, α x is the concatenation of all the sparse codes. As mentioned in the model of the noisy image in Equation (1), the sparsity coding denoising model to recover x from y is obtained by solving a minimization problem: Then, the image x is estimated asx = D • α y .x is an estimate obtained by averaging each of the reconstructed patches in x i . The reconstruction of x from y in NCSR algorithm is defined as the following minimization problem: where the regularization parameter λ balances the centralized sparsity and the fidelity terms for better performance. This should be adaptively determined. β i represents the nonlocal estimate of the unknown sparse code α i . α i − β i p is the only regularization term in the aforementioned model. In the case of p = 1, the estimated β i can be computed from the nonlocal redundancy of natural images, and this is why the model is called the nonlocally centralized sparse representation (NCSR). An iterative shrinkage strategy is employed to calculate β i in Equation (6). Let Ω i be denote a set of patches similar to patch x i and α i,q be the sparse codes of patch x i,q within set Ω i . Thereafter, β i can be computed as: where w i,q is the corresponding weight and it is set inversely proportional to the distance between patches x i and x i,q : where x i = Dα i and x i = Dα i,q , respectively. h is a pre-determined scalar, and W is a normalization factor. Specifically, with the nonlocal estimate β i taking full advantage of the nonlocal redundancy of images, the NCSR algorithm naturally integrates the nonlocal self-similarity prior into the sparse representation framework and shows a promising performance in terms of denoising natural images with many repetitive structures.

Residual Learning of Deep CNN-Based Image Denoising Method (DnCNN)
The DnCNN method has been successfully used in image denoising mainly because of the following three reasons [27]. First, it has a very deep architecture that can increase its own capacity and flexibility. Second, some advances in training CNN-based models have been achieved; these include the rectified linear unit (ReLU) [32], the tradeoff between depth and width [33,34], gradient-based optimization algorithms [35][36][37] parameter initialization [38], batch normalization [28], and residual learning [29]. Third, the DnCNN can efficiently perform parallel calculations on modern powerful GPUs; thus, it has the potential to exhibit an improved run-time performance.
The input of the DnCNN is the mentioned noisy observation in Equation (1). Three types of network layers are introduced in the DnCNN denoiser; the architecture is illustrated in Figure 2, where "Conv" stands for convolution, "BN" stands for batch normalization, and "ReLU" stands for the rectified linear unit. After removing all pooling layers, the size of the convolution filters is 3 × 3. For a certain noise level in Gaussian denoising, it is more appropriate to set the size of the receptive field of the DnCNN denoiser to 35 × 35 with a corresponding depth of 17. Some explanations of the architecture of the DnCNN denoiser are given below: 1. Conv+ReLU: In the first layer, 64 feature maps are generated by 64 filters with the size of 3 × 3 × c; subsequently, rectified linear units (ReLU, max(0, ·)) are utilized for nonlinearity. c denotes the number of image channels; for a gray image, c = 1, and for a color image, c = 3.
2. Conv+BN+ReLU: 64 filters of size 3 × 3 × 64 are used, and batch normalization is added for layers 2 ∼ (D − 1) between the convolution and ReLU. Here, D represents the depth of the DnCNN. 3. Conv: In the last layer, there are c filters with the size of 3 × 3 × 64 that are used to reconstruct the final residual image. With regard to model training, DnCNN adopts the residual learning strategy and trains a residual mapping R(y) ≈ v to predict the residual image; furthermore, it uses batch normalization [28] to accelerate training and reduce the internal covariate shift [28]. Then, the output is obtained using x = y − R(y). It has been pointed out in [27] that integrating residual learning and batch normalization is particularly helpful for fast and stable training as well as better denoising performance.

Combination of the NCSR and the DnCNN
In this section, we present an image fusion algorithm to optimize the denoising performance of NCSR and DnCNN using two adaptive weights that reflect the relative pixel intensity and the global gradient, respectively.

Fusion of Images Denoised by NCSR and DnCNN
The proposed denoising effect boosting method is a linear combination of NCSR and DnCNN. That is, we apply two denoisers D 1 and D 2 to yield two denoised imagesx 1 = D 1 (y) andx 2 = D 2 (y). We compute the desired imagex by retaining only the "optimal" parts in imagesx 1 andx 2 . This process is guided by the relative pixel intensity and the global gradient, which are consolidated into a scalar-valued weight map. The final imagex is obtained by fusingx 1 andx 2 using weighted blending. The processes involved in the proposed method are shown in Figure 3. To optimally fuse the initial denoised images, we compute a weight map for the n-th input image as where (i, j) represents the image pixels, and denotes a very small positive value (e.g., 10 −2 ) to avoid the denominator being zero. The parameters p 1 , p 2 > 0 are set to determine the extent to which each weight should be emphasized. The number 2 indicates that there are two input images that are denoised by NCSR and DnCNN, respectively. W 1,n (i, j) and W 2,n (i, j) are two adaptive weights designed to reflect the relative pixel intensity and the global gradient of an input image. A detailed introduction to the two weights will be given in the following subsections.
Using the weight obtained in Equation (9), the resulting denoised imagex can be obtained via a weighted sum of the initial denoised images: wherex n is the input image denoised by NCSR or DnCNN andx n (i, j) is the image pixel intensity. In this study, the pixel intensity is normalized to the range of [0, 1]. Unfortunately, applying only Equation (10) will yield an image with several artifacts. This is because the values of the weights are usually noisy and discontinuous. Therefore, we apply Equation (10) in multiple resolutions using a pyramidal image decomposition, described in [39], to avoid sharp weight map transitions. The fusion is carried out in each pyramid separately. Specifically, we set the decomposition level l to 7 based on [39]. For level l, L {x n (i, j)} l is the Laplacian pyramid of imagex n (i, j) and G {W n (i, j)} l is the Gaussian pyramid of the weight map W n (i, j). Note that the value ofx n (i, j) determines the value of W n (i, j). Then, we blend the pixel intensities in different pyramid levels in Equation (11): The fused pyramid L {x n (i, j)} l is collapsed to obtain the resulting denoised imagex. The pyramid approach can weaken the local unnatural transition by dispersing the gray-level mutations of the whole image, which are caused by the differences in the denoising effects.

Pixel Intensity Based Weight Design
In this section, we introduce a weight design W 1,n (i, j) that reflects the pixel intensity. A fundamental aspect of the image fusion algorithm is to design W n (i, j), which reflects the importance of the corresponding pixel; furthermore, it needs to reflect the influence of luminance changes, i.e., to emphasize bright regions and vice versa. Mertens et al. [39] presented an image quality measure known as well-exposedness to design a weight in this regard: where λ equals 0.2. Similar to several intuitive weight designs, the measure uses a Gauss curve and provides weights to each pixel intensityx n (i, j) based on the proximity of the intensity value to 0.5. It also can be observed that the n-th image is the only variable used in this function. Based on this, we present our observations regarding the weight design. First, a weight design that employs Equation (12) cannot assign a large weight to a well-denoised pixel with an intensity value far from 0.5, in bright or dark regions. Therefore, it cannot well emphasize a bright pixel that is well-denoised in an overall dark image or a well-denoised dark pixel in an overall bright image. Hence, we propose a weight design that is relative to the overall image brightness. The proposed weight design assigns a relatively large weight to a dark pixel in a bright image and vice versa. We define m n as the mean of the pixel intensities of the n-th initial denoised image, and the weight should emphasize the pixel intensities close to 1 − m n . In the same form as that of Equation (12), this can be written as exp(−(x n (i, j) − (1 − m n )) 2 ). In addition, we note that several well-denoised pixels should be considered when the brightness of the input initial denoised images m n and m n+1 has a large difference. Therefore, we assign a large λ n when the brightness of the two images differs substantially. Finally, the first weight W 1,n (i, j) that reflects the relative pixel intensity can be represented as where λ n controls the weight as λ n = 2α(m n+1 − m n ) based on the difference between the two input images. From Equation (13), it can be seen that when the input image is bright (m n is close to 1), dark pixels (x n (i, j) with a relatively low value) will be assigned a larger weight and vice versa. Moreover, a large weight is assigned when there is a large difference in the mean brightness of the two input initial denoised images.

Global Gradient Based Weight Design
The image gradient has been widely studied because it conveys rich information regarding image edges and structures. To explore the complementary information provided by the gradient of the image pixels, and further understand how to design an efficient weight function, we study the gradient between the pixel intensity and its frequency. In this subsection, we will discuss how image gradient information can be exploited to compute the weight map for the initial denoised images.
In a bright image, the pixels values in bright regions are saturated close to 1, whereas they have a small gradient in the dark regions. The opposite relation holds in the case of a dark image. Some methods assign large weights to pixels with large gradient values [39][40][41]. However, the pixel gradient value is small in smooth regions regardless of the degree of luminance; thus, emphasizing only the pixel values in regions with large gradients will fail to stress the pixels with a small gradient that are in well-denoised regions.
In this regard, we design another weight that is based on the gradient of the pixel intensity and its frequency to emphasize the well-denoised regions regardless of their local contrast. As the proposed gradient is not a local one (that is, relative to surrounding pixels) but relative to other remote pixels in a similar frequency range, we refer to the proposed gradient as the global gradient. The global gradient of a dark image is large because many saturated pixel intensities are close to zero. Therefore, we posit that an image pixel is in a well-denoised region when it is in a region with a small global gradient. In other words, pixel values are relatively scarce in this region; thus, the pixels have a large variation in value compared to that of the surrounding pixels. In contrast with dark images, bright images show smaller global gradients at lower pixel values. This also indicates that the pixels with a smaller global gradient are in well-denoised or high-variation regions. Therefore, we give a pixel a larger weight when it has a smaller global gradient. Considering these observations, we design the second weight: where G n (x n (i, j)) is the global gradient for pixel intensityx n (i, j).

Experiments
We have conducted extensive experiments to validate the effectiveness of our approach and compared it to recently proposed powerful denoising methods. In this section, we first discuss the datasets and the experimental setup. Then, we evaluate the proposed image fusion denoising method and its competing methods on the test images.

Datasets and Experimental Setup
Referring to the two widely used test datasets and the ESPL synthetic image database [42], we derived our test images to evaluate the denoising performance of the proposed method and that of several competing methods. The first datasets contains ten natural images that are commonly used to study image denoising, including four images with a size of 256 × 256 (Cameraman, House, Monarch and Peppers), and six images with a size of 512 × 512 (Barbara, Boat, Couple, Hill, Lena, and Man), as shown in Figure 4. The second one is a set of 50 natural images selected from the Berkeley segmentation dataset (BSD) [43]. The third dataset contains 25 high quality synthetic color images obtained from the Internet, which generally comprised 1920 × 1080 pixels. The images are primarily selected from some popular animation movies and video games. All of the images contain both repetitive patterns and irregular textures. Some examples can be seen in Figure 5. We compare the denoising performance of our proposed method with that of seven state-of-the-art and representative denoising methods, including BM3D [5], NCSR [15], WNNM [7], PGPD [17], DnCNN [27], TWSC [14], and FFDNet [26]. The denoising results of all the competing algorithms are generated using the source codes released by their original authors, and we use the default parameters. To quantitatively evaluate the visual quality of the images denoised via the different methods, the assessing index peak signal to noise ratio (PSNR) is used. This is defined as follows: where µ i,j and x i,j represent the pixel values of the restored image and the original image respectively, and the size of the input image is M × N. We also calculated the structural similarity index measurement (SSIM) [44], the feature similarity index measurement (FSIM) [45], the visual information fidelity (VIF) [46] and the information content weighting SSIM (IW-SSIM) [47] of the competing methods. These metrics provide quality measurements closer to the characteristics of human vision, enabling further evaluation of the denoising performance. For all these aforementioned indexes, larger values indicate that the denoised images will appear more similar to their original ones in terms of human vision. The basic parameter setting is as follows: the number of images N is two and the pixel intensity I n (x, y) is normalized to the range of [0, 1]. We conducted experiments to determine the best PSNR value with respect to changes in α in the range of [0.25, 1.25]. The experimental results show that a larger α leads to a higher PSNR. However, it is a comparatively minor improvement. For the stability and robustness of the experimental results, α is set to the middle value 0.75. The exponents p 1 and p 2 in Equation (9) determine which of the two weights has a greater influence on the final weight map. As these two weights play the same role in our weight combination, we set p 1 = p 2 = 1 to consider the two weights as equally important. We carried out our experiments in MATLAB (R2018a) environment using a PC with a 4.00 GHz Intel Core i7-6700K CPU, 16 GB of RAM, and an Nvidia Quadro M4000 GPU.

Quantitative Comparison with Other State-of-the-Art Algorithms
In this subsection, we first elucidate the testing of the proposed method and its competing methods on ten commonly used test images. AWGN, with the noise levels σ = 10, 20, 30, 40, 50, and 60, is added to these test images. The highest values obtained for each noise level are highlighted in bold in each of the tables. Table 1 lists the PSNR values for the test images Boat, Couple, Man, Monarch, and Peppers for the noise level σ = 10, 30, and 50. It can be observed that the best PSNR values are obtained by our method for all these images. From the average PSNR values shown in Table 2, the following observations can be made. First, the proposed method surpasses the NCSR, PGPD, and BM3D by a substantial margin, and it also outperforms WNNM, DnCNN, TWSC, and FFDNet by an average of approximately 0.31∼0.53 dB for a wide range of noise levels. Second, the proposed method has higher PSNR values than BM3D, NCSR, PGPD, WNNM, DnCNN, and TWSC, and it is only slightly inferior to the FFDNet when the noise level σ is set to 60. However, it gradually outperforms the FFDNet when σ < 60, and the proposed method performs exceptionally with regard to low-noise-level denoising.  Table 3 presents the average SSIM and FSIM values obtained for eight methods under six different noise levels. It can be seen that the proposed method and FFDNet have a comparable performance with regard to the SSIM. Particularly, in terms of the FSIM, the best result is achieved by our method. This validates the excellent denoising performance of the proposed method, which considers both local structural preservation and global luminance consistency.  Table 4 lists the average VIF and IW-SSIM values obtained for the competing methods, for various denoising tasks carried out at six different noise levels. The proposed method outperforms TWSC, PGPD, and BM3D by a substantial margin. It demonstrates a noticeable denoising effect in low-noise-level denoising tasks; particularly, in terms of VIF, when the noise level is set to 40 and 50, our method surpasses the benchmark method BM3D by 0.085 and 0.060, respectively. Regarding images with a low noise level, many details are intact in the final image obtained using our method; thus, our method is able to eliminate the inaccuracy and uncertainty in denoised images obtained using the individual methods, thereby preserving the image details to the maximum extent. To further demonstrate the general applicability of our method, we employed 50 images from the BSD dataset. The PSNR performance of the eight competing denoising methods is reported in Table 5. An overall impression, obtained from Table 5, is that the proposed method achieves the highest PSNR in all cases. In the case of low noise levels (σ = 10), the improvement is strikingly noticeable (e.g., an average improvement of 1.14 dB over the second-best method, FFDNet). Subsequently, even as the noise level increases to 50 and 60, the improvements exhibited by the proposed method over the PSNR of FFDNet are notable, with the average values of 0.13 dB and 0.03 dB, respectively. It is also observed that the proposed method outperforms the benchmark BM3D method by 0.65 dB∼1.33 dB. Such a gain in the PSNR is remarkable because only a few methods can exceed the PSNR of the BM3D method by an average of more than 0.3 dB [48,49]. In addition, we calculated the metrics VIF and IW-SSIM to further assess the performance of our method. From Table 6, it is clear that the result obtained using the proposed boosting method is more pleasing than the denoised image obtained using either DnCNN or NCSR. A majority of the best metric values are also achieved by our method.   In addition to considering traditional datasets, we also evaluated the performance of our method on synthetic images. The PSNR results are reported in Table 7. The experiments conducted at low (σ = 10) and high (σ = 50) noise levels show that the proposed method outperforms all of the 7 comparable methods. Moreover, in terms of the average PSNR results, our method is the best among all the competitors. The proposed boosting method is able to boost the PSNR value by an approximate average of 0.68 and 0.17 compared to NCSR and DnCNN, respectively. The experimental results demonstrate that the proposed method can achieve a state-of-the-art denoising performance in different datasets. Thus, our method possesses a high generalizability and applicability.

Comparison of Statistical Significance with Other State-of-the-Art Algorithms
Although the proposed method demonstrates performance improvements over the performance of the existing methods considered in this study (see Tables 1-7), these improvements may not be statistically relevant. Therefore, we performed a two-way analysis of variance (ANOVA) (and multiple subsequent comparison tests [50]) on the PSNR results shown in Table 2 to determine the statistical significance of the results obtained using the proposed method. The corresponding results are tabulated in Table 8. ANOVA is a statistical analysis method that allows us to interpret and analyze observations made from several populations. It decomposes the observed results into contributions from different sources; then, it determines whether there is a significant difference between the sources of variation or not. Furthermore, it gives a value indicating the amount of variation. In our experiments, a criterion based on the p-value obtained using the results of the ANOVA is used to evaluate the statistical significance. From Table 8, it can be seen that the p-values of the paired ANOVA test for evaluating the difference between our method and the comparison methods are all less than 0.05. This demonstrates that the results obtained using the proposed boosting method are statistically significant.

Visual Comparison with Other State-of-the-Art Algorithms
As the ultimate judges of image quality are human subjects, visual quality is also critical in evaluating a denoising method. Therefore, we focus on the visual comparison of the images denoised by the eight competing methods in this study. The results of the experiment at the noise level σ = 20 for the test image Boat, shown in Figure 6, illustrate that the proposed method can preserve the contrast and structural details almost entirely. Comparing our method with other methods, it can be observed that the results of the NCSR and PGPD have lost several image details, whereas BM3D, WNNM, DnCNN, and TWSC produced over-smoothed results in the highlighted red window. Furthermore, FFDNet tends to generate several artifacts on the sign of the boat, where the proposed method obtains a smooth result. Particularly, the proposed method can recover well the thin masts of the boat. These masts are almost absent in the recovered images obtained by other methods.
Subsequently, we increased the noise level to 50. It can be observed from Figure 7 that PGPD, BM3D, NCSR, and FFDNet tend to smooth the edges and textures, which leads to image blurring. Although DnCNN, TWSC, and WNNM better balance the contrast, they generate substantial artifacts on the flower in the image Monarch. In contrast, the proposed method can well reconstruct the vein-like patterns in the butterfly's wing that are shown in the magnified view; furthermore, the proposed method better preserves the edge structures of the test image. Overall, the proposed method produces denoised images of the best visual quality while maintaining high PSNR indices.
In addition, we test our method on the BSD dataset. It is clear from the results that the proposed method exhibits a visual performance that is superior to that of the other denoising methods. Visual comparisons of the results obtained using the various denoising methods are shown in Figure 8. It can be seen that NCSR generates substantial artifacts between the zebra's stripes, and DnCNN balances the contrast well; nevertheless, it tends to distort the lines and generate blurred edges. It is not surprising that our method can preserve much more sharp edges and fine details because it is a combination of NCSR and DnCNN via the proposed fusion strategy, which is highly promising. For visual comparison, Figure 9 shows the denoised images, corresponding to an image in the ESPL synthetic image database, that were obtained using the various methods evaluated in this study. A magnified view is also provided for each image for better visual comparison. It can be seen that a number of noise pixels have not been removed in the images denoised by NCSR, PGPD, BM3D, and TWSC; moreover, details have been extensively lost in the lower right corner of the image. Regarding the denoised image obtained using FFDNet, many undesirable bright pixels are generated on the wings of the cartoon girl. Furthermore, WNNM and DnCNN produce over-smoothed textures and edges. By comparison, the result obtained using the proposed method retains the information in the original image to the greatest extent and suppresses almost all the noise, even at a high noise level. One of the reasons for this is that our weight map can incorporate the well-denoised pixels into the final result.

Discussion
There are two important indicators of denoising performance: the denoising effect and computational complexity. Unfortunately, a high denoising performance is often obtained at the cost of computational complexity; therefore, the development of denoising methods is a spiraling process. The current denoising models must seek a reasonable trade-off between denoising performance and run time. This encourages researchers to continue to focus attention on improving the current state-of-the-art models. The computation time of our method comprises the fusion time of the initially denoised images and the running time of NCSR and DnCNN; therefore, it is longer than the running time of the single denoiser. However, unlike several deep learning-based boosting methods, the fusion step in the proposed method does not involve the training stage, which is time-consuming. The fusion times of our method for processing six images selected from the ten commonly used test images employed in this study, with sizes of 256 × 256 and 512 × 512, are listed in Table 9. We evaluate the fusion time by denoising the ten images with noise levels of 10, 30, and 60. It can be seen that the fusion process takes very little time; therefore, the computational complexity mainly depends on the two algorithms to be fused. Our goal is to introduce a novel method for boosting the denoising effect using an image fusion strategy. With the evolution of the denoising methods to be fused, the efficiency of our method will increase. The proposed method allows the combination of the initial denoised images generated by any two image denoisers; thus, one can train two complementary algorithms that are different from the ones employed in this study and use our method to boost the denoising effect. In summary, the proposed method achieves optimal results at a reasonable computational cost; furthermore, it allows for an effective performance/complexity trade-off in the future. Whereas image denoising algorithms have produced highly promising results over the past decade, it is worth mentioning that it has become increasingly difficult for several denoising methods to achieve even minor performance improvements. According to Levin et al. [49], when compared over the BSD dataset, for σ = 50, the predicted maximal possible improvement (over the performance of BM3D) for external denoising tasks is bounded by 0.7 dB. However, the proposed method exceeds the performance of BM3D by 0.77 dB, as shown in Table 5, which is a substantial improvement. Through the image fusion strategy, our method offers a solution to further improve individual internal or external denoising algorithms. The fused image can provide a visually better output image that contains more information. Therefore, it is worth achieving a more specific and accurate result using our method at the reasonable computational cost. In fact, there are abundant real-world applications (e.g., machine vision, remote sensing, and medical diagnoses) that can benefit from the proposed method. Specifically, in digital medical treatment, the detailed features in images may be ignored by the NCSR algorithm, which is based on the non-local self-similarity of images; however, such features can be preserved by the external denoising method DnCNN. Thus, the proposed method can output better and more comprehensive images by combing the complementary information of the medical images denoised by the two methods, thereby providing more accurate data for clinical diagnosis and treatment. This will be crucial for feature extraction from images of lesions, three-dimensional reconstruction and multi-source medical image fusion, and other technologies that assist in diagnosis. Thus, the proposed method could be of immense value with regard to providing an alternative for boosting the denoising effect.
The boosting algorithm developed in this study can be interpreted as an algorithm for the fusion of two initially denoised images. Thus, it is not limited to the noise models of algorithms such as AWGN, and can be adapted to other types of noise if it is allowed by the constituent denoising algorithms. In addition, a good discrimination between noise and image texture information can significantly improve the noise reduction effect, which is also the goal of many traditional denoising algorithms. Currently, researchers are continuing to improve the performance of the state-of-the-art denoising methods. In the future, we will determine complementary algorithms with better performances to deal with various denoising tasks by using our fusion strategy.

Conclusions and Future Studies
In this study, a denoising effect boosting method based on an image fusion strategy has been presented to combine two image denoising methods (i.e., NCSR and DnCNN) for a better denoising performance. It is based on two weight designs. The first weight design measures the importance of the pixel values according to the overall luminance, and it increases the weight when the neighboring pixel intensity changes largely. The second weight design reflects the importance of the regions with substantial variations in pixel values and suppresses the saturated pixels in the initial denoised images. By integrating the images denoised via NCSR and DnCNN into an optimally fused image, the final denoised output is produced. The experimental results confirm that the proposed method exhibits substantial quantitative improvements over the other state-of-the-art methods, in addition to producing high-quality fused denoised images with much better image structures and less visual artifacts.
The proposed method is based on a general image fusion strategy. This indicates that it is not limited to image denoising problems. In future research, it is reasonable to extend the proposed boosting method to image de-blurring or image super-resolution problems. Future work could also involve choosing more efficient complementary algorithms or parallel implementations to further improve the computational efficiency of the proposed method. There is no single method that always performs better than others in complex imaging scenarios. Our method offers a solution to integrate individual methods that have complementary strengths into a stronger combined method. We also expect that a number of computer vision applications can benefit from the proposed denoising effect boosting method.