Next Article in Journal
Evaluating the Performance of Algorithms in Axillary Microwave Imaging towards Improved Breast Cancer Staging
Previous Article in Journal
FDSS-Based DFT-s-OFDM for 6G Wireless Sensing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

RDASNet: Image Denoising via a Residual Dense Attention Similarity Network

1
School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
2
State Key Laboratory for Manufacturing Systems Engineering, Xi’an Jiaotong University, Xi’an 710049, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(3), 1486; https://doi.org/10.3390/s23031486
Submission received: 28 October 2022 / Revised: 14 January 2023 / Accepted: 19 January 2023 / Published: 29 January 2023
(This article belongs to the Section Sensing and Imaging)

Abstract

:
In recent years, thanks to the performance advantages of convolutional neural networks (CNNs), CNNs have been widely used in image denoising. However, most of the CNN-based image-denoising models cannot make full use of the redundancy of image data, which limits the expressiveness of the model. We propose a new image-denoising model that aims to extract the local features of the image through CNN and focus on the global information of the image through the attention similarity module (ASM), especially the global similarity details of the image. Furthermore, dilation convolution is used to enlarge the receptive field to better focus on the global features. Moreover, avg-pooling is used to smooth and suppress noise in the ASM to further improve model performance. In addition, through global residual learning, the effect is enhanced from shallow to deep layers. A large number of experiments show that our proposed model has a better image-denoising effect, including quantitative and visual results. It is more suitable for complex blind noise and real images.

1. Introduction

As an important information carrier, imaging is widely used in remote sensing, medicine, aerospace, and other fields. However, due to the interference of imaging equipment and external factors, images are very easily affected by various noises and become blurred [1], so image denoising is particularly important. In fact, image denoising has always been a fundamental and important problem in computer vision [2]. Its purpose is to recover a clean image from a noisy image [3]. In general, for noisy images y, the image-denoising problem can be expressed as y = x + v , where x is the original image and v represents additive Gaussian noise (AWGN) with a standard deviation of σ .
From the point of view of Bayes, when the possibility is known, imaging prior modeling is a good method of image denoising. In the early days, most models were modeled based on the image prior method. For example, the non-local mean (NLM) algorithm [4] estimated the center point of the reference block using the weighted average of self-similar blocks to reduce noise. In addition, block-matching and 3D-filtering (BM3D) algorithms [5] enhanced sparsity through cooperative transformation to achieve image denoising. In calculating the weighted nuclear norm minimization (WNNM) [6], prior information was used to determine the nuclear norm for image denoising. Indeed, image denoising can be carried out using image-prior-based methods, and a good denoising effect can be achieved. However, these methods all face two problems [7]: (1) the optimization problem in the test stage is very complex, making the denoising process time-consuming; (2) parameters need to be manually adjusted to obtain a better image-denoising effect.
As evident in models such as AlexNet [8], VGG [9], ResNet [10], etc., deep learning has flourished, and convolutional neural networks (CNN) have been widely used in image denoising and have achieved a better denoising effect. For example, residual learning and bath normalization were applied to a deep convolutional neural network for image denoising (DnCNN) [7] to enhance the learning ability of the model. The downsampled subimages and noise level map were used as model inputs, in order to improve the adaptability of the network to different noises (FFDNet) [11]. Furthermore, the residual dense layers were proposed to be incorporated in CNN (RDN) [12,13] to improve the performance of the model for image super-resolution. Batch renormalization was applied to CNN (BRDNet) [14] to enhance the expressiveness of the models. Make full use of the effects from the shallow layer on the deep layer of the network (ADNet) [15]. In fact, although these models do improve the effect of image denoising, there are still some problems: (1) One of the problems is the image data redundancy, which is represented by similar details in some parts of the image. Furthermore, CNN-based models do not make full use of the global similarity information of the image, which limits the expressive ability of the model. (2) In some methods that use lightweight networks, the effect of image denoising needs to be improved, for example, ADNet and BRDNet, and some methods are complex, such as BM3D. (3) Some methods do not adequately capture key information in complex environments, and the denoising effect is not very good in complex noise and real-world images.
In this paper, we propose a new network for image denoising. Its core is the residual dense attention similarity module (RDASM). First, a shallow feature is extracted through the preprocessing module, which only contains two convolution layers. Then, the local features of the image are captured using the residual dense module (RDM); it includes residual learning (LR [10]) and dense layers (DenseNet [16] and RDN [12]), and leaky ReLU (LReLU [17]) is used to avoid the case of zero gradients. In addition, inspired by BM3D and NLM, an attention mechanism is used to mine the global similarity information of the image, and similar details have similar weights. On this basis, an avg-pooling mechanism is used to better smooth noise points and suppress the noise, and dilated convolution (Dilated Conv [18]) is used to expand the receptive field, so as to better obtain similar global features. Furthermore, the deep features of the image are mined by cascading residual dense attention similarity modules (RDASMs). Finally, through global residual learning, the effect is enhanced from shallow layers to deep layers.
The contributions of this paper are as follows:
  • This paper proposes a new image-denoising algorithm framework: the residual dense attention similarity network (RDASNet). Different from the existing CNN denoising models, the core of the proposed model is the residual dense attention similarity module (RDASM), which extracts the local features of the image through CNN and captures the global similarity features of the image through the attention similarity module. Furthermore, this is very effective for dealing with complex noise images. The proposed model achieves a better denoising effect, both qualitatively and quantitatively; thus, it is more suitable for complex noise.
  • Weight is used to represent the similarity of image details. Data redundancy exists in the image, that is, the textural detail of the whole image is similar. The attention similarity module (ASM) can make full use of the global information of the image, and similar details have similar weights. Furthermore, the ASM gives larger weight to key features. Ablation studies have also shown the effectiveness of ASM.
  • In the attention mechanism, dilated convolution is used to enlarge the receptive field in attention so as to extract the global similarity information better. Furthermore, dilated convolution has a smaller number of parameters. Compared with RDN, the number of parameters in our proposed model increased only by 0.07 M, while PSNR increased by 0.10–0.22 dB.
  • For global pooling in the attention mechanism, we found that the color of the image is dimmer after avg-pooling, which makes the noise in the image look less obvious than before; that is to say, avg-pooling is beneficial to smooth and suppress the noise in the image. Moreover, ablation studies have also shown that avg-pooling could further improve model performance for image denoising.

2. Related Work

2.1. Deep CNNs for Image Denoising

Due to the abovementioned two main defects, image denoising is based on the image prior knowledge modeling. Moreover, a convolutional neural network can automatically extract features, thus reducing computational costs [19,20]. Therefore, deep CNNs are widely used in image denoising.
Zhang et al. [7] first designed deep CNNs for image denoising (DnCNN), and they improved the performance of the model by stacking multiple convolutional layers, residual learning [10], and batch normalization [21]. DnCNN obtained a better effect than traditional BM3D; therefore, it is a successful application of CNN in image denoising. It was noted that DnCNN has a better effect on certain noises, and the effect of blind noise is not ideal. Then, to deal with this problem, Zhang et al. [11] designed a fast and flexible denoising network (FFDNet). They used a trainable noise level map as the model input, and a single model can handle different noise levels. Furthermore, in order to make full use of the abundant features of all the layers to improve model performance, Zhang et al. [12] proposed a very deep residual dense network (RDN) for image super-resolution, and by cascading residual dense blocks, a continuous memory mechanism was formed. Moreover, to reduce computational costs, Tian et al. [14] proposed a batch renormalization denoising network (BRDNet), which used batch renormalization [22] to accelerate the convergence of network training. Furthermore, it is suitable for denoising on low-configuration hardware devices. Using dilated convolution instead of ordinary convolution can also reduce the number of model parameters. For example, Tian et al. [15] designed ADNet, and they used sparse blocks composed of dilated convolution and ordinary convolution to improve performance and efficiency; in addition, an attention mechanism was used to extract hidden information. Motivated by this, and that deep CNNs have shown better performance for image denoising, we used CNNs for image denoising.

2.2. Attention Mechanism and Similarity

Extracting key information in complex environments is very important for image denoising. Furthermore, there is redundant information in the image; specifically, there is a global similarity in image details. A better use of image data redundancy can improve the performance of the model in a complex environment.
The attention mechanism used in this study originated in the human brain [23], then it was introduced into natural language processing [24] and applied to computer vision [25]. From a mathematical point of view, the attention mechanism provides a pattern of weights to perform operations on. In a neural network, the attention mechanism enables the use of some network layers to calculate the corresponding weight value of the feature maps and to carry out the attention mechanism of the feature maps. Furthermore, an attention mechanism can be understood as giving more attention to the most meaningful parts (with more weight) [26]. Thus, this is very useful in complex environments to obtain key information about the image, where attention gives more weight. Jaderberg et al. [27] proposed a spatial transformer network (STNet), which focused on the spatial information of the image. Hu et al. [28] proposed a squeeze-and-excitation network (SENet) to study channel dimensions, which can adaptively adjust the key features of image channel dimensions. Furthermore, ADNet [15] uses only one convolution focus channel information to guide the CNN in training the model. In addition, Woo et al. [29] proposed a plug-and-play convolutional block attention module (CBAM) that extracts global image features in the channel spatial dimension, respectively, through max-pooling and avg-pooling. Inspired by these methods, our attention mechanism also includes two dimensions, channel and spatial dimensions.
Classical image denoising, such as classical NLM [4] and BM3D [5], make full use of image similarity information. NLM gives large weight to neighborhoods with similar pixels, and BM3D searches for similar patches of a given patch. In image denoising with deep CNNs, there are few studies focused on the global similarity of images. Motivated by this, we used an attention mechanism to represent the similarity of image details.

3. Proposed RDASNet Denoising Method

In this section, a new image-denoising network is described, along with the residual dense attention similarity network (RDASNet), as shown in Figure 1. Firstly, the shallow information of the image is extracted using the preprocessing module (PM), which only contains two convolution layers. The core of our model is the residual dense attention similarity module (RDASM), which consists of the residual dense module (RDM, motivated by RDN [12]) and the attention similarity module (ASM, motivated by CBAM [29], NLM [4], and BM3D [5]). The RDM captures the local features of the image through residual learning and dense layers, while the ASM uses attention to assign similar weights to areas with similar image details (pixels) and gives large weight to the key features of the image. This is useful for image denoising in a complex background. Then, global residual learning is used to enhance the effect from a shallow layer to a deep layer in the network.

3.1. Network Structure

Assume that I n o i s e and I d e n o i s i n g represent the input image containing noise and the output denoising image, as shown in Figure 1. Specifically, in the preprocessing module (PM), we use two convolution and activation layers to extract shallow features with 64 3 × 3 convolution kernels for each layer to obtain the shallow feature map F p r e as follows:
F p r e = H p r e 2 ( H p r e 1 ( I n o i s e ) )
where H p r e 1 and H p r e 2 denote convolution and activation operations. Furthermore, we use the leaky rectified linear unit (LReLU [17]) activation function.
Then, F p r e is sent to the N-stacked residual dense attention similarity module to capture the image features. Through RDASMs, we can obtain F B N using N-RDASM.
F B N = H B N ( F B N 1 ) = H B N ( H B d ( H B 1 ( F p r e ) ) )
where H B d denotes the operations of the d-th RDASM, and H B d is a non-linear transformation; it’s a series of operations, such as convolution and LReLU. More details on RDASM are given in Section 3.2. Then, the output features of all RDASMs are fused, and the output feature F R D A S M s can be obtained.
F R D A S M s = H F 2 ( H F 1 ( [ F B 1 , , F B d , , F B N ] ) )
where [ F B 1 , , F B d , , F B N ] indicates that the feature maps from 1 to N RDASM will be concatenated. H F 1 means to control the number of output channels with a 1 × 1 convolution, and H F 2 means that a 3 × 3 convolution is used to improve the expression ability of the model.
Finally, we use global residual learning to enhance the effect from a shallow layer to a deep layer, and thus we can obtain the output feature map F o u t .
F o u t = F R D A S M s + F 0
Then, through a 3 × 3 convolution, it will be converted to three channels or one channel (depending on whether the input is a color image or a gray image).
I d e n o i s i n g = H o u t ( F o u t )
where I d e n o i s i n g is the final output of our model, that is, the image after denoising through the RDASNet.
In addition, the L1 loss function is used to optimize the difference between the denoising output image I d e n o i s i n g of our model and the ground truth image I G T . Assuming that the training set has N pairs of images I n o i s e i , I G T i ( i = 1 , 2 , , N ), the loss function of RDASNet can be calculated by
L = 1 N i = 1 N | F R D A S N e t ( I n o i s e i ) I G T i |
where F R D A S N e t ( · ) denotes the predicted output of the model.

3.2. Residual Dense Attention Similarity Module (RDASM)

Our proposed residual dense attention similarity module is the core of RDASM; it includes the residual dense module (RDM) and the attention similarity module (ASM), as shown in Figure 2. RDM is used to obtain the local features of the image, form dense layers through a series convolution, and enhance the representation of the model through residual learning. ASM is used to obtain key similarity features in a complex background, including channel attention similarity (CASM) and spatial attention similarity (SASM). CASM focuses on the global image similarity information from the channel dimension, while SASM focuses on the global image similarity information from the spatial dimension.
In Figure 2, we can obtain the channel attention similarity map M C R C × 1 × 1 , and we can obtain the spatial attention similarity map M S R 1 × H × W . Then, the attention similarity module can be described as
F C A S M = C A S C F B d F S A S M = S A S S F C A S M
where F B d denotes the input feature map of the d-th RDASM. ⊗ denotes element-wise multiplication. More details on CASM and SASM are given in Section 3.2.2 and Section 3.2.3, respectively.
Then, the output F S A S M of the attention similarity module and the serial output F B d , i of the residual dense more are concatenated (more details are provided in Section 3.2.1).
F B d = H B d ( [ F B d , 0 , , F B d , i , , F B d d , c 1 , F S A S M ] )
where H B d denotes the [ F B d , 0 , , F B d , i , , F B d d , c 1 , F S A S M ] are concatenated. Then, the 1 × 1 convolution is used to control the number of output channels. Thus, the input feature map of d + 1-th RDASM can be obtained by
F B d + 1 = F B d + F B d
where F B d + 1 denotes the output of d + 1-th RDASM, and the feature map of the current layer is passed backward through local residual learning.

3.2.1. Residual Dense Module (RDM)

The residual dense module is designed with reference to RDN, and each RDM block adopts an eight-layer convolution and activation operation to achieve contiguous memory by transferring the feature F B d of the current layer to each subsequent layer, respectively, so as to make full use of the features of each layer, as shown in Figure 2. The output feature map F B d , c of the c-th Conv layer of the d-th RDASM can be expressed as
F B d , c = σ ( W B d , c [ F B d , 0 , , F B d , i , , F B d , c 1 ] )
where F B d = F B d , 0 . F B d , i is the feature map extracted from the i-th (i = 0, 1, ⋯, c−1) Conv layer of the d-th RDASM, and W B d , c is the weight of the c-th Conv layer. σ denotes the LReLU activation function.

3.2.2. Channel Attention Similarity Module (CASM)

CASM uses global avg-pooling on each channel to compress the feature map from C × H × W to C × 1 × 1 so that one pixel represents one channel to achieve global information embedding [28], as shown in Figure 3. We use F B d R C × H × W to represent the feature map input to the CASM. The global spatial information is compressed into a channel descriptor z R C × 1 × 1 through global avg-pooling. Furthermore, the c-th channel of z c is calculated by
z c = H G A P ( f c ) = 1 H × W i = 1 H j = 1 W f c ( i , j )
where H G A P is used to represent global avg-pooling. Compared with max-pooling, avg-pooling is beneficial to smooth and suppress noise and achieve a better denoising effect (more details are provided in Section 4.5). Furthermore, f c ( i , j ) represents the value of position ( i , j ) of feature map F B d .
Then, through Dilated+LReLU+Dilated, a gating mechanism with a sigmoid activation is used to learn the interrelationships between the channels. Furthermore, we can obtain the channel attention similarity map C A S c R C × 1 × 1 . It should be noted that here, we use dilated convolution [18] instead of standard convolution, and the dilation rate is 2. There are two reasons: (1) dilated convolution can enlarge the receptive field, which is beneficial to obtain better global similarity information; (2) there are fewer dilated convolution parameters.
C A S C = σ 2 ( H D C o n v 2 ( σ 1 ( H D C o n v 1 ( z c ) ) ) )
where σ 2 denotes the sigmoid activation function. H D C o n v 1 R C r × C and H D C o n v 2 R C × C r represent the two layers of the dilated convolution, respectively. The r is the reduction ratio, and then the number of parameters is reduced and set to 16 [28]. σ 1 represents the LReLU activation function after the first dilated convolution layer. Furthermore, C A S c is the channel attention similarity obtained through the CASM, as shown in Figure 3. In addition, similar channel pixels have similar weights. The two channels in dark red in Figure 3 have similar weights w 1 . Furthermore, CASM gives a large weight to key channel features.

3.2.3. Spatial Attention Similarity Module (SASM)

SASM also uses global average pooling to compress the channel in the spatial dimension from C × H × W to 1 × H × W, as shown in Figure 4. We use F C A M R C × H × W to represent the input feature map of the spatial attention module. Then, It is compressed to R 1 × H × W using global average pooling. Then, through a set of non-linear transformation processes, we can obtain the feature map S A S S R 1 × H × W of the spatial attention similarity.
S A S S = σ 2 ( H D C o n v ( H G A P ( F C A S M ) ) )
where H D C o n v denotes the dilation convolution, the kernel size is 3 × 3, and the dilation rate is 2. σ 2 denotes the sigmoid activation function.

3.3. Implementation Details

In the model, a 3 × 3 convolution kernel was used in all cases without special instructions. Furthermore, we used a zero-padding strategy to keep the size of the image constant. At the same time, a 1 × 1 convolution kernel was used behind each concatenation layer to control the number of output channels. In addition, in the attention similarity module, we used dilated convolution instead of standard convolution, to enlarge the receptive field and reduce the number of parameters. Moreover, RDASMs had a total of 16 residual dense attention similarity modules, and there were 8 levels of convolution in each RDASM.

4. Experiment Results

In this section, we present the experimental setup of the model, the experimental results, and the corresponding ablation experiments.

4.1. Datasets

4.1.1. Train Datasets

The training datasets consisted of three types: gray image datasets, color image datasets, and real noisy image datasets. The gray image datasets were trained using blind noise and consisted of two public datasets, namely the Waterloo Exploration Database [30] and the BSD400 dataset [11,19]. The BSD400 dataset was randomly selected from ImageNet’s [31] validation set and stored in PNG format. The Waterloo Exploration Database consists of 4744 natural images in PNG format. The color image datasets included the Waterloo Exploration Database and BSD432 [7]. The BSD432 dataset is derived from the Berkeley Segmentation datasets and contains 432 color images. The real noisy image datasets used polyU-Real-World-Noisy-Images datasets [32] to train the model. The polyU-Real-World-Noisy-Images datasets consists of 100 color images with real noise, which are obtained from five cameras: Sony A7 II, Nikon D800, Canon 80D, Canon 600D, and Canon 5D Mark II.

4.1.2. Test Datasets

Similarly, the test datasets also included gray image datasets, color image datasets, and real noisy image datasets. The gray image datasets consisted of Set12 and BSD68 [7]. Set12 has 12 gray images, while BSD68 has 68 gray images. The color image test datasets included CBSD68, McMaster [33], and Kodak24 [34]. McMaster and Kodak24 contain 18 and 24 color images, respectively. The real noisy image test dataset was cc [35]. The cc contains 15 real noisy images from different ISO (1600, 3200, and 6400).

4.2. Experimental Settings

The main parameters of model training are shown in Table 1. For gray images, the patch size was set to 80 × 80; for color images, it was set to 80 × 80; and for real noisy images, it was set to 64 × 64. In gray image datasets, color image datasets, and real noisy image datasets, we trained 400, 400, and 65 epochs, respectively. In addition, we set the initial learning rate at 1 × 10 4 , and LR remained unchanged in the first 80% of the epochs and subsequently changed to 0.1 times the original with each epoch. Furthermore, in each epoch, we obtained a blind noisy patch by adding the AWGN of noise range σ = [ 0 , 75 ] to the clean patch.

4.3. Quantitative and Qualitative Evaluation

4.3.1. RDASNet for Gray Image Denoising

For gray image denoising, we chose several state-of-art denoising methods with the same test datasets, including BM3D [5], DnCNN [7], FFDNet [11], BRDNet [14], ADNet [15], and RDN [13]. In addition, BM3D is a denoising method based on the prior knowledge of images; DnCNN, BRDNet, and ADNet are image-denoising non-blind methods based on CNNs, and FFDNet is blind image denoising. It should be noted that the design of our residual dense module was inspired by RDN, and the noise levels of RDN test datasets are different from other methods, so we retrained the RDN. Moreover, BM3D, DnCNN, FFDNet, BRDNet, and ADNet yielded PSNR, and with direct reference to this, the SSIM was recalculated.
Table 2 and Table 3 report the PSNR and SSIM results on Set12 and BSD68 datasets, respectively. In terms of the quantitative results, our RDASNet achieved the same or better results in most cases than all other methods. The quantitative results of RDASNet were mostly optimal or suboptimal. In particular, in the complex noise, our model was superior to all the most advanced image-denoising methods, mainly because more global image similarity information is paid attention to using our proposed model.
Figure 5 and Figure 6 show the visualization results. It can be found that the RDASNet proposed by us obtained clearer results and more advanced details. Taking “Starfish.png” in Figure 5 as an example, The effect of BM3D restoration was the least ideal, and other methods, such as FFDNet, BRDNet, ADNet, etc., all had different degrees of distortion. In contrast, our RDASNet could better alleviate blur and restore more image details.

4.3.2. RDASNet for Color Image Denoising

For color image denoising, we compared RDASNet with CBM3D [5], DnCNN, FFDNet, BRDNet, ADNet, and RDN [13].
Table 4 reports the PSNR and SSIM results using the CBSD68, Kodak24, and McMaster datasets. From the quantitative results, it can be inferred that the RDASNet proposed by us outperformed all the other image-denoising methods on color images. This is mainly due to the fact that our model can pay more attention to the global information of the image. Furthermore, Figure 7 and Figure 8 show the visualization results.

4.3.3. RDASNet for Real Noisy Image Denoising

For real noisy images, we chose several commonly used image-denoising methods with the same test datasets, such as CBM3D, WNNM [6], DnCNN, BRDNet, ADNet, and RDN. Table 5 reports the PSNR results using the cc dataset. From Table 5, we can observe that the model proposed by us still achieved the best effect of denoising for real images in terms of the overall mean value. Furthermore, in contrast to other methods, our RDASNet had a good denoising effect taken by different camera devices and could better adapt to different devices.

4.4. Ablation Studies

4.4.1. RDASM Design

On the one hand, CNN extracts the image features of the fixed receptive field through convolution operation and pays more attention to the local information of the image. In addition, there are two methods to enlarge the receptive field of convolution [36]: (1) using a larger convolution kernel, for example, a 5 × 5 or 7 × 7 convolution kernel instead of a 3 × 3 convolution kernel; (2) deepening the network layer. Both methods lead to a surge in the number of parameters in the model. On the other hand, there is redundant information in the image; that is, some details of the whole image are similar. Classical image-denoising methods, such as NLM [4] and BM3D [5], obtain better performance by using image similarity. The core idea of the NLM is that the estimate of the current pixel is obtained using the weighted average of the pixels with similar structures in the image. Furthermore, BM3D is a block-matching and 3D-filtering method, and during the block-matching process, similar blocks are found and then filtered. However, in image denoising, few CNN models make full use of the global similarity information of the image, which limits the representation ability of the models. Inspired by this, we designed the RDASM.
RDASM Includes the RDM and ASM. The RDM is formed through residual learning and dense convolution layers, as shown in Figure 2. The RDM focuses on the local information of the image and extracts its features. The ASM consists of CASM and SASM and focuses on the global similarity information of the images from two dimensions of channel and space, respectively, as shown in Figure 3 and Figure 4. The ASM we designed has several differences: (1) It uses an attention mechanism to mine global similarity information. Through CASM or SASM, the channel attention similarity or spatial attention similarity map can be obtained. Similar image details have similar weights, and key features are given larger weights. (2) Dilated convolution is used to enlarge the receptive field, so as to better focus on the global information of the image, and the number of parameters is less than standard convolution. (3) Avg-polling is beneficial to smooth and suppress the noise; more detail can be found in Section 4.4.2.
In order to verify the effectiveness of the RDASM, especially the effectiveness of the attention similarity module (ASM), we designed a set of comparative experiments to compare the merits and demerits of the RDM and RDASM in image denoising on multiple datasets. The results in Table 6 show that the proposed RDASM has a better effect on image denoising, which proves the effectiveness of the proposed structure.
In addition, we visualized the proposed RDASM, as shown in Figure 9. In Figure 9, label (*1) is the noisy image, (*2) is the heatmap, and (*3) is the denoising image. In (a2) heatmap, we can see that the lower left corner a1, the middle a2, and the upper right corner a3 of the image have similar details and similar weights. Furthermore, in the figure (a3) denoising image, it can be seen that the details of a1, a2, and a3 are indeed very similar. In addition, the attention mechanism gives more weight to the key features (red in the figure indicates large weight), and therefore a2’s weight w2 is larger than a4’s w4. This shows that through RASM, the model can make full use of the redundant information in the image. The global similarity information, in particular, is very useful in images with complex noisy backgrounds.

4.4.2. Global Pooling Design

As mentioned in the section describing the RDCSAM design, it is global pooling that enables the attention mechanism to pay attention to the global information of the image, so global pooling is very important.
Furthermore, here, we made a notable discovery. During an experiment, it was accidentally found that after avg-pooling, the color of an image would be dimmer, and the noise would appear less obvious. However, max-pooling was just the opposite, highlighting the noise, as shown in Figure 10.
In real life, if the light is dim at night, it is difficult to find a small spot on the face, but if it is in a well-lit situation, it is easy to find a small spot on the face. Therefore, we wondered whether the average pooling would have a positive effect on the final performance of the model. Subsequently, experiments were carried out to verify this conjecture.
Then, we only changed the way of global pooling in CBAM and only used average pooling, maximum pooling, and CBAM training models, respectively. In order to save time, the model only trained 200 epochs, and the results are shown in Table 7. This proves once again that we can achieve a better image-denoising effect using avg-pooling; that is, avg-pooling can suppress the noise.

4.5. Complexity Analysis

The testing speed of the model is also an important evaluation index. Thus, Table 8 shows the running time results of BM3D, WNNM, DnCNN, FFDNet, BRDNet, RDN, and RDASNet for gray image denoising with sizes of 256 × 256 and 512 × 512, where the noise level is σ = 50 . Furthermore, we found that compared with RDN, the speed of our model was faster. In addition, we compared the number of parameters with the PSNR of McMaster ( σ = 50 ), as shown in Figure 11. ADNet and BRDNet are lightweight models proposed to solve resource-constrained situations, their parameters were smaller, and the performance of our model was better. Our model had high parameters, compared with RDN, and the parameters of our proposed model only increased by 0.07 M, and from Table 2, Table 3 and Table 4, it is evident that PSNR increased by 0.10–0.20 dB for gray images and by 0.11–0.22 dB for color images. The evaluation was conducted in a PyCharm community (2021) environment with an Nvidia GeForce RTX 3090 Ti GPU.

5. Conclusions

In this paper, we proposed a residual dense attention similarity network (RDASNet) for image denoising. The local information of the image was extracted using a CNN, and the global information of the image was extracted using the attention similarity module. In addition, our model obtained the shallow features through a preprocessing module; then, the CNN was used to pay more attention to the local information, and the attention similarity module was used to pay more attention to the global similar information, so as to fully benefit from the redundant information of the image. Moreover, similar details had similar weights, and more weight would be given to key features, making the model more suitable for complex noise. Furthermore, global residual learning was used to enhance the effect from a shallow layer to a deep layer. Our proposed RDASNet is more suitable for blind noise, complex environments, and real noise.
In the future, we hope to deploy RDASNet on the mobile end. In general, deployment on mobile platforms requires a smaller model. One possible solution is to use deeply separable convolution instead of traditional convolution to reduce the number of model parameters, or to compress the model through knowledge distillation to design a lightweight RDASNet.

Author Contributions

Conceptualization, H.T. and R.H.; methodology, H.T.; software, H.T.; validation, H.T. and Q.Y.; formal analysis, H.T.; investigation, H.T. and R.H.; resources, H.T.; data curation, W.G.; writing—original draft preparation, H.T. and R.H.; writing—review and editing, H.T., W.G., R.H., Q.Y. and J.Z.; visualization, H.T.; supervision, W.G. and Q.Y.; project administration, W.G.; funding acquisition, W.G. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 51975452).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: [https://ece.uwaterloo.ca/~k29ma/exploration/ (accessed 18 November 2021); https://image-net.org/ (accessed 16 November 2021); https://github.com/csjunxu/PolyU-Real-World-Noisy-Images-Dataset (accessed 19 January 2022); https://ieeexplore.ieee.org/abstract/document/8365806/ (accessed 18 November 2021); http://r0k.us/graphics/kodak (accessed 22 November 2021); http://snam.ml/research/ccnoise/ (accessed 19 January 2022)].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gao, X.; Zhang, M.; Luo, J. Low-Light Image Enhancement via Retinex-Style Decomposition of Denoised Deep Image Prior. Sensors 2022, 22, 5593. [Google Scholar] [CrossRef] [PubMed]
  2. Fahim, M.A.N.I.; Saqib, N.; Siam, S.K.; Jung, H.Y. Denoising Single Images by Feature Ensemble Revisited. Sensors 2022, 22, 7080. [Google Scholar] [CrossRef] [PubMed]
  3. Xu, S.; Chen, X.; Tang, Y.; Jiang, S.; Cheng, X.; Xiao, N. Learning from Multiple Instances: A Two-Stage Unsupervised Image Denoising Framework Based on Deep Image Prior. Appl. Sci. 2022, 21, 10767. [Google Scholar] [CrossRef]
  4. Buades, A.; Coll, B.; Morel, J.M. A non-local algorithm for image denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 60–65. [Google Scholar]
  5. Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
  6. Gu, S.; Zhang, L.; Zuo, W.; Feng, X. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2862–2869. [Google Scholar]
  7. Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
  9. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  10. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  11. Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2472–2481. [Google Scholar]
  13. Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2480–2495. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Tian, C.; Xu, Y.; Zuo, W. Image denoising using deep CNN with batch renormalization. Neural Netw. 2020, 121, 461–473. [Google Scholar] [CrossRef] [PubMed]
  15. Tian, C.; Xu, Y.; Li, Z.; Zuo, W.; Fei, L.; Liu, H. Attention-guided CNN for image denoising. Neural Netw. 2020, 124, 117–129. [Google Scholar] [CrossRef] [PubMed]
  16. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2018; pp. 4700–4708. [Google Scholar]
  17. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Chen, T. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef] [Green Version]
  18. Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  19. Tian, C.; Fei, L.; Zheng, W.; Xu, Y.; Zuo, W.; Lin, C.W. Deep learning on image denoising: An overview. Neural Netw. 2020, 131, 251–275. [Google Scholar] [CrossRef] [PubMed]
  20. Ren, W.; Liu, S.; Ma, L.; Xu, Q.; Xu, X.; Cao, X.; Yang, M.H. Low-light image enhancement via a deep hybrid network. IEEE Trans. Image Process. 2019, 28, 4364–4375. [Google Scholar] [CrossRef] [PubMed]
  21. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lile, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
  22. Ioffe, S. Batch renormalization: Towards reducing minibatch dependence in batch-normalized models. In Proceedings of the IAdvances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1946–1954. [Google Scholar]
  23. Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Hu, S.M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
  24. Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  25. Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2204–2212. [Google Scholar]
  26. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5999–6009. [Google Scholar]
  27. Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2017–2025. [Google Scholar]
  28. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
  29. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  30. Ma, K.; Duanmu, Z.; Wu, Q.; Wang, Z.; Yong, H.; Li, H.; Zhang, L. Waterloo Exploration Database: New Challenges for Image Quality Assessment Models. IEEE Trans. Image Process. 2017, 26, 1004–1016. [Google Scholar] [CrossRef] [PubMed]
  31. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  32. Xu, J.; Li, H.; Liang, Z.; Zhang, D.; Zhang, L. Real-world noisy image denoising: A new benchmark. arXiv 2018, arXiv:1804.02603. [Google Scholar]
  33. Zhang, L.; Wu, X.; Buades, A.; Li, X. Color demosaicking by local directional interpolation and nonlocal adaptive thresholding. J. Electron. Imaging 2011, 20, 023016. [Google Scholar]
  34. Franzen, R. Kodak Lossless True Color Image Suite. Available online: http://r0k.us/graphics/kodak (accessed on 22 November 2021).
  35. Nam, S.; Hwang, Y.; Matsushita, Y.; Kim, S.J. A holistic approach to cross-channel image noise modeling and its application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1683–1691. [Google Scholar]
  36. Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
Figure 1. Network architecture of the proposed RDASNet.
Figure 1. Network architecture of the proposed RDASNet.
Sensors 23 01486 g001
Figure 2. Residual dense attention similarity module. CASM and SASM represent the channel attention similarity module and spatial attention similarity module, respectively.
Figure 2. Residual dense attention similarity module. CASM and SASM represent the channel attention similarity module and spatial attention similarity module, respectively.
Sensors 23 01486 g002
Figure 3. Channel attention similarity module. Channels with similar pixels have similar weights, and key channel features have larger weights. The two dark red channels in the figure are similar and have a similar weight w 1 .
Figure 3. Channel attention similarity module. Channels with similar pixels have similar weights, and key channel features have larger weights. The two dark red channels in the figure are similar and have a similar weight w 1 .
Sensors 23 01486 g003
Figure 4. Spatial attention similarity module. Spatial similar regions have similar weights, and spatial key features have larger weights.
Figure 4. Spatial attention similarity module. Spatial similar regions have similar weights, and spatial key features have larger weights.
Sensors 23 01486 g004
Figure 5. Denoising results of different methods on one image from Set12 with noise level σ = 50 : (a) original image, (b) noisy image, (c) BM3D/25.04 dB, (d) DnCNN/25.70 dB, (e) FFDNet/25.75 dB, (f) BRDNet/25.77 dB, (g) ADNet/25.70 dB, (h) RDN/25.65 dB, and (i) RDASNet/25.94 dB.
Figure 5. Denoising results of different methods on one image from Set12 with noise level σ = 50 : (a) original image, (b) noisy image, (c) BM3D/25.04 dB, (d) DnCNN/25.70 dB, (e) FFDNet/25.75 dB, (f) BRDNet/25.77 dB, (g) ADNet/25.70 dB, (h) RDN/25.65 dB, and (i) RDASNet/25.94 dB.
Sensors 23 01486 g005
Figure 6. Denoising results of different methods on one image from BSD68 with noise level σ = 25 : (a) original image, (b) noisy image, (c) BM3D/28.42 dB, (d) DnCNN/29.11 dB, (e) FFDNet/29.16 dB, (f) BRDNet/29.26 dB, (g) ADNet/29.11 dB, (h) RDN/29.17 dB, and (i) RDASNet/29.21 dB.
Figure 6. Denoising results of different methods on one image from BSD68 with noise level σ = 25 : (a) original image, (b) noisy image, (c) BM3D/28.42 dB, (d) DnCNN/29.11 dB, (e) FFDNet/29.16 dB, (f) BRDNet/29.26 dB, (g) ADNet/29.11 dB, (h) RDN/29.17 dB, and (i) RDASNet/29.21 dB.
Sensors 23 01486 g006
Figure 7. Denoising results of different methods on one image from Kodak24 with noise level σ = 35 : (a) original image, (b) noisy image, (c) CBM3D/27.33 dB, (d) DnCNN/28.43 dB, (e) FFDNet/28.60 dB, (f) BRDNet/28.88 dB, (g) ADNet/28.68 dB, (h) RDN/28.94 dB, and (i) RDASNet/29.11 dB.
Figure 7. Denoising results of different methods on one image from Kodak24 with noise level σ = 35 : (a) original image, (b) noisy image, (c) CBM3D/27.33 dB, (d) DnCNN/28.43 dB, (e) FFDNet/28.60 dB, (f) BRDNet/28.88 dB, (g) ADNet/28.68 dB, (h) RDN/28.94 dB, and (i) RDASNet/29.11 dB.
Sensors 23 01486 g007
Figure 8. Denoising results of different methods on one image from McMaster with noise level σ = 15 : (a) original image, (b) noisy image, (c) CBM3D/35.49 dB, (d) DnCNN/35.63 dB, (e) FFDNet/36.64 dB, (f) BRDNet/37.20 dB, (g) ADNet/36.59 dB, (h) RDN/37.20 dB, and (i) RDASNet/37.39 dB.
Figure 8. Denoising results of different methods on one image from McMaster with noise level σ = 15 : (a) original image, (b) noisy image, (c) CBM3D/35.49 dB, (d) DnCNN/35.63 dB, (e) FFDNet/36.64 dB, (f) BRDNet/37.20 dB, (g) ADNet/36.59 dB, (h) RDN/37.20 dB, and (i) RDASNet/37.39 dB.
Sensors 23 01486 g008
Figure 9. The thermodynamic image of RDASM is proposed. (a1a3) are the noise images, (b1b3) are the corresponding heatmaps, (c1c3) are the corresponding denoising images. In (a2) figure, the details of (a1a3) are similar, and the weights w1, w2, and w3 are similar. Red indicates high weight, and attention gives more weight to key features.
Figure 9. The thermodynamic image of RDASM is proposed. (a1a3) are the noise images, (b1b3) are the corresponding heatmaps, (c1c3) are the corresponding denoising images. In (a2) figure, the details of (a1a3) are similar, and the weights w1, w2, and w3 are similar. Red indicates high weight, and attention gives more weight to key features.
Sensors 23 01486 g009
Figure 10. A comparison of avg-pooling and max-pooling. (The noise level σ = 50 ).
Figure 10. A comparison of avg-pooling and max-pooling. (The noise level σ = 50 ).
Sensors 23 01486 g010
Figure 11. PSNR results on McMaster σ = 50 vs. the number of parameters of different methods.
Figure 11. PSNR results on McMaster σ = 50 vs. the number of parameters of different methods.
Sensors 23 01486 g011
Table 1. Main parameters of RDASNet.
Table 1. Main parameters of RDASNet.
RDASNetBlind RangePatch SizeBatch SizeEpochs
gray[0, 75]80 × 804400
color[0, 75]80 × 804400
real-64 × 64465
Table 2. PSNR (dB) and SSIM of different methods on the Set12 for different noise levels (i.e., 15, 25, and 50). The best PSNR two results are shown in bold and underlined, respectively.
Table 2. PSNR (dB) and SSIM of different methods on the Set12 for different noise levels (i.e., 15, 25, and 50). The best PSNR two results are shown in bold and underlined, respectively.
ImagesC.manHousePeppersStarfishMonarchAirplaneParrotLenaBarbaraBoatManCoupleAverage
NoiseLevel σ =15
BM3D31.9134.9332.6931.1431.8531.0731.3734.2633.1032.1331.9232.1032.37/-
DnCNN32.6134.9733.3032.2033.0931.7031.8334.6232.6432.4232.4632.4732.86/0.9020
FFDNet32.4335.0733.2531.9932.6631.5731.8134.6232.5432.3832.4132.4632.77/0.9055
BRDNet32.8035.2733.4732.2433.3531.8532.0034.7532.9332.5532.5032.6233.03/0.9076
ADNet32.8135.2233.4932.1733.1731.8631.9634.7132.8032.5732.4732.5832.98/0.9031
RDN32.3335.1833.4132.1433.0031.7031.8734.6932.3432.4232.4132.5832.84/0.9045
RDASNet32.7035.2833.4132.2633.1831.8531.9634.7532.7632.4832.4932.6332.99/0.9069
NoiseLevel σ =25
BM3D29.4532.8530.1628.5629.2528.4228.9332.0730.7129.9029.6129.7129.97/-
DnCNN30.1833.0630.8729.4130.2829.1329.4332.4430.0030.2130.1030.1230.43/0.8605
FFDNet30.1033.2830.9329.3230.0829.0429.4432.5730.0130.2530.1130.2030.44/0.8662
BRDNet31.3933.4131.0429.4630.5029.2029.5532.6530.3430.3330.1430.2830.61/0.8669
ADNet30.3433.4131.1429.4130.3929.1729.4932.6130.2530.3730.0830.2430.58/0.8634
RDN29.9733.5331.1129.2530.2329.1429.4332.6229.7230.2130.0930.3130.47/0.8644
RDASNet30.2933.8631.1229.4230.5129.2129.6132.8030.3130.3530.1830.3630.67/0.8679
NoiseLevel σ =50
BM3D26.1329.6926.6825.0425.8225.1025.9029.0527.2226.7826.8126.4626.72/-
DnCNN27.0330.0027.3225.7026.7825.8726.4829.3926.2227.2027.2426.9027.18/0.7810
FFDNet27.0530.3727.5425.7526.8125.8926.5729.6626.4527.3327.2927.0827.32/0.7893
BRDNet27.4430.5327.6725.7726.9725.9326.6629.7326.8527.3827.2727.1727.45/0.7915
ADNet27.3130.5927.6925.7026.9025.8825.5629.5926.6427.3527.1727.0727.37/0.7862
RDN27.1630.7727.5125.6526.9325.8026.5329.7526.0827.3727.2527.1427.33/0.7876
RDASNet27.3431.0327.6925.9426.9926.1226.5329.8026.8927.4127.2827.2927.53/0.7940
Table 3. PSNR (dB) and SSIM of different methods on the BSD68 for different noise levels (i.e., 15, 25, and 50). The best PSNR two results are shown in bold and underlined, respectively.
Table 3. PSNR (dB) and SSIM of different methods on the BSD68 for different noise levels (i.e., 15, 25, and 50). The best PSNR two results are shown in bold and underlined, respectively.
Methods σ = 15 σ = 25 σ = 50
BM3D31.07/-28.57/-25.62/-
DnCNN31.72/0.890129.23/0.827626.23/0.7170
FFDNet31.62/0.895229.19/0.834526.30/0.7278
BRDNet31.79/0.896629.29/0.834626.26/0.7284
ADNet31.74/0.888229.25/0.826029.29/0.7169
RDN31.62/0.894329.16/0.831426.27/0.7223
RDASNet31.77/0.896029.30/0.834726.38/0.7294
Table 4. PSNR (dB) and SSIM of different methods on the CBSD68, Kodak24, and McMaster for different noise levels (i.e., 15, 25, 35, 50, and 75). The best PSNR two results are shown in bold and underlined, respectively.
Table 4. PSNR (dB) and SSIM of different methods on the CBSD68, Kodak24, and McMaster for different noise levels (i.e., 15, 25, 35, 50, and 75). The best PSNR two results are shown in bold and underlined, respectively.
DatasetsMethods σ = 15 σ = 25 σ = 35 σ = 50 σ = 75
CBM3D33.52/-30.71/-28.89/-27.38/-25.74/-
DnCNN33.98/0.929031.31/0.883029.65/0.842128.01/0.7896-
FFDNet33.80/0.931031.18/0.886429.58/0.8473-26.57/0.7285
CBSD68BRDNet34.10/-31.43/0.891229.77/0.850028.16/0.800526.43/0.7342
ADNet33.99/0.932531.31/0.887829.66/0.847928.04/0.796126.33/0.7288
RDN34.06/0.934231.42/0.890729.79/0.852228.18/0.801626.50/0.7376
RDASNet34.17/0.935431.53/0.892529.91/0.854728.31/0.805526.63/0.7421
CBM3D/-34.28/-31.68/-29.90/-28.46/-26.82/-
DnCNN34.73/0.920932.23/0.877530.64/0.839829.02/0.7917-
FFDNet34.55/0.923432.11/0.881830.56/0.845528.99/0.799327.25/0.7373
Kodak24BRDNet34.88/-32.41/0.886930.80/0.848529.22/0.802927.49/0.7442
ADNet34.76/0.925032.26/0.883030.68/0.846129.10/0.799027.40/0.7637
RDN35.02/0.927232.55/0.886931.00/0.851529.45/0.806127.80/0.7498
RDASNet35.16/0.929032.69/0.889331.16/0.854829.60/0.811127.95/0.7546
CBM3D34.06/-31.66/-29.92/-28.51/-26.79/-
DnCNN34.80/-32.47/-30.91/-29.21/--
FFDNet34.47/0.922432.25/0.889430.76/0.859929.14/0.820127.29/0.7635
McMasterBRDNet35.08/-32.75/0.896531.15/0.864729.52/0.826027.72/0.7734
ADNet34.93/0.925632.56/0.889931.00/0.859829.36/0.819027.53/0.7637
RDN35.04/0.928532.74/0.896131.22/0.867229.60/0.830027.82/0.7800
RDASNet35.21/0.930132.91/0.898831.41/0.871029.81/0.836028.04/0.7884
Table 5. PSNR (dB) of different methods using the cc dataset. The best two results are shown in bold and underlined, respectively.
Table 5. PSNR (dB) of different methods using the cc dataset. The best two results are shown in bold and underlined, respectively.
Camera SettingCBM3DWNNMDnCNNBRDNetADNetRDNRDASNet
Canon39.7637.5137.2637.6335.9635.8537.05
5D36.4033.8634.1337.2836.1136.0737.14
ISO = 320036.3731.4334.0937.7534.4936.4236.85
Nikon34.1833.4633.6234.5533.9438.0038.09
D60035.0736.0934.4835.9934.3337.4138.14
ISO = 320037.1339.8635.4138.6238.8735.8734.47
Nikon36.8136.3535.7939.2237.6139.6639.96
D80037.7639.9936.0839.6738.2439.0639.36
ISO = 160037.5137.1535.4839.0436.8937.9138.40
Nikon35.0538.6034.0838.2837.2037.9338.33
D80034.0736.0433.7037.1835.6737.7337.86
ISO = 320034.4239.7333.3138.8538.0937.3138.30
Nikon31.1333.2929.8332.7532.2435.4736.25
D80031.2231.1630.5533.2432.5930.6128.68
ISO = 640030.9731.9830.0932.8933.1436.4937.27
Average35.1935.7733.8636.7335.6936.7937.01
Table 6. PSNR (dB) of RDASM and RDM using the Set12 and BSD68 datasets for different noise levels (i.e., 15, 25, 35, 50, and 75).
Table 6. PSNR (dB) of RDASM and RDM using the Set12 and BSD68 datasets for different noise levels (i.e., 15, 25, 35, 50, and 75).
Methods σ = 15 σ = 25 σ = 35 σ = 50 σ = 75
Set12
RDM32.8430.4728.9627.3325.54
RDASM32.9930.6729.1727.5325.73
BSD68
RDM31.6229.1627.7026.2724.79
RDASM31.7629.3029.8226.3724.90
Table 7. PSNR (dB) on Set12, BSD68, CBSD68, Kodak24, and McMaster using different pooling methods.
Table 7. PSNR (dB) on Set12, BSD68, CBSD68, Kodak24, and McMaster using different pooling methods.
MethodsAvg-PoolingMax-PoolingCBAM
Noise Level σ = 50
Set1227.494127.402827.4138
BSD6826.345526.274026.2614
CBSD6828.315328.243128.2083
Kodak2429.517429.512429.4744
McMaster29.690829.690629.6834
Table 8. Comparison of parameters and run times.
Table 8. Comparison of parameters and run times.
MethodsDevicePSNR/256 × 256PSNR/512 × 512Parameters
BM3DCPU25.04/0.5929.05/2.52-
WNNMCPU25.44/203.129.25/773.2-
DnCNNGPU25.70/0.006129.39/0.0089556 K
FFDNetGPU25.75/0.011329.66/0.0168490 K
BRDNetGPU25.77/0.063129.73/0.20181115 K
ADNetGPU25.70/0.008629.59/0.0109519 K
RDNGPU25.65/0.020329.75/0.033521.97 M
RDASNetGPU25.94/0.019425.80/0.020122.05 M
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tao, H.; Guo, W.; Han, R.; Yang, Q.; Zhao, J. RDASNet: Image Denoising via a Residual Dense Attention Similarity Network. Sensors 2023, 23, 1486. https://doi.org/10.3390/s23031486

AMA Style

Tao H, Guo W, Han R, Yang Q, Zhao J. RDASNet: Image Denoising via a Residual Dense Attention Similarity Network. Sensors. 2023; 23(3):1486. https://doi.org/10.3390/s23031486

Chicago/Turabian Style

Tao, Haowu, Wenhua Guo, Rui Han, Qi Yang, and Jiyuan Zhao. 2023. "RDASNet: Image Denoising via a Residual Dense Attention Similarity Network" Sensors 23, no. 3: 1486. https://doi.org/10.3390/s23031486

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop