A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising

Wang, Yi; Song, Xiao; Gong, Guanghong; Li, Ni

doi:10.3390/electronics10030319

Open AccessArticle

A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising

¹

School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China

²

School of Cyber Science and Technology, Beihang University, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Electronics 2021, 10(3), 319; https://doi.org/10.3390/electronics10030319

Submission received: 28 December 2020 / Revised: 24 January 2021 / Accepted: 25 January 2021 / Published: 29 January 2021

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Due to the rapid development of deep learning and artificial intelligence techniques, denoising via neural networks has drawn great attention due to their flexibility and excellent performances. However, for most convolutional network denoising methods, the convolution kernel is only one layer deep, and features of distinct scales are neglected. Moreover, in the convolution operation, all channels are treated equally; the relationships of channels are not considered. In this paper, we propose a multi-scale feature extraction-based normalized attention neural network (MFENANN) for image denoising. In MFENANN, we define a multi-scale feature extraction block to extract and combine features at distinct scales of the noisy image. In addition, we propose a normalized attention network (NAN) to learn the relationships between channels, which smooths the optimization landscape and speeds up the convergence process for training an attention model. Moreover, we introduce the NAN to convolutional network denoising, in which each channel gets gain; channels can play different roles in the subsequent convolution. To testify the effectiveness of the proposed MFENANN, we used both grayscale and color image sets whose noise levels ranged from 0 to 75 to do the experiments. The experimental results show that compared with some state-of-the-art denoising methods, the restored images of MFENANN have larger peak signal-to-noise ratios (PSNR) and structural similarity index measure (SSIM) values and get better overall appearance.

Keywords:

image denoising; attention neural network; multi-scale feature extraction; PSNR; SSIM

1. Introduction

Image denoising is a fundamental and classic topic of image processing tasks. Due to the varying environment and sensor noise, the captured image usually contains noise and the transmission and storage process may also cause the image to be degraded by noise [1]. Therefore, image denoising is an important and indispensable part of many high-level vision tasks [2,3,4]. Additive White Gaussian noise (AWGN) is the most representative noise among all kind of noises and we make a common assumption that the images are degraded by AWGN. The model of an image which is degraded by AWGN can be described as:

y = x + v

, where y is the observed degraded image, x is the noiseless clean image, v is the AWGN with zero mean and standard deviation is

σ

. The image denoising problem is to restore the noiseless clean image x from the observed image y.

Recently, a large number of methods have been proposed for image denoising [5,6,7,8,9]. A direct way to restore the image is to estimate the noise v, and the noiseless clean image is acquired by

y - v

. However, for a long period, accurately estimating the noise was once a difficult and almost impossible mission. Before the convolutional neural networks become popular, in [10], a deep convolutional neural residue network was proposed to learn the noise, which got superior results to many typical denoising methods. The bilateral filter [11] is a kind of widely used denoising method for its adaptability and good performance, but the performance decreases rapidly at high noise levels. An improvement was the non-local means (NLM) denoising method [12], which is achieved under the assumption that natural scenes tend to repeat themselves in the same and different scales. NLM acquires better performance than the bilateral filter, but a difficulty for NLM is to tune the hyper-parameters which depend on the noise’s standard deviations, so an improper choice of hyper-parameters would cause it to lose edges or leave noise. Ville uses Stein’s unbiased risk estimate to monitor the mean square errors and avoid tuning the hyper-parameters [13]. BM3D [14] reaches the peak of the improved NLM methods, and it is a benchmark for image denoising methods. Transform domain denoising is another kind of popular denoising method [15,16]. It transforms a noisy image to a transform domain and removes noise by tuning the coefficients. Fourier transform denoising (FTD) is a typical transform domain denoising method [17]. It transforms a noisy image to the frequency domain and removes frequencies connected with noise, and recovers the image by inverse Fourier transform. FTD faces a difficulty in determining whether the high frequency information is noise or features. Wavelet domain denoising is a development of the Fourier transform, which maps an image to the wavelet domain; the wavelet coefficients of higher amplitude are information; noise is removed by clipping smaller amplitude coefficients [18,19]. Rajwade in [20] used singular value decomposition for image denoising; noises are considered to relate to smaller singular values, and the noises are removed by dropping smaller singular values. Sparse and redundant representation is another popular transform domain denoising method, which trains a redundant dictionary from the noisy image, and acquires a restored image by optimizing an object function with sparse coefficients priors [9]. Protter [21] generalized the sparse and representation methods to image sequence denoising. Later, sparse and representation methods were combined with non-local means to get better performance [22]. There are also many other denoising methods, such as total variation [23,24] and statistical neighborhood approaches [25]. Most of the above methods are model-based methods that rely on prior knowledge, and they are realized by optimization methods. Three drawbacks for these methods are trying to balance the noise removal and detail-preservation, choosing the prior knowledge and searching for the optimal solution.

An alternative way is the discriminate learning methods, which learn the mapping that maps noisy images to corresponding noiseless clean ones. Burger in [26] proposed a plain neural network for image denoising, which acquires comparable performance to BM3D. ZhangK in [10] proposed a fully convolutional network for image denoising. By learning the residue of an image, it can not only remove AWGN, but also work on other image processing tasks, such as image super-resolution and image deblocking. However, for the above discriminate learning methods, they demand training models for each noise level, which brings great inconvenience. To tackle this problem, Isogawa in [3] proposed a novel activation function with a varying threshold; the noisy images with different noise levels are restored by a unique network. Zhang et al. [27] used the down-sampled sub-images to train the model and adopted the noisy image and noise level map as the input; noisy images under different noise levels can be handled by a single network. Lefkimmiatis in [28] integrates non-local self-similarity into a convolutional neural network (CNN) and gets results competitive with many state-of-the-art methods.

Although the CNN achieves excellent denoising effects, for most CNN based methods, the convolution kernels are of a size of one; the features on distinct scales are neglected. In addition, all the channels of features are treated equally; the relationships of channels are not considered. In this paper, we propose a multi-scale feature extraction-based normalized attention neural network for image denoising. In MFENANN, we define a feature extraction block which extracts and combines features at scales of

1 \times 1

,

3 \times 3

and

5 \times 5

of the noisy image. Moreover, we introduce 1D normalization techniques to NAN, which smooths the optimization landscape in training, and refines the relationship between channels. Furthermore, we introduce the NAN to MFENANN for denoising, in which every channel is augmented by assigning an amount of gain. In this paper, we take the down-sampled images to train the network, which enlarges the receptive field, and reduces the number of calculations in the training. A residual net is used to avoid losing shallow features. Moreover, the network learns the residue of the noisy image, and the noiseless clean image is obtained from the difference between the noisy image and the residue.

In general, the contributions of the paper are summarized as follows: (1) We define a feature extraction block to extract and combine different scale features of the noisy image, which causes the feature maps to contain more detailed information of the original image, and enhances the ability of the network to maintain details. (2) We propose a normalized attention network to learn the relationship between channels, which smooths the optimization landscape and speeds up the convergence process for training an attention model. (3) We introduce NAN to image denoising, in which each channel gets an amount of gain, and channels play different roles in the subsequent convolution, which improves the performance of image denoising.

2. Related Works

2.1. Residual Network

The residual network (ResNet) [29] was proposed by He, in which the underlying mapping of stacked layers is addressed as

H (X)

, and the output of the ResNet block is addressed as

F (X)

, where X is the input. The process to learn the output

F (X)

is called residual learning. The concrete model is defined as:

H (X) = F (W, X) + X

(1)

where

F (\cdot, \cdot)

is the residual mapping to be learned and

F (W, X) = F (X)

. ResNet is helpful to avoid a vanishing gradient when the network is deep, and greatly improves recognition accuracy. Later, He in [30] demonstrated the theories of ResNet and improved ResNet. Soon ResNet drew great interest and many variants appeared [31,32]. Huang in [33] proposed DenseNet, in which each layer before another layer is connected to the layer as the input; it is widely used for its flexibility and good performance. Later, ResNet was widely used in computer vision tasks, such as image super-resolution [34] and pedestrian trajectory prediction [35].

2.2. Batch Normalization and SENet

With the rapid development of deep learning, many techniques have been proposed to raise efficiency and improve performance. Rectified linear unit (Relu) [36,37] is a widely used unsaturated activation function, which relieves the vanishing gradient problem and accelerates convergence speed. The convolution [38] greatly reduces the number of calculations for sharing weights. The dropout [39] decreases overfitting for networks. The inception [40] is used to extract different scales’ features and concatenates them for subsequent convolution.

(1) Batch normalization (BN): BN [41] is proposed for increasing the accuracy of classification, which decreases the number of calculations and simplifies the process of parameter adjustment. Santurkar in [42] analyzed the principle of why batch normalization can improve performance. He pointed out that no evidence shows BN stands in relation to interval covariate shift and put forward the idea that BN improves the performance by smoothing the optimization landscape in the training. He also has testified that networks can achieve similar, even better performance by using other normalization techniques. Recently, BN has been widely used in networks for image denoising [3,10,27].

(2) SENet: SENet [43] is an attention network that learns the relationships between channels, in which every channel is augmented with an amount of gain. SENet squeezes every channel to be a point with the average value; after the forward propagation of the two-layer fully connected (FC) network, the outputs are gains, and each channel is augmented by the corresponding gain. Wang in [44] used 1D convolution instead of an FC layer, which improved computational efficiency. Li in [45] proposed a selective kernel network which chooses kernel size for each channel by learning.

3. Proposed MFENANN for Image Denoising

In this section, we detail the MFENANN proposed for image denoising. In MFENANN, we define a simple multi-scale feature extraction block which extracts different scale features from noisy image with convolution kernels of different sizes. Moreover, we propose a normalized attention network to learn the relationship between channels which improves SENet by adding 1D normalization techniques. In addition, we introduce the normalized attention network to CNN denoising, in which, each channel gets an amount of gain, and channels play different roles in subsequent convolution. We also define ResNet blocks for MFENANN, which can effectively integrate shallow and deep features and avoid vanishing gradient problem. In the training phase, we assume the size of batchsize is N, and randomly generate N values in the noise level range as the noise standard deviations. We expand each standard deviation into a tensor with the same length and width as the input image as the noise level map and add noise to the corresponding image in the batch. In the testing phase, if the noise level is known, we expand it into a tensor with the same length and width as the input image as the noise level map.

3.1. Network Architecture

Figure 1 shows the architecture of the proposed MFENANN. We down-sample the noisy image using an interlaced sampling way and concatenate the down-sampled subimages and the noise level map as the input for network. Suppose the size of the noisy image is

W \times H \times C

, where W is the width of image, H is the height of image and C is the number of channels. Therefore, the size of input for MFENANN is

\frac{W}{2} \times \frac{H}{2} \times (4 C + 1)

for grayscale image, and

\frac{W}{2} \times \frac{H}{2} \times (4 C + 3)

for color image. The multi-scale feature extraction block (MFEBlk) is used to extract and combine distinct scale features for the subsequent convolution. The Relu function has the form of

m a x (0, \cdot)

. MFEBlk is detailed in Figure 2. ResNetBlock is a defined ResNet block, which is detailed in Figure 3. NAN is the normalized attention block which is detailed in Figure 4. In MFENANN, there are five ResNetBlocks and four NAN blocks, the number of layers is 21. The output of network is residue rather than the clean image. The clean image is obtained as follows:

\hat{X} = Y - r e s i d u e

(2)

where

\hat{X}

is a restored image, Y is the observed noisy image,

r e s i d u e

is the learned residue. Figure 2 shows the architecture of MFEBlk.

X_{M}

is an input for MFEBlk, the channels number of

X_{M}

is 5 for a grayscale image and 15 for a color image. In MFEBlk, we define 3 kinds of convolutions:

5 \times 5

convolution,

3 \times 3

convolution and

1 \times 1

convolution with kernel numbers of 10, 76 and 10 respectively. Zero-padding is used to keep the same channel size.

Y_{M}

is the concatenated feature maps of different scale convolutions. The mathematical model for the MFEBlk is defined as follows:

Y_{M} = c a t (c o n v 5_{10} (X_{M}), c o n v 3_{76} (X_{M}), c o n v 1_{10} (X_{M}))

(3)

where

c o n v 5_{10} (\cdot)

,

c o n v 3_{76} (\cdot)

and

c o n v 1_{10} (\cdot)

are convolutions with kernel size of

5 \times 5

,

3 \times 3

and

1 \times 1

respectively. The subscripts 10, 76 and 10 are numbers of convolution kernels. By experiments, we find the choice of quantity of kernels is a balance of the computational costs and performance.

c a t (\cdot, \cdot, \cdot)

is a function to concatenate channels of feature maps.

Y_{M}

is the output which has 96 channels. Figure 3 shows the architecture of ResNetBlock. ResNetBlock has four “Conv+BN+Relu” blocks. Each “Conv+BN+Relu” block includes a

3 \times 3

convolution layer, a BN layer and a Relu activation function. Z is the output of the fourth “Conv+BN+Relu” block which has 96 channels. The output

Y_{R}

is the sum of the input

X_{R}

and output Z:

Y_{R} = X_{R} + Z

(4)

Equation (4) can be reformed as

Z = Y_{R} - X_{R}

, the work to learn Z is the residual learning. Learning the residues reduces the computational cost in the training, and avoids vanishing gradient problem. The NAN Block is applied between two ResNetBlocks, each channel of the output for ResNetBlock can get an amount of gain. The architecture of NAN Block is detailed in the next section.

3.2. NAN

Figure 4 shows the architecture of NAN block. Assuming the input

X_{N}

has 96 channels, the squeeze operation adopts an average pooling function, which squeeze each channel to be a point. For each channel

X_{N}^{l}

, the mathematical expression of squeeze operation is described as follows:

x_{N}^{l} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{N}^{l} (i, j)

(5)

where

X_{N}^{l} (i, j)

is the amplitude at position

(i, j)

,

x_{N}^{l}

is the mean value of channel

X_{N}^{l}

. FC is the fully connected layer which has 96 neurons. 1DBN and Relu layers can avoid vanishing gradient problem and accelerate the convergence. Suppose the value of batchsize is k, thus each training process contains k samples. The 1DBN is described as follows:

{\tilde{x}}_{b n} = \frac{x_{b n} - E (x_{b n})}{\sqrt{v a r (x_{b n})}}

(6)

y_{b n} = γ {\tilde{x}}_{b n} + β

(7)

where

x_{b n}

and

y_{b n}

are input and output vectors of BN block,

γ

and

β

are variables which are updated during back propagation. On the last layer, the sigmoid function maps the input to range 0–1. We address the output of sigmoid function as s, and s is a vector of 96 dimensions which can be described as

(s_{1}, s_{2}, \dots, s_{n - 1}, s_{n})

, n is 96. For each channel, the scale operation is described as follows:

Y_{N}^{m} = X_{N}^{m} \cdot s_{m}, m = 1, 2, \dots, n

(8)

where

Y_{N}^{m}

and

X^{m}

are the mth channels of

Y_{N}

and

X_{N}

, respectively.

s_{m}

is the mth element of s. From Equation (8), we find every channel in

X_{N}

is augmented by the corresponding element of s.

3.3. Role of MFEBlk

Inspired by inception [40], we propose a MFEBlk to extract distinct scale features using different size kernels. For image denoising problem, we hope the restored image holds most features, thus in the first layer, we take

5 \times 5

,

3 \times 3

and

1 \times 1

convolution kernels to take distinct scale features from the noisy image. Larger kernel size commonly brings about larger numbers of calculations; therefore, we use less

5 \times 5

and more

3 \times 3

convolution kernels to balance feature extraction and reduce the number of calculations. The

1 \times 1

convolution kernels are used to increase nonlinearity and reduce the amount of parameter and calculation [46,47].

3.4. Complexity Analysis

For MFENANN, the introduction of MFEBlk, skip connections in ResNetBlocks and NANs bring about an increase in network complexity. Considering that the floating-point operations per second (FLOPS) is related to the input image size, we use grayscale image with size of

256 \times 256

and color image with size of

768 \times 512

to compute the increase of parameters and calculations for introducing MFEBlk, skip connections and NANs. Compared with the plain neural network with the same number of layers, for grayscale image and color image, the amount of model parameters both increase by 0.113 M, and the number of calculations increases by 0.013GFLOPS and 0.099GFLOPS respectively. Thus the introduction of MFEBlk, skip connections and NANs only bring little increase of parameters and calculations. In addition, for a grayscale image with size of

256 \times 256

, compared to DnCNN and FFDNet, the number of calculations of MFENANN decreases by 9.06GFLOPS and increases by 19.42GFLOPS, respectively.

4. Experiments

4.1. Dataset Generation and Experimental Settings

For the AWGN removal work, in order to train our proposed network, we take 4744 images in the Waterloo Exploration Database [48], and extract image patches of size of

44 \times 44

point-wisely with stride of 30. Approximately,

577 \times 1000

image patches are chosen for training the network. For every patch

x_{i}

, we add an AWGN on it and address the noisy image as

y_{i}

. The noises added have noise levels ranging from 0 to 75. We apply image sets “BSD68” [49] and “Set12” to testify the performance of the proposed network for grayscale image denoising, and “CBSD68” [49], “Kodak24” [50] and “McMaster” [51] to verify the effectiveness of the network on color image denoising. For MFENANN, The channel numbers of input are 5 and 15 for grayscale and color images respectively.

The experiments are performed on Pytorch 1.1 environment, on a PC with Ubuntu 16.04 operating system, Intel(R) Core(TM)i7-8700 CPU, 16GB RAM and a NVIDIA RTX 2070 GPU. We choose a loss function likes [10]:

L (Θ) = \frac{1}{2 N} \sum_{i = 1}^{N} {∥R (y_{i}; Ψ; Θ) - (y_{i} - x_{i})∥}_{F}^{2}

(9)

where

Θ

is parameters of the network need to be learned.

L (\cdot)

is the loss function.

Ψ

is the noise level map.

R (\cdot; \cdot; \cdot)

is the output residue.

{x_{i}, y_{i}}

are clean-noisy image patch pair used for training. The loss function makes residue close to the noise. We use PSNR [52] and SSIM [53] to measure the quality of the restored images. We adopt the Adam [54] optimized methods and adopt the default settings. We train 60 epochs for the MFENANN. The initial learning rate is

1 \times 10^{- 3}

, and it decays to

1 \times 10^{- 5}

and

1 \times 10^{- 6}

at 40 and 50 epochs. The value of batchsize is 128. The training process takes approximately 21 h.

4.2. Comparison Methods

To measure the performance, we compare our algorithm with some state-of-the-art denoising methods, including conventional methods (i.e., BM3D [14] and WNNM [55]), sparse and redundant representation method (i.e., SRR [9]) and discriminative learning methods (i.e., MLP [26], DnCNN [10], FFDNet [27], and BDMGIN [56]). For the SRR denoising methods, there are three ways (i.e., discrete cosine transform, global training and adaptive training) to build a dictionary, and the SRR denoising methods corresponding to these three ways are addressed as SRR-DCT, SRR-G and SRR-A respectively. For DnCNN, two ways are used to train the networks to remove noise with known and unknown noise levels and they are addressed as DnCNN-S and DnCNN-B respectively. DnCNN-S needs to train a network for each noise level; DnCNN-B trains a single network to remove noise with all noise levels. BDMGIN is designed to remove mixed Gaussian-impulse noise. In this section, we set the impulse noise density to be 0 to remove Gaussian noise.

4.3. Ablation Experiment

In order to verify the role of MFEBlk and NAN in MFENANN, we train the networks after removing MFEBlk and NAN respectively. We address the network without MFEBlk as NANN and address the network without NAN as MFEN. We also trained a plain convolutional network with the same number of channels and layers as MFENANN for image denoising and address it as plainNet. Table 1 shows the average PSNR values for restored images in “Set12” of plainNet, NANN, MFEN and MFENANN. NANN and MFEN have larger PSNR values than plainNet, which indicates that NANN and MFEN have better performance. MFENANN has larger PSNR values than NANN and MFEN which shows the combination of MFEBlk and NAN makes the network achieve better denoising performance.

4.4. ResNet vs. DenseNet

Densely connected convolutional networks (DenseNet) are effective to improve the performance of object recognition [33]. In this section, we use the DenseNet in image denoising and increase the number of convolution layers to 31. The settings of DenseNet are the same as [33]. We address the network used DenseNet blocks instead of ResNet blocks in the proposed network as DenseMFENANN. From Table 2, we find DenseMFENANN has higher PSNR values for images “Peppers” and “Starfish” at a noise level of 25 and for image “Peppers” at a noise level of 75. MFENANN has higher PSNR values in other situations and it has higher average PSNR values at all noise levels. This means the ResNet gets better performance than DenseNet used in the proposed network for image denoising. Therefore, in the paper, we use ResNet instead of DenseNet for image denoising.

4.5. Experimental Results and Analysis

We address the networks trained with patch numbers of

577 \times 10^{3}

and

1 \times 10^{6}

as MFENANN-5 and MFENANN-10 respectively. Table 3 shows the average PSNR values for images in “BSD68” restored by MFENANN-5 and MFENANN-10. MFENANN-5 has higher PSNR values at noise levels of 15 and 45. MFENANN-5 and MFENANN-10 have the same value at a noise level of 25. MFENANN-10 has higher PSNR values at noise levels of 35, 55, 65 and 75. In general, the restored images of the two models have similar average PSNR values at noise levels of 15, 25, 35, 45, 55, 65 and 75; increasing training sample numbers does not bring a performance improvement, but it costs more than twice the time. Therefore, in the following experiment, we used patches of number of

577 \times 10^{3}

to train the networks. Table 4 shows the PSNR values of several state-of-the-art methods at noise levels of 15, 25, 35, 50 and 75. When noise level was 15, for image “Barbara”, WNNM achieved the largest PSNR value, followed by BM3D. That is because “Barbara” contains a lot of stripe textures, but the MSE loss function tends to represent smooth and outstanding structural information. MFENANN has the largest PSNR values for other images and achieved the largest average PSNR value among all methods at this noise level. When noise levels were 25, 35, 50 and 75, the PSNR values showed the same law as the noise level of 15; for image “Barbara”, WNNM got the largest PSNR values, followed by BM3D; MFENANN got the largest PSNR values for other images and achieved the largest average PSNR values. For each method except BDMGIN, the PSNR value decreased as the noise level increased. BDMGIN has the smallest average PSNR values for noise levels of 15, 25, 35 and 50. That is because BDMGIN is designed to remove the mixed Gaussian-impulse noise, so is not good at dealing with single Gaussian noise. Compared with other methods, the superiority of MFENANN increases as noise level increases; this is because as the noise level grows, effective information decreases, and the traditional prior-based method could not work well; MFENANN restores images by learning the correlations between noisy-clean image patch pairs. Moreover, MFENANN extracts and integrates different scale features of noisy images and pays attention to correlations of channels, which decreases the influence of increasing noise levels. Table 5 shows the SSIM values of restored images of several state-of-the-art algorithms at noise levels of 15, 25, 35, 50 and 75. When noise level was 15, for image “Barbara”, BM3D got the largest SSIM values, followed by the proposed MFENANN. When noise level was 25, 35, 50 or 75, MFENANN got the largest SSIM values for all images. BDMGIN has the smallest average SSIM values under all noise levels. MFENANN has the largest average SSIM values under all noise levels. These show the superiority of MFENANN. Table 6 shows the average PSNR values of images in “BSD68” for several state-of-the-art methods at noise levels of 15, 25, 35, 50 and 75. In general, discriminative deep learning methods except the BDMGIN (i.e., MLP, DnCNN, FFDNet and MFENANN) got larger PSNR values than traditional denoising methods (i.e., BM3D and WNNM). When noise level was 15, MFENANN got the largest PSNR value, which is numerically close to DnCNN. This is because at such a low noise level, DnCNN trains a specific model for this noise level which offsets the role of the superiority of networks. When noise levels were 25, 35, 50 and 75, MFENANN got the largest PSNR values for all noise levels, which shows the superiority of MFENANN. BDMGIN has the smallest PSNR values for all noise levels.

Figure 5 shows the restored images for “test033” in “BSD68” of several state-of-the-art methods at a noise level of 50. At such a high noise level, we found that the noisy image had lost lots of details, which looks very poor visually and could not be used directly for many high-level image processing tasks. BM3D can effectively remove noise, but the restored image is over-smoothed; lots of details are lost. SRR-A introduces too many artificial effects while removing noise. In general, deep learning methods have better overall appearance than BM3D and SRR-A, but DnCNN-S, DnCNN-B and FFDNet also made the restored image over-smoothed and introduced some artificial effects. The restored image of MFENANN has the most details and best overall appearance. In quantity, discriminate learning methods get larger PSNR values than BM3D and SRR-A. MFENANN got the largest PSNR values among all discriminating learning methods. Figure 6 shows the restored images of some state-of-the-art methods for “test045” in “BSD68” at a noise level of 25. We found BM3D causes the image to be heavily over-smoothed, and lots of details were lost. SRR-A also over-smoothed the image and introduced many artifacts. The restored images of DnCNN-S and DnCNN-B have more details than BM3D and SRR-A, but the two methods also caused the loss of many texture features. FFDNet achieved better overall appearance than the previous methods, but it still led to the loss of many details. The restored image of MFENANN has the most details and best overall appearance. Moreover, restored image of MFENANN has the largest PSNR value among all methods. Figure 7 shows the restored images of “test064” in “BSD68” at a noise level of 75. We found that at such a high noise level, the observed image is heavily degraded and many significant details are lost. BM3D can effectively remove noise, but it causes the restored image to be over-smoothed and many significant details are lost. SRR (including SRR-DCT, SRR-G and SRR-A) could not effectively eliminate noise at such a high noise level; they caused too many artifacts. The image restored by FFDNet has better overall appearance than images restored by previous methods, but it was still over-smoothed and lots of details were lost. The image restored by MFENANN has the most details and best overall appearance. In quantity, deep learning methods (i.e., FFDNet and MFENANN) have larger PSNR values than BM3D and SRR. In addition, MFENANN has a larger PSNR value than FFDNet.

In reality, usually the level of noise is unknown. In this paper, instead of estimating the noise level, we traversed the entire noise level range with a stride of 1, computed the PSNR values for all restored images and chose the one that achieved the largest value as the output. For BM3D, FFDNet and MFENANN, we used the above-mentioned settings, and for the blind denoising methods BDMGIN and DnCNN-B, there was no need to enter the noise levels. Table 7 shows the PSNR values of restored images under unknown noise levels of several state-of-the-art methods. At a noise level of 26.2, for image “Barbara”, BM3D has the largest PSNR value, followed by MFENANN. MFENANN has the largest PSNR values for other images and the largest average PSNR value at this noise level. When the noise level is 37.4, the PSNR values of restored images have similar laws as when the noise level is 26.2. BM3D has the largest value for image “Barbara”, and MFENANN has the largest PSNR values for other images. When noise levels were 48.6 and 55.8, MFENANN got the largest PSNR values for all images. MFENANN got the largest average PSNR values under all noise levels. These pieces of evidence indicate that when noise levels are 26.2, 37.4, 48.6 and 55.8, the proposed method of MFENANN gets better performance under unknown noise levels.

Table 8 shows the average PSNR values for color images in image sets “McMaster”, “Kodak24” and “CBSD68”, which were restored by several state-of-the-art methods. For “McMaster”, when noise levels were 15, 25 and 75, CBM3D got larger PSNR values than CDnDNN, which indicates that CBM3D is superior to CDnCNN at these noise levels. FFDNet got larger PSNR values than CBM3D and CDnCNN under all noise levels. MFENANN has the largest average PSNR values among all methods under all noise levels; this shows MFENANN had the best performance for “McMaster”. For “Kodak24”, when the noise level was 75, CBM3D gave a larger PSNR value than CDnCNN, and it has smaller PSNR values at other noise levels. FFDNet has larger PSNR values than CBM3D and CDnCNN. MFENANN has the largest PSNR values among all methods. For “CBSD68”, similarly to “McMaster” and “Kodak24”, MFENANN has the largest PSNR values, which shows MFENANN had the best performance among all methods under all noise levels.

Figure 8 shows the restored images of several state-of-the-art methods for image “kodim05” in “Kodak24” at a noise level of 50. CBM3D causes the restored image to be over-smoothed while removing noise; the image looks a little blurry. The restored image of FFDNet has better overall appearance than CBM3D, but still many details are lost. The restored image of MFENANN has the most detail and the best overall appearance among all methods. In quantity, MFENANN has the largest PSNR values among all methods. These facts show that MFENANN achieved a better performance than CBM3D and FFDNet for image “kodim05” at a noise level of 50.

Table 9 shows the average runtimes of several state-of-the-art methods for images in “Set12” at noise levels of 15, 25, 35, 50 and 75. To be fair, all algorithms ran on the CPU. SRR-A used more time than the other methods; this is because it needs to construct a dictionary adaptively, which needs lots of time. The deep learning methods (i.e., DnCNN-B, FFDNet and MFENANN) used less time than traditional methods. MFENANN used a little more time than DnCNN-B and FFDNet; this is because it contains some improved SENet blocks. In general, the time used by MFENANN is comparable to FFDNet; the extra time consumed can be ignored for image denoising.

5. Conclusions

This paper proposes a novel multi-scale feature extraction-based normalized attention neural network for image denoising. The MFEBlk extracts features of distinct scales from noisy image and integrates them together. The NAN blocks learns the relationship between channels, in which each channel acquires an amount of gain, and channels can play different roles in the subsequent convolution. The residual unit effectively avoids a vanishing gradient and loses shallow features. Experimental results showed that the proposed MFENANN can effectively eliminate noise for noise levels ranging from 0 to 75. Moreover, compared to some state-of-the-art denoising methods, MFENANN has larger PSNR values and better overall appearance.

Applications and future research: The proposed MFENANN can be embedded in imaging equipment and integrated into application software to improve the quality of images. In addition, the noisy image sequence contains different characteristics of the image. Designing a deep neural network to extract features from a noisy image sequence and fuse the features to get a restored high quality image is worthy of further study.

Author Contributions

Y.W. and X.S. proposed the method; Y.W. analyzed the data and drafted the paper; G.G. and N.L. guided students to do the experiments and revised the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China under Grant 2018YFB1702703.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ali, S.; Nasar, M.; Haidi, I. Median Filtering Using First-Order and Second-Order Neighborhood Pixels to Reduce Fixed Value Impulse Noise from Grayscale Digital Images. Electronics 2020, 9, 2034. [Google Scholar]
Liu, M.; Cao, F.; Yang, Z.; Hong, X.; Huang, Y. Hyperspectral Image Denoising and Classification Using Multi-Scale Weighted EMAPs and Extreme Learning Machine. Electronics 2020, 9, 2137. [Google Scholar] [CrossRef]
Isogawa, K.; Ida, T.; Shiodera, T.; Takeguchi, T. Deep Shrinkage Convolutional Neural Network for Adaptive Noise Reduction. IEEE Signal Process. Lett. 2018, 25, 224–228. [Google Scholar] [CrossRef]
Wang, Y.; Wang, J.; Song, X.; Han, L. An Efficient Adaptive Fuzzy Switching Weighted Mean Filter for Salt-and-Pepper Noise Removal. IEEE Signal Process. Lett. 2016, 23, 1582–1586. [Google Scholar] [CrossRef]
Nguyen, M.P.S.; Chun, Y. Bounded Self-Weights Estimation Method for Non-Local Means Image Denoising Using Minimax Estimators. IEEE Trans. Image Process. 2017, 26, 1637–1649. [Google Scholar] [CrossRef] [PubMed]
Wang, Q.; Zhang, X.; Wu, Y.; Tang, L.; Zha, Z. Non-Convex Weighted Lp Minimization based Group Sparse Representation Framework for Image Denoising. IEEE Signal Process. Lett. 2017, 24, 1686–1690. [Google Scholar] [CrossRef]
Song, X.; Wu, L.; Hao, H.; Xu, W. Hyperspectral Image Denoising Based on Spectral Dictionary Learning and Sparse Coding. Electronics 2019, 8, 86. [Google Scholar] [CrossRef] [Green Version]
Wang, H.; Cen, Y.; He, Z.; Zhao, R.; Zhang, F. Reweighted Low-Rank Matrix Analysis With Structural Smoothness for Image Denoising. IEEE Trans. Image Process. 2018, 27, 1777–1792. [Google Scholar] [CrossRef]
Elad, M.; Aharon, M. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries. IEEE Trans. Image Process. 2006, 12, 3736–3745. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [Green Version]
Tomasi, C.; Manduchi, R. Bilateral Filtering for Gray and Color Images. In Proceedings of the Sixth International Conference on Computer Vision, Bombay, India, 7 January 1998; pp. 836–846. [Google Scholar]
Buades, A.; Coll, B.; Morel, J.M. A Review of Image Denoising Algorithms, with a New One. Siam J. Multiscale Model. Simul. 2005, 4, 490–530. [Google Scholar] [CrossRef]
Ville, D.V.D.; Kocher, M. SURE-Based Non-Local Means. IEEE Signal Process. Lett. 2009, 16, 973–976. [Google Scholar] [CrossRef] [Green Version]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
Wang, X. Moving window-based double haar wavelet transform for image processing. IEEE Trans. Image Process. 2006, 15, 2771–2779. [Google Scholar] [CrossRef]
Starck, J.; Candes, E.J.; Donoho, D.L. The curvelet transform for image denoising. IEEE Trans. Image Process. 2002, 11, 670–684. [Google Scholar] [CrossRef] [Green Version]
Mustafi, A.; Ghorai, S.K. A novel blind source separation technique using fractional Fourier transform for denoising medical images. Optik 2013, 124, 265–271. [Google Scholar]
Liu, Y.; Du, W.; Jin, J.; Wang, H.; Liang, R. Boost image denoising via noise level estimation in quaternion wavelet domain. AEU Int. J. Electron. Commun. 2016, 70, 584–591. [Google Scholar] [CrossRef]
Jain, P.; Tyagi, V. LAPB: Locally adaptive patch-based wavelet domain edge-preserving image denoising. Inf. Sci. 2015, 294, 164–181. [Google Scholar] [CrossRef]
Rajwade, A.; Rangarajan, A.; Banerjee, A. Image Denoising Using the Higher Order Singular Value Decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 849–862. [Google Scholar] [CrossRef]
Protter, M.; Elad, M. Image Sequence Denoising via Sparse and Redundant Representations. IEEE Trans. Image Process. 2009, 18, 27–35. [Google Scholar] [CrossRef]
Tang, S.; Yang, J. Image denoising using K-SVD and non-local means. In Proceedings of the IEEE Workshop on Electronics, Computer and Applications, Ottawa, ON, Canada, 8–9 May 2014; pp. 886–889. [Google Scholar]
Rudin, L.; Osher, S. Total variation based image restoration with free local constraints. In Proceedings of the 1st International Conference on Image Processing, Austin, TX, USA, 13–16 November 1994; pp. 31–35. [Google Scholar]
Nasonov, A.; Krylov, A. An Improvement of BM3D Image Denoising and Deblurring Algorithm by Generalized Total Variation. In Proceedings of the 7th European Workshop on Visual Information Processing (EUVIP), Tampere, Finland, 26–28 November 2018; pp. 1–4. [Google Scholar]
Ordentlich, E.; Seroussi, G.; Verdu, S.; Weinberger, M.; Weissman, T. A discrete universal denoiser and its application to binary images. Int. Conf. Image Process. 2003, 1, 117–120. [Google Scholar]
Burger, H.C.; Schuler, C.J.; Harmeling, S. Image denoising: Can plain neural networks compete with BM3D? In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 July 2012; pp. 2392–2399. [Google Scholar]
Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lefkimmiatis, S. Non-local Color Image Denoising with Convolutional Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5882–5891. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, Poland, 8–16 October 2016; pp. 630–645. [Google Scholar]
Han, C.; Shi, L. ML CResNet: A novel network to detect and locate myocardial infarction using 12 leads ECG. Comput. Methods Programs Biomed. 2020, 185, 1–10. [Google Scholar] [CrossRef] [PubMed]
Guo, S.; Yang, Z. Multi-Channel-ResNet: An integration framework towards skin lesion analysis. Inform. Med. Unlocked 2018, 12, 67–74. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Chen, C.; Qi, F. Single Image Super-Resolution Using Deep CNN with Dense Skip Connections and Inception-ResNet. In Proceedings of the 9th International Conference on Information Technology in Medicine and Education, Hangzhou, China, 19–21 October 2018; pp. 999–1003. [Google Scholar]
Song, X.; Chen, K.; Li, X.; Sun, J.; Hou, B.; Cui, Y.; Zhang, B.; Xiong, G.; Wang, Z. Pedestrian Trajectory Prediction Based on Deep Convolutional LSTM Network. IEEE Trans. Intell. Transp. Syst. 2020. [Google Scholar] [CrossRef]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Siem Reap, Cambodia, 13–16 December 2012; Volume 1, pp. 1097–1105. [Google Scholar]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 1, 1929–1958. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolution. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
Santurkar, S.; Tsipras, D.; Ilyas, A.; Dry, A. How Does Batch Normalization Help Optimization? In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 1–11. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [Green Version]
Wang, Q.; Wu, B.; Zhu, P. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the CVPR, Los Angeles, CA, USA, 16–19 June 2019; pp. 510–519. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–10 February 2017; pp. 4278–4284. [Google Scholar]
Ma, K.; Duanmu, Z.; Wu, Q.; Wang, Z.; Yong, H.; Li, H.; Zhang, L. Waterloo Exploration Database: New Challenges for Image Quality Assessment Models. IEEE Trans. Image Process. 2017, 26, 1004–1016. [Google Scholar] [CrossRef]
Roth, S.; Black, M.J. Fields of Experts: A framework for learning image priors. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 860–867. [Google Scholar]
Franzen, R. Kodak Lossless True Color Image Suite. 1999 [Online]. Available online: http://r0k.us/graphics/kodak (accessed on 16 May 2020).
Zhang, L.; Wu, X.; Buades, A.; Li, X. Color demosaicking by local directional interpolation and nonlocal adaptive thresholding. J. Electron. Imaging 2011, 20, 023016. [Google Scholar]
Wang, Z.; Bovik, A.C. Mean squared error: Love it or leave it? IEEE Signal Process. Lett. Magaz. 2009, 26, 98–117. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–41. [Google Scholar]
Gu, S.; Zhang, L.; Zuo, W.; Feng, X. Weighted Nuclear Norm Minimization with Application to Image Denoising. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2862–2869. [Google Scholar]
Abiko, R.; Ikehara, M. Blind Denoising of Mixed Gaussian-impulse Noise by Single CNN. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 1717–1721. [Google Scholar]

Figure 1. The architecture of MFENANN.

Figure 2. The architecture of MFEBlk.

Figure 3. The architecture of ResNetBlock.

Figure 4. The architecture of NAN block.

Figure 5. Experimental results of several state-of-the-art algorithms for image “test033” in “BSD68” at a noise level of 50. The PSNR values are: (a) noisy image, 14.69 dB; (b) BM3D, 23.64 dB; (c) SRR-A, 23.78 dB; (d) DnCNN-S, 24.81 dB; (e) DnCNN-B, 24.77 dB; (f) FFDNet, 24.86 dB; (g) MFENANN, 24.93 dB.

Figure 6. Experimental results of several state-of-the-art algorithms for image “test045” in “BSD68” at a noise level of 25. The PSNR values are: (a) noisy image, 20.14dB; (b) BM3D, 31.75 dB; (c) SRR-A, 32.19 dB (d) DnCNN-S, 33.62 dB; (e) DnCNN-B, 33.58 dB; (f) FFDNet, 33.78 dB; (g) MFENANN, 33.95 dB.

Figure 7. Experimental results of several state-of-the-art algorithms for image “test064” in “BSD68” at a noise level of 75.The PSNR values are: (a) noisy image, 10.63 dB; (b) BM3D, 22.11 dB; (c) SRR-DCT, 22.26 dB; (d) SRR-G, 22.45 dB; (e) SRR-A, 22.42 dB; (f) FFDNet, 23.16 dB; (g) MFENANN, 23.21 dB.

Figure 8. Experimental results of several state-of-the-art algorithms for image “kodim05” in “Kodak24” at a noise level of 50. The PSNR values are: (a) noisy image, 14.14 dB; (b) CBM3D, 24.29 dB; (c) FFDNet, 26.32 dB (d) MFENANN, 26.61 dB.

Table 1. Average PSNR (dB) values for images in “Set12” restored by plainNet, NANN, MFEN and MFENANN at noise levels of 15, 25, 35, 50 and 75. The bold numbers are the largest ones at each corresponding noise level.

Methods	$σ = 15$	$σ$ = 25	$σ$ = 35	$σ$ = 50	$σ$ = 75
plainNet	32.75	30.41	28.99	27.28	25.49
NANN	32.88	30.58	29.07	27.48	25.66
MFEN	32.90	30.59	29.05	27.48	25.62
MFENANN	32.95	30.63	29.14	27.55	25.75

Table 2. The PSNR (dB) values for restored images from “Set12” using DenseMFENANN (DMFENANN) and MFENANN at noise levels of 15, 25, 35, 50 and 75. The bold numbers are the largest from the corresponding noise levels.

Images	C.man	House	Pepp.	Starf.	Mona.	Airpl.	Parrot	Lena	Barb.	Boat	Man	Couple	Aver.
$σ$ = 15
DMFENANN	32.47	35.22	33.35	32.13	32.89	31.79	31.95	34.66	32.46	32.42	32.45	32.50	32.86
MFENANN	32.62	35.30	33.49	32.23	33.11	31.80	31.99	34.76	32.61	32.47	32.48	32.61	32.95
$σ$ = 25
DMFENANN	30.13	33.46	31.39	29.60	30.22	29.15	29.47	32.61	30.10	30.29	30.06	30.24	30.56
MFENANN	30.26	33.51	31.03	29.58	30.57	29.31	29.54	32.76	30.22	30.33	30.12	30.34	30.63
$σ$ = 35
DMFENANN	28.68	32.15	29.34	27.68	28.69	27.66	28.08	31.26	28.45	28.83	28.71	28.72	29.02
MFENANN	28.81	32.30	29.57	27.85	28.79	27.68	28.13	31.36	28.60	28.90	28.76	28.89	29.14
$σ$ = 50
DMFENANN	27.29	30.78	27.72	25.99	26.93	25.88	26.57	29.65	26.66	27.37	27.32	27.23	27.45
MFENANN	27.31	30.84	27.74	25.99	27.04	26.05	26.76	29.83	26.93	27.43	27.33	27.30	27.55
$σ$ = 75
DMFENANN	25.46	28.59	25.87	23.68	24.84	24.26	25.05	27.85	24.29	25.64	25.74	25.34	25.55
MFENANN	25.71	29.22	25.82	23.89	25.08	24.36	25.13	28.02	24.61	25.74	25.90	25.48	25.75

Table 3. Average PSNR (dB) values for images in “BSD68” restored by MFENANN which were trained with patch numbers of

577 \times 10^{3}

and

1 \times 10^{6}

. The bold numbers are the largest ones at each corresponding noise level.

Table 3. Average PSNR (dB) values for images in “BSD68” restored by MFENANN which were trained with patch numbers of

577 \times 10^{3}

and

1 \times 10^{6}

. The bold numbers are the largest ones at each corresponding noise level.

Patches Number	$σ = 15$	$σ$ = 25	$σ$ = 35	$σ$ = 45	$σ$ = 55	$σ$ = 65	$σ$ = 75
MFENANN-5	31.73	29.29	27.82	26.80	26.00	25.37	24.86
MFENANN-10	31.72	29.29	27.83	26.79	26.01	25.40	24.89

Table 4. The PSNR (dB) values for restored images of “Set12” using several state-of-the-art algorithms at noise levels of 15, 25, 35, 50 and 75. The bold numbers are the largest from the corresponding noise levels.

Images	C.man	House	Pepp.	Starf.	Mona.	Airpl.	Parrot	Lena	Barb.	Boat	Man	Couple	Aver.
$σ$ = 15
BM3D	31.91	34.93	32.69	31.14	31.85	31.07	31.37	34.26	33.10	32.13	31.92	32.10	32.37
WNNM	32.17	35.13	32.99	31.82	32.71	31.39	31.62	34.27	33.60	32.27	32.11	32.17	32.70
BDMGIN	23.96	33.22	28.85	30.36	31.18	30.57	24.48	32.84	30.10	30.61	31.12	29.30	29.72
DnCNN	32.61	34.97	33.30	32.20	33.09	31.70	31.83	34.62	32.64	32.42	32.46	32.47	32.86
FFDNet	32.42	35.01	33.10	32.02	32.77	31.58	31.77	34.63	32.50	32.35	32.40	32.45	32.75
MFENANN	32.62	35.30	33.49	32.23	33.11	31.80	31.99	34.76	32.61	32.47	32.48	32.61	32.95
$σ$ = 25
BM3D	29.45	32.85	30.16	28.56	29.25	28.42	28.93	32.07	30.71	29.90	29.61	29.71	29.97
WNNM	29.64	33.22	30.42	29.03	29.84	28.69	29.15	32.24	31.24	30.03	29.76	29.82	30.26
BDMGIN	25.36	30.45	27.88	27.21	28.26	27.83	26.42	29.94	27.20	28.21	28.27	27.78	27.90
MLP	29.61	32.56	30.30	28.82	29.61	28.82	29.25	32.25	29.54	29.97	29.88	29.73	30.03
DnCNN	30.18	33.06	30.87	29.41	30.28	29.13	29.43	32.44	30.00	30.21	30.10	30.12	30.43
FFDNet	30.06	33.27	30.79	29.33	30.14	29.05	29.43	32.59	29.98	30.23	30.10	30.18	30.43
MFENANN	30.26	33.51	31.03	29.58	30.57	29.31	29.54	32.76	30.22	30.33	30.12	30.34	30.63
$σ$ = 35
BM3D	27.92	31.36	28.51	26.86	27.58	26.83	27.40	30.56	28.98	28.43	28.22	28.15	28.40
WNNM	28.08	31.92	28.75	27.27	28.13	27.10	27.69	30.73	29.48	28.54	28.33	28.24	28.69
BDMGIN	26.33	29.47	27.32	25.64	26.67	25.81	26.32	28.86	25.84	27.21	26.98	26.94	26.95
MLP	28.08	31.18	28.54	27.12	27.97	27.22	27.72	30.82	27.62	28.53	28.47	28.24	28.46
DnCNN	28.61	31.61	29.14	27.53	28.51	27.52	27.94	30.91	28.09	28.72	28.66	28.52	28.82
FFDNet	28.54	31.99	29.18	27.58	28.54	27.47	28.02	31.20	28.29	28.82	28.70	28.68	28.92
MFENANN	28.81	32.30	29.57	27.85	28.79	27.68	28.13	31.36	28.60	28.90	28.76	28.89	29.14
$σ$ = 50
BM3D	26.13	29.69	26.68	25.04	25.82	25.10	25.90	29.05	27.22	26.78	26.81	26.46	26.72
WNNM	26.45	30.33	26.95	25.44	26.32	25.42	26.14	29.25	27.79	26.97	26.94	26.64	27.05
BDMGIN	25.82	28.45	25.96	24.26	25.34	24.62	25.49	27.80	24.61	26.16	26.13	25.93	25.88
MLP	26.37	29.64	26.68	25.43	26.26	25.56	26.12	29.32	25.24	27.03	27.06	26.67	26.78
DnCNN	27.03	30.00	27.32	25.70	26.78	25.87	26.48	29.39	26.22	27.20	27.24	26.90	27.18
FFDNet	27.03	30.43	27.43	25.77	26.88	25.90	26.58	29.68	26.48	27.32	27.30	27.07	27.32
MFENANN	27.31	30.84	27.74	25.99	27.04	26.05	26.76	29.83	26.93	27.43	27.33	27.30	27.55
$σ$ = 75
BM3D	24.32	27.51	24.73	23.27	23.91	23.48	24.18	27.25	25.12	25.12	25.32	24.70	24.91
WNNM	24.60	28.24	24.96	23.49	24.31	23.74	24.43	27.54	25.81	25.29	25.42	24.86	25.23
MLP	24.63	27.78	24.88	23.57	24.40	23.87	24.55	27.68	23.39	25.44	25.59	25.02	25.07
DnCNN	25.07	27.85	25.17	23.64	24.71	24.03	24.71	27.54	23.63	25.47	25.64	24.97	25.20
FFDNet	25.29	28.43	25.39	23.82	24.99	24.18	24.94	27.97	24.24	25.64	25.75	25.29	25.49
MFENANN	25.71	29.22	25.82	23.89	25.08	24.36	25.13	28.02	24.61	25.74	25.90	25.48	25.75

Table 5. The SSIM values for restored images from “Set12” using several state-of-the-art algorithms at noise levels of 15, 25, 35, 50 and 75. The bold numbers are the largest from the corresponding noise levels.

Images	C.man	House	Pepp.	Starf.	Mona.	Airpl.	Parrot	Lena	Barb.	Boat	Man	Couple	Aver.
$σ$ = 15
BM3D	0.897	0.890	0.906	0.800	0.939	0.899	0.896	0.946	0.965	0.946	0.939	0.947	0.914
SRR-A	0.894	0.879	0.899	0.895	0.928	0.894	0.888	0.948	0.957	0.937	0.930	0.934	0.915
BDMGIN	0.706	0.859	0.850	0.884	0.900	0.885	0.757	0.936	0.940	0.925	0.928	0.917	0.874
DnCNN	0.911	0.888	0.913	0.914	0.948	0.908	0.903	0.958	0.963	0.951	0.945	0.950	0.929
DnCNN-B	0.906	0.887	0.911	0.912	0.946	0.907	0.901	0.958	0.962	0.950	0.944	0.951	0.928
FFDNet	0.911	0.889	0.912	0.913	0.947	0.909	0.904	0.960	0.964	0.952	0.946	0.953	0.930
MFENANN	0.913	0.890	0.915	0.916	0.951	0.911	0.906	0.961	0.965	0.953	0.946	0.954	0.932
$σ$ = 25
BM3D	0.851	0.859	0.868	0.850	0.901	0.855	0.854	0.925	0.941	0.905	0.893	0.909	0.884
SRR-A	0.835	0.846	0.856	0.835	0.888	0.845	0.840	0.913	0.917	0.880	0.872	0.881	0.867
BDMGIN	0.657	0.780	0.779	0.798	0.824	0.805	0.723	0.884	0.886	0.869	0.867	0.877	0.812
DnCNN	0.873	0.862	0.880	0.869	0.919	0.868	0.859	0.931	0.934	0.913	0.903	0.913	0.894
DnCNN-B	0.866	0.861	0.879	0.867	0.914	0.870	0.853	0.931	0.935	0.913	0.903	0.916	0.892
FFDNet	0.873	0.861	0.885	0.865	0.923	0.872	0.864	0.939	0.940	0.917	0.907	0.921	0.897
MFENANN	0.877	0.864	0.889	0.871	0.925	0.876	0.866	0.941	0.942	0.918	0.906	0.924	0.900
$σ$ = 35
BM3D	0.818	0.836	0.832	0.806	0.869	0.820	0.817	0.895	0.911	0.867	0.849	0.870	0.849
SRR-A	0.794	0.814	0.824	0.780	0.852	0.804	0.796	0.878	0.875	0.832	0.822	0.826	0.825
BDMGIN	0.658	0.753	0.757	0.746	0.799	0.750	0.730	0.865	0.855	0.836	0.824	0.843	0.785
DnCNN-B	0.833	0.840	0.8470	0.824	0.878	0.836	0.827	0.906	0.904	0.876	0.861	0.879	0.859
FFDNet	0.847	0.847	0.854	0.826	0.894	0.843	0.833	0.917	0.913	0.886	0.864	0.890	0.868
MFENANN	0.846	0.851	0.861	0.833	0.895	0.846	0.835	0.921	0.917	0.887	0.870	0.892	0.871
$σ$ = 50
BM3D	0.778	0.816	0.792	0.737	0.821	0.767	0.779	0.861	0.870	0.811	0.802	0.813	0.804
SRR-A	0.749	0.759	0.777	0.713	0.800	0.749	0.749	0.822	0.810	0.769	0.764	0.747	0.767
BDMGIN	0.690	0.731	0.721	0.694	0.740	0.682	0.731	0.837	0.813	0.796	0.773	0.801	0.751
DnCNN	0.803	0.820	0.810	0.771	0.847	0.794	0.794	0.876	0.854	0.826	0.807	0.827	0.819
DnCNN-B	0.800	0.820	0.806	0.769	0.845	0.792	0.788	0.872	0.857	0.828	0.807	0.828	0.818
FFDNet	0.809	0.832	0.818	0.773	0.856	0.801	0.799	0.892	0.870	0.835	0.818	0.843	0.829
MFENANN	0.815	0.837	0.825	0.783	0.866	0.812	0.802	0.894	0.875	0.836	0.819	0.851	0.835
$σ$ = 75
BM3D	0.733	0.762	0.728	0.660	0.766	0.712	0.738	0.813	0.804	0.737	0.729	0.729	0.743
SRR-A	0.654	0.682	0.689	0.601	0.724	0.659	0.685	0.743	0.712	0.678	0.683	0.649	0.680
FFDNet	0.768	0.787	0.767	0.704	0.795	0.748	0.758	0.846	0.801	0.768	0.756	0.765	0.772
MFENANN	0.777	0.811	0.780	0.712	0.813	0.754	0.764	0.852	0.806	0.773	0.759	0.777	0.781

Table 6. Average PSNR (dB) for images in “BSD68” restored by several state-of-the-art methods at noise levels of 15, 25, 35, 50 and 75. The bold numbers are the largest ones at each corresponding noise level.

Methods	$σ = 15$	$σ = 25$	$σ = 35$	$σ = 50$	$σ = 75$
BM3D	31.07	28.57	27.08	25.62	24.21
WNNM	31.37	28.83	27.30	25.87	24.40
MLP	–	28.96	27.50	26.03	24.59
BDMGIN	29.30	27.08	26.03	25.10	19.10
DnCNN	31.72	29.23	27.69	26.23	24.64
FFDNet	31.63	29.19	27.73	26.29	24.79
MFENANN	31.73	29.29	27.82	26.38	24.87

Table 7. The PSNR (dB) values for restored images from “Set12” using several methods with unknown noise levels. The bold numbers are the largest from the corresponding noise levels.

Images	C.man	House	Pepp.	Starf.	Mona.	Airpl.	Parrot	Lena	Barb.	Boat	Man	Couple	Aver.
$σ$ = 26.2
BM3D	28.87	32.78	29.89	28.08	29.06	28.13	28.46	31.87	30.41	29.52	29.31	29.37	29.65
BDMGIN	25.30	30.31	27.80	26.92	27.95	27.56	26.57	29.71	26.97	28.02	28.00	27.55	27.72
CnDNN-B	29.79	32.91	30.58	28.91	30.04	28.99	29.08	32.20	29.49	29.91	29.79	29.81	30.13
FFDNet	29.78	33.12	30.72	28.93	29.85	28.82	29.14	32.34	29.84	30.04	29.86	29.98	30.20
MFENANN	29.84	33.32	30.87	29.27	30.06	29.12	29.25	32.51	29.93	30.12	29.96	30.03	30.36
$σ$ = 37.4
BM3D	26.98	31.22	27.96	26.20	27.38	26.34	26.53	30.19	28.59	27.97	27.87	27.56	27.90
BDMGIN	26.46	29.44	27.20	25.34	26.46	25.76	26.22	28.73	25.79	26.93	26.93	26.74	26.83
CnDNN-B	28.30	31.16	28.63	27.15	28.26	27.19	27.67	30.63	27.64	28.39	28.28	28.15	28.45
FFDNet	28.39	31.88	29.02	27.23	28.19	27.19	27.70	30.95	28.17	28.54	28.42	28.42	28.68
MFENANN	28.45	32.19	29.18	27.48	28.26	27.34	27.76	31.06	28.28	28.60	28.44	28.51	28.80
$σ$ = 48.6
BM3D	25.16	29.91	26.31	24.52	25.71	24.84	24.93	28.95	26.89	26.51	26.73	26.19	26.39
BDMGIN	26.10	28.83	26.19	24.43	25.48	24.85	25.56	28.04	24.76	26.31	26.32	26.03	26.08
CnDNN-B	27.13	30.01	27.60	25.81	26.96	25.86	26.58	29.37	26.41	27.26	27.32	26.98	27.27
FFDNet	27.27	30.80	27.72	25.68	27.00	26.16	26.62	29.85	26.87	27.44	27.33	27.27	27.50
MFENANN	27.52	31.03	27.87	26.03	27.04	26.25	26.78	29.97	27.05	27.46	27.41	27.42	27.65
$σ$ = 55.8
BM3D	24.28	28.86	25.52	23.68	24.78	24.01	24.14	28.16	26.21	25.76	26.11	25.49	25.58
BDMGIN	25.03	27.01	25.13	23.07	24.07	23.40	24.68	26.61	23.66	25.28	25.29	25.08	24.86
CnDNN-B	26.59	29.24	26.43	24.85	26.11	25.15	25.87	28.57	25.46	26.58	26.59	26.21	26.47
FFDNet	26.66	30.02	27.08	25.21	26.33	25.50	26.20	29.20	26.12	26.83	26.87	26.67	26.89
MFENANN	26.75	30.50	27.11	25.30	26.50	25.65	26.25	29.26	26.37	26.87	26.93	26.80	27.02

Table 8. Average PSNR (dB) values for images in image sets “McMaster”, “Kodak24” and “CBSD68” which were restored by several state-of-the-art methods at noise levels of 15, 25, 35, 50 and 75. The bold numbers are the largest ones at each corresponding noise level.

Imagesets	Methods	$σ = 15$	$σ = 25$	$σ = 35$	$σ = 50$	$σ = 75$
McMaster	CBM3D	34.06	31.66	29.92	28.51	26.79
	CDnCNN	33.44	31.51	30.14	28.61	25.10
	FFDNet	34.66	32.35	30.81	29.18	27.33
	MFENANN	34.88	32.62	31.12	29.51	27.67
Kodak24	CBM3D	34.28	31.68	29.90	28.46	26.82
	CDnCNN	34.48	32.03	30.46	28.85	25.04
	FFDNet	34.63	32.13	30.57	28.98	27.27
	MFENANN	34.78	32.31	30.77	29.19	27.50
CBSD68	CBM3D	33.52	30.71	28.89	27.38	25.74
	CDnCNN	33.89	31.23	29.58	27.92	24.47
	FFDNet	33.87	31.21	29.58	27.96	26.24
	MFENANN	34.01	31.36	29.74	28.13	26.44

Table 9. The average runtimes (s) of several state-of-the-art methods for images in “Set12” at noise levels of 15, 25, 35, 50 and 75. The bold numbers are the largest ones at each corresponding noise level.

Methods	$σ = 15$	$σ = 25$	$σ = 35$	$σ = 50$	$σ = 75$
BM3D	1.04	0.93	1.01	1.5	1.47
SRR-A	157	60	37	21	13
DnCNN-B	0.22	0.22	0.21	0.21	0.21
FFDNet	0.37	0.39	0.35	0.38	0.37
MFENANN	0.48	0.48	0.49	0.47	0.49

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Song, X.; Gong, G.; Li, N. A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising. Electronics 2021, 10, 319. https://doi.org/10.3390/electronics10030319

AMA Style

Wang Y, Song X, Gong G, Li N. A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising. Electronics. 2021; 10(3):319. https://doi.org/10.3390/electronics10030319

Chicago/Turabian Style

Wang, Yi, Xiao Song, Guanghong Gong, and Ni Li. 2021. "A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising" Electronics 10, no. 3: 319. https://doi.org/10.3390/electronics10030319

APA Style

Wang, Y., Song, X., Gong, G., & Li, N. (2021). A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising. Electronics, 10(3), 319. https://doi.org/10.3390/electronics10030319

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising

Abstract

1. Introduction

2. Related Works

2.1. Residual Network

2.2. Batch Normalization and SENet

3. Proposed MFENANN for Image Denoising

3.1. Network Architecture

3.2. NAN

3.3. Role of MFEBlk

3.4. Complexity Analysis

4. Experiments

4.1. Dataset Generation and Experimental Settings

4.2. Comparison Methods

4.3. Ablation Experiment

4.4. ResNet vs. DenseNet

4.5. Experimental Results and Analysis

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI