Learning Medical Image Denoising with Deep Dynamic Residual Attention Network

: Image denoising performs a prominent role in medical image analysis. In many cases, it can drastically accelerate the diagnostic process by enhancing the perceptual quality of noisy image samples. However, despite the extensive practicability of medical image denoising, the existing denoising methods illustrate deﬁciencies in addressing the diverse range of noise appears in the multidisciplinary medical images. This study alleviates such challenging denoising task by learning residual noise from a substantial extent of data samples. Additionally, the proposed method accelerates the learning process by introducing a novel deep network, where the network architecture exploits the feature correlation known as the attention mechanism and combines it with spatially reﬁne residual features. The experimental results illustrate that the proposed method can outperform the existing works by a substantial margin in both quantitative and qualitative comparisons. Also, the proposed method can handle real-world image noise and can improve the performance of different medical image analysis tasks without producing any visually disturbing artefacts.


Introduction
Medical image denoising (MID) perceive as a process of improving the perceptual quality of degraded noisy images captured with specialized medical image acquisition devices. Regrettably, such imaging devices are susceptible to capture noise despite the altitude in imaging technologies [1]. However, the presence of noise in the images has a starling impact on medical image analysis as well as can convolute the decision making maneuver of an expert [1][2][3]. Hence, denoising has considered a classical yet strenuous medical image analysis task.
Typically, the MID applications examine image noise in the form of Gaussian distribution [2]. Which essentially depends on capturing conditions as well as the hardware configuration of the capturing devices [4]. Therefore, such sensor noise in MID remains blind-fold and substantially varies depending on image retrieval techniques (i.e., images capture with radiological devices employs a distinct noise factor comparing to microscopic modalities) [5]. In contrast, medical image analysis has to leverage multidisciplinary modalities in the visualization process of biological molecules for treatment purposes [6]. As a consequence, a large and diverse scale of MID is a relatively challenging task than a conventional image denoising method.
In the recent past, a substantial push has obtained in MID by introducing novel approaches such as non-local self-similarity (NSS) [7], sparse coding [8], filter-based methods [9][10][11], etc. Also, considering the massive success in various vision tasks, many recent studies [12][13][14][15] have appropriated deep learning as a viable alternative to the aforementioned MID methods. However, most of these recent studies have focused on an inadequate range of noise deviations as well as a narrow range of data diversity rather than striving to generalize their methods for multidisciplinary modalities. As a consequence, existing MID methods illustrate deficiencies in large-scale noise removal from medical images and immensely fail in numerous cases, as shown in Figure 1.  [11]. (c) Result obtained by DnCNN [16]. (d) Result obtained by Residual MID [12]. (e) Result obtained by DRAN (proposed). (f) Reference sharp image. Source by: (https://www.kaggle.com/mateuszbuda/lgg-mri-segmentation).
To alleviate the deficiencies of existing works, this study proposes a novel denoising method to learn the blind-fold residual noise from a convex set of medical images. Additionally, This study introduces a deep network for MID applications, which utilizes the feature correlation known as the attention mechanism [17][18][19][20] and combines it with refined residual learning [21,22] to illustrate supremacy over existing methods. Here, the attention mechanism leverage in such a manner that it can utilize the depth-wise feature correlation to aggregate a dynamic kernel throughout the convolution operation [23]. Also, this study proposes to refine the residual learning by using a spatial gating mechanism denoted as noise gate [19,20], which learns to control the low-level features propagation towards the top layers. To the best concern, this is the first work in the open literature, which comprehensively combines the feature correlation and refine the residual feature propagation, particularly for the MID applications. This study denotes the proposed deep model as a dynamic residual attention network (DRAN) in the rest of the sections. The feasibility of the proposed method has verified with real-world noisy medical images and fusing it with different medical image analysis tasks. The main contribution of the proposed method has summarized as follows: • Large-scale denoising: Introduces a novel method for learning multidisciplinary medical image denoising through a single deep network. Thus, large-scale noise can be handle without illustrating any artefacts. • Dynamic residual attention network: Proposes a novel deep network that combines feature correlation and refines the residual feature propagation for MID. The model intended to accelerate the denoising performance in a diverse dataset by recovering details. Code available: https: Dense experiments: Conducts dense experiments with a substantial amount of data samples. Therefore, the feasibility of the proposed method can identify in a diverse range of noisy images.

•
Real-world applications: Illustrates the denoising performance on noisy medical images, which are collected by employing actual hardware. Also, the proposed method has combined with different medical image analysis tasks to reveal the practicability in real-world applications.
The rest of the paper is structured such that Section 2 reviews the related works, Section 3 details the proposed method, Section 4 compare and analyse the experimental results, and Section 5 concludes this work.

Related Works
This section briefly reviews the works, which are related to the proposed method.

Medical Image Denoising
A substantial amount of novel methods have been proposed in the recent past. According to their optimization strategies, the MID works can be divided into two major categories: (i) Classical approaches and (ii) Learning-based approaches.
Classical approaches: The filter-based denoising method has dominated the classical medical image denoising for a long while. Most of the recent filter-based works focused on developing a low-pass filter, where the filter aims to eliminates precipitate peaks from a noisy image based on the local estimation [1]. Amongst the numerous variants of filter-based approaches, Gaussian averaging filters [24], median filters [25], mean filters [26], diffusion filters [27] are used widely for removing noise from specific types of medical imaging modalities such as ultrasound images , magnetic resonance images (MRI), computed tomography (CT) images, etc [25][26][27][28]. Despite the widespread usage of such filter-based techniques, such methods tend to smooth the given images while removing the noise.
Many recent works extended the MID by leveraging adaptive filters to address the deficiencies of previously mentioned static filter-based approaches. In these works, the authors emphasized to estimate weighted coefficients of an image by employing the statistical properties. The non-local denoising method, like Block Matching 3D (BM3D) [11], is one of the perfect examples of adaptive filter based techniques. Similarly, non-local means filters-based methods [29,30] for MR image denoising, Optimised Bayesian Non Local Mean (OBNLM) [31] and modified non local-based (MNL) [32] for ultrasound image denoising, bio-inspired bilateral filter [10] for CT image denoising are also representative of the adaptive filter-based techniques. However, such adaptive filter-based denoising methods are computationally expensive and unable to accommodate real-time results.
Another well-known classical MID genre is known to be multi-scale analysis based methods [33][34][35], where the representative techniques intended to process the noisy images in different image resolutions. In recent times, a notable amount of work exploited such denoising techniques and utilized the time-frequency analysis [1]. Nevertheless, in numerous cases, multi-scale medical image denoising illustrates deficiencies in specifying the distribution of noisy inputs in different scales. The drawbacks of the multi-scale methods have been addressed with nonlinear estimators [36] by a part. However, the performance of this classical image denoising category is still far away from the expectations.
Learning Based Methods:The learning-based image denoising method has started to draw attention in the MID domain very lately. In recent work, a feedforward autoencoder [13] has used to learn medical image denoising. However, the later study [12] on MID took the inspiration from [16] and improved the performance of their method by using residual learning. In exception, another study on MID [14] practiced a distinctive network designing strategy and used a genetic algorithm (GA) to search the hyperparameter of their deep network. Despite the satisfactory performance in specific noise levels as well as on specific datasets, none of the existing works generalize their method for multidisciplinary modalities.

Attention Guided Learning
The concept of attention mechanism has been taken from the human visual system. In deep learning, the attention mechanism has first introduced in the natural language processing domain for adaptively focuses on salient areas of a given input. By considering that approach of adaptive feature correlation is a success, many vision works quickly adopted similar concepts in computer vision applications such as super-resolution [37], in-painting [20], deblurring [38], etc. Among the recent works, a non-local spatial attention [39] formulated for video classification. Also, utilization of channel-wise interdependencies to obtain a significant performance gain over existing image classification methods [17].
In recent time, a few novel works leverage the residual learning along with the attention mechanisms. In [40], authors stacked attention modules in the feed-forward structure and combined with residual connections to train very-deep network to outperform their counterparts. Later, Ref. [41] also exploited similar residual-attention strategy in multi-scale network structure with static convolution operation to improve the accuracy of a classification task. Apart from being used in classification tasks, residual attention strategies has also illustrated a substantial push in image super-resolution. In recent work, Refs. [42,43] utilized feature attention after convolutional layers in a sequential manner to perform image super-resolution. In a later study, Ref. [44] combined spatial and temporal feature attention with residual connection after the convolutional operation to accelerate muli-image super-resolution.
Despite the success of residual-attention strategies in different computer vision tasks, the contemporary residual-attention approaches are suffering two-fold limitations. Firstly, existing residual-attention methods utilize the attention mechanism with static convolutional operation in a sequential manner. However, such stacked architecture tends to make the network designing relatively deeper and can introduce a tendency of suffering from vanishing gradient along with a larger number of trainable parameters [23]. Secondly, most of the existing DRAN utilized simple skip connections to accelerate residual learning. Notably, such straightforward skip connections can backfire the denoising by propagating unpruned low-level features. This study addresses the limitation of existing residual-attention strategies by incorporating attention-guided dynamic kernel convolutional operation known as dynamic convolution. Also, this study proposes to accelerate the residual learning by utilizing a noise gate. Which essentially aims to prune the low-level features spatial getting mechanism. Table 1 shows a concise comparison between different methods. Here, each category has been reviewed by distinguishing strengths and weaknesses.

Proposed Method
This study presents a novel denoising method for addressing medical images denoising by learning from large-scale data samples. This section details the methodology of the proposed work.

Network Design
The proposed method intended to recover the clean image c from a given noisy medical image v by learning residual noisy image n through the mapping function F : v → n. Hence, learning denoising for medical images can be derived for this study as follows: As Figure 2 shows, the proposed network is presented as an end-to-end convolutional neural network (CNN) [45]. Where the network utilized traditional convolutional operation for input and output as well as a novel dynamic residual attention block (DRAB) as the backbone of the main network. The input layer takes an normalized noisy image v ∈ [0, 1] M×N×3 and generate a normalized residual noise [16] extracted from the input as n ∈ [0, 1] M×N×3 . Here, M and N represents the height and width of the input as well as the output of the proposed DRAN.

Dynamic Residual Attention Block
The proposed dynamic residual attention block (DRAB) comprises a l = 5 number of dynamic convolution [23] layers stacked consecutively. Here, the dynamic convolutional were aims to improve the performance of traditional convolution by aggregating a d ∈ Z number of dynamic kernels (each with equal dimensions) by an attention mechanism [17]. Therefore, a static convolution y c = ψ(W T x + b) comprises of a weight matrix W, a bias term b, and activated with g(.) was replaced with {W T k x +b}. The aggregation of k ∈ Z number of linear function was obtained as: ψ(.) exploited as a ReLU activation and can be expressed as ψ(x) = max(0, x) .
π k (x) presents the attention mechanism, which has aggregated over the linear models {W T k x +b} for a given input x. Where the attention weights of π k (x) has obtained through a global feature descriptor by applying a global average pooling [17,37]. Where depth-wise squeezed descriptors z ∈ R C of an input feature map can be calculated as: Here, z c , H × W, and x present the global average pooling, spatial dimension, and input feature map.
The aggregated global dependencies through the gating mechanism applied as follows: Here, σ present the sigmoid activation as τ(x) = 1 1+e − x and δ ReLU activation as ψ(x) = max(0, x), which were applied after W E and W S convolutional operations.
The final depth attention map achieved by rescaling the feature map as follows: To perceive a faster convergence, each dynamic convolutional layer used in this paper normalized with a batch normalization function [46] as follows: where, E(x b ) and Var[x b ] denote the expectation of input x b and its variance, while γ b and η b denote the learnable parameters which intended to improve the model performance.
Apart from the dynamic convolution layers, the proposed network also intended to leverage the residual learning through the skip connection [21]. However, in the denoising, the skip connection can backfire by delivering the lower-level superfluous features towards the top levels. Therefore, this study leverage a spatial attention mechanism [19,20] denoted as a noise gate, which controls the propagation of trivial features by learning spatial feature correlation. The noise gating mechanism has obtained as follows: φ and τ presents the LeakyReLU and sigmoid activation functions as φ(x) = (0.1x, x) and τ(x) = 1 1+e − x . W g and W f represents convolutional operations.

Optimization
For a given the training set {v t , n t } P t=1 consisting of P pairs of images, the proposed DRAN learns to parameterized weights W R and intended to minimize the objective function as: Here, L denotes a pixel-wise loss, which can be calculated in the euclidean space as a form of L1 or an L2-norm [47]. However, due to the direct relation with PSNR values, the L2-norm is susceptible to produce smoother images [48]. As a consequence, this study employs an L1 norm as an objective function, which can be derived as follows: Here, n r and n g represent the output obtain through F(v) and simulated reference noise.

Data Preparation
Data preparation plays a crucial role in learning-based denoising methods [49]. For training purposes, it is mandatory to obtain a sufficient amount of data samples. Therefore, this study has collected a substantial amount of medical images capture through different modalities. Also, the collected data samples have processed (i.e., adding noise for training purposes) carefully for further study.

Data Collection
It always remains a challenging task in medical image analysis to collect a diverse range of data samples [13,50]. Also, none of the existing datasets offers a collection of images accumulated by different medical imaging technologies. However, to generalize the performance of any deep method on a spacious data space, a substantial amount of training data samples is mandatory [51]. To address this contradictory condition, this study collected enormous image samples from different sources and divided them into three categories. Brief detail about each data category described below: • Radiology: This category comprises the microscopic images collected from of four different modalities, including X-ray [52], MRI [53], CT scans [54] , and ultrasound [55] images.

•
Microscopy: This category includes microscopic images collected by histopathologic scan [56] and protein atlas scans [57]. • Dermatology: This category contains dermatoscopic images collected by different image acquisition methods [58].
A total of 711,223 medical images were collected by this study. Where 585,198 samples were used for model training and the rest of the 20 percent data used for performance evaluation. Figure 3 illustrates the sample images from each category.

Noise Modeling
Despite having a significant number of sample images, the collected dataset does not provide a training pair of reference and noise-contaminated input images. Therefore, reference-noisy image pairs have to be formulated by contaminating artificial noise. Here, a uniform noisy-image n s has generated from a given noise-free image c as: µ and σ 2 represent the mean and the variance of a Gaussian distribution (N ). This Gaussian noise added to the given clean-image c. The final noisy-image formed as: Figure 4 illustrate the sample of noisy-clean image pair along with the corresponding noise (generated).

Implementation Details
The proposed DRAN was designed as an end-to-end convolution network and implemented using the PyTorch framework [59]. This study utilized three consecutive DRABs for making a trade-off between performance and trainable parameters. Each layer of DRAB comprises a depth size of 64, a kernel size of 3, a padding size of 1, and a stride size of 1. Thus, the network can keep the output dimension identical to the input. Also, the network was optimized with an Adam optimizer [60] with β 1 = 0.9, β 2 = 0.99, and learning rate = 1e-4. The network trained on resized images with dimensions of 128 × 128 × 3 and contaminated with random noise (σ ∈ [0, 50]). The training process carried for 100,000 steps while keeping a batch size of 24. All experiments conducted on hardware incorporates an AMD Ryzen 3200G central processing unit (CPU) clocked at 3.60 GHz and a random-access memory of 16 GB. Also, an Nvidia Geforce GTX 1060 (6GB) graphical processing unit (GPU) was exploited to accelerate the training process.

Result and Analysis
The performance of the proposed method has been studied and compared with state-of-the-art denoising methods. Also, the feasibility of the proposed method in different medical image analysis tasks has verified with sophisticated experiments.

Comparison with State-of-the-Art Methods
In this study, three existing works have been selected for the comparison. These methods are: (i) BM3D [11], (ii) DnCNN [16], and (iii) Residual MID [12]. DnCNN [16] and Residual MID [12] both utilized residual learning for image denoising, while BM3D [11] is known to be one of the pioneers of the image denoising works. It worth noting, none of these works have been developed for addressing a diverse range of noise removal from medical images as intended in this work. Nevertheless, to make the comparison as fair as possible, both learning-based methods [12,16] have been trained and tested with the hyperparameters suggested in the actual implementations. Additionally, each model trained for 4∼5 days until they converge with the collected dataset mentioned in Section 3.2. Oppositely, the optimization-based method [11] studied for the comparison does not require any additional training similar to its counterparts. Subsequently, this study used the official implementation publicly available for the fair comparison.

Quantitative Comparison
This study incorporates a distinct evaluation strategy to study the feasibility of the compared denoising method in different noisy environments using two widely-used image assessment matrices: peak-signal-to-noise ratio (PSNR) [61] and structural-similarity-index metrics (SSIM) [62]. Such evaluation metrics meant to evaluate the reconstructed image quality by comparing it with the reference image, similar to the human visual system. Therefore, the higher value of such metric indicates the better performance of the target method [2,63]. Notably, this work leverage the PSNR metrics to calculate the noise ratio between reference and denoised image, while SSIM intends to evaluate the structural similarity, luminance, and contrast distortions. Overall, the performance of MID methods has summarized with the mean scores obtained from the evaluation metrics over individual categories as well as different noise deviations. Table 2 shows the quantitative comparison between different denoising methods for medical images. As the Table 2 depicts, the proposed method outperforms the existing MID methods by a distinguished margin in all compared combinations. Most notably, depending on the noise deviations, the proposed method can exceed its counterpart on the dermatology images by up to 13.75 dB in the PSNR metric and 0.0992 in the SSIM metric. Similarly, it is illustrated the supremacy over existing denoising methods by up to 10.91 dB in PSNR metric and 0.1137 in the SSIM metric on radiology images as well as 11.17 dB in PSNR metric and 0.3065 in SSIM metric on microscopy images. It worth noting the increment of noise in the images can deteriorate the performance of MID methods. However, the proposed method illustrates it's consistency in all categories. Overall, the Table 2 reveals a new dimension of multidisciplinary MID. Also, it demonstrates the practicability of a sophisticated denoising method for medical image analysis.

Qualitative Comparison
Qualitative evaluation plays an important role in medical image analysis [1,64]. Therefore, this study focused on qualitative comparison along with quantitative comparisons. Figures 5-7 illustrates the visual comparison between the proposed method and existing MID methods.   As can be seen, the proposed method is proficient in improving the perceptual quality of degraded noisy images dramatically. It can remove a substantial amount of noise while maintaining the details of a degraded input image. Most notably, the method shows its consistency over the existing MID techniques in all image categories without producing any visually disturbing artefacts.

Real-World Noise Removal
In the real-world scenario, the noise that appeared in the medical images can differ from the synthesized data. Therefore, to push the MID a furthermore, the feasibility of the proposed method has been studied with real-world noisy medical images. As Figure 8 illustrates, the proposed method can notoriously handle the real-world noise and substantially improve the perceptual image quality of noisy medical images by removing blind-fold noise.

Applications
A sophisticated denoising method can drastically accelerate the performance of computer-aided detection (CAD) and medical image analysis tasks by improving the perceptual quality of target images. To study further, the propose DRAN has combined with existing state-of-the-art medical image analysis methods to investigate the consequences.

Abnormalities Detection
Computer-aided detection has obtained momentum in medical image analysis by observing the oversights of given images [65,66]. However, the presence of sensor noise in a given image can misguide the detection system, as shown in Figure 9. Here, the effect of image noise has studied over tumour detection and localization on brain MRIs using state-of-the-art Mask R-CNN [67]. It can be apparent that even the well-known learning-based method struggle in localizing the abnormalities on a noisy image. Contrarily, the addition of proposed DRAN drastically improve the localization performance of the respective detection method by performing denoising.

Medical Image Segmentation
The image noise can also startlingly effect the medical image segmentation process similar to the detection methods, as shown in Figure 10. Here, segmentation has performed on brain MRIs using well-known U-Net architecture [68]. It has observed that image noise make the segmentation process substantially unsatisfactory. However, a sophisticated MID method like the proposed DRAN can assist the segmentation method by mitigating image noise.

Network Analysis
Despite being deeper, the proposed network comprises 2,458,944 parameters. It worth noting, the number of parameters can be altered by substituting the number of DRABs. The reduction of blocks can also deteriorate the performance of the proposed network while reducing inference time. However, for any quantity of DRABs, the proposed denoising network can take any dimension of images for the inference.
The DRAB also has a significant impact on the learning processes. Expressly, the noise gate introduced in the DRAB plays a crucial role in the network stability. Figure 11 illustrates the impact of the noise gate on the training phase. Notably, the noise gate allowed the proposed DRAN to perceive a faster convergence even with a very complex set of medical images.  . Graph of training loss. It is visible that the noise gate used in DRAB illustrates far more training stability. Also, it helps the proposed method to encounter faster convergence time.
Apart from the training stability, the noise gate also has a clear impact on the performance gain. Table 3 demonstrates that the noise gate improves the performance of the proposed DRAN drastically among all medical image categories. Here, the performance metrics calculated by exploiting random noise. Both models have evaluated the same noisy images during their training phases. Also, the evaluation has repeated over every 5000 steps for consistency.

Discussion
Despite the extensive experiments, it is undeniable that the proposed study encounters a few limitations. Contrastly, the experimental results of this study reveals a dimension of medical image denoising an apart.
Similar to the existing works, one of the limitations of this study identified as lacking real-world training data. Although the data samples used in this study are synthesized with artificial noise, nonetheless, in numerous instances, the simulated data can differ from the real world noise. Also, due to the lacking of reference images, the quantitative performance of the proposed method remains underexplored, particularly on real-world noisy medical images.
The limitation and observation perceive through the proposed study reveal an interesting future dimension of MID methods. Despite the sophisticated preprocessing, it has found that the reference images can contain noise. Therefore, it would be interesting to extend this work in an unsupervised manner. Also, for the generalization, the proposed study conducted all testing and training on three-channel RGB images. However, depending on the application, the proposed DRAN can be optimized by incorporating a single-channel dataset. In the foreseeable future, it has planned to study the feasibility of the proposed study in one channel data as well as by exploiting unsupervised learning.

Conclusions
This work presented a novel end-to-end learning-based denoising method for medical image analysis. Additionally, it has illustrated that MID can be generalized by utilizing large-scale multidisciplinary images rather than learning from a small range of homogeneous data samples. The proposed method also incorporates a novel deep network, which combines the attention mechanism and spatially-refine residual learning in a feed-forward manner. Notably, such a comprehensive learning strategy allowed this study in drastically improving the denoising performance, particularly for medical images. The experimental results illustrate that the proposed method can outperform the existing works by a distinguishable margin while maintaining details. Also, the practicability of the proposed denoising method has explicitly inspected by employing sophisticated experiments. It has planned to extend the proposed method by exploiting unsupervised learning in the foreseeable future. Funding: This research is an independent work and didn't receive any additional funding.