Channel-Wise Attention Mechanism in the 3D Convolutional Network for Lung Nodule Detection

Zhu, Xiaoyu; Wang, Xiaohua; Shi, Yueting; Ren, Shiwei; Wang, Weijiang

doi:10.3390/electronics11101600

Open AccessArticle

Channel-Wise Attention Mechanism in the 3D Convolutional Network for Lung Nodule Detection

by

Xiaoyu Zhu

¹

,

Xiaohua Wang

^1,2,

Yueting Shi

^1,3

,

Shiwei Ren

^1,2

and

Weijiang Wang

^1,2,*

¹

School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China

²

School of Integrated Circuits and Electronics, Beijing Institute of Technology, Chongqing Center for Microelectronics and Microsystems, Chongqing 401332, China

³

Micro Nano Device and System Innovation Research Center, Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing 314019, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(10), 1600; https://doi.org/10.3390/electronics11101600

Submission received: 22 April 2022 / Revised: 12 May 2022 / Accepted: 13 May 2022 / Published: 17 May 2022

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Pulmonary nodule detection is essential to reduce the mortality of lung cancer. One-stage detection methods have recently emerged as high-performance and lower-power alternatives to two-stage lung nodule detection methods. However, it is difficult for existing one-stage detection networks to balance sensitivity and specificity. In this paper, we propose an end-to-end detection mechanism combined with a channel-wise attention mechanism based on a 3D U-shaped residual network. First, an improved attention gate (AG) is introduced to reduce the false positive rate by employing critical feature dimensions at skip connections for feature propagation. Second, a channel interaction unit (CIU) is designed before the detection head to further improve detection sensitivity. Furthermore, the gradient harmonizing mechanism (GHM) loss function is adopted to solve the problem caused by the imbalance of positive and negative samples. We conducted experiments on the LUNA16 dataset and achieved a performance with a competition performance metric (CPM) score of 89.5% and sensitivity of 95%. The proposed method outperforms existing models in terms of sensitivity and specificity while maintaining the attractiveness of being lightweight, making it suitable for automatic lung nodule detection.

Keywords:

lung nodule detection; encoder-decoder; improved attention gate; channel interaction unit

1. Introduction

Lung cancer originating from epithelial tissues is the most dangerous tumor to human health and life. Its morbidity and mortality rate ranks first in the world [1]. Most lung cancers have no obvious symptoms in the early stage and are easily overlooked. Therefore, only by early detection, correct diagnosis, and precise treatment can the survival period of patients be fundamentally extended. Computer-aided diagnosis (CAD) systems to accurately detect lung nodules on low-dose computed tomography (CT) images of the lungs is essential for reducing lung cancer mortality [2].

At present, mainstream lung nodule detection methods based on convolutional neural networks (CNNs) are emerging in an endless stream. To achieve high sensitivity while minimizing false-positive predictions, most of them adopt two-stage detection strategies: (1) candidate nodule detection, aimed at detecting as many suspected nodules as possible; and (2) false-positive reduction (FPR), aimed at filtering true nodules from the suspected nodule areas, such as blood vessels, shades, bronchioles, bifurcation points, and ribs, obtained from the first step. Ding et al. [3] first introduced the deconvolution structure into faster-RCNN to reduce the incidence of false-positive nodules. To reduce the computational cost, they used two-dimensional axial slices as the input of the candidate nodule detection network. Then, the obtained candidate nodule-centered 3D patches are sent to a 3D convolutional neural network to reduce false positives. Setio et al. [4] proposed a computer-aided detection (CAD) system based on multi-view convolutional networks (ConvNets). Three detection networks are designed for solid nodules, subsolid, and large nodules in the candidate nodule detection stage. Then, a set of two-dimensional patches extracted from different directions for each candidate nodule are fed into two-dimensional ConvNets. The final classification result is obtained by a special fusion method. Cao et al. [5] proposed a new method of FPR: 87 features extracted from the candidate nodules are fed into SVM, and the relatively key features are selected as the inputs of FPR. Dou et al. [6], aiming to reduce false positives, developed three independent three-dimensional convolutional networks, each of which encodes a specific level of context information. The final classification result is obtained by combining three probability prediction outputs. Kang et al. [7] employed a multi-view one-network strategy using 3D multi-view convolutional neural networks (MV-CNN) including 3D Inception and 3D Inception-ResNet. Liu et al. [8] proposed a 3DFPN-HS2 framework based on a three-dimensional feature pyramid network for high-sensitivity nodule detection and high-specificity false-positive reduction. Saradhi et al. [9] proposed a new convolutional neural network called multi-scale CNN (MCNN-CF), which uses multi-scale 3D patches as inputs and performs the fusion of intermediate features in different ways at two different network depths.

Although these two-stage lung nodule detection methods have achieved high sensitivity and specificity, they usually consume significant computing resources. Therefore, many researchers have recently done a lot of work to attain one-stage lung nodule detection.

In object detection, a strategy that integrates feature extraction, proposal extraction, bounding box regression, and classification into a network has been proven to improve the overall performance of the detection network. Based on this, Zhu et al. [10] proposed a fully automatic lung cancer diagnosis system, which is a U-shaped dual-path detection network based on 3D Faster RCNN that combines the advantages of residual learning and dense connections. Khosravan et al. [11] proposed a new lung nodule detection method, S4ND, based on deep learning. The network is designed as a 3D convolutional neural network (CNN) with dense connections, trained in an end-to-end manner. Li et al. [12] proposed 3D compression with excitation encoder-decoder structure and adopted focal loss as the loss function to solve the imbalance of positive and negative samples. Shi et al. [13] proposed a brain-inspired LIF-NET using the spatiotemporal properties of the novel LIF neural network and achieved comparable detection accuracy while reducing computation complexity by over 60%. While the one-stage lung nodule detection methods described above are fast and consume fewer resources, they usually have a higher false-positive rate. It is difficult to use an end-to-end pulmonary nodule detection model to achieve high sensitivity while minimizing false-positive predictions.

In recent years, the attention mechanism has been widely used in various deep learning tasks, such as medical image processing [12,14], natural language processing [15,16], object detection [17], image segmentation [18,19], and human–object interaction recognition [20]. It is one of the crucial deep learning technologies that deserve the most attention and in-depth understanding. In essence, the attention mechanism in deep learning is similar to the human selective visual attention mechanism. The core goal is to select the more critical information from numerous details for the current task.

To solve the above problems, and inspired by the attention mechanism, we first propose a U-shaped encoder-decoder structure to extract image features at multiple image scales. Second, in the decoding stage, the feature maps extracted at various scales are merged through skip connections, and improved attention gates (AGs) are adopted to eliminate noise and combine features with solid correlations at the skip connections. Third, we add a post-processing module—the channel interaction unit (CIU)—before the final detection head to obtain the importance of all channels so that the high-dimensional features are fully extracted. The module constructs a featured reward and punishment strategy and realizes the channel’s adaptive calibration. For the problem of the imbalance in the number of positive samples (true nodules) and negative samples (false-positive nodules), the GHM loss function is used to reduce the negative impact of uneven sample distribution on detection accuracy. We evaluate our proposed method on the LUNA16 dataset. The experimental results show that our approach has better nodule detection performance than other methods. In addition, by designing ablation experiments, we prove that improved AG and CIU effectively improve sensitivity and reduce false-positive rates.

The contributions of this paper are summarized as follows:

An end-to-end lung nodule detection network with a U-shaped encoder-decoder structure is proposed to improve model sensitivity and specificity.
Improved attention gates (AGs) are designed in skip connections, which are conducive to reducing false positives in a one-stage pulmonary nodules detector.
A post-processing module—the channel interaction unit (CIU)—is introduced to obtain the importance of each feature channel, and more targeted image features are extracted to fully optimize the network performance.
We validate our proposed framework on the LUNA16 dataset. GHM loss is used as the loss function to solve the imbalance between positive and negative samples. Experimental results show that our proposed method can achieve high sensitivity and specificity.

2. Materials and Methods

The proposed channel-wise attention nodule detection Network (CANN) is illustrated in Figure 1. Specifically, the network is built upon a 3D encoder-ecoder structure with a post-processing module, named the CIU, which is enhanced by the improved AG to reduce false-positive rates.

2.1. Network Structure

The CANN adopts an encoder-decoder structure similar to U-Net. A detailed network structure for the CANN is shown in Table 1. Please note that we do not show skip connections integrated with the improved AGs in the table.

Encoder Network. The encoder network is directly adapted from the 2D ResNet-18 [21] by extending 2D residual blocks to 3D and changing 7 × 7 filters into 3 × 3 [10,22]. It starts with a pre-block consisting of two 3 × 3 × 3 convolutional layers, followed by two 3D batch normalization (BN) layers [23]. Then, this is followed by four 3D residual blocks, interleaved with four 3D max-pooling layers.

Decoder Network. The decoder network is composed of two 2 × 2 × 2 deconvolutional layers (ConvTransposed 3D) followed by two 3D batch normalization (BN) layers to upsample the feature map so that the feature map scale becomes twice as large as the original. The “concat” refers to skip connections integrated with the improved AGs, which are expected to selectively merge features from the encoder to features from the decoder. Then, the output of the previous layer is sent to a 3D residual block. Then, a post-processing attention extraction module—the CIU—is designed to obtain the importance of different feature channels and achieve adaptive calibration without changing the size of the feature. Finally, the detection head follows it.

Data Flow Description. The inputs are 4D patches with the size of

96 \times 96 \times 96 \times 24

, and intermediate features with the size of

6 \times 6 \times 6 \times 64

, are obtained through the encoder structure. After that, they are passed through a deconvolution layer, and two skip connections integrated with two AGs to obtain features with the size of

12 \times 12 \times 12 \times 128

fused with low-scale and high-scale information. The above operation is repeated once. After this, the CIU does not change the size of feature maps. Finally, the detection head adjusts the 4D output tensors to

24 \times 24 \times 24 \times 3 \times 5

. The last two dimensions correspond to the number of anchors and the number of parameters of the regression box, respectively. Initiated by RPN, the network has three anchors of different proportions at each position, which are 5 mm, 10 mm, and 20 mm. The model’s output includes five values: the predicted probability of the nodules, the 3D space position coordinates of the nodules, and the diameter of the nodules.

2.2. The Improved Attention Gate

The fusion of features on different scales is an essential means to improve the performance of lung nodule detection. The low-level features have a higher resolution and contain more location and detailed information, but owing to fewer convolutions, they have lower semantics and more noise. High-level features have more robust semantic details, but the resolution is shallow, and the perception of particulars is poor. Our proposed encoder-decoder model combines feature maps extracted on multiple scales through skip connections to combine coarse-grained and fine-grained features, which is beneficial for detecting nodules of different sizes.

The success of U-Net and FPN has proved that multiple cascaded frameworks are beneficial to integrate richer semantic information and spatial features, thereby making dense predictions on particular region of interest (ROI). In medical image segmentation, some frameworks divide the task into separate positioning and subsequent segmentation steps. However, in the cascaded framework, low-level features from the bottom layer of the network are repeatedly extracted and used, which leads to feature redundancy and waste of computing resources. Attention U-Net [14] proves that the same goal can be achieved by integrating attention gates in a standard CNN model. Inspired by it, we propose the improved attention gate (AG) with the dynamic ReLU (DY-ReLU) in our detection network. AG can automatically learn to focus on the target area of feature maps without additional supervision, and additional model parameters will not be introduced. As shown in Figure 2, this module generates a gate signal to control the importance of features at different spatial locations. Before splicing the features at each resolution of the encoder with the corresponding features from the decoder, AG is used, and the output features of the encoder are readjusted to eliminate the ambiguity caused by the irrelevance and noise response in the skip connection. Thus, only the feature values with strong correlations are merged. Compared with the two-stage lung nodule detection model, the one-stage lung nodule detector fused with the improved AGs can gradually suppress the characteristic response of irrelevant background regions, which plays an essential role in improving detection sensitivity and reducing the false-positive rate.

We assume that the feature output from the encoder is

x_{i, e n c o d e r}^{l} \in R^{F_{l}}

, where

F_{l}

represents the number of feature maps of layer l, and i represents pixel i. The feature to be spliced with it from the decoder is

x_{i, d e c o d e r}

(gate signal). First,

x_{i, e n c o d e r}^{l}

and

x_{i, d e c o d e r}

respectively pass through two independent linear mapping units. Here, we use channel-wise

1 \times 1 \times 1

convolutions and linearly map them to the same

R^{F_{i n t}}

-dimensional intermediate space, and then the mapped features are simply added to achieve the integration of information. To increase the nonlinearity, the dynamic ReLU (DY-ReLU) activation function [24] is applied, and the parameters of DY-ReLU are obtained from the hyperfunction according to the input. The hyperfunction synthesizes the input of each dimension context to adaptively activate the function, which can significantly improve the expressive ability of the network with a small number of additional calculations.

Then, a sigmoid function is used to normalize the attention coefficients. Finally, the attention coefficients are multiplied by the feature

x_{e n c o d e r}

from the encoder to obtain the salient feature

x_{e n c o d e r}^{^{'}}

passed through the skip connection so that the subsequent decoding operation is performed.

The improved AG is formulated as follows:

\begin{matrix} α_{i}^{l} = ψ^{T} (D y R e L U (W_{e}^{T} x_{i, e n c o d e r}^{l} + W_{d}^{T} x_{i, d e c o d e r} + b_{e})) + b_{ψ} \\ x_{i, e n c o d e r}^{^{'} l} = S i g m o i d (α_{i}^{l} x_{i, e n c o d e r}^{l}) \end{matrix}

(1)

where linear transformations

W_{e}^{T} \in R^{F_{l} \times F_{i n t}}

,

W_{d}^{T} \in R^{F_{d} \times F_{i n t}}

,

ψ^{T} \in R^{F_{i n t} \times F_{1}}

, and bias terms

b_{e} \in R

,

b_{ψ} \in R^{F_{i n t}}

.

DY-ReLU. As shown in Figure 3, DY-ReLU is a dynamic piecewise function whose parameters (slope, intercept) depend on the input [24]. For a given input vector x, the dynamic activation function consists of two parts:

θ (x)

calculates the parameters of the activation function;

f_{θ (x)} (x)

uses the parameters to output a corresponding activation value.

DyReLU activation function, in essence, is the integration of PReLU [25] activation function and SE-Net [26]. SE-Net is a classical channel attention mechanism. In SE-Net, each feature channel obtains a weight learned by fully connected networks (FCN). The weights of the DyReLU activation function are also obtained through a fully connected network. These weights are set as the coefficients in the PReLU activation function, that is, the slope and intercept of the positive and negative parts. Therefore, on the one hand, the use of DyReLU echoes the theme of this paper: channel-wise attention. On the other hand, the DyReLU activation function enables each sample to have its own unique set of nonlinear transformations, providing a more flexible dynamic nonlinear transformation mode, which has the potential to improve the pulmonary nodules detection accuracy.

2.3. Channel Interaction Unit

For lung nodule detection, each channel of feature maps extracted after a series of convolution operations represents a special detector. Therefore, fusing targeted feature information between different channels is helpful to improve diagnostic accuracy. In the classic channel attention SE-Net [26], the input features first go through channel-wise global average pooling (GAP), then through two fully connected layers. Finally, the weight of each channel is generated by a nonlinear sigmoid activation. The role of a fully connected layer is to capture the nonlinear interaction between the cross-channels, which can effectively reduce dimensionality. Although this strategy is widely used in some tasks, experience tells us that dimensionality reduction is not conducive to capturing the dependencies between channels. It will inevitably increase the computational cost. Inspired by ECA-Net [27], we extend SE-Net to 3D detection and improve it by replacing fully connected layers with a 3D convolution operation. To balance the relationship between performance and complexity, the channel interaction unit (CIU) further enhances the effectiveness of nodule detection by capturing the cross-channel local interaction information among adjacent channels.

The structure of the CIU is shown in Figure 4. Adaptive global average pooling (GAP) is used for the input feature map to obtain features with a size of

1 \times 1 \times 1 \times C

. Then, after

3 \times 3

two-dimensional convolution, the excitation operation can be realized by direct correspondence between channels and weights. Then, the sigmoid activation function is used to obtain a value of 0–1, representing the interaction attention score. The final output of this module is the product of the attention score and the initial input.

2.4. Designed Loss Functions for Sample Imbalance

The loss function consists of classification and regression loss. In order to solve the problem of sample imbalance, we adopt the gradient harmonizing mechanism (GHM) [28] to compute regression loss and classification loss:

L_{G H M - R} = \sum_{i = 1}^{N} \frac{A S L_{1} (d_{i})}{G D (g r_{i})}

(2)

L_{G H M - C} = \sum_{i = 1}^{N} \frac{L_{C E} (p_{i}, p_{i}^{*})}{G D (g_{i})}

(3)

where ASL1 refers to the modified smooth L1 loss, and

L_{C E}

refers to the cross-entropy loss. GD(·) is gradient density:

G D (g) = \frac{1}{l_{ε} (g)} \sum_{k = 1}^{N} δ_{ε} (g_{k}, g)

(4)

where

δ_{ε} (g_{k}, g)

represents the number of samples in 1-N whose gradient modulus length is distributed within the range of

(g - \frac{ε}{2}, g + \frac{ε}{2})

.

3. Experimental Results

3.1. Datasets

We evaluate the proposed lung nodule detection framework on the LUNA16 dataset [29]. LUNA16 (Lung Nodule Analysis 16) is a lung nodule detection dataset launched in 2016, which aims to be used as a benchmark for evaluating various computer-aided detection systems (CAD). LUNA16 is the subset of a larger dataset, LIDC-IDRI, which has 1018 CT scans. In the LUNA16 dataset, CT scans with thicknesses greater than 3 mm were removed. At the same time, the CT scans of inconsistent slice spaces and missing slices were also removed. Finally, 888 CT scans were generated, and a total of 1186 nodules were marked.

3.2. Implementation Details

During training, a data block with a size of

96 \times 96 \times 96

is cropped as the input of the lung nodule detection model. All patches are randomly cropped, flipped, and zoomed in for data enhancement. We train our network on 2 NVIDIA 2080Ti GPUs with a batch size of 16 and use an Adam optimizer with a learning rate of 0.0001. We use 10-fold cross-validation to test the accuracy of the algorithm. The evaluation indicators are the free-response receiver operating characteristic (FROC) and competition performance metric (CPM) [29]. FROC is an index used to comprehensively evaluate the sensitivity and specificity of the detection model. The FROC curve is a function of sensitivity relative to false positives per scan (FPs/scan). The detection index of LUNA16 is the CPM value, which is the average sensitivity when the number of false positives per scan in each group of CT images is 0.125, 0.25, 0.5, 1, 2, 4, and 8.

3.3. Experiments on LUNA2016

3.3.1. Performance of Different Methods under Comparison

To verify the effectiveness of our proposed network, we conduct experiments on the LUNA16 dataset and compare FROC values with classical end-to-end pulmonary nodule detection models (3D RPN [22], DeepLung [10], and DeepSeed [12]). As shown in Table 2, the proposed network achieves a CPM score of 89.5 % on LUNA16, which outperforms results obtained from the above end-to-end pulmonary nodule detection models. The FROC curve obtained by CANN is shown in Figure 5. Figure 6 displays detection results for three instances, with each column representing one instance. The rows, from top to bottom, represent DeepLung, DeepSeed, the proposed CANN, and ground truth. It can be observed from the first column that CANN has a better ability to detect tiny nodules located at the pulmonary margins. As shown in the second column, the proposed CANN can frame the position of nodules more accurately, and the size of the predicted box is closer to the ground-truth box. The third column shows that CANN can reduce false-positive predictions. To sum up, CANN has a decisive advantage in predicting precise bounding box sizes and exact locations of bounding boxes, achieving high detection confidence for small and large nodules, and reducing false positives.

We compare the number of parameters and sensitivity of our proposed model with some 2D and 3D two-stage lung nodule detection models. The results are shown in Table 3. It can be observed that our proposed lung nodule detection network has fewer parameters and higher sensitivity.

3.3.2. Ablation Studies

In order to verify the role of modules in the detection network proposed by us in enhancing sensitivity and specificity, we randomly select one fold of the 10-fold cross-validation as a test set, and compare the baseline with its upgraded versions with GHM, improved AG, and CIU. Table 4 shows the results of the ablation study. When FPs/scan is set to 0.125, the improved AG increases the detection sensitivity from 0.775 to 0.798, and CIU further increases the sensitivity to 0.899. When FPs/scan is set to 0.25, the improved AG increases the detection sensitivity from 0.837 to 0.899, and CIU further increases the sensitivity to 0.922. This experimental result indicates that the improved AG and CIU can significantly reduce the false-positive rates in an end-to-end lung nodule detector. The vertical coordinate of the FROC curve is sensitivity, which measures the proportion of positives that can be correctly identified, and the horizontal coordinate of the FROC curve is FPs/scan, which measures the proportion of negatives incorrectly predicted as positives to the actual number of negatives. Therefore, the lower the FPR, the higher the sensitivity, which is enough to indicate the better performance of the detector. As can be seen in Figure 7, the FROC curves show the results of the ablation experiment more intuitively.

There is an issue observed from Table 4 and Figure 7: the sensitivities of low FP levels are boosted considerably, with an improvement of 10.1% at 1/8 FP level and an improvement of 3.7% at 1/4 FP level, respectively. However, the sensitivity at the high FP level drops a little bit (96.1% vs. 97.7% at 8 FP level). We find similar problems in [3,31]. Through analyzing and comparing, we think this is due to the overly strict classification resulting in some true-positive samples being wrongly identified as false-positive samples, so the true positives are filtered out, and the sensitivities drop slightly. Our proposed method achieves low false positives, but the detection rates need further improvement, especially at the high FP level. This will be a key research direction for our future work.

We design an experiment based on one fold of the 10-fold cross-validation as a test set to verify that using two AGs at two respective skip connections enables finer attention inference. In Figure 1, the first AG is named AG1, and the second AG is named AG2. We compare the results of the four cases: baseline, baseline + AG1, baseline + AG2, and baseline + AG1 + AG2. Experimental results are shown in Table 5. We can observe that using the improved AG twice produces a better CPM score.

3.3.3. Grad-CAM Visualizations

For qualitative analysis and demonstrating the effect of the CIU module on improving the attention to lung nodules, we apply Grad-CAM [32] to analyze our detection network using a 3D data block from LUNA16 as the input. Grad-CAM is a visualization method used to explain the representation characteristics of neural networks. It uses gradients to calculate the importance of different spatial positions in the output feature maps of a target convolution layer. Because the gradients are calculated concerning only one class, nodules, we can judge the feature extraction ability of the target layer by observing the highlighted area considered essential for prediction in the heatmaps.

We conduct Grad-CAM visualization experiments on the outputs of the CIU module and its previous layer (back2), respectively. Figure 8 illustrates the Grad-CAM visualization results. We can see that the Grad-CAM masks of the CIU module can focus on the target area, while the concentration of back2 is not apparent, and some cases even show that it pays more attention to background information. That is, the CIU module helps our network learn to leverage the information and aggregate features from the target layer well.

4. Conclusions

In this paper, we construct an end-to-end lung nodule detection network based on a U-shaped residual network combined with a channel-wise attention mechanism. To achieve high sensitivity while minimizing false-positive predictions, we adopt improved AGs at skip connections and CIU as the feature post-processing module. To overcome the imbalance of positive and negative samples, we assume GHM loss during the training procedure. We conduct comparative experiments with some classical end-to-end lung nodule detection models and ablation studies. In addition, we represent the visualization results of nodules. All of these demonstrate that our proposed method is critical to improving lung nodule detection performance. In future work, we will further improve diagnostic accuracy through more efficient data enhancement and apply our proposed method to other deep-CNN-based medical imaging analyses.

Author Contributions

Conceptualization, X.Z. and Y.S.; methodology, X.Z.; investigation and validation, X.Z., X.W. and Y.S.; data curation and formal analysis, X.Z., X.W., and W.W.; writing—original draft preparation, X.Z. and Y.S.; writing—review and editing, S.R., X.W. and W.W.; funding acquisition, S.R. and W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Chongqing Natural Science Foundation (Grant No. cstc2021 jcyj-msxmX1096).

Data Availability Statement

https://luna16.grand-challenge.org/.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jemal, A.; Siegel, R.; Xu, J.Q.; Ward, E. Cancer Statistics. CA Cancer J. Clin. 2017, 67, 7–30. [Google Scholar]
Firmino, M.; Angelo, G.; Morais, H.; Dantas, M.R.; Valentim, R. Computer-aided detection (CADe) and diagnosis (CADx) system for lung cancer with likelihood of malignancy. Biomed. Eng. Online 2016, 15, 15–17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ding, J.; Li, A.; Hu, Z.; Wang, L. Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Quebec City, QC, Canada, 10–14 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 559–567. [Google Scholar]
Setio, A.A.; Ciompi, F.; Litjens, G.; Gerke, P.; Jacobs, C.; Van Riel, S.J.; Wille, M.M.; Naqibullah, M.; Sánchez, C.I.; Van Ginneken, B. Pulmonary nodule detection in CT images: False positive reduction using multi-view convolutional networks. IEEE Trans. Med Imaging 2016, 35, 1160–1169. [Google Scholar] [CrossRef] [PubMed]
Cao, P.; Yang, J.; Li, W.; Zhao, D.; Zaiane, O. Ensemble-based hybrid probabilistic sampling for imbalanced data learning in lung nodule CAD. Comput. Med. Imaging Graph. 2014, 38, 137–150. [Google Scholar] [CrossRef] [PubMed]
Dou, Q.; Chen, H.; Yu, L.; Qin, J.; Heng, P.A. Multilevel contextual 3-D CNNs for false positive reduction in pulmonary nodule detection. IEEE Trans. Biomed. Eng. 2017, 64, 1558–1567. [Google Scholar] [CrossRef] [PubMed]
Kang, G.; Liu, K.; Hou, B.; Zhang, N. 3D multi-view convolutional neural networks for lung nodule classification. PLoS ONE 2017, 12, e0188290. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, J.; Cao, L.; Akin, O.; Tian, Y. 3DFPN-HS²: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection; Springer: Cham, Switzerland, 2019. [Google Scholar]
Mps, A.; Tv, B. Multiscale CNN with compound fusions for false positive reduction in lung nodule detection—ScienceDirect. Artif. Intell. Med. 2021, 113, 102017. [Google Scholar]
Zhu, W.; Liu, C.; Fan, W.; Xie, X. Deeplung: Deep 3D dual path nets for automated pulmonary nodule detection and classification. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 673–681. [Google Scholar]
Khosravan, N.; Bagci, U. S4ND: Single-shot single-scale lung nodule detection. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; pp. 794–802. [Google Scholar]
Li, Y.; Fan, Y. DeepSEED: 3D Squeeze-and-Excitation Encoder-Decoder Convolutional Neural Networks for Pulmonary Nodule Detection. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging, Iowa City, IA, USA, 3–7 April 2020; pp. 1866–1869. [Google Scholar]
Shi, Y.; Li, H.; Zhang, H.; Wu, Z.; Ren, S. Accurate and Efficient LIF-Nets for 3D Detection and Recognition. IEEE Access 2020, 8, 98562–98571. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Zhu, X.; Su, W.; Lu, L.; Li, B.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
Li, X.; Zhou, T.; Li, J.; Zhou, Y.; Zhang, Z. Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation. arXiv 2020, arXiv:2012.05007. [Google Scholar]
Zhou, T.; Wang, S.; Zhou, Y.; Yao, Y.; Li, J.; Shao, L. Motion-Attentive Transition for Zero-Shot Video Object Segmentation. IEEE Trans. Image Process. 2020, 34, 13066–13073. [Google Scholar] [CrossRef]
Zhou, T.; Wang, W.; Qi, S.; Ling, H.; Shen, J. Cascaded Human-Object Interaction Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Liao, F.; Liang, M.; Li, Z.; Hu, X.; Song, S. Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3484–3495. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic ReLU. arXiv 2020, arXiv:2003.10027. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision, Washington, DC, USA, 7–13 December 2015. [Google Scholar]
Jie, H.; Li, S.; Gang, S.; Albanie, S. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
Li, B.; Liu, Y.; Wang, X. Gradient Harmonized Single-stage Detector. arXiv 2018, arXiv:1811.05181. [Google Scholar] [CrossRef]
Setio, A.A.; Traverso, A.; De Bel, T.; Berens, M.S.; Van Den Bogaard, C.; Cerello, P.; Chen, H.; Dou, Q.; Fantacci, M.E.; Geurts, B.; et al. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The luna16 challenge. Med. Image Anal. 2017, 42, 1–13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Gong, Z.; Li, D.; Lin, J.; Zhang, Y.; Lam, K.M. Towards Accurate Pulmonary Nodule Detection by Representing Nodules as Points With High-Resolution Network. IEEE Access 2020, 8, 157391–157402. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 618–626. [Google Scholar]

Figure 1. Scheme of the proposed channel-wise attention nodule detection network (CANN).

Figure 2. Scheme of the improved attention gate (AG).

Figure 3. Scheme of dynamic ReLU (DY-ReLU).

Figure 4. Scheme of the proposed channel interaction unit (CIU).

Figure 5. FROC curve obtained by the proposed CANN. The two dotted lines respectively represent the upper and lower boundaries after bootstrapping.

Figure 6. Visualization of detection results and nodule ground truths.

Figure 7. FROC curves obtained by the proposed lung nodule detection network and its degraded versions.

Figure 8. Grad-CAM visualization results. The first row represents the input images, the second row denotes the output heatmaps of back2, and the third row represents the output heatmaps of the CIU.

Table 1. Detailed network structure for CANN.

Encoder	Weights	Output
preblock	$3 \times 3 \times 3 \times 24, \times 2$	$96 \times 96 \times 96 \times 24$
resblock1	$\{\begin{matrix} 3 \times 3 \times 3 \times 32 \\ 3 \times 3 \times 3 \times 32 \end{matrix}\} \times 2$	$96 \times 96 \times 96 \times 32$
maxpooling	$2 \times 2 \times 2, s t r i d e = 2$	$48 \times 48 \times 48 \times 32$
resblock2	$\{\begin{matrix} 3 \times 3 \times 3 \times 64 \\ 3 \times 3 \times 3 \times 64 \end{matrix}\} \times 2$	$48 \times 48 \times 48 \times 64$
maxpooling	$2 \times 2 \times 2, s t r i d e = 2$	$24 \times 24 \times 24 \times 64$
resblock3	$\{\begin{matrix} 3 \times 3 \times 3 \times 64 \\ 3 \times 3 \times 3 \times 64 \end{matrix}\} \times 3$	$24 \times 24 \times 24 \times 64$
maxpooling	$2 \times 2 \times 2, s t r i d e = 2$	$12 \times 12 \times 12 \times 64$
resblock4	$\{\begin{matrix} 3 \times 3 \times 3 \times 64 \\ 3 \times 3 \times 3 \times 64 \end{matrix}\} \times 3$	$12 \times 12 \times 12 \times 64$
maxpooling	$2 \times 2 \times 2, s t r i d e = 2$	$6 \times 6 \times 6 \times 64$
Decoder	Weights	Output
deconvolution1+concat	$2 \times 2 \times 2 \times 64$	$12 \times 12 \times 12 \times 128$
resblock5	$\{\begin{matrix} 3 \times 3 \times 3 \times 64 \\ 3 \times 3 \times 3 \times 64 \end{matrix}\} \times 3$	$12 \times 12 \times 12 \times 64$
deconvolution2+concat	$2 \times 2 \times 2 \times 64$	$24 \times 24 \times 24 \times 128$
resblock6	$\{\begin{matrix} 3 \times 3 \times 3 \times 64 \\ 3 \times 3 \times 3 \times 64 \end{matrix}\} \times 3$	$24 \times 24 \times 24 \times 64$
CIU	$\{\begin{matrix} G l o b a l A v e r a g e P o o l i n g \\ 1 \times 1 \times 1 \times 128 \\ S i g m o i d \end{matrix}\}$	$24 \times 24 \times 24 \times 128$
detection head	$\{\begin{matrix} 1 \times 1 \times 1 \times 64 \\ 1 \times 1 \times 1 \times 15 \end{matrix}\}$	$24 \times 24 \times 24 \times 3 \times 5$

Table 2. FROC of different numbers of false positives per scan per exam on LUNA16.

FROC	0.125	0.25	0.5	1	2	4	8	CPM
3D FPN [22]	0.662	0.746	0.815	0.864	0.902	0.918	0.932	0.834
DeepLung [10]	0.692	0.769	0.824	0.865	0.893	0.917	0.933	0.842
DeepSeed [12]	0.739	0.803	0.858	0.888	0.907	0.916	0.920	0.862
Proposed	0.782	0.834	0.893	0.917	0.932	0.952	0.956	0.895

Table 3. Comparison results of the number of parameters and sensitivity in different models.

Model	Number of Parameters	Sensitivity %
2D SSD [30]	59,790,787	77.8
2D Dense Avepool [11]	67,525,635	84.8
2D Dense Maxpool [11]	67,525,635	87.5
3D DCNN [3]	11,720,032	94.6
Proposed	6,282,970	95.0

Table 4. Results of the ablation study on LUNA2016.

FPs/Scan	0.125	0.25	0.5	1	2	4	8	CPM
Baseline	0.751	0.816	0.863	0.885	0.922	0.967	0.968	0.882
Baseline + GHM	0.775	0.837	0.899	0.938	0.969	0.977	0.977	0.910
Baseline + GHM + AG	0.798	0.899	0.907	0.946	0.961	0.977	0.977	0.924
Baseline + GHM + AG + CIU	0.899	0.922	0.930	0.953	0.953	0.961	0.961	0.940

Table 5. Comparison of different methods with the improved AG.

FPs/Scan	0.125	0.25	0.5	1	2	4	8	CPM
Baseline	0.761	0.780	0.828	0.866	0.895	0.924	0.924	0.855
Baseline + AG1	0.676	0.781	0.857	0.905	0.914	0.924	0.924	0.860
Baseline + AG2	0.752	0.800	0.838	0.886	0.914	0.943	0.943	0.869
Baseline + AG1 + AG2	0.762	0.790	0.895	0.914	0.914	0.933	0.943	0.879

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, X.; Wang, X.; Shi, Y.; Ren, S.; Wang, W. Channel-Wise Attention Mechanism in the 3D Convolutional Network for Lung Nodule Detection. Electronics 2022, 11, 1600. https://doi.org/10.3390/electronics11101600

AMA Style

Zhu X, Wang X, Shi Y, Ren S, Wang W. Channel-Wise Attention Mechanism in the 3D Convolutional Network for Lung Nodule Detection. Electronics. 2022; 11(10):1600. https://doi.org/10.3390/electronics11101600

Chicago/Turabian Style

Zhu, Xiaoyu, Xiaohua Wang, Yueting Shi, Shiwei Ren, and Weijiang Wang. 2022. "Channel-Wise Attention Mechanism in the 3D Convolutional Network for Lung Nodule Detection" Electronics 11, no. 10: 1600. https://doi.org/10.3390/electronics11101600

APA Style

Zhu, X., Wang, X., Shi, Y., Ren, S., & Wang, W. (2022). Channel-Wise Attention Mechanism in the 3D Convolutional Network for Lung Nodule Detection. Electronics, 11(10), 1600. https://doi.org/10.3390/electronics11101600

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Channel-Wise Attention Mechanism in the 3D Convolutional Network for Lung Nodule Detection

Abstract

1. Introduction

2. Materials and Methods

2.1. Network Structure

2.2. The Improved Attention Gate

2.3. Channel Interaction Unit

2.4. Designed Loss Functions for Sample Imbalance

3. Experimental Results

3.1. Datasets

3.2. Implementation Details

3.3. Experiments on LUNA2016

3.3.1. Performance of Different Methods under Comparison

3.3.2. Ablation Studies

3.3.3. Grad-CAM Visualizations

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI