Next Article in Journal
Cross-National Analysis of Opioid Prescribing Patterns: Enhancements and Insights from the OralOpioids R Package in Canada and the United States
Previous Article in Journal
Computational Strategies to Enhance Cell-Free Protein Synthesis Efficiency
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Pulmonary Nodule Detection, Segmentation and Classification Using Deep Learning: A Comprehensive Literature Review

by
Ioannis Marinakis
,
Konstantinos Karampidis
* and
Giorgos Papadourakis
Department of Electrical and Computer Engineering, Hellenic Mediterranean University, 71410 Heraklion, Greece
*
Author to whom correspondence should be addressed.
BioMedInformatics 2024, 4(3), 2043-2106; https://doi.org/10.3390/biomedinformatics4030111
Submission received: 1 July 2024 / Revised: 10 August 2024 / Accepted: 22 August 2024 / Published: 13 September 2024

Abstract

:
Lung cancer is a leading cause of cancer-related deaths worldwide, emphasizing the significance of early detection. Computer-aided diagnostic systems have emerged as valuable tools for aiding radiologists in the analysis of medical images, particularly in the context of lung cancer screening. A typical pipeline for lung cancer diagnosis involves pulmonary nodule detection, segmentation, and classification. Although traditional machine learning methods have been deployed in the previous years with great success, this literature review focuses on state-of-the-art deep learning methods. The objective is to extract key insights and methodologies from deep learning studies that exhibit high experimental results in this domain. This paper delves into the databases utilized, preprocessing steps applied, data augmentation techniques employed, and proposed methods deployed in studies with exceptional outcomes. The reviewed studies predominantly harness cutting-edge deep learning methodologies, encompassing traditional convolutional neural networks (CNNs) and advanced variants such as 3D CNNs, alongside other innovative approaches such as Capsule networks and transformers. The methods examined in these studies reflect the continuous evolution of deep learning techniques for pulmonary nodule detection, segmentation, and classification. The methodologies, datasets, and techniques discussed here collectively contribute to the development of more efficient computer-aided diagnostic systems, empowering radiologists and dfhealthcare professionals in the fight against this deadly disease.

1. Introduction

Lung cancer is by far the leading cause of cancer death among both men and women, accounting for almost 25% of all cancer deaths [1]. Each year, more people die of lung cancer than of colon, breast, and prostate cancers combined [2]. Lung cancer ranks as the most prevalent cancer among men and is the second most common among women [3]. The World Health Organization—WHO classifies lung cancer as the deadliest cancer, with 1.8 million deaths and 2.21 million new cases in 2020 [4]. Early detection of lung cancer can significantly increase the survival chance of the patients. If lung cancer is diagnosed at an earlier stage before it has spread, it is more likely to be successfully treated [5]. Typically, the symptoms of lung cancer do not appear until the disease is already at an advanced stage. Even when lung cancer causes symptoms, many people may mistake them for other problems, such as viral infection or long-term effects from smoking. This may delay the diagnosis. Current and former smokers are at a higher risk of developing lung cancer [5]. Another factor contributing to the high mortality rate is the large delay in lung cancer diagnosis [6]. Lung cancer screening procedures may be classified as invasive or non-invasive (Figure 1).
Invasive procedures are concerned with physically entering the body, like using a scope to look inside the lungs or taking a piece of lung tissue for testing. If a suspicious nodule is detected through imaging, doctors may perform a biopsy to obtain tissue samples for a definitive diagnosis. This invasive procedure involves the extraction of a small piece of tissue for pathological examination [7]. Some methods for achieving this are as follows:
  • Bronchoscopy: A thin, flexible tube with a camera (bronchoscope) is inserted through the nose or mouth and into the airways to examine the lungs and collect tissue samples for biopsy [8].
  • Needle Biopsy: A needle is used to extract a tissue sample from a suspicious lung nodule or lymph node for examination under a microscope. There are different types of needle biopsies, including transthoracic needle biopsy and endobronchial ultrasound-guided biopsy [9].
  • Thoracoscopy or Video-Assisted Thoracoscopic Surgery (VATS): These minimally invasive surgical procedures involve making small incisions in the chest to access and biopsy lung tissue or remove a suspicious nodule [10].
  • Mediastinoscopy: This procedure involves making a small incision in the neck and inserting a scope to examine and sample lymph nodes in the area between the lungs (mediastinum) [11].
These methods can provide a clear diagnosis and help plan treatment. However, these procedures pose numerous risks to patients, including pain, discomfort, potential blood loss, and an elevated risk of infection or pneumonia. Moreover, these procedures can be emotionally and physically taxing for the patient and require more recovery time [7].
These adverse effects underscore the urgent need for alternative, less invasive approaches to lung cancer screening, in which deep-learning technologies offer a promising avenue for improvement.
Nowadays, modern medical imaging techniques and tools employed by healthcare professionals have revolutionized patient screening, minimizing the need for invasive procedures and discomfort. Non-invasive procedures, like low-dose CT or X-ray imaging, do not require physical entry into the body. These procedures are generally less uncomfortable and risky, but they might not always provide as much detailed information or accuracy in diagnosing lung cancer. Non-invasive procedures include the following:
  • Chest X-rays: Historically, chest X-rays have been the primary tool for detecting lung abnormalities. They provide two-dimensional images of the chest, and can reveal the presence of lung nodules or other suspicious lesions. However, their sensitivity in detecting early-stage lung cancer is limited [12,13].
  • Low-dose Computed Tomography (LDCT) Scans: Computed Tomography (CT) has become a more advanced and widely adopted method for lung cancer screening. These scans use a series of X-rays to create detailed cross-sectional images of the chest. Low-dose CT (LDCT) scans, in particular, have gained prominence in recent years due to their ability to detect smaller nodules and early-stage cancers [14,15].
  • Lung Cancer Risk Assessment Models: Doctors often employ risk assessment models to identify individuals at a higher risk of developing lung cancer. These models take into account factors such as age, smoking history, and family history to stratify patients into different risk categories [16].
The choice between invasive and non-invasive procedures depends on the patient’s situation and what the doctors need to identify. It is essential to carefully consider both options for lung cancer detection. The use of low-dose computed tomography (CT) scans for lung cancer screening has become increasingly popular due to their ability to detect pulmonary nodules at an early stage. However, interpreting these images requires expertise and time, which can lead to delays in diagnosis and treatment. They often lack in terms of sensitivity and objectivity required for optimal results [17]. These limitations underscore the urgency of exploring innovative, non-invasive, and more efficient approaches to lung cancer screening. To address this challenge, computer-aided diagnostic (CAD) systems have been developed to assist radiologists and other medical professionals in identifying and classifying pulmonary nodules.
These computer-assisted methods have significantly enhanced the capabilities of healthcare professionals in lung cancer screening and diagnosis. They not only improve the accuracy of detection but also streamline the workflow, leading to more efficient patient care and timely interventions when necessary. These methods are illustrated in Figure 2.
Deep learning, a subset of artificial intelligence, has emerged as a promising technology for enhancing the accuracy and efficiency of lung cancer screening [18]. Deep learning algorithms have shown great promise in improving the accuracy and efficiency of CAD systems, enabling them to automatically detect, segment, and classify pulmonary nodules on low-dose CT scans.
However, the choice of image modality plays a pivotal role. Researchers commonly use two primary modalities: 2D and 3D low-dose CT scans. The selection depends on the available data and the specific objectives of the screening program. Typically, a CT scan contains multiple slices; therefore, both 2D and 3D options are available. Once the modality is determined, preprocessing steps are crucial to prepare the data for deep learning models. This includes candidate nodule generation to identify potential cancerous regions, resampling for uniformity, and lung masking to isolate relevant structures. Finally, the model architecture is a critical component (Figure 3). It involves selecting and configuring deep learning architectures, such as convolutional neural networks (CNNs), 3D CNNs, Autoencoders, and Deep Auto Encoders, to effectively analyze the pre-processed images and accurately detect lung cancer nodules or anomalies. Moreover, the utilization of synthetic data generation and data augmentation techniques holds significant importance due to the typically limited size of the datasets. The taxonomy presented in Figure 3 will drive this research and comprehensively organize the presentation of the various deep-learning methods used for lung cancer screening.
In this literature review, our focus is on investigating prior research efforts dedicated to the detection, segmentation, and classification of pulmonary nodules within low-dose CT scans, leveraging the capabilities of deep learning models.
Several review papers (Table 1) have been published over the past decade on the subject of computer-assisted lung cancer screening using machine learning. Older reviews focused on both traditional machine learning and deep learning methods [19], while more recent ones [20] have shifted their focus primarily to the field of deep learning. Only a few reviews [20] provide extensive details about the datasets, preprocessing methods, and architecture details, while advanced data augmentation methods that involve Generative Adversarial Networks (GANs) for synthetic data generation are reported in this review [20]. This recent review [21] reports state-of-the-art deep learning methods, but overlooks the current research trends in transformer techniques.
In our research, we comprehensively gathered information from the literature on the datasets used, preprocessing procedures, data augmentation techniques, architectural designs, and the reported performance metrics in the three tasks of interest—namely, pulmonary nodule detection, segmentation, and classification. Our analysis encompasses state-of-the-art deep learning approaches (CNN and autoencoders) and fast-growing and promising approaches such as transformers. Furthermore, we assessed the credibility of each study by examining whether the authors presented lucid and comprehensive explanations of their methodologies and adhered to machine learning best practices. Through these efforts, we provide a current and in-depth viewpoint on this dynamic and rapidly expanding field of study.
Given the vast number of studies published in recent years, our methodology aimed to ensure a rigorous and meticulous filtering process to isolate the most relevant and high-quality research. In our literature research, we employed the web scraping tool “Publish or Perish” [28] to enhance the efficiency of our literature search. This tool, driven by input parameters such as keywords and a specified year range, retrieves the most pertinent publications from Google Scholar. Upon initial investigation, it became evident that the field of lung nodule detection encompasses a multitude of diverse methodologies. In pursuit of a more comprehensive understanding of this subject, we opted to employ multiple keywords in our literature exploration. Table 2 describes the queries used during our literature search:
Our initial step involved sifting through a substantial population of 1200 studies. During the first filtering phase, we applied preliminary screening by examining the titles of these publications. This preliminary step allowed us to eliminate studies that were not directly aligned with the scope and objectives of our review. Through this process, we successfully narrowed down our selection to 560 studies that showed potential for inclusion in our analysis.
To further refine our selection, we conducted a second filtering phase. Here, we delved deeper into these studies by assessing their abstracts and methodologies. This phase was crucial in ensuring that the selected studies not only exhibited relevance but also demonstrated a sound research design. By applying this stringent criterion, we were able to eliminate studies that lacked the necessary depth and rigor in their approach. This led us to a more focused set of 216 studies, which provided a strong foundation for our review.
In the third and final filtering phases, we aimed to identify studies with exceptional performance metrics. We focused on publications that reported high-performance indicators, such as accuracy, sensitivity, specificity, and area under the curve (AUC), all exceeding the threshold of approximately 90%. Additionally, we prioritized studies that exhibited a high DICE score, further emphasizing the quality and accuracy of their results. This approach allowed us to spotlight studies that showcased the most promising outcomes in terms of lung nodule detection, segmentation, and classification, resulting in 110 studies for analysis (Figure 4).
Throughout our filtering process, we also took into account the impact factor of each publication. Factors such as the publication year, number of citations, and reputation of the journal were considered indicators of the study’s influence within the field. This consideration provided us with an additional layer of insight into the significance and relevance of these studies. By employing this meticulous approach to study selection, we were able to curate the refined and high-quality collection of research articles that formed the basis of our review.
The remainder of this paper is structured as follows. Section 2 refers to publicly available datasets. In Section 3, we describe the most common preprocessing steps and data augmentation techniques used in previous works. In Section 4, we describe the designs and architectures used by the authors. In Section 5, we summarize our findings and discuss notable results and future research directions. Finally, a conclusion is presented.

2. Datasets

In all the studies in this review, CT databases were used for the training and evaluation of the proposed methods. In the past few years, most studies have used the publicly available LIDC-IDRI database [29] or the processed subset of LIDC-IDRI, which is the LUNA16 challenge database [30]. The most utilized CT databases for lung cancer diagnosis are briefly described below and summarized in Table 3.

2.1. Lung Image Database Consortium Image Collection

The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions [29]. It is a web-accessible international resource for the development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.
Seven academic centers and eight medical imaging companies collaborated to create this dataset, which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories (“nodule > or =3 mm,” “nodule < 3 mm,” and “non-nodule > or =3 mm”). In the subsequent unblinded-read phase, each radiologist independently reviewed their marks along with the anonymized marks of the other three radiologists to render a final opinion. The goal of this process was to identify all lung nodules in each CT scan as completely as possible without requiring forced consensus.

2.2. Lung Nodule Analysis 2016

LUng Nodule Analysis 2016 is a subset of the LIDC-IDRI database [29]. In this subset, scans with a slice thickness greater than 2.5 mm were excluded. In total, 888 CT scans are included, and the dataset is also split into 10 subsets for training algorithms with 10-fold cross-validation. This was generated and provided to the participants of the LUNA16 challenge [30] to tackle the major tasks in pulmonary nodule analysis. The first task was to identify the locations of possible nodules and assign a probability of being a nodule to each location. In the second task, participants are given a set of candidate locations. The goal was to assign the probability of being a nodule to each candidate location. Hence, this can be considered as a binary classification task: nodule or not a nodule.

2.3. ELCAP Public Lung Image Database

This database was created through collaboration between the ELCAP and VIA research groups [31]. It was created to make a common dataset available that could be used for the performance evaluation of different computer-aided detection systems. This database was first released in December 2003 and is a prototype of a web-based image data archive. The database contents currently consist of an image set of 50 low-dose documented whole-lung CT scans for detection. The CT scans were obtained in a single breath-hold with a 1.25 mm slice thickness. The locations of the nodules detected by the radiologist are also provided.

2.4. Alibaba Tianchi Competition Dataset

This competition dataset consists of low-dose lung CT images in the mhd format from high-risk patients, with each image containing multiple axial slices of the chest [32]. The number of slices varies depending on factors like the scanning machine and patient characteristics. The data is authorized by a partner hospital, desensitized to meet medical information standards, and includes patient ID and slice thickness information. It consists of 1000 patients with nodules in the preliminary round, with nodule size distribution split between 5–10 mm and 10–30 mm. Nodules were marked by three doctors, and the slice thickness for all images was less than 2 mm.

2.5. SPIE-AAPM Lung CT Challenge

This dataset consists of 70 CT scans and was used in the LUNGx Challenge on quantitative image analysis methods for the diagnostic classification of malignant and benign lung nodules conducted at the 2015 SPIE Medical Imaging Conference, SPIE, with the support of the American Association of Physicists in Medicine (AAPM) and the National Cancer Institute (NCI) [33]. Pixel coordinates of the nodule locations and diagnoses are provided in a spreadsheet.

3. Preprocessing

Data preprocessing is a very crucial step, and it can affect the performance of a deep learning model. The purpose of this step is to process the raw data of a dataset and prepare them for the training process. Common preprocessing steps for lung CT databases are resampling for isotropy, Conversion of the pixel value to Hounsfield units, Lung parenchyma segmentation, normalization, resizing, cropping, and more. Based on the input of the deep learning model, nodule samples or patches have to be extracted based on the radiologist’s annotations. Based on the network architecture and the input size of the network, these patches can be 2D or 3D (DICOM has volumetric information). Next, we briefly describe the most common preprocessing steps that the authors followed in their experiments.

3.1. Conversion of Pixel Value to Hounsfield Units and Thresholding

The Hounsfield scale, named after Sir Godfrey Hounsfield, is a quantitative scale used to describe radiodensity. The Hounsfield unit (HU) scale is a linear transformation of the original linear attenuation coefficient measurement into one, in which the radiodensity of distilled water at standard pressure and temperature (STP) is defined as zero Hounsfield units (HU), while the radiodensity of air at STP is defined as −1000 HU) [34]. Table 3 illustrates the Hounsfield values, which represent the spectra of values for various substances. Figure 5. shows a histogram displaying the Hounsfield units extracted from a randomly selected CT scan in the LIDC-IDRI database [29].
Hounsfield units are standardized across all CT scans regardless of the absolute number of photons the scanner detector captures. Thus, in a CT scan, with an HU scale, we can find the values of different body tissues and materials. These HU scales can be used to remove unnecessary substances (e.g., air) from the scan. This can be easily achieved by setting the HU thresholds (Table 4).

3.2. Resampling for Isotropy

CT scans in a dataset usually have different pixel spacings, which means that the distance between slices is different. Different slice thicknesses can be problematic for the development of CADx using CNN or deep learning methods. A commonly used method to alleviate this problem is to resample the entire dataset to a certain isotropic resolution. LIDC-IDRI is usually resampled to 1 mm × 1 mm × 1 mm pixels, and the rest of the preprocessing steps are then applied to the resampled CT slices [35].

3.3. Lung Segmentation

To narrow down the problem space, a good practice is to remove unnecessary artifacts from our scans. These artifacts are air, body, bones, lung parenchyma, organs, and tissue. The purpose of this step is to generate a lung segmentation mask that contains the inner area of the lungs without any other artifacts like organs and bones [36]. It is also important to consider keeping some tissue around the lungs because juxtra-pleural nodules may be present at the lung parenchyma boundary.

3.4. Normalization

Normalization (Equation (1)) is a data preprocessing technique that scales and transforms features in a dataset to have similar ranges or distributions, ensuring that no single feature dominates the learning process and helps models converge faster and perform better. The HU values of the CT scans can be normalized [37]. HU values in a lung CT scan usually range from −1024 to around 2000. Any value above 400 HU can be thresholded because there are bones with different radiodensities. The values are usually normalized between 0 and 1, and a commonly used set of HU thresholds in the LUNA16 competition is between −1000 and 400.
Normalized   Value = X X min X max X min ,
where X denotes the current value, Xmin is the minimum value, and Xmax is the maximum value.

3.5. Zero Centering

Another advisory preprocessing step is to zero-center the data such that the mean value is 0. To achieve this, the mean pixel value from all pixels has to be subtracted. By averaging all images in a dataset, the mean pixel value can be calculated. For example, the typical mean pixel value found in the LUN16 dataset is around 0.25 [38]. Once this value is found, it has to be subtracted from all pixels. CT scanners are calibrated to return accurate Hounsfield unit (HU) measurements, which means that there are no images with lower contrast or brightness in CT datasets. It is wrong to have a zero center with the mean value of a single image. Thus, the mean pixel value has to be calculated from all the images in the dataset.
Z e r o _ c e n t e r e d _ i m a g e = i m a g e _ p i x e l _ v a l u e s P I X E L _ M E A N
where image_pixel_values denote the image array values, and PIXEL_MEAN is the average value of the dataset.

3.6. Patch Extraction

To train a deep learning model for pulmonary nodule detection, segmentation, and classification, samples of nodules have to be extracted from the nodule databases. Based on the radiologist pixel annotations, the authors extracted samples of nodules to train their networks. Thus, the nodule regions of interest were cropped, and 2D or 3D patches of nodule samples were generated. 3D patches can also be considered multi-view patches. These patches contain volumetric information from the axial, coronal, and sagittal views of a nodule. As for the malignancy label, in the LIDC-IDRI database, there are five classes annotated by experienced radiologists. Authors usually combine classes 1 and 2 (Highly Unlikely, Moderately Unlikely) and label the combined class as benign nodules. Classes 4 and 5 (Moderately Suspicious, Highly Suspicious) are labeled as malignant (Figure 6). Class 3, which represents an intermediate class, is usually ignored due to the uncertainty of whether a nodule is benign or malignant [39].

3.7. Data Augmentation

Augmentation of data constitutes a crucial stage that significantly impacts the performance of a deep learning model. These steps aim to generate synthetic samples to mitigate the risk of overfitting by altering the original data. In many of the selected studies in our review, the authors addressed the dataset imbalance issue of the LIDC-IDRI dataset [29]. A dataset can be characterized as imbalanced when a class or more is underrepresented or overrepresented. To alleviate the dataset imbalance issue, the authors generated synthetic training data using traditional data augmentation techniques, which are rotations, translations, scaling, shear, flip, crop, and duplicate nodule samples (Figure 7). Another technique is to adjust the sampling rate of the positive and negative nodule samples to achieve class balance. There are also some modern data generation techniques like Generative Adversarial Networks (GANs) [40].

4. Architectures

Two-dimensional convolutional neural networks (2D CNNs) are often used for their simplicity and computational efficiency, while three-dimensional convolutional neural networks (3D CNNs) excel in capturing volumetric information essential for accurate nodule analysis (Figure 8). Hybrid methods combine the strengths of deep learning and traditional machine learning techniques to enhance performance. Capsule networks, which preserve spatial hierarchies and relationships, offer a robust alternative to traditional CNNs for tasks that require detailed spatial understanding. Innovative architectures, such as Vision Transformers, leverage hierarchical resolution features, and autoencoders focus on efficient data representation and reconstruction. The following paragraphs provide a concise overview of these methods.
  • Convolutional Neural Networks (CNNs)
Convolutional Neural Networks (CNNs) [41] are a class of deep learning models particularly effective for processing data with a grid-like topology, such as images. They utilize convolutional layers that apply a series of filters to input data and extract hierarchical features from simple edges to complex textures and shapes. CNNs reduce the dimensionality of images while preserving spatial relationships, making them efficient for tasks such as image recognition, classification, and object detection. The key components of CNNs include convolutional, pooling, and fully connected layers, which together enable the model to learn spatial hierarchies and patterns in the data. CNNs are not limited to image processing; they have a wide range of applications across various domains, including security [42,43], autonomous driving [44], and natural language processing [45].
  • U-Net
U-Net is a type of convolutional network specifically designed for biomedical image segmentation [46]. It has a U-shaped architecture consisting of a contracting path to capture context and a symmetric expanding path that enables precise localization. The contracting path follows the typical architecture of a convolutional network, while the expanding path involves up-convolutions and concatenations with high-resolution features from the contracting path. This structure allows U-Net to effectively segment images with limited training data, making it widely used in medical imaging tasks, such as tumor detection and organ segmentation.
  • Autoencoders
Autoencoders are unsupervised neural networks used for learning efficient coding of input data [47]. They consist of two main parts: an encoder that compresses the input into a latent-space representation and a decoder that reconstructs the input from this representation. Autoencoders are primarily used for dimensionality reduction, feature learning, and anomaly detection. Variants like Variational Autoencoders (VAEs) [48] introduce probabilistic approaches to model the latent space, allowing for more meaningful and interpretable representations. These models are essential in applications like image denoising, compression, and generative tasks.
  • Capsule Networks
Capsule Networks [49], introduced by Geoffrey Hinton and his team, aim to overcome some limitations of CNNs, particularly their inability to handle spatial hierarchies effectively. Capsules are groups of neurons that encode different properties of an object or part of an object, and their activity represents the probability of the presence of an object along with its pose information. Capsule Networks use dynamic routing algorithms to ensure that the output of one capsule is sent to appropriate higher-level capsules. This architecture helps maintain spatial hierarchies and enables better generalization for tasks like image recognition and part-whole relationships.
  • Transformers
Transformers are a type of model architecture that has revolutionized natural language processing (NLP) [50]. They rely entirely on self-attention mechanisms to draw global dependencies between inputs and outputs without using sequence-aligned RNNs or CNNs. The transformers consist of an encoder and decoder, both built from layers of self-attention and feedforward neural networks. The attention mechanism allows the model to focus on different parts of the input sequence when making predictions, enabling it to capture long-range dependencies more effectively. Transformers are the foundation for many state-of-the-art models in NLP, such as BERT [51] and GPT [52], and have been adapted for image processing tasks with Vision Transformers (ViTs) [53].
Each of these methods has unique advantages and addresses specific challenges in the quest to improve early lung cancer detection and diagnosis. We have organized this section into three primary categories, each addressing a specific challenge. These categories include the detection, segmentation, and classification of pulmonary nodules. Within each category, we have gathered and summarized the most noteworthy studies, presenting their approaches and key findings.

4.1. Nodule Detection

The objective of a nodule detection model is to specify potential regions of interest (ROIs) of the nodules. The nodule detection task has also been described in many studies as candidate generation. These nodule candidates can be 2D (x, y) or 3D (x, y, z). Some authors report the presence of False Positive predictions (FPs) and propose methods for reducing the FPs and boosting the sensitivity performance. Table 5 provides an overview of the nodule detection works, and detailed information on the preprocessing steps and performance metrics can be found in Appendix A, Table A1, and Table A2, respectively. In the following table, we report sensitivity, a key metric for nodule detection that measures the model’s ability to correctly identify actual nodules, thereby minimizing the risk of missed diagnoses. High sensitivity is crucial to ensure that potentially malignant nodules are not overlooked. However, high false positive (FP) rates, where the model incorrectly identifies normal tissue as nodules, create ‘noise’ in the predictions that radiologists must review, thereby increasing the review time.

4.1.1. Two-Dimensional Convolutional Neural Networks (2D CNN)

Convolutional Neural Networks (CNNs) have been widely adopted for nodule detection due to their robust feature extraction capabilities and efficiency in handling image data. CNNs are particularly effective in reducing the dimensionality of images while preserving spatial relationships, which is crucial for identifying nodules in CT scans [58,60,72].
However, CNNs typically require large amounts of labeled data for training, which can be a limitation in medical imaging, where annotated datasets are scarce [86]. In addition, some CNNs are computationally intensive and demand substantial resources for both training and inference. Despite these drawbacks, the ability of CNNs to effectively process and analyze visual data makes them a cornerstone for the development of nodule-detection systems.
Tan et al. [58] developed a CADx to detect and classify juxta-pleural nodules. In their study, they segmented the lung region using HU thresholds, generated candidates and classified them using a CNN. Their study aimed to relieve the challenge of overfitting under dataset limitations and reduce the running-time complexity of self-learning; they used available, engineered features to prevent CNN. Their reported sensitivity was 94.01%, with four FP per scan on the juxta-pleural nodules of the LIDC-IDRI dataset.
Zuo et al. [64] developed a multi-resolution convolutional neural network (CNN) to extract features of various levels and resolutions from different depth layers in the network for the classification of lung nodule candidates. Transfer knowledge from the source CNN model has been applied to edge detection and improved the model to a new multi-resolution model that is suitable for the image classification task. Their multi-resolution work had multiple input sizes of N × N [N: 26, 36, 48], and the classification performance [Se: 97.26%, Sp: 97.38%, AUC: 0.9954, Acc: 97.33, Acc(multi-res): 92.81%] in the LUNA16 challenge dataset was very promising, but their 0.742 CPM score is relatively low compared to other works.
Q. Wang et al. [60], tried to develop a simple and computationally effective approach to the nodule detection task. In their study, a raw CT image was divided into 64 × 64 patches and used then as input 2D patches in a CNN without any preprocessing. This approach was very effective computationally and produced a good sensitivity of 92.8%, but the false positive rate was very high, with eight FPs per scan. The paper [69] presents DFD-Net, a lung cancer detection model trained on denoized computed tomography (CT) images. This two-path convolutional network was designed to characterize and learn different morphological features of lung nodules, integrating denoising and detection tasks, which is a novel contribution to the field. Additionally, a discriminant correlation analysis strategy is introduced to enhance model accuracy by concatenating more representative features. The proposed two-stage training addresses the critical class difference problem, fine-tuning all model layer parameters after the first training stage, with retraining focused on the output layer. Utilizing the U-Net architecture with dimensions of 128 × 128, DFD-Net achieved a specificity of 0.874, recall of 0.891, and accuracy of 0.878, highlighting its effectiveness in lung cancer detection with promising performance metrics.
Zheng’s et al. [59] proposed a pipeline consisting of two stages, nodule candidate detection and false positive reduction. As a preprocessing step, they segmented the lung parenchyma using Hounsfield unit (HU) thresholds to exclude irrelevant regions. A novelty of this study over the literature is the extraction of maximum intensity projection (MIP) images of different slab thicknesses of 5 mm, 10 mm, 15 mm, and 1 mm axial section slices used as input to the nodule candidate generation model. Four 2D Unet-based models were trained to generate candidates for each MIP projection. Candidate nodules were generated using a 2D Unet network, and the 2D masks were then merged into a 3D patch. This generated 3D patch was classified using two different 3D-CNNs (VGG-net based), with sizes of 16 × 16 × 16 and 32 × 32 × 32. Their multi-MIP approach achieved a sensitivity of 95.4% with 19.1 false positives (FPs) per scan for the nodule detection task, and the 3D CNN classifier achieved fusion sensitivity results of 89.9% and 94.8% at 0.25, and 4 FPs per scan. They obtained a very high CPM score of 0.952 (FROC) for the LUN16 challenge. Their method was highly effective, and by combining multiple 2D MIP with 3D patches, the classification model was able to extract rich spatial information.
Nguyen et al., in their work [72], proposed an innovative approach to early pulmonary nodule detection using a Faster R-CNN model with an adaptive anchor box, addressing challenges posed by varying nodule sizes in training datasets. The system employs ground-truth nodule sizes to dynamically generate adaptive anchor box configurations, optimizing the Faster R-CNN’s detection performance. Additionally, a residual convolutional neural network based on the ResNet architecture is introduced to reduce false positives from the Faster R-CNN’s output, striking a balance between sensitivity and prediction quality. The proposed method is evaluated on the LUNA16 dataset, achieving a high sensitivity of 95.64% at 1.72 false positives per scan and a competitive competition performance metric (CPM) score of 88.2%. The false positive reduction network demonstrates strong performance with 93.8% sensitivity, 97.6% specificity, 95.7% accuracy, 95.5% F1-score, and a notable 0.957 AUC. These results outperform those of recent state-of-the-art detection methods, emphasizing the efficacy and generalizability of the proposed model in pulmonary nodule detection.

4.1.2. Three-Dimensional Convolutional Neural Networks (3D CNN)

3D CNNs have advanced the field by leveraging the full spatial context of volumetric data, which is critical for accurate nodule detection in CT scans. For instance, Ding et al. [54] integrated a 3D DCNN to reduce false positives and achieved a high sensitivity for the LUNA16 challenge. Similarly, Gu et al. [56] utilized a multiscale 3D CNN to capture rich contextual information from 3D nodule samples, thereby improving the detection accuracy. Despite their superior performance in capturing volumetric features, 3D CNNs are computationally demanding and complex to train, as seen in the work by Gruetzemacher et al. [35], which requires extensive computational resources for cross-validation and training. These methods significantly enhance nodule detection accuracy but are often constrained by their high computational requirements and complexity.
Ding et al. [54] introduced a deconvolutional structure into Faster R-CNN to detect candidate nodules from the axial slices and a three-dimensional DCNN to reduce false positives. Their candidate detection network consists of two modules: a region proposal network (RPN) that aims to propose potential regions of nodules and an ROI classifier that recognizes whether ROIs are nodules or not. Their approach achieved a high sensitivity of 0.946 for the candidate detection task with 15 candidates per scan, and their nodule classification network achieved sensitivities of 92.2%/1 FP and 94.4%/4 FPs per scan in the LUNA16 challenge dataset with an average FROC score of 0.864.
Y. Gu et al. [56] proposed a novel CADx system based on a 3D CNN with multiscale prediction to detect lung nodules after the lungs were segmented from chest CT scans with a comprehensive (Otsu threshold-based) method. Their multiscale lung nodule prediction strategy, including multiscale cube prediction and cube clustering, aimed to detect extremely small nodules using rich contextual information from 3D nodule samples. Their method achieved a sensitivity of 92.93% with four FPs per scan and a CPM score of 0.7967.
Gruetzemacher et al. [35], in their work, proposed two 3D deep learning models, one for each of the essential tasks of computer-aided nodule detection, which are candidate generation and false positive reduction. To train their detection module, 3D patches of size 64 × 64 × 64 were extracted from the LUNA16 dataset, and this U-net-based model is used for volume-to-volume prediction of nodules, i.e., segmentation, to identify potential pulmonary nodules within the input CT scan. Although the segmentation model generates a large number of candidate nodules, many of which are false positives, and as a result, a second false positive reduction DNN is required to improve the performance of binary nodule, non-nodule classification. They performed 9-fold cross-validation to evaluate their system and repeated it 10 times. The candidate detection model achieved 94.77% sensitivity with 30.39 FPs per scan, and the false positive reduction model achieved 94.21% sensitivity with 1.789 FPs per scan, both evaluated in the test set. The overall performance of their complete system in terms of ROC AUC and sensitivity was 0.9324 and 89.29, respectively, with 1.789 FPs per scan.
J. Zhang, Xia, Zeng et al. [57] performed lung segmentation using HU thresholds and then generated nodule candidates to be classified by a 3D DCNN. Multiscale Laplacian of Gaussian (LoG) filters and prior shape and size constraints were used to detect nodule candidates and then construct a densely dilated 3D DCNN, which combines dilated convolutional layers and dense blocks, for simultaneous identification of genuine nodules and estimation of nodule diameters. The achieved CPM score was 0.947 in combination with a low FP rate of 94.9%/1 FP per scan.
Nasrullah et al. [61], in their implementation, proposed a deep learning approach that combines multiple strategies. The detection of pulmonary nodules was performed with a 3D Faster R-CNN on efficiently learned features from CMixNet and a U-Net-like encoder–decoder. Then, based on the output of the detection model, 3D candidate nodule patches (32 × 32 × 32) were generated to be classified by a Gradient Boosting Machine (GBM) on the learned features from the designed 3D CMixNet structure. To reduce false positives, the final decision was performed in connection with physiological symptoms and clinical biomarkers. Their system was evaluated on the LIDC-IDRI dataset and obtained a sensitivity of 94% and specificity of 91% [60].
H. Tang et al. [62], in their work, presented an end-to-end 3D DCNN to perform nodule detection, false positive reduction, and nodule segmentation jointly in a multi-task fashion. They implemented two design tricks. Firstly, they decoupled feature maps for nodule detection and false positive reduction, and then a segmentation refinement subnet was used to increase the precision of nodule segmentation. Their end-to-end pulmonary nodule analysis was evaluated using the LIDC dataset, and they achieved a final CPM score of 87.27% for the nodule detection task and a DSC score of 83.10% for the nodule segmentation task.
S. Tang et al. [63] proposed CADx with a 3D U-Net convolutional neural network based on multiscale features of transfer learning to automatically detect pulmonary nodules from the thoracic region containing background and noise. In their approach, they segmented the lung parenchyma using various methods. After that, the 3D Unet nodule detection model was built with a multiscale feature structure and trained with multiscale samples. Transfer Learning and fine-tuning were introduced to boost the performance. Their experiments were conducted in two publicly available datasets, LUNA16 and TIANCHI17, and the reported performance metrics were very high, achieving an AUC of 0.941, sensitivity of 92.4%, specificity of 94.6%, and accuracy of 96.8% (Figure 9).
Tong et al. [67] used a 34-layer 3D-ResNet for feature extraction from the spatial information contained in CT scans. They also extracted heterogeneous features, including deep features extracted from images and patient information (age, smoking history, cancer history in five years, hypertension, heart disease, diabetes, tuberculosis, hepatitis, and drinking alcohol hobby) that were combined to generate a fused description of the nodule object. Finally, an SVM with MKL was used to perform the binary classification of the nodules. They obtained a higher accuracy of 91.29%, a sensitivity of 91.01%, and a specification of 91.40% on the LIDC-IDRI database and a higher accuracy of 84.70%, a sensitivity of 83.33%, and a specification of 86.65% on their private dataset for the nodule classification task.
The work of Peng et al. [68] introduced a 3D multiscale deep convolutional neural network designed for pulmonary nodule detection, featuring a structure with Bottle2SEneck modules. The network is divided into two parts: a nodule candidate detection network and a false positive reduction network. With dimensions of 128 × 128 × 128, the 3D convolutional neural network (3DCNN) achieves a Free Response Operating Characteristic (FROC) average sensitivity of 0.923, demonstrating its effectiveness in detecting pulmonary nodules with high sensitivity across various scales. The integration of multiscale features enhances the model’s capability for robust nodule detection in three-dimensional medical imaging data.
The authors in [70] presented a pioneering Multiscale CNN with compound fusions designed for false positive reduction in lung nodule detection from 3D data extracted from lung CT scans. The newly devised 3D CNN architecture processes thoracic information across axial slices, and an innovative approach incorporates 3D patches of varying sizes, allowing the model to handle nodule candidates of diverse scales while extracting complementary contextual features. A unique fusion procedure integrates these complementary features at two depths, progressively enhancing their class discrimination power. To address the class imbalance, a novel iterative training strategy is implemented, maintaining equal proportions of true positives (TPs) in each iteration while replacing false positives (FPs) with new instances. Employing a 3D CNN with dimensions of 64 × 64 × 64, 32 × 32 × 32, and 16 × 16 × 16, the model achieves a competitive competition performance metric (CPM) score of 0.948, demonstrating its efficacy in false positive reduction for lung nodule detection across a spectrum of sizes.
Yuan et al. [71] introduced an efficient Multi-path 3D Convolutional Neural Network (CNN) tailored for false positive reduction in pulmonary nodule detection. Firstly, their proposal emphasizes the superiority of 3D CNN over 2D CNN in volumetric medical image processing, leveraging its capacity to fully extract 3D contextual information for more discriminative features. Secondly, the network responds to different receptive field sizes, enabling the learning of corresponding expression features for a given pulmonary nodule. Thirdly, the multi-path network, surpassing its single-path counterpart, enhances performance through feature map concatenation, facilitating expression feature fusion for comprehensive information supplementation. With dimensions of 48 × 48 × 48, the Multi-path 3D CNN achieves a competitive performance metric score of 0.881. Notably, it exhibits excellent sensitivity at 4 and 8 False Positives per scan, with scores of 0.952 and 0.962, respectively, underscoring its effectiveness in false positive reduction for pulmonary nodule detection.
In this study, Suzuki et al. [74] proposed a modified Three-Dimensional U-Net deep-learning model with an input size of 64 × 96 × 96, which was developed and validated for automated detection of lung nodules on chest CT images. The model underwent training using the Lung Image Database Consortium and Image Database Resource Initiative dataset. Internal validation involved 89 chest CT scans that were not part of the model training, and external validation involved 450 chest CT scans from a Japanese urban university hospital. Each case included at least one nodule > 5 mm identified by an experienced radiologist. Model accuracy was assessed using the competition performance metric (CPM) at various false positive rates. In internal validation, the CPM achieved 94.7% (95% CI: 89.1–98.6%), while in external validation, the CPM was 83.3% (95% CI: 79.4–86.1%). These results highlight the high performance of the modified 3D U-Net deep-learning model, emphasizing its efficacy for automated lung nodule detection across diverse clinical scenarios.
The work of Akila Agnes et al. [76] introduced a two-stage lung nodule detection framework employing an enhanced UNet and Convolutional LSTM networks in CT images. The proposed CADe system begins by scanning axial slices of CT images to identify suspicious nodules. Subsequently, it refines the output by eliminating false nodules, maintaining high sensitivity with a low false positive rate (FPR). Key contributions include the utilization of dilated convolution for a larger receptive field, an ensemble mechanism integrating shallow and deeper stream features, and a novel deep learning-based classification model utilizing Long Short-Term Memory (LSTM) networks for enhanced discriminative power. The Pyramid Dilated ConvLSTM (PD-CLSTM) with a pyramid dilation mechanism captures multi-resolution spatial features without increasing model complexity. The proposed CADe system achieves the detection of small nodules (5 mm–9 mm) at 91.84% and larger nodules (>10 mm) with a sensitivity of 92.70%, significantly improving the detection rate of small pulmonary nodules. The system attains the best average Free Response Operating Characteristic (FROC) score of 0.930, outperforming state-of-the-art methods, and demonstrates a 0.959 sensitivity with 1 false positive per scan, suggesting its potential as a valuable tool for automatic nodule detection in early lung cancer diagnosis.
X. Zhu et al. [77] introduced an end-to-end lung nodule detection network with a U-shaped encoder–decoder structure, aiming to enhance model sensitivity and specificity. Notably, improved attention gates (AGs) have been strategically integrated into skip connections, effectively reducing false positives in one-stage pulmonary nodule detectors. To further refine the model, a post-processing module known as the channel interaction unit (CIU) is introduced. The CIU evaluates the importance of each feature channel, enabling the extraction of more targeted image features and fully optimizing the network performance. Employing a 3D U-shaped residual network with dimensions of 96 × 96 × 96, the proposed model achieves a competitive competition performance metric (CPM) score of 89.5% and a sensitivity of 95%, showcasing its effectiveness in lung nodule detection with improved accuracy and reduced false positives.
Jian et al. [78] introduced 3DAGNet, a 3D convolutional neural network for automatic lung nodule detection. It incorporates a two-branch attention module to simulate physician diagnosis behavior, focusing on CT image depth. Additionally, a three-branch multiscale feature fusion module replaces traditional decoding up-sampling, facilitating the fusion of high and low-level semantic features. The model achieves a sensitivity of 88.08% in average FROC, showcasing its effectiveness in detecting lung nodules across scales and depths in 128 × 128 × 128 CT images.
This recent work of L. Liu et al. [79] presents a federated learning approach for training a lung nodule detection model on horizontally distributed data across different clients. Employing the federated averaging algorithm, the method utilizes a 3D ResNet18 Dual Path Faster R-CNN model for nodule detection. To address data quality impact on model training, a sampling-based content diversity algorithm is introduced and validated on luna16 data. This mitigates overfitting, improves generalization, and reduces the training time. Comparative experiments with other federated learning algorithms demonstrate that the proposed 3D ResNet18 Dual Path Faster R-CNN federated learning algorithm achieves superior results. The model attained an overall detection accuracy of 83.417%, AUC of 88.382%, sensitivity of 83.388%, precision of 83.412%, and F1 score of 83.401% on 128 × 128 × 128 data.

4.1.3. Auto Encoders

Autoencoders provide an unsupervised approach to learning efficient coding of input data, making them valuable for feature extraction and anomaly detection in nodule detection tasks [55,83]. Although autoencoders are effective in reducing data dimensionality and highlighting anomalies, they can suffer from overfitting and may require careful regularization and tuning to avoid reconstruction errors [87].
Eun et al. [55] developed a candidate generation network and novel false positive reduction framework. They proposed an ensemble of single-view 2D CNNs with fully automatic non-nodule categorization for pulmonary nodule detection. Unlike 3D CNN-based frameworks, they utilized 2D CNNs using 2D single views to improve computational efficiency. The computational efficiency is high due to the 2D CNNs, and they obtained a good CPM score of 0.922.
Shrey et al. [66], proposed a cascaded network to segment and classify benign or malignant nodules in their work. Their method consists of a U-Net segmentation network with a discriminator to segment suspicious nodules. The classification network is based on an encoder followed by fully connected layers to classify the output of the segmentation network. They trained their U-net segmentation network on the public LUNA16 dataset and then used the trained weights to test on their private CT and PET set of 204 patents. Their method achieved very high-performance metrics with precision, recall of 98%, and accuracy of 97.96%.
Zheng et al. [65] proposed a two-stage method for multiplanar lung nodule detection, focusing on small nodule identification. The first stage involves multiplanar nodule candidate detection and false positive (FP) reduction. Utilizing a convolutional neural network model, U-net++, with the Efficient-Net classification model as its backbone pre-trained on ImageNet, potential nodule candidates are identified on the axial, coronal, and sagittal planes. The predictions from these planes are merged to enhance sensitivity. In the second stage, multiscale dense convolutional neural networks are employed to exclude FP candidates. This network leveraged the transfer learning method to boost performance. Their approach was one of the best and most accurate nodule detection pipelines, achieving 94.2%/1 FP, 96%/2 FPs per scan, and an extremely high CPM score of 0.9403 while a detection sensitivity of 91.1% at a 1 mm slice thickness was reported.
K. Cao [83] introduced a three-dimensional multifaceted attention encoder–Decoder network designed for pulmonary nodule detection. This model combines a self-attention module with an LRA block to create a multifaceted attention block, allowing the model to establish irregular long-range dependencies based on nodule features. By integrating this block into the encoder–decoder structure of a convolutional neural network (CNN), the model adeptly captures both local texture features of nodules and establishes effective long-range dependencies, enhancing its ability to identify fine nodules with irregular shapes. The model’s performance is further augmented through the incorporation of multiscale modules and focal loss functions. With a dimension of 128 × 128 × 128, the 3D Multifaceted Attention Encoder–Decoder achieves an FROC of 0.891, demonstrating an average sensitivity of 89.1% at seven predefined false positives per scan (Figure 10).

4.1.4. Transformers

Transformers, initially developed for natural language processing, have shown great promise in image processing tasks due to their ability to capture long-range dependencies and contextual information [88], including nodule detection [80,81,82], achieving high-performance metrics for 2D and 3D inputs.
Despite their advantages, transformers are resource-intensive and require substantial training data [89], which can be a limitation in medical imaging applications. However, their ability to handle hierarchical resolution features and long-range dependencies makes them valuable tools for the evolution of nodule detection architectures.
Mkindu et al. [81] presented a novel 3D multiscale vision transformer designed for lung nodule detection in chest CT images, leveraging the strengths of the original 2D Swin Transformer while extending it to a 3D version. The proposed model incorporates a local–global transformer structure, where transformer encoders extract patches individually at each scale in the local stage. At the global stage, patches from different scales are merged and processed by transformer encoders, enhancing the global reception field. To fully utilize the 3D nature of CT scans, the model employs voxel-wise embedded CT patches as transformer inputs, allowing for the comprehensive utilization of hierarchical resolution features. The volumetric predictions are then converted to sequence-to-sequence predictions. The 3D ViT model, with dimensions 64 × 64 × 64, 32 × 32 × 32, and 16 × 16 × 16, achieves a notable sensitivity of 97.81% and demonstrates competitive performance metrics with a score of 0.911 in lung nodule detection, underscoring its effectiveness across different scales in chest CT images. Later the same year, another innovative work by Mkindu et al. [80] introduced another novel approach for lung nodule detection, demonstrating superior performance with fewer resources compared to deep convolutional neural network (CNN)-based architectures. The proposed 3D-NodViT architecture integrates a shifted window vision transformer with a 3D Region Proposal Network (RPN) to generate lung nodule candidates. The study employs Bayesian Optimization to determine the optimal 3D-NodViT architecture for detecting candidate nodules in chest CT images. The 3D ViT model with dimensions of 128 × 128 × 128 achieves a remarkable detection sensitivity of 98.39% and a CPM score of 0.909, showcasing its effectiveness in lung nodule detection with high sensitivity and precision.
ETAM (Ensemble Transformer with Attention Modules), tailored for small object detection, was introduced by J. Zhang et al. [82]. ETAM employs an ensemble Transformer encoder to address the challenge of limited features in small objects and mitigate interference from background features. Notably, the model introduces a Magnifying Glass (MG) module designed specifically for small object detection, enhancing feature extraction and precision in detecting small objects while minimizing background interference. Additionally, a Quadruple Attention (QA) module extends the attention to the height and width dimensions, improving the overall feature extraction. To balance the small and large object detection accuracy, ETAM adopts a two-branch ensemble learning approach, with the ETAM-S branch focusing on small objects and the ETAM-N branch handling larger objects. The 2D Ensemble Transformer with Attention Modules achieves commendable metrics, including an accuracy of 96.14%, sensitivity of 94.58%, specificity of 97.10%, and an AUC of 0.9896, affirming the model’s efficacy in small object detection without compromising overall accuracy.

4.1.5. Capsule Networks

Capsule Networks are new and revolutionary machine learning architectures proposed that can overcome the shortcomings of CNN, maintaining spatial hierarchies and relationships between features, which enhances the robustness of image recognition tasks [90]. Capsule Networks use dynamic routing algorithms to ensure that the output of one capsule is sent to appropriate higher-level capsules, preserving the spatial hierarchies and enabling better generalization [91].
Despite their advantages in maintaining spatial hierarchies and better generalization, Capsule Networks are computationally intensive and complex to implement, necessitating significant computational resources and training time [92].
Song et al. [84] introduced a novel architecture for low-dose computed tomography pulmonary nodule detection, combining 3D convolutional neural networks (CNN) with Capsule Networks (CapsNet) to enhance model robustness. Leveraging convolution kernels of varying scales enables the extraction of richer contextual information from lung nodules of different sizes. The Capsule Network layer was employed to extract more representative features, contributing to a more accurate classification. With dimensions of 32 × 32 × 8, the 3D CNN-CapsNet achieves a notable nodule detection rate of 95.19%, sensitivity of 92.31%, specificity of 98.08%, and an F1-score of 0.95.

4.1.6. Others

A novel two-stage deep learning framework for cancer risk assessment in CT lung screening was introduced in [73], emphasizing the independence of nodule detection and malignancy assessment as distinct processes. In the first stage, established nodule detectors are used to identify nodules within a scan. In the second stage, a neural network inspired by the ResNet architecture, regularized using dropout, performs a cancer risk assessment for the entire CT lung screening scan. The framework undergoes a thorough large-scale evaluation and comparison with state-of-the-art models across three datasets. The results demonstrate that the proposed method (a) achieves an area under the curve (AUC) between 86% and 94%, with the external test set (LHMC) being at least twice as large as in other works, (b) outperforms the widely accepted PanCan Risk Model, achieving 6% and 9% better AUC scores in two test sets, (c) exhibits improved performance compared to the state-of-the-art represented by the winners of the Kaggle Data Science Bowl 2017 competition on lung cancer screening, and (d) demonstrates comparable performance to radiologists in estimating cancer risk at a patient level. This underscores the potential of the proposed framework to achieve radiologist-level cancer risk assessment in CT lung screening using deep learning.
A groundbreaking approach to lung nodule detection through a 3D sphere representation-based center-point- matching detection network (SCPM-Net) was proposed by X. Luo et al. [75]. To overcome the limitations of current anchor-based detectors, the authors discard pre-determined anchor boxes and instead predict a center-point map directly using a point-matching strategy. The proposed CPM-Net incorporates novel attentive modules, online hard example mining, and refocal loss, addressing the ineffectiveness of anchor-based methods. Notably, this study pioneers the representation of pulmonary nodules as bounding spheres in 3D space, introducing an effective Sphere-based Intersection-over-Union loss function to train CPM-Net and create SCPM-Net. Evaluation of the LUNA16 dataset demonstrates SCPM-Net’s superior performance compared to both anchor-based and existing anchor-free methods for lung nodule detection from 3D CT scans. With a dimension of 96 × 96 × 96, SCPM-Net achieves a sensitivity of 89.2% at seven predefined false positives per scan, highlighting its efficacy in advancing lung nodule detection capabilities.
Y. Zhu et al. [85] proposed a groundbreaking Multiscale Self-Calibrated Pulmonary Nodule Detection Network featuring a dual attention mechanism. The introduction of the MSC module, replacing traditional convolution, employs a novel self-calibration operation to establish interchannel and remote spatial dependencies, enhancing the detection sensitivity and expanding the receptive field through multiscale information extraction. In attention mechanism comparisons, the Efficient Pyramid Split Attention (EPSA) module, derived from the PSA module, is introduced, demonstrating enhanced pulmonary nodule detection metrics without additional parameters. Additionally, the Dual-Path Spatial Attention Module (DSAM) fuses various receptive field information, maximizing spatial details in CT images, strengthening semantic content in low-level feature maps, and improving location-related information, ultimately enhancing specificity. Operating within a 128 × 128 × 128 framework, the network achieves an impressive sensitivity of 0.988 and a competitive performance metric (CPM) of 0.963, highlighting its efficacy in pulmonary nodule detection with heightened sensitivity and overall competitive performance.
Each architecture discussed addresses specific challenges in nodule detection, offering unique advantages and facing distinct limitations. CNNs [60,72] and 3D CNNs [62,67] excel in feature extraction and handling volumetric data, respectively; however, they are computationally demanding. Autoencoders [55,83] provide unsupervised learning capabilities that are useful in data-scarce environments, although they risk overfitting. Transformers [81] capture long-range dependencies and contextual information, enhancing detection accuracy, although they require substantial computational resources and large datasets. Capsule Networks [84] enhance spatial relationship handling at the cost of increased complexity, and transformers capture long-range dependencies but require extensive computational resources.

4.2. Nodule Segmentation

The nodule segmentation method aims to segment a nodule region. In this segmentation task, a mask is calculated around the nodule, with which the nodule is separated from the other tissues. Segmentation of a nodule can be performed in 2D or 3D space. CT scans contain rich volumetric information; therefore, a 3D segmented nodule can provide useful information regarding the shape and morphology of a nodule. Table 6 provides an overview of the nodule segmentation works, and detailed information on preprocessing steps and performance metrics can be found in Appendix A, Table A3, and Table A4. In the following table, we report the Dice Similarity Coefficient (DCS), which is a key metric for nodule segmentation because it measures the overlap between the predicted segmentation and the ground truth, ensuring precise delineation of nodule boundaries, which is critical for accurate diagnosis and subsequent treatment planning.

4.2.1. Two-Dimensional Convolutional Neural Networks (2D CNN)

2D CNNs have traditionally been used for image segmentation tasks, including nodule segmentation in CT images [93,96]. Although 2D CNNs are less computationally demanding and easier to train compared to their 3D counterparts, they may miss out on the volumetric context provided by 3D data, potentially impacting segmentation accuracy [118].
S. Wang’s et al. [93] Central Focused Convolutional Neural Networks (CF-CNN) stand out for their ability to effectively segment lung nodules from heterogeneous CT images. By capturing nodule-sensitive features from both 3D and 2D CT images simultaneously, Wang’s model achieved average DICE scores of 82.15% ± 10.76 in the LIDC-IDRI dataset and 80.02% ± 11.09 in an independent private dataset. Notably, their approach excelled in juxta-pleural nodule segmentation, a challenging task, demonstrating a promising average dice score difference of only 1.98% when compared with inter-radiologists’ consistency on the LIDC dataset.
Roy et al. [95] introduced a synergistic combination of deep learning and shape-driven level sets for accurate lung nodule segmentation. Their approach involved a deep, fully convolutional network for coarse segmentation and shape-driven level sets for fine segmentation, achieving an impressive average DICE score of 93% ± 0.11 on isolated nodules in the LIDC/IDRI dataset and 90% ± 0.08 on pleura adhesion.
Singadkar et al. [96] presented a groundbreaking approach to lung nodule segmentation with their novel deep residual deconvolutional network. This end-to-end architecture incorporates multi-level contextual information, automatically learning nodule-sensitive features from 2D CT images to enhance segmentation performance. Despite achieving an impressive DICE score of 94.97% on the LIDC-IDRI dataset, it is crucial to note the lack of cross-validation and detailed preprocessing steps in their study.

4.2.2. Three-Dimensional Convolutional Neural Networks (3D CNN)

3D CNNs extend traditional CNN capabilities by processing volumetric data, which is crucial for capturing the spatial context of nodules [62,108]. The advantage of 3D CNNs lies in their ability to utilize volumetric information for more accurate segmentation [118]. However, similar to the 3D CNNs developed for nodule detection, these models are computationally intensive and require substantial resources for training and inference.
H. Tang et al. [62], proposed a unified model for pulmonary nodule detection, false positive reduction, and segmentation, which exhibited substantial improvements in nodule detection accuracy by 10.27% compared to baseline models. The model achieved a state-of-the-art Dice-Sørensen coefficient (DSC) of 83.10%, emphasizing its robust performance and contributions to enhancing pulmonary nodule detection and segmentation tasks. The shared underlying feature extraction backbone and end-to-end training demonstrated the effectiveness of their comprehensive approach.
Dutande et al. [100], employed a 2D–3D cascaded convolutional neural network (CNN) for comprehensive lung nodule analysis, integrating segmentation, detection, and classification. The SquExUNet segmentation framework, combined with a 2D–3D cascaded CNN approach for detection, demonstrated a sensitivity of 90% and a Dice coefficient of 80%. This study’s holistic strategy showcases the potential for combining multiple tasks within a unified framework.
Kido et al. [108] introduced a Nested Three-Dimensional Fully Connected Convolutional Network for lung nodule segmentation. Leveraging a single encoder, the model achieved promising results with a Dice Similarity Coefficient (DS) of 0.845 ± 0.007 and Intersection over Union (IoU) of 0.738 ± 0.011 on a 128 × 128 × 64 input size. The housing unit structure and encoder–decoder connections through concatenation demonstrated the network’s efficacy.

4.2.3. U-Net

The U-Net architecture has become a cornerstone in medical image segmentation due to its ability to capture both contextual and localization information through its distinctive U-shaped design [46]. This architecture is particularly effective for nodule segmentation, where precise boundary delineation is crucial. For instance, Usman et al. [97] utilized a two-stage Residual U-Net, achieving high DICE scores by leveraging both axial and sagittal views. However, U-Nets can struggle with segmenting fine details and may require extensive data augmentation to enhance robustness. The slice spacing of the CT image is also a factor that can affect the performance of these models.
Usman et al. [97] proposed a semi-automated 3D segmentation method for lung nodules, employing a two-stage process. In the first stage, a 2D ROI containing the nodule was used for patch-wise exploration along the axial axis using an adaptive ROI algorithm. In the second stage, the VOI was further explored along the coronal and sagittal axes using Residual U-Nets. The method achieved promising results with DICE scores of 85.29% ± 9.78 in axial, 84.76% ± 12.45 in coronal, and 83.58% ± 8.93 in sagittal views, averaging 87.5% ± 10.58.
Pezzano et al. [99] took a unique approach by using a U-net-based network to learn the context of nodules through two masks representing background and secondary-important elements in CT scans. The subtraction of these masks successfully extracted the nodule area, yielding an impressive IoU score of 76.6 ± 12.3, comparable to human performance. The masks generated by Pezzano’s network closely resembled those produced by radiologists, as validated by the Kolmogorov-Smirnov test [119].
A novel approach introduced in [101] leveraging synthetic CT images with distinctive color patterns to represent evolving nodule features for detection and segmentation. Using a modified U-Net architecture, the method achieved a Dice Coefficient Similarity of 93.14%, Recall of 91.76%, Precision of 93.3%, True Positive Rate of 2.3%, and a false positive rate of 0.21%. This innovative approach enhances the accuracy and efficiency of lung nodule segmentation by effectively incorporating inter-slice information through synthetic images.
Wu et al. [103], utilizing image enhancement and a Dual-Branch U-Net (DB U-Net), demonstrated effective segmentation. The DB U-Net achieved Dice coefficients of 83.16% and 81.97% on the LIDC dataset and an additional dataset, showcasing its accuracy in lung nodule segmentation.
X. Zhang et al. [105] proposed an accurate segmentation approach with an improved U-Net convolutional network incorporating Batch Normalization (BN) for enhanced performance. Operating at a resolution of 32 × 32, the improved U-Net achieved a Dice Similarity Coefficient of 0.8623, demonstrating its accuracy in segmenting various types of lung nodules.
Dodia et al. [102] introduced a novel deep-learning architecture designed for lung cancer nodule detection with a focus on reducing false positives. It incorporates receptive regularization in both the convolution and deconvolution layers of the V-Net model. Nodule classification is carried out using a combination of SqueezeNet and ResNet, termed the nodule classification network (NCNet). Post-processing involves image enhancement of 2D slices through increased intensity using pseudo-color or fluorescence contrast. The RFR V-Net demonstrates impressive segmentation performance, achieving a Dice Similarity Coefficient of 95.01% and an Intersection over Union of 0.83. The NCNet achieves a sensitivity of 98.38% and a low rate of false positives per scan (FPs/Scan) of 2.3 for 3D representations. The proposed approach, which combines RFR V-Net and NCNet, exhibits substantial improvements over existing computer-aided diagnosis (CAD) systems for lung nodule detection.
Two studies, RAD-Unet [113] and SMR-Unet [114], focused on enhancing traditional U-Net models. RAD-UNet incorporated a residual network module and a pyramid pooling module, achieving a mIoU of 87.76% and 88.13%, demonstrating improved lung nodule semantic segmentation performance. SMR-UNet introduced self-attention, multiscale features, and residual structures, achieving a Dice index of 0.9187 and an IoU of 0.8688, surpassing the traditional U-Net model by 4.22% and 4%, respectively.
Evo-GUNet3++ is a novel approach by Ardiemento et al. [117] using evolutionary algorithms to optimize UNet-based architectures for efficient 3D lung cancer detection. Achieving a Dice coefficient of 0.972, sensitivity of 0.977, and positive predictive value (PPV) of 0.923, Evo-GUNet3++ demonstrated effectiveness in accurate semantic segmentation for lung cancer detection, outperforming baseline performances, and showcasing the potential of evolutionary algorithms in UNet-based architecture optimization.

4.2.4. Residual Networks

Residual Networks (ResNets) address the degradation problem in deep neural networks by introducing shortcut connections that allow gradients to bypass one or more layers, improving training efficiency [120]. ResNets are advantageous for training deeper networks without the risk of vanishing gradients, which is crucial for capturing intricate details in medical images. The complexity of these deeper networks can lead to increased computational requirements and longer training times.
H. Liu et al. [94], proposed an innovative approach utilizing a residual block-based dual-path network to extract both local features and rich contextual information from lung nodules. The method takes advantage of multi-view and multiscale features from CT images, showcasing effectiveness in segmenting small and juxta-pleural nodules. The implemented CDP-ResNet achieved an impressive average Dice Sørensen Coefficient of 81.58% ± 11.05 on the LIDC dataset. Notably, the model’s segmentation accuracy was reported to be slightly superior to that of human experts, emphasizing its potential for accurate nodule segmentation.
Building upon H. Liu’s [94] work, H. Cao [98] proposed a model designed to simultaneously capture the multi-view and multiscale features of different nodules in CT images. This model combines intensity and convolutional neural network (CNN) features using a unique pooling method called the central intensity pooling layer (CIP). This layer extracts intensity features from the center voxel of the block, and a CNN is employed to obtain convolutional features from the same center voxel. To address overfitting, the authors introduced a weighted sampling strategy for training sample selection. The method achieved a remarkable Dice Sørensen Coefficient of 82.74% ± 10.19 on the LIDC-IDRI dataset, highlighting its potential for robust lung nodule segmentation.
These studies collectively demonstrate the effectiveness of residual networks in lung nodule segmentation, showcasing advancements in leveraging multi-view and multiscale features for improved accuracy and performance.

4.2.5. Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) [40] are utilized for generating high-quality synthetic data, such as medical images [121], but they have applications in other domains, such as security [122]. These synthetic data can augment the training datasets and improve the segmentation accuracy. For instance, GANs have been used to generate realistic CT scans with nodule annotations, enhancing the training of segmentation models [123]. They are particularly useful in addressing the data scarcity problem, which is a common issue in medical imaging. However, GANs suffer from training challenges, such as mode collapse, non-convergence, and instability problems [124]. With these limitations, GANs can generate unrealistic, blurry, and less diverse images. The mode collapse problem occurs when the generator produces similar output images while taking different input features. These issues necessitate careful tuning of the generator and discriminator networks to achieve a stable equilibrium.
This work [109] introduced CSE-GAN, a 3D conditional generative adversarial network tailored specifically for lung nodule segmentation. In this novel approach, the generator functions as a segmentation network that is responsible for producing accurate segmentation mask images. Simultaneously, the discriminator operates as a classification network, distinguishing between authentic and generated segmented nodule masks. This adversarial setup encourages the generator to improve its segmentation capabilities continually. A key innovation in CSE-GAN lies in the integration of a concurrent spatial and channel squeeze and excitation module within both the generator and discriminator. This module enhances the segmentation performance by enabling the network to focus on critical spatial and channel-wise features during the segmentation process (Figure 11). This dual application of the squeeze and excitation module contributes to improving classification accuracy as well. To validate the proposed model, an Indian lung nodule dataset collected from a local hospital is introduced. This dataset serves both validation and generalizability testing, providing a diverse set of cases from which the model can learn. CSE-GAN demonstrates noteworthy results on two distinct datasets: the LUNA test set and the local dataset. Specifically, the model achieves Dice coefficients of 80.74% and 76.36% on these datasets, indicating a high level of accuracy in lung nodule segmentation. Additionally, sensitivities of 85.46% and 82.56% further underscore the effectiveness of CSE-GAN in accurately identifying and segmenting lung nodules.

4.2.6. Transformers

Transformers have shown significant potential for handling complex dependencies and contextual information in image segmentation tasks [111,112,115]. The adaptation of transformers to image processing, particularly for nodule segmentation, allows for capturing long-range dependencies and contextual information, enhancing segmentation accuracy. However, similar to transformers developed for nodule detection, they are resource-intensive and often require large datasets.
DPBET, a unique Cascade-Axial-Prune transformer (CAP-Trans) model for lung nodule segmentation, was introduced by S. Wang et al. [111]. DPBET utilizes a hybrid CNN-Transformer architecture to maximize the benefits of both convolutional and transformer blocks, allowing for the effective capture of local details and global semantic representations. To enhance the boundary information, the authors employ edge detection operators, constructing boundary enhancement datasets for the edge path to provide additional edge prior knowledge. The encoding method, termed “Down-Attention Sample (DASample)”, is introduced in the edge path, leveraging channel and spatial attention to increase the perceptual field for multiscale lung nodule information. Evaluated on a 64 × 64 dataset, the CAP-Trans model achieves impressive segmentation results with a Dice Similarity Coefficient (DSC) of 89.86% and an average sensitivity of 90.50%. This highlights the model’s effectiveness in accurately delineating lung nodules and showcases its potential for advancing lung nodule segmentation tasks by incorporating a hybrid CNN-Transformer architecture and innovative boundary enhancement strategies.
X. Li et al. [115] introduced TPFR-Net, a U-shaped model designed for lung nodule segmentation, specifically addressing challenges related to global dependencies, computation efficiency, and feature loss during up-sampling. TPFR-Net employs a unique encoder that integrates convolution and transformer modules for both local and global feature extraction. This allows for multiscale and channel attention convolutions to be incorporated effectively. To manage the computational demands of the transformer’s self-attention layer, the authors ingeniously fuse it with the pooling layer. Additionally, a novel feature reorganization strategy utilizing dual attention mechanisms is implemented to maximize feature retention during up-sampling. The loss function is enhanced by combining Binary Cross-Entropy (BCE), Dice, and Hausdorff distance (HD Loss) to give due emphasis to segmentation boundaries. The deep supervision strategy further ensures the authenticity and reliability of the image information post-feature reorganization. Evaluated on datasets of varying dimensions (64 × 64, 96 × 96, and 128 × 128), TPFR-Net showcases remarkable results with a Dice Similarity Coefficient of 91.84 and a sensitivity of 92.66, highlighting its effectiveness in lung nodule segmentation. This model proves to be a promising advancement in addressing the challenges associated with global dependencies and feature loss during up-sampling in lung nodule segmentation tasks.
The DEHA-Net [112] introduced a novel lung nodule segmentation framework incorporating a dual-encoder-based hard attention network and an adaptive ROI mechanism. The framework operates in two stages: firstly, a 2D ROI is generated along the axial axis using DEHA-Net, leveraging the initial input from a radiologist or CADe system. An adaptive ROI algorithm then extends the generation of ROIs to the surrounding slices, facilitating 3D mask reconstruction. In the second stage, ROIs are generated along sagittal and coronal views using the 3D mask obtained in the first stage, with DEHA-Net applied for segmentation along these views. The final 3D segmentation mask is produced by a consensus module. Notably, this pipeline avoids resizing and mitigates the issues associated with input and output rescaling. The dual-encoder-based CNN achieves promising results with a Dice Similarity Coefficient of 87.91%, sensitivity of 90.84%, and positive predictive value of 89.56%, demonstrating the effectiveness of DEHA-Net for accurate lung nodule segmentation (Figure 12).

4.2.7. Others

S. Luo et al. [110] introduced DAS-Net, a lung nodule segmentation method that leverages an adaptive dual attention module and a novel 3D shadow mapping layer. The adaptive dual attention module enhances the model’s ability to perceive detailed information on the 3D nodule surface, improving segmentation accuracy. A 3D shadow mapping layer was introduced to construct the basic structure of the network, ensuring feature-rich extraction with a reduced number of parameters and computational efficiency. DAS-Net, with a dimension of 16 × 128 × 128, outperforms state-of-the-art methods in lung nodule segmentation tasks. The evaluation metrics demonstrate its superiority, with a Dice score of 92.05% ± 3.08, sensitivity of 90.81% ± 6.35, and Hausdorff distance of 3.93 ± 1.87 mm, showcasing the effectiveness of DAS-Net in accurate 3D lung nodule segmentation with a compact parameter configuration.
Qiu et al. [116] proposed this novel Dual-Task Region-Boundary Aware Neural Network for pulmonary nodule segmentation incorporates a hierarchical feature module for capturing multiscale information, a boundary-guided module to model boundaries explicitly, and a feature aggregation module for fusing boundary and multiscale features. The proposed region-boundary aware loss function enhances the relationship between regions and boundaries, resulting in improved segmentation performance. The method achieves superior segmentation results on the LIDC-IDRI and LUNA16 datasets, particularly for small and non-solid nodules. Utilizing a 3D U-Net architecture with a 64 × 64 × 32 input size, the proposed method demonstrates promising performance metrics, including a Dice Similarity Coefficient of 82.48%, Intersection over Union of 70.86%, sensitivity of 82.74%, Precision of 84.10%, and Average Surface Distance of 0.310 mm, highlighting its suitability for accurate pulmonary nodule segmentation.
We reported various architectures for lung nodule segmentation, each addressing specific challenges with unique strengths and limitations. 2D CNNs, such as those in [93,95], are effective for image segmentation and are less computationally demanding but lack volumetric context. In contrast, 3D CNNs [62,108] utilize volumetric data for better spatial accuracy, although they require substantial computational resources. The U-Net architecture [97] excels in capturing contextual and localization information with its distinctive U-shaped design, but struggles with fine details, often necessitating extensive data augmentation. Residual Networks (ResNets) [94] mitigate degradation in deep networks, improving training efficiency and capturing intricate details, albeit with increased computational demands. Generative Adversarial Networks (GANs) [109] generate high-quality synthetic data to augment training datasets, addressing data scarcity, but face training challenges like mode collapse and instability. Transformers [111] handle complex dependencies and contextual information, enhancing segmentation accuracy, although they are resource-intensive and require large datasets. Additionally, novel architectures like DAS-Net [110] and DEHA-Net [112] incorporate dual attention modules and adaptive ROI mechanisms, demonstrating superior performance and computational efficiency in 3D lung nodule segmentation tasks. Collectively, these advancements highlight significant progress in lung nodule segmentation by leveraging diverse methodologies to improve accuracy and performance.

4.3. Nodule Classification

A pulmonary nodule classification model operates to distinguish between benign and malignant nodules by analyzing their features. From our literature review, we identified works that used single-view input, where each image is analyzed independently; multi-view inputs, where multiple perspectives of the same nodule are considered together; and 3D inputs that capture volumetric information, providing a more comprehensive analysis of the nodule by considering its structure in three dimensions. Classification works follow similar preprocessing steps to the segmentation works we outlined in our earlier discussion on segmentation works. Similarly, in the following table, we demonstrate the preprocessing steps and augmentation methods employed in previous studies. Table 7 provides an overview of the nodule classification works, while detailed information on preprocessing steps and performance metrics can be found in Appendix A, Table A5, and Table A6, respectively. In the following table, we report Accuracy, which is a key metric for nodule classification because it directly reflects the model’s ability to correctly identify both benign and malignant nodules, which is crucial for effective diagnosis and treatment planning.

4.3.1. Single-View

Single-view classifiers focus on analyzing individual slices or views from 3D medical images, thereby simplifying data and computational requirements. These classifiers are advantageous due to their lower computational cost and faster processing times, making them suitable for quick diagnostic applications. However, they inherently lack the ability to capture the full spatial context of nodules, potentially leading to less accurate classification. For instance, single-view approaches may miss subtle 3D characteristics of nodules that are crucial for accurate diagnosis.
G. L. F. da Silva et al. [131], in their work employed a convolutional neural network (CNN) alongside Particle Swarm Optimization (PSO) to optimize hyperparameters, achieving a remarkable accuracy of 97.62%, sensitivity of 92.20%, specificity of 98.21%, and an AUC of 0.955 on the LIDC-IDRI database.
S. Zhang et al. [169] implemented the transfer learning technique on the LeNet-5 model to classify pulmonary nodules of thoracic CT images, including benign and malignant pulmonary nodules and different malignancies of the malignant nodules. Their method was evaluated with 10-fold cross-validation and achieved an accuracy of 97.041% and an AUC of 0.977 in malignant-nodule–non-nodule classification. They achieved an accuracy of 96.685% and an AUC of 0.979 for the classification of Serious-Malignant and Mild-Malignant tumors in the LIDC-IDRI database.
Bhandary et al. [138] explored a Modified AlexNet (MAN) architecture with Ensemble-Feature-Technique (EFT), showcasing the model’s superiority with a classification accuracy of 97.27%, sensitivity of 98.09%, specificity of 95.63%, and an AUC of 0.995.
Tran et al. [134] introduced a novel 15-layer 2D CNN, incorporating focal loss, resulting in a highly accurate model with 97.2% accuracy, 96.0% sensitivity, 97.3% specificity, and an AUC of 0.982 on the LUNA16 challenge database. Al-Shabi M. [52] proposed Gated-Dilated (GD) networks, achieving an accuracy of 92.57%, the sensitivity of 92.67%, and AUC of 0.9514, outperforming traditional CNNs. Suresh and Mohan [139] developed a CNN with Generative Adversarial Networks (GAN), achieving a classification accuracy of 93.9%, specificity of 93%, sensitivity of 93.4%, and AUC of 0.934, demonstrating the impact of GAN-generated synthetic samples.
Huang’s et al. [170] novel approach combined Deep Transfer Learning CNN (DTCNN) and Extreme Learning Machine (ELM), achieving 94.57% accuracy, 93.69% sensitivity, and 95.15% specificity on LIDC-IDRI. Zhao X. [136] explored modified CNN strategies with transfer learning, attaining an AUC of 0.94, a sensitivity of 91%, and an overall accuracy of 88%. Ali I. [141] introduced transferable texture CNN networks, achieving high performance on the LIDC-IDRI dataset and demonstrating the efficacy of transfer learning on smaller datasets. Naik et al. [145] combined Fractalnet and CNN, yielding a model with high sensitivity (97.52%) and AUC (0.98), addressing the overfitting problem with drop-path in the fractalnet architecture.
LDNNET, proposed by Chen et al. [148] for robust classification, exhibited excellent performance without extensive preprocessing, emphasizing its versatility and robustness. The reported metrics for a voxel resolution of 80 × 80 pixels include a sensitivity of 0.982072, an accuracy of 0.988396, and a specificity of 0.994584. A Bilinear convolutional neural network (BCNN) [153] introduced for lung nodule classification achieved an accuracy rate of 91.99% and an AUC rate of 95.9%, highlighting the efficacy of the proposed architecture.
Additionally, transfer learning techniques for the automatic characterization of Solitary Pulmonary Nodules (SPNs) by Apostolopoulos, Pintelas et al. [155] demonstrated a peak accuracy of 94%, addressing the challenges of small datasets and showcasing the benefits of transfer learning and data augmentation. Zhai et al. [144] implemented a 2-D Multi-Task-CNN model with nine views, each having a nodule classification branch and an image reconstruction branch. The weighted fusion of the predictions from these models resulted in an AUC of 0.9559 for the LIDC-IDRI dataset and 0.973 for the LUNA16 challenge dataset. However, the sensitivity performance of 87.74% in LIDC-DRI and 84.00% in the LUNA16 challenge dataset was reported to be lower than that of other classification approaches, possibly due to limited information communication between different views before fusion, leading to the potential loss of rich contextual information and features.
In another work by Suresh and Mohan [160], NROI-based feature learning using DCNN demonstrated outstanding performance with a classification accuracy of 97.8%, specificity of 97.2%, sensitivity of 97.1%, and an AUC of 0.9956. This study [161] proposed Residual-Transformer networks, showcasing improved accuracy (AUC of 0.9628) and overall effectiveness in classifying lung nodules.
Qiao et al. [167] explored computational intelligence techniques, achieving improvements in accuracy, precision, sensitivity, and specificity using a U-net-based network. The ensemble framework, F-LSTM-CNN, proposed for benign-malignant classification, integrated nodule attributes, and images, achieved an accuracy of 0.955, sensitivity of 1, specificity of 0.937, and AUC of 0.995.
These advancements collectively underscore the evolving landscape of deep learning in lung nodule classification, emphasizing improved model architectures, optimization strategies, and computational intelligence integration for enhanced clinical accuracy and reliability.

4.3.2. Multi-View

Multi-view classifiers address the limitations of single-view models by incorporating multiple perspectives of the nodule, often from different angles or slices. This approach enhances the model’s ability to capture more comprehensive spatial features, leading to improved classification accuracy. Multi-view classifiers can integrate features from various views to make a more informed decision. However, this increased complexity demands more computational power and sophisticated data-handling techniques. The benefit of improved accuracy must be weighed against higher resource requirements.
Nibali et al. [125] adopted a three-column ResNet architecture to assess the impact of curriculum learning, transfer learning, and varying network depth on malignancy classification. By using three 2D planar views instead of a full 3D volume, their model achieved a sensitivity of 91.07%, specificity of 88.64%, AUC of 0.9459, and accuracy of 89.90%, showcasing computational efficiency compared to other multi-view approaches.
Kang et al. [126] experimented with 3D multi-view convolutional neural networks (MV-CNN) for lung nodule classification, exploring binary and ternary classifications. Achieving a lower error rate and higher performance metrics (Se: 95.60%, Sp: 93.94%, AUC: 0.99) than similar works, this study highlighted the efficacy of 3D MV-CNN in diverse classification tasks.
Meanwhile, in another work, Xie et al. [127] proposed the Transferable Multi-Model Ensemble (TMME) algorithm, leveraging pre-trained ResNet-50 models for nodule characterization. In experiments on the LIDC-IDRI database, TMME achieved a classification accuracy of 93.4%, sensitivity of 91.43%, and specificity of 94.09%, underscoring the effectiveness of transfer learning for accurate nodule classification.
On a different front, Xie Y. et al. [128] introduced a multi-view knowledge-based collaborative (MV-KBC) deep neural network model, utilizing nine views for nodule classification. Despite competitive performance metrics with an accuracy of 91.60%, specificity of 94%, and AUC of 95.70%, the model’s drawback lay in its large size (nine sub-models) and significant computational requirements.
This retrospective study [147] developed a deep learning (DL) algorithm for lung nodule classification, showcasing superior performance with an AUC of 0.93 and clinical relevance compared to existing models. Additionally, MV-CRecNet, a novel model architecture for pulmonary nodule identification, demonstrated superior performance in terms of accuracy, sensitivity, specificity, and AUC compared to alternative model architectures. These studies collectively advance the landscape of lung nodule classification by introducing innovative approaches and demonstrating promising outcomes in terms of accuracy and clinical applicability.
This work [149] introduced a novel model architecture, MV-CRecNet, which leverages Multi-view Convolutional Recurrent Neural Networks (MV-CNN). The comparative performance of MV-CRecNet is assessed alongside various other model architectures, employing 2D MV-CNN and 3D MV-CNN with different input sizes (20 × 20), (30 × 30), and (40 × 40). The reported metrics for a voxel resolution of (20 × 20) include an accuracy of 0.97, sensitivity of 0.98, specificity of 0.97, and an AUC of 0.99. These results underscore the effectiveness of the proposed MV-CRecNet in pulmonary nodule identification, showcasing superior performance in terms of accuracy, sensitivity, specificity, and AUC compared to alternative model architectures.

4.3.3. Three-Dimensional Classifiers

Three-dimensional classifiers process entire 3D patches of CT scans, enabling the capture of detailed spatial relationships and volumetric features of the nodules. This holistic approach significantly enhances classification performance, as evidenced by models like [130,157]. The main drawback of 3D classifiers is their high computational demand, which necessitates extensive resources for both training and inference. This can restrict their practical use in settings with limited computational capabilities. However, the significant benefit of using the entire nodule for prediction, rather than a single view, potentially outweighs this limitation, as radiologists review multiple slices in practice.
Several 3D classifiers for lung nodule classification have been developed over the past few years. For example, Causey J. [129] proposed a predictive model employing a deep learning CNN (Figure 13). Despite achieving a high sensitivity of 94.80%, specificity of 94.30%, accuracy of 94.60%, and an AUC of 0.984, the model’s major limitation lies in its reliance on manual specification of the nodule region of interest, making it less automated. Dey et al. [130] evaluated multiple 3D CNNs, introducing a 3D multi-output DenseNet (MoDenseNet) with transfer learning. The model achieved an accuracy of 90.40%, specificity of 90.33%, sensitivity of 90.47%, and AUC of 0.948, showcasing robust performance, especially with transfer learning on a challenging private dataset.
Xu et al. [140] developed lightweight 3D CNN models, demonstrating that shallower networks outperformed deeper ones. Despite achieving a high accuracy of 92.65% and a specificity of 95.87%, the sensitivity of 85.58% was relatively low. Ren Y. [142] proposed a novel manifold regularized classification deep neural network (MRC-DNN), achieving a classification accuracy of 0.90 on the LIDC-IDRI validation set, with sensitivity and specificity of 0.81 and 0.95, respectively. Jung H. [132] introduced a 3D DCNN with shortcut and dense layer connections and achieved a CPM score of 0.910, with a high sensitivity of 95.4% and a notably low false positive rate.
This study [157] proposed a lung nodule classification algorithm that integrates CT imagery with biomarker annotation and volumetric radiomic features. The algorithm employs a 3D convolutional neural network (CNN) and a Random Forest. The performance is analyzed and compared across different combinations: using only imagery, only biomarkers, combined imagery + biomarkers, combined imagery + volumetric radiomic features, and the combination of imagery + biomarkers + volumetric features. The reported results for a voxel resolution of 32 pixels × 32 pixels × 16 slices include an ROC AUC of 0.8674. This comprehensive approach leverages multiple data modalities to enhance the classification of lung nodule malignancy suspicion levels.
The Multi-View Coupled Self-Attention Network (MVCS) [164] addressed depth dimension relations and achieved notable performance metrics on a 64 × 64 × 32 voxel resolution, including an accuracy of 91.25%, sensitivity of 89.10%, specificity of 93.39%, precision of 91.59%, AUC of 91.25%, and F1-score of 90.19%. This study [154] focused on the early detection of lung cancer and utilized a 3D convolutional neural network (CNN) with a Multiview-one-network strategy, achieving an AUC of 0.97, accuracy of 97.17%, F-score of 0.92, precision of 0.87, and recall of 0.94. These advancements collectively contribute to the evolving landscape of 3D lung nodule classification, showcasing innovative approaches and promising outcomes in terms of accuracy and clinical applicability.
In addition, several other innovative approaches have been introduced. The Self-supervised Transfer Learning Framework driven by Visual Attention (STLF-VA) [165] demonstrated competitive performance with an accuracy of 92.36%, sensitivity of 91.62%, specificity of 93.08%, AUC of 97.17%, and precision of 92.99%. The ProCAN network [159], incorporating non-local network enhancement and curriculum learning, outperformed state-of-the-art methods with an AUC of 98.05% and an accuracy of 95.28%.

4.3.4. Auto Encoders

Autoencoders are utilized for their ability to learn efficient representations of input data, which can be leveraged for nodule classification tasks. By reducing the dimensionality of the data and extracting salient features, autoencoders can enhance the performance of classifiers. However, as described in previous sections, autoencoders can suffer from overfitting, particularly when trained on limited data and require careful tuning to ensure robust performance [87].
Silva F. [143] proposed a transfer learning approach utilizing a Convolutional Autoencoder (CAE) as a feature extractor, followed by a Multi-layer Perceptron classifier. Despite achieving a sensitivity of 78.9% and an AUC of 0.928 when trained from scratch, the model’s performance improved to a sensitivity of 84.8% and an AUC of 0.936 on the LIDC-IDRI dataset. While these metrics are lower compared to those in other studies, the transfer learning approach showcases the potential of leveraging pre-trained autoencoder features for malignancy binary classification.
Another study [168] introduced an unsupervised feature extraction method employing multiple convolutional autoencoders for various 2.5-dimensional medical images, achieving high performance in detecting cerebral aneurysms and lung nodules with AUCs exceeding 0.96. This highlights the versatility and effectiveness of autoencoders for unsupervised feature extraction in different medical imaging applications.
In the CKAK pipeline [166], a method aimed at reliable lung nodule diagnosis, the authors effectively fused clinical and AI knowledge at both the feature and decision-making levels. The proposed scale-aware feature extraction block (SAFE) integrates multiscale contextual features using a lightweight Transformer, demonstrating superior accuracy in benign-malignant classification (92.82%), AUC (97.40%), sensitivity (87.69%), and specificity (95.38%) at a voxel resolution of 80 × 80 × 60. The CKAK pipeline showcases the potential of combining clinical expertise with AI knowledge for robust lung nodule diagnosis, meeting clinical requirements for a reliable computer-aided diagnosis (CAD) system.
Mao et al. [133] proposed a novel lung nodule classification model using deep autoencoders for feature representation. They segmented lung nodule images into local patches with Superpixel, transformed these into fixed-length vectors using deep autoencoders, and constructed a global representation using the Bag of Visual Words (BOVW). The model was evaluated on the ELCAP dataset, achieving a high classification rate of 0.939. This method outperformed other techniques by effectively combining local and global feature representations and enhancing model generalization through unsupervised learning.

4.3.5. Multi-Task Learning

Multi-task learning models are designed to perform multiple related tasks simultaneously, such as detection, segmentation, and classification of nodules. This approach can improve the overall performance of each task by leveraging shared features and representations. The main challenge with multi-task learning is the increased complexity of model design and training, which can require more sophisticated strategies to balance the learning of each task effectively.
Zhai P. [144] extracted nine 2-D views from a candidate nodule cube from different projection angles. Then, for each view, they constructed a 2-D Multi-Task-CNN model, which consisted of a nodule classification branch and an image reconstruction branch. Finally, they obtained the classification results by weighted fusion of the prediction results of the nine 2-D MT-CNN models. The implementation of transfer learning improved their metrics slightly with an AUC of 0.9559 in the LIDC-IDRI dataset and 0.973 in the LUNA16 challenge dataset, but the reported sensitivity performance of 87.74% in LIDC-DRI and 84.00% in the LUNA16 challenge dataset is lower than that of other classification approaches. The authors state that there is no information communication between different views before fusion; thus, rich contextual information and features are probably lost.
Zia M. [146] addressed intra-class variation and inter-class similarity using a multi-deep model (MD model) for lung nodule classification. The model includes multiscale dilated convolutional blocks, dual deep convolutional neural networks, and multi-task learning components. With reported performance metrics of almost 91% sensitivity, specificity, and accuracy, this approach aims to overcome the challenges related to diverse imaging modalities.
Fu et al. [162] introduced a deep learning-based multi-task learning (MTL) model with attention modules for analyzing lung nodule attributes in CT images. This model, which processes entire image volumes and simultaneously scores multiple nodule attributes, achieved notable performance on the LIDC-IDRI dataset with an accuracy of 94.7%, a sensitivity of 96.2%, a specificity of 82.9%, a precision of 97.8%, and AUC of 95.9. The incorporation of attention modules enhances interpretability and clinical relevance.

4.3.6. Transformers

Transformers are highly effective in capturing complex relationships within the data and enhancing classification accuracy, as the self-attention mechanism allows them to capture long-range dependencies and contextual information. However, as mentioned in the previous sections, they are resource-intensive and often require large datasets and substantial computational power, which can be a limitation in data-scarce environments.
The Multi-Granularity Dilated Transformer (MGDFormer) [163] model aims to learn pixel-wise global attention for robust long-range global representation and employs a Local Focus Scheme (LFS) to enhance the focus on local discriminative features. With competitive performance metrics, including an AUC of 98.5%, accuracy of 96.1%, precision of 95.9%, sensitivity of 94.4%, and F1-score of 95.2%, MGDFormer demonstrates effectiveness in addressing challenges related to local and global information, making it a promising model for lung nodule classification.

4.3.7. Capsule Networks

Capsule networks (CapsNets) are employed for the first time as individual experts within a mixture of experts (MoE) framework in the proposed MIXCAPS [150] model for lung nodule malignancy prediction. This study reveals how CapsNet can be considered an MoE framework, making MIXCAPS a hierarchical MoE technique. The output of the gating model is explored for potential correlations with hand-crafted nodule features, enhancing the interpretability of MIXCAPS (Figure 14). The analysis demonstrates that individual CapsNet experts specialize in different subsets of the dataset, and the gating model determines their contributions, showcasing how experts’ activations change with different data subsets. MIXCAPS achieves robust generalizability, as illustrated through the extension and evaluation of a separate dataset associated with a different prediction task. The reported metrics for an input size of 80 × 80 × 3 slices include a sensitivity of 89.5%, specificity of 93.4%, accuracy of 90.7%, and an AUC of 0.956. These findings highlight the effectiveness of the proposed MIXCAPS model for lung nodule malignancy prediction and its potential for interpretability and generalization.
Afshar et al. [137] followed a Capsule Network design that is capable of dealing with a small number of training samples. They proposed three independent capsule networks that used 3D nodule crops as inputs. Each CapsNet has a different input scale of 80 px × 80 px × 3 slices (+10 px for the next two scales). The output vectors are masked and concatenated into a single vector. This vector goes through a fusion module consisting of a set of fully connected layers to form the probability associated with each class (benign or malignant). This method achieved highly accurate 3D classification with a sensitivity of 94.94%, specificity of 90%, accuracy of 93.12%, and AUC of 0.964 on the LIDC-IDRI dataset. The multiscale approach provided more context for the end decision.

4.3.8. Others

DC-GAN [152] proves effective in generating realistic SPNs, leading to a notable increase in FF-VGG19’s classification accuracy on the LIDC-IDRI dataset from +7% to 92.07% and on the CT dataset from 5% to 84.3%. The reported metrics, with a voxel resolution of 32 × 32, include an accuracy (ACC) of 92.1%, sensitivity (SEN) of 89.3%, specificity (SPE) of 94.8%, and an area under the curve (AUC) of 92.1%. These results underscore the efficacy of the proposed methodology, which combines DC-GAN for realistic nodule generation and FF-VGG19 for improved classification accuracy in lung nodule malignancy assessment.
A groundbreaking study [151] introduced a novel approach utilizing neural architecture search (NAS) for the automatic exploration of 3D network architectures. This method achieves an exceptional accuracy/speed trade-off by integrating the convolutional block attention module (CBAM) into networks to enhance the reasoning process. The use of the A-Softmax loss function during training fosters learning of angularly discriminative representations. Notably, this study marks the first attempt to use NAS for pulmonary nodule classification. The model’s reasoning process aligns with physicians’ diagnoses, contributing to explainability, and the proposed ensemble strategy achieves high comparability with previous state-of-the-art methods while utilizing significantly fewer parameters. With reported metrics for a voxel resolution of 32 × 32 × 32, including an accuracy of 90.77%, sensitivity of 85.37%, specificity of 95.04%, and an F1 score of 89.29, these results underscore the efficacy of the proposed 3D NAS method, CBAM module, A-Softmax loss, and ensemble strategy for achieving efficient, explainable, and discriminative representations in pulmonary nodule classification.
Another innovative study by Xia et al. [156] adopted a multi-step approach for pulmonary nodule classification. The MIXUP method is employed to construct virtual training data for enhanced data diversity, followed by the use of a 3D dual-path network (3D DPN) to extract nodule features. The Gradient Boosting Machine (GBM) algorithm is then applied to differentiate pulmonary nodules using both deep features and raw nodule pixels. Spatial and contextual features are captured using the RAN and SE modules, with a novel multiscale attention module proposed to capture multiscale attentive features. This comprehensive approach, incorporating various techniques such as data augmentation, advanced network architectures, and attention mechanisms, yields robust pulmonary nodule classification with reported results, including an accuracy of 91.9%, sensitivity of 91.3%, a false positive rate of 8.0%, and an F1-score of 91.0%.
Additionally, Al-Shabi et al. [158] introduced a novel approach, 3D Axial-Attention, for lung nodule classification, aiming to improve efficiency compared to regular non-local networks. The 3D Axial-Attention operates on each axis independently, requiring less computing power. To address the invariant position problem, 3D positional encoding is added to shared embeddings. The proposed method outperforms state-of-the-art approaches on the LIDC-IDRI dataset, achieving an AUC of 96.17, Accuracy of 92.81, Precision of 92.59, and sensitivity of 92.36. This highlights the effectiveness of the 3D Axial-Attention approach in enhancing lung nodule classification performance.
In this section, various approaches to nodule classification are discussed, each with distinct strengths and limitations. Single-view classifiers, which analyze individual slices of 3D images, are computationally efficient and quick but may miss the crucial spatial context, leading to less accurate classification. These models, such as the CNNs optimized by da Silva et al. [131] and Zhang et al. [169], achieve high accuracy and AUC, but often fall short in capturing subtle 3D characteristics. Multi-view classifiers address these limitations by incorporating multiple perspectives, enhancing spatial feature capture, and improving classification accuracy. Notable examples include Nibali et al.’s [125] three-column ResNet and Kang et al.’s [126] 3D multi-view CNN, both demonstrating significant improvements in sensitivity and specificity but requiring higher computational power. Three-dimensional classifiers process entire 3D patches, capturing detailed spatial relationships and volumetric features, as seen in the works by Causey et al. [129] and Dey et al. [130], which achieve robust performance metrics but are computationally demanding. Autoencoders offer efficient data representation and enhance classifier performance, but they risk overfitting and require careful tuning. For instance, Silva et al.’s [143] transfer learning approach with convolutional autoencoders shows promise in leveraging pre-trained features for classification.
Multi-task learning models, such as those proposed by Zhai et al. [144] and Zia et al. [146], perform multiple related tasks simultaneously, improving the overall performance by leveraging shared features. These models demonstrate high accuracy and sensitivity but add complexity to model design and training. Transformers, with their self-attention mechanisms, effectively capture long-range dependencies and contextual information, enhancing classification accuracy but necessitating large datasets and substantial computational power. An example is the MGDFormer model [163], which excels at addressing local and global information challenges. Capsule Networks, used in models like MIXCAPS by Afshar et al. [150], effectively manage small training samples and enhance interpretability, demonstrating high sensitivity and accuracy. Other innovative approaches, such as the use of GANs for generating realistic nodule samples [152], neural architecture search (NAS) for optimizing 3D network architectures [151], and multiscale attention modules [156], have further advanced the field of lung nodule classification. These methods collectively contribute to improved accuracy, sensitivity, and clinical applicability, showcasing significant progress in leveraging deep learning for reliable and efficient lung nodule classification.

5. Discussion

In this literature review, we investigated prior research efforts dedicated to the detection, segmentation, and classification of pulmonary nodules in low-dose CT scans, each of which presents its own set of difficulties. The literature review identified significant hurdles in accurately locating nodules, precisely outlining their boundaries, and reliably classifying them as benign or malignant using deep learning methods. We comprehensively gathered information from the datasets used, preprocessing procedures, data augmentation techniques, architectural designs, and the reported performance metrics. Our analysis encompassed state-of-the-art deep learning approaches like 2D and 3D CNNs, autoencoders, and novel, fast-growing, and promising approaches, such as transformers. Furthermore, we assessed the credibility of each study by examining whether the authors presented lucid and comprehensive explanations of their methodologies and adhered to machine learning best practices. Through these efforts, we provide a current and in-depth viewpoint on this dynamic and rapidly expanding field of study.
The modeling approaches can be categorized into two high-level categories: single-view and multi-view. Single-view approaches utilize 2D images from CT scans to identify and analyze nodules, offering computational efficiency and simpler implementation. However, these methods often lack the depth information necessary for precise nodule analysis, leading to inaccuracies. In contrast, multi-view approaches, particularly those involving 3D imaging, leverage the volumetric data of CT scans for a more comprehensive analysis. This method allows for better spatial contextualization and improved accuracy in detection and segmentation. The advantages of 2D imaging include lower computational costs, simpler algorithms, and faster processing times, while its disadvantages include limited spatial information, higher false positives and negatives, and less accurate segmentation. On the other hand, 3D imaging offers enhanced spatial context and improved accuracy in nodule detection and segmentation, although it requires higher computational power, longer processing times, and more complex algorithms.
When comparing studies using 2D, 3D, and multi-view methods, a clear trade-off emerges between computational efficiency and detection accuracy. 2D methods, such as those used in [55], provide a quicker, less resource-intensive solution but at the cost of potentially missing critical volumetric information. Multi-view methods, like those employed by [100], enhance accuracy by integrating information from multiple perspectives, balancing the trade-offs between 2D and 3D approaches. In contrast, 3D methods, such as those in [137], offer higher accuracy by leveraging complete spatial data, although they are significantly more demanding in terms of computational power and data requirements.
Researchers must consider the specific needs of their applications when selecting a method. For example, in settings where rapid screening is essential and computational resources are limited, 2D methods may be preferable. Multi-view methods can be a suitable middle ground, offering improved accuracy without the full computational demands of 3D approaches. Conversely, for detailed diagnostic tasks where accuracy is paramount, and resources are available, 3D methods provide a superior solution.
In our review, we provide a comprehensive comparison of the accuracy of various models (Table A2, Table A4, and Table A6). It is crucial to acknowledge that the reported accuracy of these models is indeed heavily influenced by the datasets on which they were trained and evaluated. The vast majority of studies in this field utilize the LIDC-IDRI dataset or its curated subset, the LUNA16 challenge dataset. This widespread use of standardized datasets allows for more meaningful comparisons between different models, as the variability in dataset quality and characteristics is minimized.
The choice of dataset impacts not only the reported accuracy but also the generalizability of the models. However, given that LIDC-IDRI and LUNA16 are the most commonly used datasets, summarizing the accuracy across different models in the context of these datasets remains valid and informative. LIDC-IDRI and LUNA16 have become benchmark datasets for pulmonary nodule detection and classification, allowing for consistent comparisons across studies. The use of these datasets ensures that differences in model performance are more likely attributable to the models themselves than to variations in the data.
Moreover, these datasets include a wide range of nodule sizes, types, and locations, which are representative of cases encountered in clinical practice. Therefore, the models trained and tested on these datasets are likely to perform well in real-world scenarios. The widespread adoption of these datasets in recent influential studies has established a precedent that future research typically follows, reinforcing their role as standard references for model performance.
While other datasets [31,32,33] are occasionally used in the literature, their impact is typically less significant due to their smaller size, limited availability, or lack of comprehensive annotation. We briefly mentioned these alternative datasets where relevant, but they did not detract from the overall findings derived from the LIDC-IDRI and LUNA16 datasets.

5.1. Preprocessing

Data preprocessing is a critical step in pulmonary nodule detection, segmentation, and classification, as it significantly impacts the performance of deep learning models. Techniques such as conversion to Hounsfield units (HU) and thresholding standardize radiodensity values, facilitate the removal of unnecessary substances, and ensure consistency across CT scans. HU conversion is considered the best practice due to its ability to provide a standardized scale for radiodensity, which is essential for differentiating between various tissue types.
Resampling for isotropy is another crucial technique, especially important for 3D convolutional neural networks (CNNs). It ensures uniform voxel dimensions, aligning the spacing between slices to a consistent 1 mm × 1 mm × 1 mm resolution, which is vital for accurate 3D analysis. This uniformity allows the model to effectively process volumetric data and capture the full spatial context of nodules.
Lung segmentation reduces the problem space by isolating the lungs from other anatomical structures, thereby focusing the analysis on relevant areas and improving the model performance. This step is particularly important for reducing the influence of surrounding tissues and artifacts; however, it requires precise algorithms to avoid segmentation errors that could exclude relevant nodule areas or include irrelevant regions.
Normalization scales features to a similar range, ensuring that no single feature dominates the learning process, and helps models converge faster and perform better. This technique, alongside zero centering, which improves numerical stability by centering the data around zero, is considered the best practice for enhancing the learning process and reducing input biases.
Patch extraction allows models to focus on smaller relevant regions of an image, which is essential for both 2D and 3D architectures. For instance, works employing 3D architectures need 3D patches to leverage the full volumetric information from CT scans. This enhances the detection, segmentation, and classification accuracy by providing a comprehensive view of the nodules from multiple perspectives; however, it demands precise annotation based on the radiologist input.
Data augmentation addresses the challenges of overfitting and class imbalance by generating synthetic samples and increasing the diversity of the training data. This step is crucial for small or imbalanced datasets to improve the model robustness and generalization. However, careful implementation is required to ensure that the augmented data are realistic and representative, avoiding the introduction of noise or distortions that could affect model performance.
Each of these preprocessing steps plays a vital role in optimizing the learning process, ensuring that the models can accurately and efficiently analyze pulmonary nodules. It is ultimately the author’s judgment which of these techniques need to be applied for their specific implementation, but adhering to good practices such as conversion to Hounsfield units (HU) and resampling for isotropy is advisable across all work to maintain data consistency.

5.2. Nodule Detection

Pulmonary nodule detection works have introduced significant advancements through the application of various deep-learning methods. Each method has unique strengths and limitations, reflecting the diverse approaches explored in the literature.
Two-dimensional convolutional neural networks (2D CNNs) have been used for their simplicity and efficiency. For instance, Zuo et al. [64] employed a multi-resolution CNN using the LUNA16 dataset to extract features at various resolutions, achieving a sensitivity of 97.26% and a specificity of 97.38%. Despite these high accuracy rates, the method’s computational demand and relatively low CPM score of 0.742 highlight the trade-off between resolution and computational efficiency. In another study, Wang et al. [60] introduced a simple yet effective approach by dividing raw CT images into patches and using them as inputs for a 2D CNN. This method achieved a sensitivity of 92.8% but with a high false positive rate of eight FPs per scan, indicating a need for improved specificity.
Three-dimensional convolutional neural networks (3D CNNs) have shown promise due to their ability to process volumetric data, capturing spatial information essential for nodule detection. Ding et al. [54] introduced a deconvolutional structure into Faster R-CNN for candidate nodule detection, followed by a 3D CNN for false positive reduction, using the LUNA16 dataset. This dual-stage approach achieved a high sensitivity of 94.4% with four false positives per scan, demonstrating the benefit of utilizing 3D spatial information to enhance detection accuracy. Similarly, Zheng et al. [59] combined multiple 2D maximum intensity projection (MIP) projections with 3D CNN classification on the LUNA16 dataset, achieving a sensitivity of 95.4% and a CPM score of 0.952. This approach effectively leverages the advantages of both 2D and 3D techniques, enhancing the overall accuracy.
Hybrid methods that combine deep learning with traditional machine learning techniques have also been explored. Nasrullah et al. [61] used a 3D Faster R-CNN in conjunction with a U-Net-like encoder–decoder and a Gradient Boosting Machine (GBM) for nodule classification on the LIDC-IDRI dataset. Their system achieved a sensitivity of 98% and specificity of 94.35%, highlighting the strength of combining deep learning models with traditional machine learning techniques to handle various nodule sizes and types. Nguyen et al. [72] introduced a Faster R-CNN model with an adaptive anchor box and a residual CNN for false positive reduction using the LUNA16 dataset. Their system achieved a high sensitivity of 95.64% and a CPM score of 88.2%, emphasizing the importance of adaptive mechanisms to handle varying nodule sizes.
Multiscale feature extraction has been particularly effective in improving the detection performance. Gu et al. [56] utilized a 3D CNN with a multiscale prediction on the LUNA16 dataset, achieving a sensitivity of 92.93% with four false positives per scan. This approach effectively captures contextual information from 3D nodule samples, demonstrating the benefits of multiscale feature extraction.
Innovative approaches have also emerged that leverage novel architectures to achieve high performance with fewer computational resources. Mkindu et al. [81] introduced a 3D multiscale vision transformer using the LUNA16 dataset, achieving a sensitivity of 97.81% and a CPM score of 0.911. This method leverages the strengths of transformers in handling hierarchical resolution features, thereby demonstrating its potential to improve detection accuracy. Peng et al. [68] developed a 3D multiscale deep CNN with Bottle2SEneck modules, achieving an FROC average sensitivity of 0.923 on the LUNA16 dataset. The integration of multiscale features in three-dimensional data improved the model’s capability to detect pulmonary nodules across various scales.
In addition, some studies have explored alternative datasets to further validate the generalizability of their methods. For example, Song et al. [84] introduced a novel architecture combining 3D Convolutional Neural Networks (CNN) with Capsule Networks (CapsNet) using the ELCAP dataset. This approach leverages convolution kernels of varying scales to extract richer contextual information from lung nodules of different sizes. The method achieved a sensitivity of 92.31%, demonstrating its effectiveness in detecting pulmonary nodules under different imaging conditions provided by the ELCAP dataset.

5.3. Nodule Segmentation

Pulmonary nodule segmentation is a critical task in lung cancer screening, and various deep-learning methods have been developed to enhance the accuracy and efficiency of this process. Each method offers distinct advantages and disadvantages that reflect the diverse strategies employed in the literature.
Two-dimensional convolutional neural networks (2D CNNs) have been used for their efficiency and simplicity. Singadkar et al. [96] presented a deep residual deconvolutional network for lung nodule segmentation, achieving a remarkable DICE score of 94.97% on the LIDC-IDRI dataset. This method incorporates multi-level contextual information, which significantly improves segmentation performance. However, the lack of cross-validation and detailed preprocessing steps in their study raises concerns regarding the generalizability of the results. Wang et al. [93] developed Central Focused Convolutional Neural Networks (CF-CNN) to effectively segment lung nodules from heterogeneous CT images. Their approach achieved average DICE scores of 82.15% on the LIDC-IDRI dataset and 80.02% on an independent private dataset. The ability to capture nodule-sensitive features from both 3D and 2D CT images simultaneously was a key strength, although the drop in performance on the private dataset indicates potential issues with model generalizability.
Three-dimensional convolutional neural networks (3D CNNs) have shown great promise due to their ability to process volumetric data. Tang et al. [62] proposed a unified model that performed nodule detection, false positive reduction, and segmentation jointly using the LIDC-IDRI dataset. Their approach achieved a state-of-the-art DSC of 83.10%, emphasizing the robustness of combining multiple tasks into a single model. However, the complexity of implementing such a unified model is a limitation. Similarly, Dutande et al. [100] employed a 2D–3D cascaded CNN for comprehensive lung nodule analysis on the LIDC-IDRI dataset, integrating segmentation, detection, and classification. Their method demonstrated a sensitivity of 90% and a DICE coefficient of 80%, demonstrating the potential of combining multiple tasks within a unified framework. However, these approaches need extensive computational resources to process 3D data.
U-Net-based approaches have been extensively explored for lung nodule segmentation. Usman et al. [112], using the LIDC-IDRI dataset, proposed a semi-automated 3D segmentation method that employed a two-stage process. In the first stage, a 2D region of interest (ROI) was used for patch-wise exploration along the axial axis, followed by further exploration along the coronal and sagittal axes using Residual U-Nets. This method achieved promising DICE scores of 85.29% in axial, 84.76% in coronal, and 83.58% in sagittal views, with an average of 87.5%. The use of multi-view and multiscale features significantly improves segmentation performance. Pezzano et al. [99] used a U-net-based network on the LUNA16 dataset to learn the context of nodules through two masks representing background and secondary-important elements in CT scans. Their method yielded an IoU score of 76.6%, which is comparable to human performance. This approach effectively captured the nodule area by subtracting the masks, although the reliance on background and secondary-important elements can be seen as a limitation.
Innovative approaches that leverages novel architectures have also emerged. CSE-GAN, introduced by Tyagi et al. [109], is a 3D conditional generative adversarial network tailored for lung nodule segmentation using the LUNA16 dataset. This method achieved DICE coefficients of 80.74% on the LUNA test set and 76.36% on a local dataset, indicating high accuracy in segmenting lung nodules. The integration of a concurrent spatial and channel squeeze and excitation module within both the generator and discriminator enhanced the segmentation performance. However, the complexity of implementing GANs and the need for substantial computational resources are limitations. Wang et al. [111] introduced DPBET, a Cascade-Axial-Prune transformer model for lung nodule segmentation, using the LIDC-IDRI dataset. Their model achieved a DSC of 89.86% and an average sensitivity of 90.50%, highlighting the effectiveness of incorporating a hybrid CNN-Transformer architecture and innovative boundary enhancement strategies. The ability to capture both local details and global semantic representations was a key strength, although the model’s complexity and computational demands can be challenging.

5.4. Nodule Classification

Pulmonary nodule classification is a crucial step in lung cancer diagnosis and involves differentiation between benign and malignant nodules. Various deep-learning methods have been developed to enhance the accuracy and efficiency of this task, each with distinct advantages and limitations.
Convolutional neural networks (CNNs) have been widely used for their effectiveness in image classification tasks. A high-performing example is the work by Naeem Abid et al. [149], who developed a deep learning model that integrates a 3D convolutional neural network (3D CNN) with a novel attention mechanism specifically designed to enhance feature representation for nodule classification. Their approach, tested on the LUNA16 dataset, achieved a remarkable sensitivity of 98.80% and specificity of 97.45%, significantly improving the model’s ability to accurately classify nodules as benign or malignant. The attention mechanism in their model allowed for more precise localization of important features, leading to enhanced classification performance without a substantial increase in computational demands.
Shrey et al. [66] proposed a cascaded network for segmentation and classification using the LUNA16 dataset, achieving precision and recall rates of 98%. This method consists of a U-Net segmentation network followed by an encoder for classification. By integrating segmentation and classification tasks, the model achieved high-performance metrics, demonstrating the effectiveness of this combined approach. However, the reliance on precise segmentation quality means that any errors in the segmentation stage can negatively impact the classification performance.
The integration of clinical and imaging data has shown significant promise. Tong et al. [67] used a 34-layer 3D-ResNet combined with patient clinical data and achieved an accuracy of 91.29% on the LIDC-IDRI dataset. This approach highlights the advantage of incorporating clinical data, such as age, smoking history, and family history, to enhance context and improve classification accuracy. By combining heterogeneous features, the model can provide a more comprehensive analysis of nodules. However, the need for high-quality and complete clinical data can be a limitation, as incomplete or inaccurate clinical data can reduce the model’s effectiveness.
Transfer learning and multiscale feature extraction have also been explored to improve classification performance. Cao et al. [83] utilized a three-dimensional multifaceted attention encoder–decoder network, combining self-attention modules with multiscale features to classify nodules on the LUNA16 dataset. Their approach achieved a high DICE score of 89.1% at seven predefined false positives per scan, highlighting the effectiveness of multiscale feature extraction in capturing detailed nodule characteristics. The use of transfer learning also allows the model to leverage pre-trained networks, improving the performance with limited training data. However, the complexity of implementing multiscale feature extraction requires extensive computational resources.
Innovative approaches that leverage novel architectures have emerged, offering promising advancements. Mkindu et al. [81] introduced a 3D multiscale vision transformer for lung nodule detection on the LUNA16 dataset, achieving a sensitivity of 97.81% and a CPM score of 0.911. This method leverages the strengths of transformers in handling hierarchical resolution features, thereby demonstrating its potential to improve classification accuracy. Transformers excel at capturing long-range dependencies and contextual information, making them suitable for complex classification tasks. However, their high computational demands and the need for substantial training data can be challenging.
Autoencoders have also been explored for their feature extraction capabilities. Mao et al. [133] proposed a novel model for lung nodule image feature representation using deep autoencoders on the LIDC-IDRI dataset, achieving a CPM score of 0.939. The use of autoencoders allows unsupervised learning of feature representations, which can be advantageous when labeled data are scarce. However, the effectiveness of autoencoders relies heavily on the quality of the learned features, and their performance can vary significantly with different data distributions [171].

5.5. Radiologist vs. AI

Several noteworthy studies have compared the performance of deep learning models with that of radiologists, highlighting the potential of artificial intelligence (AI) to match or even surpass human expertise in certain aspects of lung nodule analysis. Y. Gu [56] and S. Tang [63] utilized prediction probability maps to visualize model predictions, demonstrating that their AI models could achieve performance comparable to radiologists in detecting pulmonary nodules. H. Eun et al. [55] and J. Zhang, Xia, Zeng, et al. [57] provided feature map visualizations, showcasing the areas of interest identified by the models, which aligned closely with radiologists’ assessments.
J.L. Causey et al. [129] employed both feature maps and probability/attention maps to offer a detailed comparison, with their models achieving similar or improved diagnostic accuracy compared to radiologists, while R. Dey [130] highlighted the use of prediction probability maps, further supporting the potential of AI in clinical settings.
In studies by S. Wang [93], R. Roy [95], M. Usman [112], and G. Pezzano [99], the output masks generated by the models were compared directly to radiologists’ annotations. These studies found that AI models could provide consistent and accurate segmentations, often matching the precision of human experts. H. Liu [94] and H. Cao [98] also incorporated probability maps, emphasizing the AI’s ability to highlight suspicious regions with high accuracy.
These studies collectively demonstrate that deep learning models, through advanced visualization techniques and rigorous performance comparisons, have shown significant promise in achieving parity with radiologists in the detection, segmentation, and classification of pulmonary nodules. This underscores the potential of AI to serve as a valuable tool in lung cancer screening, enhancing diagnostic accuracy and efficiency in clinical practice.

5.6. Future Extensions and Research Directions

Future research on CADx systems for lung cancer screening should address several key areas to enhance accuracy and reliability. First, external validation on diverse datasets is crucial to ensure the generalizability of the models beyond their initial training environments. Enhancing datasets, both in size and quality, will help capture a broader spectrum of nodule types and reduce the risk of false identification, as noted by Zuo et al. [64]. Incorporating more preprocessing steps can aid in improving the classification accuracy by preparing the data more effectively for analysis. Zhang et al. [57] emphasized the importance of improving the detection of ground-glass opacity (GGO) and integrating clinical records into the nodule detection process, which could provide additional context and enhance diagnostic accuracy. Moving from 2D CNN methods to 3D CNN models, as suggested by Zuo et al. [64], can better capture contextual information between slices, thereby addressing the limitations of 2D approaches.
Furthermore, Dey et al. [130] highlighted the need to understand and visualize the features extracted by networks to ensure that they align with the diagnostic criteria used by radiologists. This could improve model interpretability and clinical trust. Automatic pulmonary nodule detection, which reduces the dependence on manual annotations, is another promising direction for future research. Xie et al. [128] proposed a semi-supervised learning framework to utilize nodules with uncertain malignancy levels and unlabeled nodules as training samples, making the model training process more efficient.
Additionally, future studies could compare new methods with solutions based on hand-engineered features, as recommended by Nibali et al. [125], to benchmark improvements and innovations accurately. Lastly, as Pinheiro suggested, applying these approaches to other types of tumors could help generalize the results and broaden the applicability of CADx systems in oncology.
Therefore, reducing false positives without compromising sensitivity is a crucial area of focus, as achieving a balance between accurately detecting malignant nodules and minimizing false alarms is essential for improving the effectiveness and efficiency of lung cancer screening programs.
The research direction in lung cancer screening using deep learning methods is poised toward several key advancements. Advanced data augmentation techniques, such as Generative Adversarial Networks (GANs), are being explored to generate synthetic data, thereby addressing the issue of data scarcity and enhancing training datasets. There is also a significant focus on improving the explainability of models to ensure that they provide interpretable results, which can enhance clinician trust and facilitate smoother integration into clinical workflows. Additionally, integrating imaging data with other clinical information, such as patient history and biomarkers, is seen as a crucial step toward improving diagnostic accuracy and patient stratification. These directions aim to create more robust, reliable, and comprehensive diagnostic tools that can be seamlessly integrated into clinical practice and improve patient outcomes.

5.7. Practical Implications

The deployment of deep learning methods for pulmonary nodule detection, segmentation, and classification in real-world clinical settings has immense potential to revolutionize lung cancer diagnosis and treatment by assisting medical experts with their daily workflows. However, integrating these advanced algorithms into clinical workflows presents several practical challenges. A significant issue is that many 2D methods require highly curated inputs, demanding that doctors manually select the most relevant slices from the CT scans. This process can be time-consuming and introduce variability, potentially delaying diagnosis and creating a bottleneck in the clinical setting. Additionally, while 3D methods offer more robust performance by leveraging full volumetric data, they require substantial computational resources, which might not be readily available in all healthcare environments. To ensure that these technologies enhance patient care without disrupting routine operations, these practical challenges must be carefully managed because the computational demands of sophisticated models like 3D CNNs and transformers require robust infrastructure and can be a barrier in resource-limited settings. Additionally, the need for extensive annotated data to train these models poses another significant hurdle, as acquiring high-quality labeled datasets is often labor-intensive and costly. Despite these challenges, its potential impact on lung cancer diagnosis is profound. Deep learning models can enhance early detection rates, improve the accuracy of diagnoses, and assist in personalized treatment planning by providing precise and consistent assessments of nodules. By reducing reliance on subjective interpretation, these technologies can lead to more standardized care and potentially better patient outcomes. As research progresses, it will be crucial to address these practical challenges, ensuring that the benefits of deep learning can be fully realized in clinical practice.
The importance of interpretability and explainability in medical applications, particularly in the context of deep learning models for lung cancer screening, cannot be overstated. In healthcare, where decisions directly impact patient outcomes, understanding the reasoning behind an algorithm’s predictions is crucial for gaining the trust of healthcare professionals and ensuring the responsible integration of artificial intelligence into clinical practice. Interpretability refers to the ability to understand the internal mechanisms and decision-making processes of a model, while explainability involves providing clear and understandable reasons for a model’s predictions. In the realm of medical applications, these aspects are paramount for several reasons. Firstly, interpretability and explainability contribute to the validation and assessment of the reliability of deep learning models. Healthcare professionals need to be confident in the accuracy and relevance of the predictions made by these models, especially when dealing with critical decisions, such as disease diagnosis and treatment planning. Secondly, these aspects are essential for fostering collaboration between machine learning systems and human practitioners. Clinicians must comprehend how a model arrives at a particular diagnosis or recommendation to make informed decisions and provide the best possible care to their patients. This collaborative approach, in which AI systems augment human expertise, can lead to more accurate and timely diagnoses.
Additionally, in the context of regulatory compliance and ethical considerations, interpretability and explainability are crucial. Regulatory bodies often require transparency in the functioning of medical AI systems, and patients have the right to understand the basis of decisions affecting their health. Transparent models not only facilitate regulatory approval but also contribute to building public trust in the application of AI in healthcare. In the specific domain of lung cancer screening, where the consequences of misdiagnosis can be severe, having interpretable and explainable models ensures that clinicians can confidently rely on AI systems as valuable tools, ultimately improving patient outcomes. Therefore, as the field progresses, emphasis on interpretability and explainability remains integral to the responsible deployment of deep learning models in medical applications.

6. Conclusions

This literature review explored the three main challenges of lung cancer screening, and the reviewed studies underscored the significant advancements made in lung cancer screening through deep learning methods, illustrating a broad spectrum of methodologies.
For pulmonary nodule detection, 3D convolutional neural networks (3D CNNs) generally achieve higher sensitivity and specificity due to their ability to capture volumetric information, although they are computationally intensive. Hybrid approaches that combine deep learning with traditional machine learning techniques enhance performance by leveraging the strengths of both paradigms. Multiscale feature extraction methods improve accuracy by capturing comprehensive contextual information, although they may increase computational demands. Novel architectures, such as transformers, offer promising advancements by balancing accuracy with computational efficiency.
In the domain of pulmonary nodule segmentation, 3D CNNs, and U-net-based approaches generally achieve higher accuracy due to their ability to process volumetric data and capture multiscale features, albeit at the cost of increased computational resources. Methods such as those proposed by Singadkar et al. [96] and Wang et al. [93] demonstrate the potential of 2D CNNs for efficiently segmenting nodules, although they may lack volumetric information. Innovative approaches, such as CSE-GAN and DPBET, offer promising advancements in segmentation accuracy by leveraging novel architectures and strategies. However, the complexity and computational demands of these methods highlight the need for further research to balance accuracy with efficiency. The continuous evolution of deep learning techniques holds significant promise for improving pulmonary nodule segmentation in lung cancer screening.
For pulmonary nodule classification, CNNs and their variants, such as Capsule Networks and ResNet, have demonstrated high accuracy and robustness. The integration of clinical data with imaging data provides a more comprehensive approach, although it requires high-quality data. Transfer learning and multiscale feature extraction offer significant improvements in performance but come with increased complexity and computational demands. Novel architectures, such as transformers and autoencoders, present promising advancements by leveraging their unique strengths in handling complex data. However, their implementation can be challenging due to high computational requirements and the need for substantial training data. The continuous evolution of deep learning techniques holds significant promise for improving pulmonary nodule classification in lung cancer screening.
In conclusion, the continuous evolution of deep learning techniques across detection, segmentation, and classification tasks in pulmonary nodule analysis presents significant opportunities for advancements in lung cancer screening. Future research should focus on optimizing the balance between accuracy and computational efficiency, integrating diverse data types, and developing robust models that can generalize across different datasets and clinical environments. These efforts will be crucial for enhancing early detection and diagnosis, ultimately improving patient outcomes in the fight against lung cancer.

Author Contributions

Conceptualization, I.M. and K.K.; methodology, I.M. and K.K.; validation, K.K. and G.P.; formal analysis, investigation, resources, I.M.; writing—original draft preparation, I.M.; writing—review and editing, K.K.; supervision, G.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Literature review tables are provided to detail the preprocessing and performance metrics associated with pulmonary nodule detection, segmentation, and classification. These tables systematically present the preprocessing and data augmentation methodologies used for data preparation, highlighting the techniques employed. Additionally, the performance tables for each challenge are reported.
Table A1. Nodule detection—Preprocessing steps per work.
Table A1. Nodule detection—Preprocessing steps per work.
ReferenceYearDatasetPreprocessingData AugmentationDeep Architecture
[54]2017LUNA16No, TH, Cr, 3D-PACr, Fl, DuFaster R-CNN, 3D DCNN
[55]2018LUNA16No, 2D-PATr, Ro, Fl2D CNNs, AE, DAE
[56]2018LUNA163D-PA, ReCr, Sc, 3D Fl, 3D Ro3D CNN
[35]2018LUNA163D-PA, Re, Rz, THTr, Ro, FlModified 3D U-Net
[57]2018LUNA16TH, Cr, 3D-PARo, Fl3D DCNN
[58]2019LIDC-IDRILS, Re, 2D-PA (Only juxta-pleural nodules)-2D CNN
[59]2019LUNA16TH, No, 2D-PATr, Ro, Fl2D U-net, 3D CNN
[60]2019LIDC-IDRICr, 2D-PARo, Tr, SC, ReCustom ResNet
[61]2019LIDC-IDRIRe, No, Cr, 2D-PATr, Ro3D Faster R-CNN and CMixNet with U-Net-like encoder–decoder architecture
[62]2019LUNA163D-PARo3D DCNN
[63]2020LIDC-IDRINo, 3D-PAFl3D U-net
[64]2020LUNA16, TIANCHI17Re, No, 3D-PA-CNN, TL
[65]2020LUNA16Cr, MV, 2D-PA-3D multiscale DCNN, AE, TL
[66]2020LUNA16, private setTH, No, Cr, 3D-PARo, Fl, TrU-net, AE, TL
[67]2021LIDC-IDRI, private data set2D slice-3D-ResNet and MKL
[68]2021LUNA16TH, Re, PSEGRo, Fl, Shift3DCNN
[69]2021Kaggle Data Science Bowl 2017 challenge (KDSB) and LUNA 16TH, No, Re, PSEG,3D-PARo, Fl, CrU-Net
[70]2021LUNA16PSEG, Nobalancing of lower target patches (thoselabeled as one) are equivalent to higher target patches (those labeled as zero)3DCNN
[71]2021LUNA16Re, PSEG, 3D-PASampling, Ro, TrMulti-path 3D CNN
[72]2021LUNA16TH, NoRo, Shift, Fl, ScFaster R-CNN with adaptive anchor box
[73]2021NLST (NLST, 2011), LHMC, KaggleTH, No, PSEGFl, Ro, Tr2D and 3D DNN
[74]2022LIDC-IDRI + Japan Chest CT Dataset3D-PAn/a3D unet
[75]2022LUNA16Re, TH, No, 3D-paRo, Fl3D sphere representation-based center-points matching detection network (SCPM-Net)
[76]2022LUNA16Re, No, 3D-PAn/aAtrous UNet+
[77]2022LUNA16TH, No3s-PARo, Fl3D U-shaped residual network
[78]2023LUNA163D-PACr, Fl, Zoom3D CNN
[79]2023LUNA16HU, No, 3D-PAn/a3D ResNet18 dual path Faster R-CNN and a federated learning algorithm
[80]2023LUNA16Re. TH, PSEG, 3D-PASc, Cr, Fl3D ViT
[81]2023LUNA16TH, 3D-PAn/a3D ViT
[82]2023LIDC-IDRITH, No, Masking gt segbalancing classes by up-sampling cancer cases2D Ensemble Transformer with Attention Modules
[83]2023LUNA16TH, HU, Masking gt, Non/a3Dl Multifaceted Attention Encoder–Decoder
[84]2023ELCAPn/aRo, Re, Cr3D CNN-CapsNet
[85]2023LUNA16TH, No, masking gt seg, Re, ScCr, Fl, ScA multiscale self-calibrated network (DEPMSCNet) with a dual attention mechanism
Table A2. Nodule detection—Performance per work.
Table A2. Nodule detection—Performance per work.
ReferenceDatasetDeep ArchitectureInput SizeSensitivitySpecificityPrecisionAUCAccuracyCPM
(FROC)
[54]LUNA16Faster R-CNN, 3D DCNN32 × 32 × 3, 36 × 36 × 2092.2/1 FP, 94.4/4 FPs----0.893
[55]LUNA162D CNNs, AE, DAE64 × 64-----0.922
[56]LUNA163D CNN32 × 32 × 3287.94/1 FP, 92.93/4 FPs----0.7967
[35]LUNA16Modified 3D U-Net64 × 64 × 6495.16/30.39 FPs--93.72-0.8135
[57]LUNA163D DCNN32 × 32 × 3294.9/1 FP - 0.947
[58]LIDC-IDRI2D CNN24 × 2488/1 FP,
94.01/4 FPs
--94.3--
[59]LUNA162D U-net, 3D CNN512 × 512 (CG),
16 × 16 × 16,
32 × 32 × 32
89.9/0.25 FP, 94.8/4 FPs----0.952
[60]LIDC-IDRICustom ResNet64 × 6492.8/8 FPs-----
[61]LIDC-IDRI3D Faster R-CNN and CMixNet with U-Net-like encoder–decoder architecture36 × 36 × 3693.97, 98.0089.83, 94.35--88.79, 94.17-
[62]LUNA163D DCNNFull CT - 0.8727
[63]LIDC-IDRI3D U-net40 × 40 × 2692.494.6-94.196.8-
[64]LUNA16, TIANCHI17CNN, TLN [26, 36, 48]
N × N
97.2697.38-99.5497.33,
92.81 (multi-res)
0.742
[65]LUNA163D multiscale DCNN, AE, TL32 × 32 × 3294.2/1 FP,
96/2 FPs
----0.9403
[66]LUNA16, private setU-net, AE, TL2D CT slice (512 × 512)98--95.6797.96-
[67]LIDC-IDRI, private data set3D-ResNet and MKL40 × 40 × 2091.0191.40--91.29
[68]LUNA163DCNN128 × 128 × 12887.2/22 FP----0.923
[69]Kaggle Data Science Bowl 2017 challenge (KDSB) and LUNA 16U-Net128 × 12889.187.4--87.8-
[70]LUNA163DCNN64 × 64 × 64, 32 × 32 × 32, 16 × 16 × 16-----0.948
[71]LUNA16Multi-path 3D CNN48 × 48 × 480.952/0.962 to 4, 8 FP/Scans----0.881
[72]LUNA16Faster R-CNN with adaptive anchor box64 × 64 FP 512 × 512 DET93.897.6-95.795.7-
[73]NLST (NLST, 2011), LHMC, Kaggle2D and 3D DNN32 × 32 × 32- 86/94
[74]LIDC-IDRI + Japan Chest CT Dataset3D unet64 × 96 × 96-----0.947/0.833
[75]LUNA163D sphere representation-based center-points matching detection network (SCPM-Net)96 × 96 × 9689.2/7 FP-----
[76]LUNA16Atrous UNet+3 × 64 × 64, 8 × 64 × 64, 16 × 64 × 6492.8-77.2--0.93
[77]LUNA163D U-shaped residual network96 × 96 × 9695----0.895
[78]LUNA163D CNN128 × 128 × 128 0.8808
[79]LUNA163D ResNet18 dual path Faster R-CNN and a federated learning algorithm128 × 128 × 12883.388-83.41288.38283.417-
[80]LUNA163D ViT128 × 128 × 12898.39- --0.909
[81]LUNA163D ViT64 × 64 × 64, 32 × 32 × 32, 16 × 16 × 1697.81- --0.911
[82]LIDC-IDRI2D Ensemble Transformer with Attention Modules512 × 51294.5897.10 98.9696.14-
[83]LUNA163Dl Multifaceted Attention Encoder–Decoder128 × 128 × 12889.1/7 FPs- --0.891
[84]ELCAP3D CNN-CapsNet32 × 32 × 8 92.3198.08 9595.19-
[85]LUNA16A multiscale self-calibrated network (DEPMSCNet)with a dual attention mechanism128 × 128 × 12898.80- --0.963
Table A3. Nodule segmentation—Preprocessing steps per work.
Table A3. Nodule segmentation—Preprocessing steps per work.
ReferenceYearDatasetPreprocessingData AugmentationDeep Architecture
[93]2017LIDC-IDRI, private set3D-PA, 2D-PA,n/aCentral Focused Convolutional Neural Networks (CF-CNN)
[94]2019LIDC-IDRI3D-PAn/aCascaded Dual-Pathway Residual Network
[95]2019LIDC-IDRI2D-PA, PSEG, Ren/aSegNet, a deep, fully convolutional network
[62]2019LIDC-IDRI3D-PAn/a3D DCNN
[96]2020LIDC-IDRI2D-PAn/aDeep residual deconvolutional network, TL
[97]2020LIDC-IDRI2D-PA, No Deep Residual U-Net
[98]2020LIDC-IDRI3D-PAn/aDB-ResNet, CF-CNN
[99]2020LIDC-IDRI2D-PARo, Zoom, PaddingU-net
[100]2021LIDC-IDRI, LNDb, ILCIDTH, PSEG, Maximum intensity projectionRo, Blur, No, Rand pixels to zero2D CNN
[101]2021LIDC-IDRI2D-PA, synthetic pseudo-color imageIntensive augmentationsU-Net
[102]2022LUNA16TH, No, 3D-PaRo, Transpose, Affine Transform, Fl, Br, ContrastV-net
[103]2021LIDC-IDRI, SHCHTH, Contrast ench, PSEG, Lesion Localization with Region GrowingFl, Ro, Cr, deformation2D–3D U-net
[104]2021LIDC-IDRI, LUNA16Cr, 2D-PA, UpscaleRo, Fl, elastic transformFaster R-CNN
[105]2021LIDC-IDRIPSEG, 2D-PARo, Fl, Sh, Zoom, CrU-net
[106]2021LIDC-IDRIPSEG, TH, Re, 3D-PAn/a3D res U-net
[107]2021LIDC-IDRI2D-PA, Ren/aVGG-SegNet
[108]2022hospital data3D-PARo, Mirroring3D FCN
[109]2022LUNA16, ILNDTH, 3D-PA, Sampling for balancepatch-based augmentation3D GAN
[110]2022LIDC-IDRITH, No, 3D-PARo, Sc, Fl3D Dual Attention Shadow Network (DAS-Net)
[111]2022LIDC-IDRI2D-PARo, Tr, FlTransformer
[112]2023LIDC-IDRIgrayscale thresholding, No, Ren/aDual-encoder-based CNN
[113]2023LIDC-IDRI, AHAMU-LCwindow selection, NoFlRAD—U-net
[114]2023LIDC-IDRI, private set2D-PACr, Sc, Br, Contrast, Sat, Random noiseSMR—U-net 2D
[115]2023LIDC-IDRI2D-PARo, random luminance, random gamma rays, Gaussian noise, hue/satU-shaped hybrid transformer
[116]2023LIDC-IDRI, LUNA16TH, No, Re3D-PAn/a3D U-net based
[117]2023LIDC-IDRImask generationn/aGUNet3++
Abbreviations: Rs: Resampling, Rz: Resize, PA: Patch (2D or 3D), Cr: Crop, No: Normalization, Ro: Rotate, Sc: Scale, Tr: Translate, TH: Threshold (HU), Fl: Flip, Du: Duplicate.
Table A4. Nodule segmentation—Performance per work.
Table A4. Nodule segmentation—Performance per work.
ReferenceDatasetDeep ArchitectureInput SizeInput ShapeDSC (%)IoU (%)Sensitivity (%)
[93]LIDC-IDRI, private setCentral Focused Convolutional Neural Networks (CF-CNN)572 × 572, 3 × 35 × 353D, 2D82.15 ± 10.76, LIDC
80.02 ± 11.09 Private set
--
[94]LIDC-IDRICascaded Dual-Pathway Residual Network65 × 65 × 32D, 3D (mask)81.58 ± 11.05--
[95]LIDC-IDRISegNet, a deep, fully convolutional network128 × 1282D93 ± 0.11--
[62]LIDC-IDRI3D DCNNn/a3D83.10 ± 8.8571.85 ± 10.48-
[96]LIDC-IDRIDeep residual deconvolutional network, TL512 × 5122D94.9788.68-
[97]LIDC-IDRIDeep Residual U-Net128 × 1283D87.5 ± 10.58--
[98]LIDC-IDRIDB-ResNet, CF-CNN3 × 35 × 353D82.74 ± 10.19--
[99]LIDC-IDRIU-net64 × 642D-76.6 ± 12.3-
[100]LIDC-IDRI, LNDb, ILCID2D CNN96 × 962D80-
[101]LIDC-IDRIU-Net256 × 2562D93.14-91.76
[102]LUNA16V-net96 × 96 × 163D95.0183
[103]LIDC-IDRI, SHCH2D–3D U-net2D–3D (3-slices)2D–3D83.16/81.97--
[104]LIDC-IDRI, LUNA16Faster R-CNN224 × 2242D89.79/90.3582.34/83.21-
[105]LIDC-IDRIU-net32 × 322D86.23--
[106]LIDC-IDRI3D res U-net48 × 192 × 1923D80.5-80.5
[107]LIDC-IDRIVGG-SegNet224 × 224 × 3 channels2D90.4982.64-
[108]hospital data3D FCN128 × 128 × 643D84.573.8
[109]LUNA16, ILND3D GAN64 × 64 × 323D80.74/76.36-85.46/82.56
[110]LIDC-IDRI3D Dual Attention Shadow Network (DAS-Net)16 × 128 × 1283D92.05-90.81
[111]LIDC-IDRITransformer64 × 642D89.86-90.50
[112]LIDC-IDRIDual-encoder-based CNN512 × 5122D87.91-90.84
[113]LIDC-IDRI, AHAMU-LCRAD—U-net512 × 5122D-87.76/88.13-
[114]LIDC-IDRI, private setSMR—U-net 2D128 × 1282D91.8786.88-
[115]LIDC-IDRIU-shaped hybrid transformer64 × 64, 96 × 96, 128 × 1282D91.84 92.66
[116]LIDC-IDRI, LUNA163D U-net based64 × 64 × 323D82.4870.8682.74
[117]LIDC-IDRIGUNet3++n/a2D97.2-97.7
Table A5. Nodule classification—Preprocessing steps per work.
Table A5. Nodule classification—Preprocessing steps per work.
ReferenceYearDatasetPreprocessingData AugmentationDeep Architecture
[125]2017LIDC-IDRIRs, 2D-PA, NoRo, Sc (only in the test set)ResNet
[126]2017LIDC-IDRI3D-PA, MVRo3D MV-CNN + SoftMax
[127]2017LIDC-IDRI2D-PA, Rs, RzRo, Sh, Fl, TrResNet-50, TL
[128]2018LIDC-IDRIRs, Rz, MV, Cr, 2D-PaTr, Ro, FlMV-KBC
[129]2018LIDC-IDRI3D-PA, QIF extractionRo, Sc, shifted up to 30%CNN + Random Forest
[130]2018LIDC-IDRI +
Private set
Rs, 3D-PA, No-3D DenseNet, TL
[131]2018LIDC-IDRIRs-CNN + PSO
[132]2018LUNA16Re, 3D-PATr, Ro3D DCNN
[133]2018ELCAP2D-PA, Cr, ReRo, Cr, perturbation (brightness, saturation, hue, and contrast)DAE
[134]2019LUNA162D-PARo, FlNovel 2D CNN
[135]2019LIDC-IDRINo, Cr, 2D-PARo, Sc, Gaussian BlurringNovel 2D CNN
[136]2019LIDC-IDRIRe, 2D-PARo, FlCNN, TL
[138]2020LIDC-IDRIRs, Thyes, but no infoMAN (modified AlexNet), TL
[139]2020LIDC-IDRI2D-PATr, Ro, Sc, GAN CNN, TL
[58]2020LIDC-IDRI, private dataset (FAH-GMU)Cr, 2D-PA-DTCNN, TL
[140]2020LIDC-IDRI, DeepLNDatasetNo, Cr, 3D-PATr, Fl3D CNN
[141]2020LIDC-IDRI, LUNGx Challenge databaseRe, No, 2D-PATr, Ro, Fl2D CNN, TL
[142]2020LIDC-IDRICr, 3D-PAAdjust sampling rateMRC-DNN
[143]2020LIDC-IDRIRe, No, Th, Cr, 3D-PASampling different slices from the same nodule to achieve a better class balancingCAE, TL
[144]2020LIDC-IDRI, LUNA16Re, No, Cr, 3D-PA, 2D-PATr, Ro, FlMulti-Task CNN
[145]2020LUNA16Cr, 2D-PAdown sampling the negative samples, RoFractalnet and CNN
[146]2020LIDC-IDRIReSc(zoom), FL, RoDCNN
[137]2020LIDC-IDRIFull CT-PA-multiscale 3D-CNN, CapsNets
[147]2021NLST, DLCST3D-PA, 2D-PA (9 views)n/a2D CNN 9 views, 3D CNN
[148]2021LUNA16/Kaggle DSB 2017 datasetraw data to pngRo, H-Fl, clip, blurryDense Convolutional Network (DenseNet)
[149]2021LIDC-IDRI/ELCAP2D-PA, 3D-PAn/a2D MV-CNN 3D MV-CNN
[150]2021LIDC-IDRI3D-PA, zero-paddingn/aCapsule networks (CapsNets)
[151]2021LIDC-IDRI3D-PA, padding, CrFl3D NAS method, CBAM module, A-Softmax loss, and ensemble strategy to learn efficient
[152]2021LIDC-IDRI2D-PAn/aDeep Convolutional Generative Adversarial Network (DC-GAN)/FF-VGG19
[153]2021LUNA16TH, 2D-PA, balancing samplesRo, FlBCNN [VGG16, VGG19] combination with and without SVM
[154]2021LUNA16n/an/a3D CNN
[155]2021pet-ct private,
LIDC-IDRI
n/aRo, Fl, Shift2d cnn
[156]2021LIDC-IDRI3D-PA, ReFl, Pad3D DPN _ attention mech
[157]2021LIDC-IDRI3D-PAn/a3D CNN + biomarkers
[158]2021LIDC-IDRIRe, 3D-PA, NoRo3D attention
[159]2022LIDC-IDRI and LUNGxRe, 3D-PA, TH, NoRoProCAN
[160]2022LIDC-IDRI2D-PARo, TrDCNN
[161]2022LIDC-IDRIRe, 2D-PARo, overlays on the axial, coronal, and sagittal slicesTransformers
[162]2022LIDC-IDRIinterpolation, THFl, RoCNN-based MTL model that incorporates multiple attention-based learning modules
[163]2022LIDC-IDRIRe, Rz, NoRoTransformers
[164]2022LUNA163D-PA, NoFl, Gaussian noise3D ResNet + attention
[165]2023LIDC-IDRI/TC-LND Dataset/CQUCH-LNDNo, 3D-PA, ScRo, FlSTLF-VA
[166]2023LIDC-IDRIRe, Non/aTransformer
[167]2023LIDC-IDRIRe, 2D-PA, ReFl, Brightness, contrast, ScF-LSTM-CNN
[168]2023privateTH,3D-PAn/aCAE
Abbreviations: Rs: Resampling, Rz: Resize, PA: Patch (2D or 3D), Cr: Crop, No: Normalization, Ro: Rotate, Sc: Scale, Tr: Translate, TH: Threshold (HU), Fl: Flip, Du: Duplicate.
Table A6. Nodule classification—Performance per work.
Table A6. Nodule classification—Performance per work.
ReferenceDatasetDeep ArchitectureInput SizeSensitivity/Recall (%)Specificity (%)Precision (%)AUC (%)Accuracy (%)
[125]LIDC-IDRIResNet64 × 6491.0788.64-94.5989.90
[126]LIDC-IDRI3D MV-CNN + SoftMaxN [40, 50, 60]
N × N × 6 slices
95.6093.94-99-
[127]LIDC-IDRIResNet-50, TL200 × 20091.4394.09-97.7893.40
[128]LIDC-IDRIMV-KBC224 × 22486.5294 97.5091.60
[129]LIDC-IDRICNN + Random ForestN × N × S
N [47, 21, 31], S [5, 3]
94.8094.30-98.494.60
[130]LIDC-IDRI +
Private set
3D DenseNet, TLN [50, 10]
S [5, 10]
N × N × S
90.4790.33-95.4890.40
[131]LIDC-IDRICNN + PSO28 × 2892.2098.64-95.597.62
[132]LUNA163D DCNN64 × 64 × 64, 48 × 48 × 4895.4/1 FP--0.910 (FROC)-
[133]ELCAPDAE180 × 180---0.939
(FROC)
-
[134]LUNA16Novel 2D CNN64 × 6496.097.3-98.297.2
[135]LIDC-IDRINovel 2D CNN32 × 3292.67--95.1492.57
[136]LIDC-IDRICNN, TL53 × 5391--9488
[138]LIDC-IDRIMAN (modified AlexNet) + SVM, TL32 × 32 × 32- 95.7091.60
[139]LIDC-IDRICNN227 × 22798.0995.63-99.597.27
[58]LIDC-IDRI, private dataset (FAH-GMU)DTCNN52 × 5293.493-93.493.9
[140]LIDC-IDRI, DeepLNDataset3D CNN64 × 6493.69/10095.15/100-94.994.57/100
[141]LIDC-IDRI, LUNGx Challenge databaseCNN, TLN [32, 48, 64] N × N × N85.5895.87-9492.65
[142]LIDC-IDRIMRC-DNN64 × 6497.19--99.196.69
[143]LIDC-IDRICAE, TL32 × 32 × 328195--90
[144]LIDC-IDRI, LUNA16Multi-Task CNN80 × 80 × 8084.8- 93.6-
[145]LUNA16Fractalnet and CNN64 × 6487.74, 84.0088.87, 96.80-0.955 (LIDC), 0.973 (LUNA)-
[146]LIDC-IDRIDCNN50 × 5097.5286.76-9894.06
[147]NLST, DLCST2D CNN 9 views, 3D CNN224 × 22490.6790.80--90.73
[137]LIDC-IDRImultiscale 3D-CNN, CapsNets80 × 80 × 3 (+10 px for the next 2 scales)94.9490-96.493.12
[148]LUNA16/Kaggle DSB 2017 datasetDense Convolutional Network (DenseNet)64 × 64, 64 × 64 × 64---93-
[149]LIDC-IDRI/ELCAP2D MV-CNN 3D MV-CNN80 × 80, 64 × 64, 48 × 48, 32 × 32 and 16 × 1698.299.45- 98.83
[150]LIDCCapsule networks (CapsNets)(20 × 20), (30 × 30) and (40 × 40)9897-9997
[151]LIDC-IDRI3D NAS method, CBAM module, A-Softmax loss, and ensemble strategy80 × 80 × 3 slices89.593.4-95.690.7
[152]LIDC-IDRIDeep Convolutional Generative Adversarial Network (DC-GAN)/FF-VGG1932 × 32 × 3285.3795.04--90.77
[153]LUNA16BCNN [VGG16, VGG19] combination with and without SVM32 × 3289.394.8-92.192.1
[154]LUNA163D CNN50 × 50---95.991.99
[155]pet-ct private,
LIDC-IDRI
2D CNN64 × 64 × 24094-879797.17
[156]LIDC-IDRI3D DPN _ attention mech32 × 3292.795.2-9494
[157]LIDC-IDRI3D CNN + biomarkers32 × 32 × 3291.3 (FP rate of 8.0%)---91.9
[158]LIDC-IDRI3D attention32 × 32 × 16 slices---86.74-
[159]LIDC-IDRI and LUNGxProCAN32 × 32 × 3292.36-92.5996.1792.81
[160]LIDC-IDRIDCNN32 × 32 × 32---98.0595.28
[161]LIDC-IDRITransformers52 × 5297.197.2 99.5697.8
[162]LIDC-IDRICNN-based MTL model that incorporates multiple attention-based learning modules32 × 32---96.2892.92
[163]LIDC-IDRITransformers64 × 6496.282.997.895.994.7
[164]LUNA163D ResNet + attention32 × 3294.4 95.998.596.1
[165]LIDC-IDRI/TC-LND Dataset/CQUCH-LNDSTLF-VA32 × 32 × 3289.1093.3991.5991.2591.25
[166]LIDC-IDRITransformer64 × 64 × 3291.6293.0892.9997.1792.36
[167]LIDC-IDRIF-LSTM-CNN80 × 80 × 6087.6995.38-97.4092.82
[168]privateCAE224 × 224 × 310093.7-99.595.5

References

  1. Siegel, R.L.; Miller, K.D.; Jemal, A. Cancer Statistics, 2020. CA Cancer J. Clin. 2020, 70, 7–30. [Google Scholar] [CrossRef] [PubMed]
  2. The American Cancer Society Medical and Editorial Content Team. Key Statistics for Lung Cancer. Available online: https://www.cancer.org/cancer/types/lung-cancer/about/key-statistics.html (accessed on 21 September 2023).
  3. Ferlay, J.; Ervik, M.; Lam, F.; Colombet, M.; Mery, L.; Piñeros, M.; Znaor, A.; Soerjomataram, I.; Bray, F. Global Cancer Observatory: Cancer Today; International Agency for Research on Cancer: Lyon, France, 2020. [Google Scholar]
  4. World Health Organization. Cancer. Available online: https://www.who.int/news-room/fact-sheets/detail/cancer (accessed on 21 September 2023).
  5. The American Cancer Society Medical and Editorial Content Team. Lung Cancer Early Detection, Diagnosis, and Staging. Available online: https://www.cancer.org/content/dam/CRC/PDF/Public/8705.00.pdf (accessed on 21 September 2023).
  6. Ellis, P.M.; Vandermeer, R. Delays in the diagnosis of lung cancer. J. Thorac. Dis. 2011, 3, 183–188. [Google Scholar] [CrossRef] [PubMed]
  7. Johns Hopkins Medicine. Lung Biopsy. Available online: https://www.hopkinsmedicine.org/health/treatment-tests-and-therapies/lung-biopsy (accessed on 21 September 2023).
  8. Mahmoud, N.; Vashisht, R.; Sanghavi, D.K.; Kalanjeri, S. Bronchoscopy. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2024. [Google Scholar] [PubMed]
  9. Sigmon, D.F.; Fatima, S. Fine Needle Aspiration. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2024. [Google Scholar] [PubMed]
  10. Mehrotra, M.; D’Cruz, J.R.; Bishop, M.A.; Arthur, M.E. Video-Assisted Thoracoscopy. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2024. [Google Scholar] [PubMed]
  11. McNally, P.A.; Sharma, S.; Arthur, M.E. Mediastinoscopy. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2024. [Google Scholar] [PubMed]
  12. Kim, J.; Kim, K.H. Role of chest radiographs in early lung cancer detection. Transl. Lung Cancer Res. 2020, 9, 522–531. [Google Scholar] [CrossRef] [PubMed]
  13. Winkler, M.H.; Touw, H.R.; van de Ven, P.M.; Twisk, J.; Tuinman, P.R. Diagnostic Accuracy of Chest Radiograph, and When Concomitantly Studied Lung Ultrasound, in Critically Ill Patients with Respiratory Symptoms: A Systematic Review and Meta-Analysis. Crit. Care Med. 2018, 46, e707–e714. [Google Scholar] [CrossRef]
  14. Tylski, E.; Goyal, M. Low Dose CT for Lung Cancer Screening: The Background, the Guidelines, and a Tailored Approach to Patient Care. Mo. Med. 2019, 116, 414–419. [Google Scholar]
  15. Vonder, M.; Dorrius, M.D.; Vliegenthart, R. Latest CT technologies in lung cancer screening: Protocols and radiation dose reduction. Transl. Lung Cancer Res. 2021, 10, 1154–1164. [Google Scholar] [CrossRef]
  16. Rubin, K.H.; Haastrup, P.F.; Nicolaisen, A.; Möller, S.; Wehberg, S.; Rasmussen, S.; Balasubramaniam, K.; Søndergaard, J.; Jarbøl, D.E. Developing and Validating a Lung Cancer Risk Prediction Model: A Nationwide Population-Based Study. Cancers 2023, 15, 487. [Google Scholar] [CrossRef]
  17. Yu, K.-H.; Lee, T.-L.M.; Yen, M.-H.; Kou, S.C.; Rosen, B.; Chiang, J.-H.; Kohane, I.S. Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation. J. Med. Internet Res. 2020, 22, e16709. [Google Scholar] [CrossRef]
  18. Cellina, M.; Cacioppa, L.M.; Cè, M.; Chiarpenello, V.; Costa, M.; Vincenzo, Z.; Pais, D.; Bausano, M.V.; Rossini, N.; Bruno, A.; et al. Artificial Intelligence in Lung Cancer Screening: The Future Is Now. Cancers 2023, 15, 4344. [Google Scholar] [CrossRef]
  19. Zhang, J.; Xia, Y.; Cui, H.; Zhang, Y. Pulmonary nodule detection in medical images: A survey. Biomed. Signal Process. Control 2018, 43, 138–147. [Google Scholar] [CrossRef]
  20. Gu, Y.; Chi, J.; Liu, J.; Yang, L.; Zhang, B.; Yu, D.; Zhao, Y.; Lu, X. A survey of computer-aided diagnosis of lung nodules from CT scans using deep learning. Comput. Biol. Med. 2021, 137, 104806. [Google Scholar] [CrossRef] [PubMed]
  21. Thanoon, M.A.; Zulkifley, M.A.; Zainuri, M.A.A.M.; Abdani, S.R. A Review of Deep Learning Techniques for Lung Cancer Screening and Diagnosis Based on CT Images. Diagnostics 2023, 13, 2617. [Google Scholar] [CrossRef] [PubMed]
  22. Zhang, G.; Jiang, S.; Yang, Z.; Gong, L.; Ma, X.; Zhou, Z.; Bao, C.; Liu, Q. Automatic nodule detection for lung cancer in CT images: A review. Comput. Biol. Med. 2018, 103, 287–300. [Google Scholar] [CrossRef]
  23. Liu, B.; Chi, W.; Li, X.; Li, P.; Liang, W.; Liu, H.; Wang, W.; He, J. Evolving the pulmonary nodules diagnosis from classical approaches to deep learning-aided decision support: Three decades’ development course and future prospect. J. Cancer Res. Clin. Oncol. 2020, 146, 153–185. [Google Scholar] [CrossRef]
  24. Halder, A.; Dey, D.; Sadhu, A.K. Lung Nodule Detection from Feature Engineering to Deep Learning in Thoracic CT Images: A Comprehensive Review. J. Digit. Imaging 2020, 33, 655–677. [Google Scholar] [CrossRef]
  25. Gu, D.; Liu, G.; Xue, Z. On the performance of lung nodule detection, segmentation and classification. Comput. Med. Imaging Graph. 2021, 89, 101886. [Google Scholar] [CrossRef]
  26. Li, R.; Xiao, C.; Huang, Y.; Hassan, H.; Huang, B. Deep Learning Applications in Computed Tomography Images for Pulmonary Nodule Detection and Diagnosis: A Review. Diagnostics 2022, 12, 298. [Google Scholar] [CrossRef] [PubMed]
  27. Silva, F.; Pereira, T.; Neves, I.; Morgado, J.; Freitas, C.; Malafaia, M.; Sousa, J.; Fonseca, J.; Negrão, E.; de Lima, B.F.; et al. Towards Machine Learning-Aided Lung Cancer Clinical Routines: Approaches and Open Challenges. J. Pers. Med. 2022, 12, 480. [Google Scholar] [CrossRef]
  28. Harzing, A.W. Publish or Perish. 2007. Available online: https://harzing.com/resources/publish-or-perish (accessed on 11 October 2023).
  29. Armato, S.G.; McLennan, G.; Bidaut, L.; McNitt-Gray, M.F.; Meyer, C.R.; Reeves, A.P.; Zhao, B.; Aberle, D.R.; Henschke, C.I.; Hoffman, E.A.; et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Med. Phys. 2011, 38, 915–931. [Google Scholar] [CrossRef]
  30. Setio, A.A.A.; Traverso, A.; de Bel, T.; Berens, M.S.; Bogaard, C.v.D.; Cerello, P.; Chen, H.; Dou, Q.; Fantacci, M.E.; Geurts, B.; et al. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge. Med. Image Anal. 2017, 42, 1–13. [Google Scholar] [CrossRef]
  31. ELCAP and VIA Research Groups. ELCAP Public Lung Image Database. Available online: https://www.via.cornell.edu/databases/lungdb.html (accessed on 18 October 2023).
  32. Alibaba Tianchi Competition Organizers. Tianchi Medical AI Competition Dataset. 2017. Available online: https://tianchi.aliyun.com/competition/entrance/231601/information (accessed on 18 October 2023).
  33. Armato, S.G.; Hadjiisk, L.; Tourassi, G.; Drukker, K.; Giger, M.; Li, F. SPIE-AAPM-NCI Lung Nodule Classification Challenge Dataset (SPIE-AAPM Lung CT Challenge). Available online: https://wiki.cancerimagingarchive.net/display/Public/LUNGx+SPIE-AAPM-NCI+Lung+Nodule+Classification+Challenge (accessed on 18 October 2023).
  34. Wikipedia Contributors. Hounsfield Scale. Available online: https://en.wikipedia.org/w/index.php?title=Hounsfield_scale&oldid=1167604704 (accessed on 18 October 2023).
  35. Gruetzemacher, R.; Gupta, A.; Paradice, D. 3D deep learning for detecting pulmonary nodules in CT scans. J. Am. Med. Inform. Assoc. 2018, 25, 1301–1310. [Google Scholar] [CrossRef] [PubMed]
  36. Nam, K.; Lee, D.; Kang, S.; Lee, S. Performance evaluation of mask R-CNN for lung segmentation using computed tomographic images. J. Korean Phys. Soc. 2022, 81, 346–353. [Google Scholar] [CrossRef]
  37. Moragheb, M.A.; Badie, A.; Noshad, A. An Effective Approach for Automated Lung Node Detection using CT Scans. J. Biomed. Phys. Eng. 2022, 12, 377–386. [Google Scholar] [CrossRef] [PubMed]
  38. Guo, F.-M.; Fan, Y. Zero-Shot and Few-Shot Learning for Lung Cancer Multi-Label Classification using Vision Transformer. arXiv 2022, arXiv:2205.15290. [Google Scholar]
  39. Zhang, H.; Gu, X.; Zhang, M.; Yu, W.; Chen, L.; Wang, Z.; Yao, F.; Gu, Y.; Yang, G.Z. Re-thinking and Re-labeling LIDC-IDRI for Robust Pulmonary Cancer Prediction. In Workshop on Medical Image Learning with Limited and Noisy Data; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
  40. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. In Proceedings of the Advances in Neural Information Processing Systems 27, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
  41. O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
  42. Karampidis, K.; Kavallieratou, E.; Papadourakis, G. A review of image steganalysis techniques for digital forensics. J. Inf. Secur. Appl. 2018, 40, 217–235. [Google Scholar] [CrossRef]
  43. Karampidis, K.; Kavallieratou, E.; Papadourakis, G. A Dilated Convolutional Neural Network as Feature Selector for Spatial Image Steganalysis—A Hybrid Classification Scheme. Pattern Recognit. Image Anal. 2020, 30, 342–358. [Google Scholar] [CrossRef]
  44. Liu, J.; Liu, Y.; Li, D.; Wang, H.; Huang, X.; Song, L. DSDCLA: Driving style detection via hybrid CNN-LSTM with multi-level attention fusion. Appl. Intell. 2023, 53, 19237–19254. [Google Scholar] [CrossRef]
  45. Varshitha, K.S.; Kumari, C.G.; Hasvitha, M.; Fiza, S.; Amarendra, K.; Rachapudi, V. Natural Language Processing using Convolutional Neural Network. In Proceedings of the 2023 7th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 23–25 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 362–367. [Google Scholar] [CrossRef]
  46. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015. [Google Scholar]
  47. Chen, S.; Guo, W. Auto-Encoders in Deep Learning—A Review with New Perspectives. Mathematics 2023, 11, 1777. [Google Scholar] [CrossRef]
  48. Girin, L.; Leglaive, S.; Bie, X.; Diard, J.; Hueber, T.; Alameda-Pineda, X. Dynamical Variational Autoencoders: A Comprehensive Review. Found. Trends® Mach. Learn. 2021, 15, 1–175. [Google Scholar] [CrossRef]
  49. Pawan, S.J.; Rajan, J. Capsule networks for image classification: A review. Neurocomputing 2022, 509, 102–120. [Google Scholar] [CrossRef]
  50. Islam, S.; Elmekki, H.; Elsebai, A.; Bentahar, J.; Drawel, N.; Rjoub, G.; Pedrycz, W. A comprehensive survey on applications of transformers for deep learning tasks. Expert. Syst. Appl. 2024, 241, 122666. [Google Scholar] [CrossRef]
  51. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  52. Yenduri, G.; Ramalingam, M.; Selvi, G.C.; Supriya, Y.; Srivastava, G.; Maddikunta, P.K.R.; Raj, G.D.; Jhaveri, R.H.; Prabadevi, B.; Wang, W.; et al. GPT (Generative Pre-Trained Transformer)—A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. IEEE Access 2024, 12, 54608–54649. [Google Scholar] [CrossRef]
  53. Park, N.; Kim, S. How Do Vision Transformers Work? arXiv 2022, arXiv:2202.06709. [Google Scholar]
  54. Ding, J.; Li, A.; Hu, Z.; Wang, L. Accurate Pulmonary Nodule Detection in Computed Tomography Images Using Deep Convolutional Neural Networks. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2017, Quebec City, QC, Canada, 11–13 September 2017; pp. 559–567. [Google Scholar] [CrossRef]
  55. Eun, H.; Kim, D.; Jung, C.; Kim, C. Single-view 2D CNNs with fully automatic non-nodule categorization for false positive reduction in pulmonary nodule detection. Comput. Methods Programs Biomed. 2018, 165, 215–224. [Google Scholar] [CrossRef]
  56. Gu, Y.; Lu, X.; Yang, L.; Zhang, B.; Yu, D.; Zhao, Y.; Gao, L.; Wu, L.; Zhou, T. Automatic lung nodule detection using a 3D deep convolutional neural network combined with a multi-scale prediction strategy in chest CTs. Comput. Biol. Med. 2018, 103, 220–231. [Google Scholar] [CrossRef]
  57. Zhang, J.; Xia, Y.; Zeng, H.; Zhang, Y. NODULe: Combining constrained multi-scale LoG filters with densely dilated 3D deep convolutional neural network for pulmonary nodule detection. Neurocomputing 2018, 317, 159–167. [Google Scholar] [CrossRef]
  58. Tan, J.; Huo, Y.; Liang, Z.; Li, L. Expert knowledge-infused deep learning for automatic lung nodule detection. J. Xray Sci. Technol. 2019, 27, 17–35. [Google Scholar] [CrossRef]
  59. Zheng, S.; Guo, J.; Cui, X.; Veldhuis, R.N.J.; Oudkerk, M.; van Ooijen, P.M.A. Automatic Pulmonary Nodule Detection in CT Scans Using Convolutional Neural Networks Based on Maximum Intensity Projection. IEEE Trans Med. Imaging 2020, 39, 797–805. [Google Scholar] [CrossRef]
  60. Wang, Q.; Shen, F.; Shen, L.; Huang, J.; Sheng, W. Lung Nodule Detection in CT Images Using a Raw Patch-Based Convolutional Neural Network. J. Digit. Imaging 2019, 32, 971–979. [Google Scholar] [CrossRef] [PubMed]
  61. Nasrullah, N.; Sang, J.; Alam, M.S.; Mateen, M.; Cai, B.; Hu, H. Automated Lung Nodule Detection and Classification Using Deep Learning Combined with Multiple Strategies. Sensors 2019, 19, 3722. [Google Scholar] [CrossRef]
  62. Tang, H.; Zhang, C.; Xie, X. NoduleNet: Decoupled False Positive Reduction for Pulmonary Nodule Detection and Segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, Shenzhen, China, 13–17 October 2019; pp. 266–274. [Google Scholar] [CrossRef]
  63. Tang, S.; Yang, M.; Bai, J. Detection of pulmonary nodules based on a multiscale feature 3D U-Net convolutional neural network of transfer learning. PLoS ONE 2020, 15, e0235672. [Google Scholar] [CrossRef]
  64. Zuo, W.; Zhou, F.; Li, Z.; Wang, L. Multi-Resolution CNN and Knowledge Transfer for Candidate Classification in Lung Nodule Detection. IEEE Access 2019, 7, 32510–32521. [Google Scholar] [CrossRef]
  65. Zheng, S.; Cornelissen, L.J.; Cui, X.; Jing, X.; Veldhuis, R.N.J.; Oudkerk, M.; van Ooijen, P.M.A. Deep convolutional neural networks for multiplanar lung nodule detection: Improvement in small nodule identification. Med. Phys. 2021, 48, 733–744. [Google Scholar] [CrossRef] [PubMed]
  66. Shrey, S.B.; Hakim, L.; Kavitha, M.; Kim, H.W.; Kurita, T. Transfer Learning by Cascaded Network to Identify and Classify Lung Nodules for Cancer Detection. In Proceedings of the Frontiers of Computer Vision—IW-FCV 2020, Ibusuki, Japan, 20–22 February 2020; pp. 262–273. [Google Scholar] [CrossRef]
  67. Tong, C.; Liang, B.; Su, Q.; Yu, M.; Hu, J.; Bashir, A.K.; Zheng, Z. Pulmonary Nodule Classification Based on Heterogeneous Features Learning. IEEE J. Sel. Areas Commun. 2021, 39, 574–581. [Google Scholar] [CrossRef]
  68. Peng, H.; Sun, H.; Guo, Y. 3D multi-scale deep convolutional neural networks for pulmonary nodule detection. PLoS ONE 2021, 16, e0244406. [Google Scholar] [CrossRef]
  69. Sori, W.J.; Feng, J.; Godana, A.W.; Liu, S.; Gelmecha, D.J. DFD-Net: Lung cancer detection from denoised CT scan image using deep learning. Front. Comput. Sci. 2021, 15, 152701. [Google Scholar] [CrossRef]
  70. Mittapalli, P.S.; Thanikaiselvan, V. Multiscale CNN with compound fusions for false positive reduction in lung nodule detection. Artif. Intell. Med. 2021, 113, 102017. [Google Scholar] [CrossRef]
  71. Yuan, H.; Fan, Z.; Wu, Y.; Cheng, J. An efficient multi-path 3D convolutional neural network for false-positive reduction of pulmonary nodule detection. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 2269–2277. [Google Scholar] [CrossRef]
  72. Nguyen, C.C.; Tran, G.S.; Nguyen, V.T.; Burie, J.-C.; Nghiem, T.P. Pulmonary Nodule Detection Based on Faster R-CNN With Adaptive Anchor Box. IEEE Access 2021, 9, 154740–154751. [Google Scholar] [CrossRef]
  73. Trajanovski, S.; Mavroeidis, D.; Swisher, C.L.; Gebre, B.G.; Veeling, B.S.; Wiemker, R.; Klinder, T.; Tahmasebi, A.; Regis, S.M.; Wald, C.; et al. Towards radiologist-level cancer risk assessment in CT lung screening using deep learning. Comput. Med. Imaging Graph. 2021, 90, 101883. [Google Scholar] [CrossRef] [PubMed]
  74. Suzuki, K.; Otsuka, Y.; Nomura, Y.; Kumamaru, K.K.; Kuwatsuru, R.; Aoki, S. Development and Validation of a Modified Three-Dimensional U-Net Deep-Learning Model for Automated Detection of Lung Nodules on Chest CT Images From the Lung Image Database Consortium and Japanese Datasets. Acad. Radiol. 2022, 29, S11–S17. [Google Scholar] [CrossRef]
  75. Luo, X.; Song, T.; Wang, G.; Chen, J.; Chen, Y.; Li, K.; Metaxas, D.N.; Zhang, S. SCPM-Net: An anchor-free 3D lung nodule detection network using sphere representation and center points matching. Med. Image Anal. 2022, 75, 102287. [Google Scholar] [CrossRef]
  76. Agnes, S.A.; Anitha, J.; Solomon, A.A. Two-stage lung nodule detection framework using enhanced UNet and convolutional LSTM networks in CT images. Comput. Biol. Med. 2022, 149, 106059. [Google Scholar] [CrossRef]
  77. Zhu, X.; Wang, X.; Shi, Y.; Ren, S.; Wang, W. Channel-Wise Attention Mechanism in the 3D Convolutional Network for Lung Nodule Detection. Electronics 2022, 11, 1600. [Google Scholar] [CrossRef]
  78. Jian, M.; Zhang, L.; Jin, H.; Li, X. 3DAGNet: 3D Deep Attention and Global Search Network for Pulmonary Nodule Detection. Electronics 2023, 12, 2333. [Google Scholar] [CrossRef]
  79. Liu, L.; Fan, K.; Yang, M. Federated learning: A deep learning model based on resnet18 dual path for lung nodule detection. Multimed. Tools Appl. 2023, 82, 17437–17450. [Google Scholar] [CrossRef]
  80. Mkindu, H.; Wu, L.; Zhao, Y. Lung nodule detection in chest CT images based on vision transformer network with Bayesian optimization. Biomed. Signal Process. Control 2023, 85, 104866. [Google Scholar] [CrossRef]
  81. Mkindu, H.; Wu, L.; Zhao, Y. 3D multi-scale vision transformer for lung nodule detection in chest CT images. Signal Image Video Process 2023, 17, 2473–2480. [Google Scholar] [CrossRef]
  82. Zhang, J.; Xia, K.; Huang, Z.; Wang, S.; Akindele, R.G. ETAM: Ensemble transformer with attention modules for detection of small objects. Expert. Syst. Appl. 2023, 224, 119997. [Google Scholar] [CrossRef]
  83. Cao, K.; Tao, H.; Wang, Z. Three-Dimensional Multifaceted Attention Encoder–Decoder Networks for Pulmonary Nodule Detection. Appl. Sci. 2023, 13, 10822. [Google Scholar] [CrossRef]
  84. Song, L.; Zhang, M.; Wu, L. Detection of low-dose computed tomography pulmonary nodules based on 3D CNN-CapsNet. Electron. Lett. 2023, 59, e12952. [Google Scholar] [CrossRef]
  85. Zhu, Y.; Xu, L.; Liu, Y.; Guo, P.; Zhang, J. Multiscale self-calibrated pulmonary nodule detection network fusing dual attention mechanism. Phys. Med. Biol. 2023, 68, 165007. [Google Scholar] [CrossRef]
  86. Tajbakhsh, N.; Shin, J.Y.; Gurudu, S.R.; Hurst, R.T.; Kendall, C.B.; Gotway, M.B.; Liang, J. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? IEEE Trans. Med. Imaging 2017, 35, 1299–1312. [Google Scholar] [CrossRef]
  87. Wu, H.; Flierl, M. Vector Quantization-Based Regularization for Autoencoders. Proc. AAAI Conf. Artif. Intell. 2020, 34, 6380–6387. [Google Scholar] [CrossRef]
  88. Bechar, A.; Elmir, Y.; Medjoudj, R.; Himeur, Y.; Amira, A. Harnessing Transformers: A Leap Forward in Lung Cancer Image Detection. In Proceedings of the 2023 6th International Conference on Signal Processing and Information Security (ICSPIS), Dubai, United Arab Emirates, 8–9 November 2023. [Google Scholar]
  89. Shao, H.; Lu, J.; Wang, M.; Wang, Z. An Efficient Training Accelerator for Transformers With Hardware-Algorithm Co-Optimization. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2023, 31, 1788–1801. [Google Scholar] [CrossRef]
  90. Choudhary, S.; Saurav, S.; Saini, R.; Singh, S. Capsule networks for computer vision applications: A comprehensive review. Appl. Intell. 2023, 53, 21799–21826. [Google Scholar] [CrossRef]
  91. Marchisio, A.; De Marco, A.; Colucci, A.; Martina, M.; Shafique, M. RobCaps: Evaluating the Robustness of Capsule Networks against Affine Transformations and Adversarial Attacks. In Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18–23 June 2023. [Google Scholar]
  92. Renzulli, R.; Grangetto, M. Towards Efficient Capsule Networks. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022. [Google Scholar]
  93. Wang, S.; Zhou, M.; Liu, Z.; Liu, Z.; Gu, D.; Zang, Y.; Dong, D.; Gevaert, O.; Tian, J. Central focused convolutional neural networks: Developing a data-driven model for lung nodule segmentation. Med. Image Anal. 2017, 40, 172–183. [Google Scholar] [CrossRef]
  94. Liu, H.; Cao, H.; Song, E.; Ma, G.; Xu, X.; Jin, R.; Jin, Y.; Hung, C.-C. A cascaded dual-pathway residual network for lung nodule segmentation in CT images. Phys. Med. 2019, 63, 112–121. [Google Scholar] [CrossRef]
  95. Roy, R.; Chakraborti, T.; Chowdhury, A.S. A deep learning-shape driven level set synergism for pulmonary nodule segmentation. Pattern Recognit. Lett. 2019, 123, 31–38. [Google Scholar] [CrossRef]
  96. Singadkar, G.; Mahajan, A.; Thakur, M.; Talbar, S. Deep Deconvolutional Residual Network Based Automatic Lung Nodule Segmentation. J. Digit. Imaging 2020, 33, 678–684. [Google Scholar] [CrossRef] [PubMed]
  97. Usman, M.; Lee, B.-D.; Byon, S.-S.; Kim, S.-H.; Lee, B.; Shin, Y.-G. Volumetric lung nodule segmentation using adaptive ROI with multi-view residual learning. Sci. Rep. 2020, 10, 12839. [Google Scholar] [CrossRef]
  98. Cao, H.; Liu, H.; Song, E.; Hung, C.-C.; Ma, G.; Xu, X.; Jin, R.; Lu, J. Dual-branch residual network for lung nodule segmentation. Appl. Soft Comput. 2020, 86, 105934. [Google Scholar] [CrossRef]
  99. Pezzano, G.; Ripoll, V.R.; Radeva, P. CoLe-CNN: Context-learning convolutional neural network with adaptive loss function for lung nodule segmentation. Comput. Methods Programs Biomed. 2021, 198, 105792. [Google Scholar] [CrossRef]
  100. Dutande, P.; Baid, U.; Talbar, S. LNCDS: A 2D-3D cascaded CNN approach for lung nodule classification, detection and segmentation. Biomed. Signal Process. Control 2021, 67, 102527. [Google Scholar] [CrossRef]
  101. Hesamian, M.H.; Jia, W.; He, X.; Wang, Q.; Kennedy, P.J. Synthetic CT images for semi-sequential detection and segmentation of lung nodules. Appl. Intell. 2021, 51, 1616–1628. [Google Scholar] [CrossRef]
  102. Dodia, S.; Basava, A.; Anand, M.P. A novel receptive field-regularized V-net and nodule classification network for lung nodule detection. Int. J. Imaging Syst. Technol. 2022, 32, 88–101. [Google Scholar] [CrossRef]
  103. Wu, Z.; Zhou, Q.; Wang, F. Coarse-to-Fine Lung Nodule Segmentation in CT Images with Image Enhancement and Dual-Branch Network. IEEE Access 2021, 9, 7255–7262. [Google Scholar] [CrossRef]
  104. Banu, S.F.; Sarker, M.M.K.; Abdel-Nasser, M.; Puig, D.; Raswan, H.A. AWEU-Net: An Attention-Aware Weight Excitation U-Net for Lung Nodule Segmentation. Appl. Sci. 2021, 11, 10132. [Google Scholar] [CrossRef]
  105. Zhang, X.; Liu, X.; Zhang, B.; Dong, J.; Zhao, S.; Li, S. Accurate segmentation for different types of lung nodules on CT images using improved U-Net convolutional network. Medicine 2021, 100, e27491. [Google Scholar] [CrossRef] [PubMed]
  106. Yu, H.; Li, J.; Zhang, L.; Cao, Y.; Yu, X.; Sun, J. Design of lung nodules segmentation and recognition algorithm based on deep learning. BMC Bioinform. 2021, 22, 314. [Google Scholar] [CrossRef]
  107. Khan, M.A.; Rajinikanth, V.; Satapathy, S.C.; Taniar, D.; Mohanty, J.R.; Tariq, U.; Damaševičius, R. VGG19 Network Assisted Joint Segmentation and Classification of Lung Nodules in CT Images. Diagnostics 2021, 11, 2208. [Google Scholar] [CrossRef]
  108. Kido, S.; Kidera, S.; Hirano, Y.; Mabu, S.; Kamiya, T.; Tanaka, N.; Suzuki, Y.; Yanagawa, M.; Tomiyama, N. Segmentation of Lung Nodules on CT Images Using a Nested Three-Dimensional Fully Connected Convolutional Network. Front. Artif. Intell. 2022, 5, 782225. [Google Scholar] [CrossRef]
  109. Tyagi, S.; Talbar, S.N. CSE-GAN: A 3D conditional generative adversarial network with concurrent squeeze-and-excitation blocks for lung nodule segmentation. Comput. Biol. Med. 2022, 147, 105781. [Google Scholar] [CrossRef] [PubMed]
  110. Luo, S.; Zhang, J.; Xiao, N.; Qiang, Y.; Li, K.; Zhao, J.; Meng, L.; Song, P. DAS-Net: A lung nodule segmentation method based on adaptive dual-branch attention and shadow mapping. Appl. Intell. 2022, 52, 15617–15631. [Google Scholar] [CrossRef]
  111. Wang, S.; Jiang, A.; Li, X.; Qiu, Y.; Li, M.; Li, F. DPBET: A dual-path lung nodules segmentation model based on boundary enhancement and hybrid transformer. Comput. Biol. Med. 2022, 151, 106330. [Google Scholar] [CrossRef] [PubMed]
  112. Usman, M.; Shin, Y.-G. DEHA-Net: A Dual-Encoder-Based Hard Attention Network with an Adaptive ROI Mechanism for Lung Nodule Segmentation. Sensors 2023, 23, 1989. [Google Scholar] [CrossRef]
  113. Wu, Z.; Li, X.; Zuo, J. RAD-UNet: Research on an improved lung nodule semantic segmentation algorithm based on deep learning. Front. Oncol. 2023, 13, 1084096. [Google Scholar] [CrossRef]
  114. Hou, J.; Yan, C.; Li, R.; Huang, Q.; Fan, X.; Lin, F. Lung Nodule Segmentation Algorithm With SMR-UNet. IEEE Access 2023, 11, 34319–34331. [Google Scholar] [CrossRef]
  115. Li, X.; Jiang, A.; Qiu, Y.; Li, M.; Zhang, X.; Yan, S. TPFR-Net: U-shaped model for lung nodule segmentation based on transformer pooling and dual-attention feature reorganization. Med. Biol. Eng. Comput. 2023, 61, 1929–1946. [Google Scholar] [CrossRef] [PubMed]
  116. Qiu, J.; Li, B.; Liao, R.; Mo, H.; Tian, L. A dual-task region-boundary aware neural network for accurate pulmonary nodule segmentation. J. Vis. Commun. Image Represent. 2023, 96, 103909. [Google Scholar] [CrossRef]
  117. Ardimento, P.; Aversano, L.; Bernardi, M.L.; Cimitile, M.; Iammarino, M.; Verdone, C. Evo-GUNet3++: Using evolutionary algorithms to train UNet-based architectures for efficient 3D lung cancer detection. Appl. Soft Comput. 2023, 144, 110465. [Google Scholar] [CrossRef]
  118. Crespi, L.; Loiacono, D.; Sartori, P. Are 3D better than 2D Convolutional Neural Networks for Medical Imaging Semantic Segmentation? In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar] [CrossRef]
  119. Kolmogorov–Smirnov Test. In Encyclopedia of Research Design; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 2010. [CrossRef]
  120. Gao, W.; McDonnell, M.D. Analysis of Gradient Degradation and Feature Map Quality in Deep All-Convolutional Neural Networks Compared to Deep Residual Networks. In Proceedings of the Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, 14–18 November 2017; pp. 612–621. [Google Scholar] [CrossRef]
  121. AbdulRazek, M.; Khoriba, G.; Belal, M. GAN-GA: A Generative Model based on Genetic Algorithm for Medical Image Generation. In Proceedings of the 27th Conference on Medical Image Understanding and Analysis, Aberdeen, UK, 19–21 July 2023. [Google Scholar] [CrossRef]
  122. Karampidis, K.; Linardos, E.; Kavallieratou, E. StegoPass—Utilization of steganography to produce a novel unbreakable biometric based password authentication scheme. In Proceedings of the 14th International Conference on Computational Intelligence in Security for Information Systems and 12th International Conference on European Transnational Educational (CISIS 2021 and ICEUTE 2021), Bilbao, Spain, 21 September 2021; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 146–155. [Google Scholar] [CrossRef]
  123. Crespi, L.; Camnasio, S.; Dei, D.; Lambri, N.; Mancosu, P.; Scorsetti, M.; Loiacono, D. Leveraging Multimodal CycleGAN for the Generation of Anatomically Accurate Synthetic CT Scans from MRIs. arXiv 2024, arXiv:2407.10888. [Google Scholar]
  124. Saad, M.M.; O’Reilly, R.; Rehmani, M.H. A survey on training challenges in generative adversarial networks for biomedical image analysis. Artif. Intell. Rev. 2024, 57, 19. [Google Scholar] [CrossRef]
  125. Nibali, A.; He, Z.; Wollersheim, D. Pulmonary nodule classification with deep residual networks. Int. J. Comput. Assist. Radiol. Surg. 2017, 12, 1799–1808. [Google Scholar] [CrossRef] [PubMed]
  126. Kang, G.; Liu, K.; Hou, B.; Zhang, N. 3D multi-view convolutional neural networks for lung nodule classification. PLoS ONE 2017, 12, e0188290. [Google Scholar] [CrossRef] [PubMed]
  127. Xie, Y.; Xia, Y.; Zhang, J.; Feng, D.D.; Fulham, M.; Cai, W. Transferable Multi-model Ensemble for Benign-Malignant Lung Nodule Classification on Chest CT. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2017, Quebec City, QC, Canada, 11–13 September 2017; pp. 656–664. [Google Scholar] [CrossRef]
  128. Xie, Y.; Xia, Y.; Zhang, J.; Song, Y.; Feng, D.; Fulham, M.; Cai, W. Knowledge-based Collaborative Deep Learning for Benign-Malignant Lung Nodule Classification on Chest CT. IEEE Trans. Med. Imaging 2019, 38, 991–1004. [Google Scholar] [CrossRef]
  129. Causey, J.L.; Zhang, J.; Ma, S.; Jiang, B.; Qualls, J.A.; Politte, D.G.; Prior, F.; Zhang, S.; Huang, X. Highly accurate model for prediction of lung nodule malignancy with CT scans. Sci. Rep. 2018, 8, 9286. [Google Scholar] [CrossRef]
  130. Dey, R.; Lu, Z.; Hong, Y. Diagnostic classification of lung nodules using 3D neural networks. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 774–778. [Google Scholar] [CrossRef]
  131. da Silva, G.L.F.; Valente, T.L.A.; Silva, A.C.; de Paiva, A.C.; Gattass, M. Convolutional neural network-based PSO for lung nodule false positive reduction on CT images. Comput. Methods Programs Biomed. 2018, 162, 109–118. [Google Scholar] [CrossRef]
  132. Jung, H.; Kim, B.; Lee, I.; Lee, J.; Kang, J. Classification of lung nodules in CT scans using three-dimensional deep convolutional neural networks with a checkpoint ensemble method. BMC Med. Imaging 2018, 18, 48. [Google Scholar] [CrossRef] [PubMed]
  133. Mao, K.; Tang, R.; Wang, X.; Zhang, W.; Wu, H. Feature Representation Using Deep Autoencoder for Lung Nodule Image Classification. Complexity 2018, 2018, 3078374. [Google Scholar] [CrossRef]
  134. Tran, G.S.; Nghiem, T.P.; Nguyen, V.T.; Luong, C.M.; Burie, J.-C. Improving Accuracy of Lung Nodule Classification Using Deep Learning with Focal Loss. J. Healthc. Eng. 2019, 2019, 5156416. [Google Scholar] [CrossRef] [PubMed]
  135. Al-Shabi, M.; Lee, H.K.; Tan, M. Gated-Dilated Networks for Lung Nodule Classification in CT Scans. IEEE Access 2019, 7, 178827–178838. [Google Scholar] [CrossRef]
  136. Zhao, X.; Qi, S.; Zhang, B.; Ma, H.; Qian, W.; Yao, Y.; Sun, J. Deep CNN models for pulmonary nodule classification: Model modification, model integration, and transfer learning. J. Xray Sci. Technol. 2019, 27, 615–629. [Google Scholar] [CrossRef]
  137. Afshar, P.; Oikonomou, A.; Naderkhani, F.; Tyrrell, P.N.; Plataniotis, K.N.; Farahani, K.; Mohammadi, A. 3D-MCN: A 3D Multi-scale Capsule Network for Lung Nodule Malignancy Prediction. Sci. Rep. 2020, 10, 7948. [Google Scholar] [CrossRef]
  138. Bhandary, A.; Prabhu, G.A.; Rajinikanth, V.; Thanaraj, K.P.; Satapathy, S.C.; Robbins, D.E.; Shasky, C.; Zhang, Y.-D.; Tavares, J.M.R.; Raja, N.S.M. Deep-learning framework to detect lung abnormality—A study with chest X-Ray and lung CT scan images. Pattern Recognit. Lett. 2020, 129, 271–278. [Google Scholar] [CrossRef]
  139. Suresh, S.; Mohan, S. ROI-based feature learning for efficient true positive prediction using convolutional neural network for lung cancer diagnosis. Neural Comput. Appl. 2020, 32, 15989–16009. [Google Scholar] [CrossRef]
  140. Xu, X.; Wang, C.; Guo, J.; Gan, Y.; Wang, J.; Bai, H.; Zhang, L.; Li, W.; Yi, Z. MSCS-DeepLN: Evaluating lung nodule malignancy using multi-scale cost-sensitive neural networks. Med. Image Anal. 2020, 65, 101772. [Google Scholar] [CrossRef]
  141. Ali, I.; Muzammil, M.; Haq, I.U.; Khaliq, A.A.; Abdullah, S. Efficient Lung Nodule Classification Using Transferable Texture Convolutional Neural Network. IEEE Access 2020, 8, 175859–175870. [Google Scholar] [CrossRef]
  142. Ren, Y.; Tsai, M.-Y.; Chen, L.; Wang, J.; Li, S.; Liu, Y.; Jia, X.; Shen, C. A manifold learning regularization approach to enhance 3D CT image-based lung nodule classification. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 287–295. [Google Scholar] [CrossRef] [PubMed]
  143. Silva, F.; Pereira, T.; Frade, J.; Mendes, J.; Freitas, C.; Hespanhol, V.; Costa, J.L.; Cunha, A.; Oliveira, H.P. Pre-Training Autoencoder for Lung Nodule Malignancy Assessment Using CT Images. Appl. Sci. 2020, 10, 7837. [Google Scholar] [CrossRef]
  144. Zhai, P.; Tao, Y.; Chen, H.; Cai, T.; Li, J. Multi-Task Learning for Lung Nodule Classification on Chest CT. IEEE Access 2020, 8, 180317–180327. [Google Scholar] [CrossRef]
  145. Naik, A.; Edla, D.R.; Kuppili, V. A combination of FractalNet and CNN for Lung Nodule Classification. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar] [CrossRef]
  146. Zia, M.B.; Juan, Z.J.; Xiao, N.; Wang, J.; Khan, A.; Zhou, X. Classification of malignant and benign lung nodule and prediction of image label class using multi-deep model. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 35–41. [Google Scholar] [CrossRef]
  147. Venkadesh, K.V.; Setio, A.A.A.; Schreuder, A.; Scholten, E.T.; Chung, K.; Wille, M.M.W.; Saghir, Z.; van Ginneken, B.; Prokop, M.; Jacobs, C. Deep Learning for Malignancy Risk Estimation of Pulmonary Nodules Detected at Low-Dose Screening CT. Radiology 2021, 300, 438–447. [Google Scholar] [CrossRef]
  148. Chen, Y.; Wang, Y.; Hu, F.; Feng, L.; Zhou, T.; Zheng, C. LDNNET: Towards Robust Classification of Lung Nodule and Cancer Using Lung Dense Neural Network. IEEE Access 2021, 9, 50301–50320. [Google Scholar] [CrossRef]
  149. Abid, M.M.N.; Zia, T.; Ghafoor, M.; Windridge, D. Multi-view Convolutional Recurrent Neural Networks for Lung Cancer Nodule Identification. Neurocomputing 2021, 453, 299–311. [Google Scholar] [CrossRef]
  150. Afshar, P.; Naderkhani, F.; Oikonomou, A.; Rafiee, M.J.; Mohammadi, A.; Plataniotis, K.N. MIXCAPS: A capsule network-based mixture of experts for lung nodule malignancy prediction. Pattern Recognit. 2021, 116, 107942. [Google Scholar] [CrossRef]
  151. Jiang, H.; Shen, F.; Gao, F.; Han, W. Learning efficient, explainable and discriminative representations for pulmonary nodules classification. Pattern Recognit. 2021, 113, 107825. [Google Scholar] [CrossRef]
  152. Apostolopoulos, I.D.; Papathanasiou, N.D.; Panayiotakis, G.S. Classification of lung nodule malignancy in computed tomography imaging utilising generative adversarial networks and semi-supervised transfer learning. Biocybern. Biomed. Eng. 2021, 41, 1243–1257. [Google Scholar] [CrossRef]
  153. Mastouri, R.; Khlifa, N.; Neji, H.; Hantous-Zannad, S. A bilinear convolutional neural network for lung nodules classification on CT images. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 91–101. [Google Scholar] [CrossRef] [PubMed]
  154. Joshua, E.S.N.; Bhattacharyya, D.; Chakkravarthy, M.; Byun, Y.-C. 3D CNN with Visual Insights for Early Detection of Lung Cancer Using Gradient-Weighted Class Activation. J. Healthc. Eng. 2021, 2021, 6695518. [Google Scholar] [CrossRef]
  155. Apostolopoulos, I.D.; Pintelas, E.G.; Livieris, I.E.; Apostolopoulos, D.J.; Papathanasiou, N.D.; Pintelas, P.E.; Panayiotakis, G.S. Automatic classification of solitary pulmonary nodules in PET/CT imaging employing transfer learning techniques. Med. Biol. Eng. Comput. 2021, 59, 1299–1310. [Google Scholar] [CrossRef]
  156. Xia, K.; Chi, J.; Gao, Y.; Jiang, Y.; Wu, C. Adaptive Aggregated Attention Network for Pulmonary Nodule Classification. Applied Sciences 2021, 11, 610. [Google Scholar] [CrossRef]
  157. Mehta, K.; Jain, A.; Mangalagiri, J.; Menon, S.; Nguyen, P.; Chapman, D.R. Lung Nodule Classification Using Biomarkers, Volumetric Radiomics, and 3D CNNs. J. Digit. Imaging 2021, 34, 647–666. [Google Scholar] [CrossRef] [PubMed]
  158. Al-Shabi, M.; Shak, K.; Tan, M. 3D axial-attention for lung nodule classification. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 1319–1324. [Google Scholar] [CrossRef]
  159. Al-Shabi, M.; Shak, K.; Tan, M. ProCAN: Progressive growing channel attentive non-local network for lung nodule classification. Pattern Recognit. 2022, 122, 108309. [Google Scholar] [CrossRef]
  160. Suresh, S.; Mohan, S. NROI based feature learning for automated tumor stage classification of pulmonary lung nodules using deep convolutional neural networks. J. King Saud. Univ. Comput. Inf. Sci. 2022, 34, 1706–1717. [Google Scholar] [CrossRef]
  161. Liu, D.; Liu, F.; Tie, Y.; Qi, L.; Wang, F. Res-trans networks for lung nodule classification. Int. J. Comput. Assist. Radiol. Surg. 2022, 17, 1059–1068. [Google Scholar] [CrossRef]
  162. Fu, X.; Bi, L.; Kumar, A.; Fulham, M.; Kim, J. An attention-enhanced cross-task network to analyse lung nodule attributes in CT images. Pattern Recognit. 2022, 126, 108576. [Google Scholar] [CrossRef]
  163. Wu, K.; Peng, B.; Zhai, D. Multi-Granularity Dilated Transformer for Lung Nodule Classification via Local Focus Scheme. Appl. Sci. 2022, 13, 377. [Google Scholar] [CrossRef]
  164. Zhu, Q.; Wang, Y.; Chu, X.; Yang, X.; Zhong, W. Multi-View Coupled Self-Attention Network for Pulmonary Nodules Classification. 2022. Available online: https://github.com/ahukui/MVCS (accessed on 21 September 2023).
  165. Wu, R.; Liang, C.; Li, Y.; Shi, X.; Zhang, J.; Huang, H. Self-supervised transfer learning framework driven by visual attention for benign–malignant lung nodule classification on chest CT. Expert. Syst. Appl. 2023, 215, 119339. [Google Scholar] [CrossRef]
  166. Dai, D.; Sun, Y.; Dong, C.; Yan, Q.; Li, Z.; Xu, S. Effectively fusing clinical knowledge and AI knowledge for reliable lung nodule diagnosis. Expert. Syst. Appl. 2023, 230, 120634. [Google Scholar] [CrossRef]
  167. Qiao, J.; Fan, Y.; Zhang, M.; Fang, K.; Li, D.; Wang, Z. Ensemble framework based on attributes and deep features for benign-malignant classification of lung nodule. Biomed. Signal Process. Control 2023, 79, 104217. [Google Scholar] [CrossRef]
  168. Nemoto, M.; Ushifusa, K.; Kimura, Y.; Nagaoka, T.; Yamada, T.; Yoshikawa, T. Unsupervised Feature Extraction for Various Computer-Aided Diagnosis Using Multiple Convolutional Autoencoders and 2.5-Dimensional Local Image Analysis. Appl. Sci. 2023, 13, 8330. [Google Scholar] [CrossRef]
  169. Zhang, S.; Sun, F.; Wang, N.; Zhang, C.; Yu, Q.; Zhang, M.; Babyn, P.; Zhong, H. Computer-Aided Diagnosis (CAD) of Pulmonary Nodule of Thoracic CT Image Using Transfer Learning. J. Digit. Imaging 2019, 32, 995–1007. [Google Scholar] [CrossRef]
  170. Huang, X.; Lei, Q.; Xie, T.; Zhang, Y.; Hu, Z.; Zhou, Q. Deep Transfer Convolutional Neural Network and Extreme Learning Machine for lung nodule diagnosis on CT images. Knowl. Based Syst. 2020, 204, 106230. [Google Scholar] [CrossRef]
  171. Rheey, J.; Choi, D.; Park, H. Adaptive Loss Function Design Algorithm for Input Data Distribution in Autoencoder. In Proceedings of the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 19–21 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 489–491. [Google Scholar] [CrossRef]
Figure 1. Taxonomy of invasive and non-invasive lung cancer screening and diagnostic procedures.
Figure 1. Taxonomy of invasive and non-invasive lung cancer screening and diagnostic procedures.
Biomedinformatics 04 00111 g001
Figure 2. Common computer-assisted methods for lung cancer screening.
Figure 2. Common computer-assisted methods for lung cancer screening.
Biomedinformatics 04 00111 g002
Figure 3. Most common deep learning methods for lung cancer screening.
Figure 3. Most common deep learning methods for lung cancer screening.
Biomedinformatics 04 00111 g003
Figure 4. Literature screening phases.
Figure 4. Literature screening phases.
Biomedinformatics 04 00111 g004
Figure 5. Hounsfield unit histogram of low-dose CT scan.
Figure 5. Hounsfield unit histogram of low-dose CT scan.
Biomedinformatics 04 00111 g005
Figure 6. Malignant nodule sample from the 3D path (LIDC-IDRI).
Figure 6. Malignant nodule sample from the 3D path (LIDC-IDRI).
Biomedinformatics 04 00111 g006
Figure 7. Data augmentation sample.
Figure 7. Data augmentation sample.
Biomedinformatics 04 00111 g007
Figure 8. 3D and 2D representations of a pulmonary nodule from LIDC-IDRI.
Figure 8. 3D and 2D representations of a pulmonary nodule from LIDC-IDRI.
Biomedinformatics 04 00111 g008
Figure 9. Detection of pulmonary nodules based on a multiscale feature 3D U-Net convolutional neural network of transfer learning, adapted from [63]. Red arrows point to the detected nodules and d is the diameter of the predicted nodules, and p is the accuracy of detection.
Figure 9. Detection of pulmonary nodules based on a multiscale feature 3D U-Net convolutional neural network of transfer learning, adapted from [63]. Red arrows point to the detected nodules and d is the diameter of the predicted nodules, and p is the accuracy of detection.
Biomedinformatics 04 00111 g009
Figure 10. Visualization of the results of LRA-generated masks, adapted from [83].
Figure 10. Visualization of the results of LRA-generated masks, adapted from [83].
Biomedinformatics 04 00111 g010
Figure 11. Block diagram of the proposed CSE-GAN for lung nodule segmentation adapted from [109] with permission from Elsevier, 2024.
Figure 11. Block diagram of the proposed CSE-GAN for lung nodule segmentation adapted from [109] with permission from Elsevier, 2024.
Biomedinformatics 04 00111 g011
Figure 12. The visual results comparison of the previous cropped-slice-based approach with Res-UNet and full-slice input-based approach with DEHA-Net [112].
Figure 12. The visual results comparison of the previous cropped-slice-based approach with Res-UNet and full-slice input-based approach with DEHA-Net [112].
Biomedinformatics 04 00111 g012
Figure 13. The layout of the two CNN networks. CNN21 is a network with an input size of 47 px. × 47 px. × 5 slices, and CNN47 is the network whose input size is 47 px. × 47 px. × 5 slices. Both networks produce a final classification probability for the two classes. We used the same network layout for the S1vS45, S12vS45, and Nodule VS Non-Nodule classifiers, although we trained separate models for each. The legend (bottom box) defines the symbols used to represent each major component of the network, adapted from [129].
Figure 13. The layout of the two CNN networks. CNN21 is a network with an input size of 47 px. × 47 px. × 5 slices, and CNN47 is the network whose input size is 47 px. × 47 px. × 5 slices. Both networks produce a final classification probability for the two classes. We used the same network layout for the S1vS45, S12vS45, and Nodule VS Non-Nodule classifiers, although we trained separate models for each. The legend (bottom box) defines the symbols used to represent each major component of the network, adapted from [129].
Biomedinformatics 04 00111 g013
Figure 14. Proposed MIXCAPS, adapted from [150] with permission from Elsevier, 2024.
Figure 14. Proposed MIXCAPS, adapted from [150] with permission from Elsevier, 2024.
Biomedinformatics 04 00111 g014
Table 1. Previous reviews.
Table 1. Previous reviews.
ReviewPublication DateDate RangeDatasetsPreprocessingMethods
[22]20182009–2018yesyesTraditional ML, DL
[19]20182006–2017few (2)brieflyTraditional ML, DL
[23]20191990–2020nonoTraditional ML, DL
[24]20202009–2018yesyesTraditional ML, DL
[25]20212005–2020yesnoNon-DL, DL
[20]20212015–2020yesyesDL, GAN
[26]20222015–2021yesbrieflyCNN
[27]20222020–2021nonoTraditional ML, DL
[21]20232018–2023yesnoDL
Ours20242015–2023yesyesCNN, DL, RNN, AE, GAN, Transformers
Table 2. Query Keywords and descriptions.
Table 2. Query Keywords and descriptions.
Query KeywordsDescription
Lung nodule detection deep learningResearch on lung nodule detection using deep learning techniques.
Lung nodule convolutional neural networksStudies involving convolutional neural networks (CNNs) for lung nodule detection.
Lung nodule segmentationResearch on the segmentation of lung nodules is often a critical step in detection.
Lung nodule transfer learningInvestigations into the use of transfer learning for lung nodule detection.
Lung nodule Generative Adversarial Networks synthetic dataUtilizing GANs to generate synthetic data for lung nodule detection.
Lung nodule convolutional autoencodersStudies involving convolutional autoencoders for lung nodule analysis.
Table 3. Summary of CT databases.
Table 3. Summary of CT databases.
ReferenceDataset NameModalities#PatientsAnnotationsImage Format
[29]LIDC-IDRICT *, DX *, CR *1018pixel-based, patient infoDICOM
[30]LUNA16CT, DX, CR888pixel-based, candidate nodulesMetaImage
[31]ELCAPCT50pixel-basedDICOM
[32]TIANCHI17CT1000pixel-basedMetaImage
[33]SPIE-AAPM-NCI LungXCT70pixel-basedDICOM
* CT (computed tomography), DX (digital radiography), CR (computed radiography).
Table 4. Hounsfield values of the substances.
Table 4. Hounsfield values of the substances.
SubstanceHU
Air−1000
Lung−500
Fat−100 to −50
Water0
Blood+30 to +70
Muscle+10 to +40
Liver+40 to +60
Bone+700 (cancellous bone) to +3000 (cortical bone)
Table 5. Nodule detection works.
Table 5. Nodule detection works.
ReferenceYearDatasetDeep ArchitectureSensitivity
[54]2017LUNA16Faster R-CNN, 3D DCNN92.2/1 FP, 94.4/4 FPs
[55]2018LUNA162D CNNs, AE, DAE-
[56]2018LUNA163D CNN87.94/1 FP, 92.93/4 FPs
[35]2018LUNA16Modified 3D U-Net95.16/30.39 FPs
[57]2018LUNA163D DCNN94.9/1 FP
[58]2019LIDC-IDRI2D CNN88/1 FP,
94.01/4 FPs
[59]2019LUNA162D U-net, 3D CNN89.9/0.25 FP, 94.8/4 FPs
[60]2019LIDC-IDRICustom ResNet92.8/8 FPs
[61]2019LIDC-IDRI3D Faster R-CNN and CMixNet with U-Net-like encoder–decoder architecture93.97, 98.00
[62]2019LUNA163D DCNN-
[63]2020LIDC-IDRI3D U-net92.4
[64]2020LUNA16, TIANCHI17CNN, TL97.26
[65]2020LUNA163D multiscale DCNN, AE, TL94.2/1 FP,
96/2 FPs
[66]2020LUNA16, private setU-net, AE, TL98
[67]2021LIDC-IDRI, private data set3D-ResNet and MKL91.01
[68]2021LUNA163DCNN87.2/22 FP
[69]2021Kaggle Data Science Bowl
2017 challenge (KDSB) and LUNA 16
U-Net0.891
[70]2021LUNA163DCNN-
[71]2021LUNA16Multi-path 3D CNN0.952/0.962 to 4, 8 FP/Scans.
[72]2021LUNA16Faster R-CNN with adaptive anchor box93.8
[73]2021NLST (NLST, 2011), LHMC, Kaggle2D and 3D DNN-
[74]2022LIDC-IDRI + Japan Chest CT Dataset3D unet-
[75]2022LUNA163D sphere representation-based center-points matching detection network (SCPM-Net)89.2/7 FP
[76]2022LUNA16Atrous UNet+92.8
[77]2022LUNA163D U-shaped residual network95
[78]2023LUNA163D CNN
[79]2023LUNA163D ResNet18 dual path Faster R-CNN and a federated learning algorithm83.388
[80]2023LUNA163D ViT98.39
[81]2023LUNA163D ViT97.81
[82]2023LIDC-IDRI2D Ensemble Transformer with Attention Modules94.58
[83]2023LUNA163Dl Multifaceted Attention Encoder–Decoder89.1/7 FPs
[84]2023ELCAP3D CNN-CapsNet92.31
[85]2023LUNA16A multiscale self-calibrated network (DEPMSCNet)with a dual attention mechanism98.80
Table 6. Nodule segmentation works.
Table 6. Nodule segmentation works.
ReferenceYearDatasetDeep ArchitectureDSC (%)
[93]2017LIDC-IDRI, private setCentral Focused Convolutional Neural Networks (CF-CNN)82.15 ± 10.76, LIDC
80.02 ± 11.09 Private set
[94]2019LIDC-IDRICascaded Dual-Pathway Residual Network81.58 ± 11.05
[95]2019LIDC-IDRISegNet, a deep, fully convolutional network93 ± 0.11
[62]2019LIDC-IDRI3D DCNN83.10 ± 8.85
[96]2020LIDC-IDRIDeep residual deconvolutional network, TL94.97
[97]2020LIDC-IDRIDeep Residual U-Net87.5 ± 10.58
[98]2020LIDC-IDRIDB-ResNet, CF-CNN82.74 ± 10.19
[99]2020LIDC-IDRIU-net-
[100]2021LIDC-IDRI, LNDb, ILCID2D CNN80
[101]2021LIDC-IDRIU-Net93.14
[102]2022LUNA16V-net95.01
[103]2021LIDC-IDRI, SHCH2D–3D U-net83.16/81.97
[104]2021LIDC-IDRI, LUNA16Faster R-CNN89.79/90.35
[105]2021LIDC-IDRIU-net86.23
[106]2021LIDC-IDRI3D res U-net80.5
[107]2021LIDC-IDRIVGG-SegNet90.49
[108]2022hospital data3D FCN84.5
[109]2022LUNA16, ILND3D GAN80.74/76.36
[110]2022LIDC-IDRI3D Dual Attention Shadow Network (DAS-Net)92.05
[111]2022LIDC-IDRITransformer89.86
[112]2023LIDC-IDRIDual-encoder-based CNN87.91
[113]2023LIDC-IDRI, AHAMU-LCRAD—U-net-
[114]2023LIDC-IDRI, private setSMR—U-net 2D91.87
[115]2023LIDC-IDRIU-shaped hybrid transformer91.84
[116]2023LIDC-IDRI, LUNA163D U-net based82.48
[117]2023LIDC-IDRIGUNet3++97.2
Table 7. Nodule classification works.
Table 7. Nodule classification works.
ReferenceYearDatasetDeep ArchitectureAccuracy (%)
[125]2017LIDC-IDRIResNet89.90
[126]2017LIDC-IDRI3D MV-CNN + SoftMax-
[127]2017LIDC-IDRIResNet-50, TL93.40
[128]2018LIDC-IDRIMV-KBC91.60
[129]2018LIDC-IDRICNN + Random Forest94.60
[130]2018LIDC-IDRI +
Private set
3D DenseNet, TL90.40
[131]2018LIDC-IDRICNN + PSO97.62
[132]2018LUNA163D DCNN-
[133]2018ELCAPDAE-
[134]2019LUNA16Novel 2D CNN97.2
[135]2019LIDC-IDRINovel 2D CNN92.57
[136]2019LIDC-IDRICNN, TL88
[137]2020LIDC-IDRImultiscale 3D-CNN, CapsNets94.94
[138]2020LIDC-IDRIMAN (modified AlexNet), TL91.60
[139]2020LIDC-IDRICNN, TL97.27
[58]2020LIDC-IDRI, private dataset (FAH-GMU)DTCNN, TL93.9
[140]2020LIDC-IDRI, DeepLNDataset3D CNN94.57/100
[141]2020LIDC-IDRI, LUNGx Challenge database2D CNN, TL92.65
[142]2020LIDC-IDRIMRC-DNN96.69
[143]2020LIDC-IDRICAE, TL90
[144]2020LIDC-IDRI, LUNA16Multi-Task CNN-
[145]2020LUNA16Fractalnet and CNN-
[146]2020LIDC-IDRIDCNN94.06
[147]2021NLST, DLCST2D CNN 9 views, 3D CNN90.73
[137]2020LIDC-IDRImultiscale 3D-CNN, CapsNets93.12
[148]2021LUNA16/Kaggle DSB 2017 datasetDense Convolutional Network (DenseNet)-
[149]2021LIDC-IDRI/ELCAP2D MV-CNN 3D MV-CNN98.83
[150]2021LIDCCapsule networks (CapsNets)Accuracy (%)
[151]2021LIDC-IDRI3D NAS method, CBAM module, A-Softmax loss, and ensemble strategy to learn efficient,89.90
[152]2021LIDC-IDRIDeep Convolutional Generative Adversarial Network (DC-GAN)/FF-VGG19-
[153]2021LUNA16BCNN [VGG16, VGG19] combination with and without SVM93.40
[154]2021LUNA163D CNN91.60
[155]2021pet-ct private,
LIDC-IDRI
2d cnn94.60
[156]2021LIDC-IDRI3D DPN _ attention mech90.40
[157]2021LIDC-IDRI3D CNN + biomarkers97.62
[158]2021LIDC-IDRI3D attention-
[159]2022LIDC-IDRI and LUNGxProCAN97.2
[160]2022LIDC-IDRIDCNN92.57
[161]2022LIDC-IDRITransformers88
[162]2022LIDC-IDRICNN-based MTL model that incorporates multiple attention-based learning modules91.60
[163]2022LIDC-IDRITransformers97.27
[164]2022LUNA163D ResNet + attention93.9
[165]2023LIDC-IDRI/TC-LND Dataset/CQUCH-LNDSTLF-VA94.57/100
[166]2023LIDC-IDRITransformer92.65
[167]2023LIDC-IDRIF-LSTM-CNN96.69
[168]2023privateCAE90
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marinakis, I.; Karampidis, K.; Papadourakis, G. Pulmonary Nodule Detection, Segmentation and Classification Using Deep Learning: A Comprehensive Literature Review. BioMedInformatics 2024, 4, 2043-2106. https://doi.org/10.3390/biomedinformatics4030111

AMA Style

Marinakis I, Karampidis K, Papadourakis G. Pulmonary Nodule Detection, Segmentation and Classification Using Deep Learning: A Comprehensive Literature Review. BioMedInformatics. 2024; 4(3):2043-2106. https://doi.org/10.3390/biomedinformatics4030111

Chicago/Turabian Style

Marinakis, Ioannis, Konstantinos Karampidis, and Giorgos Papadourakis. 2024. "Pulmonary Nodule Detection, Segmentation and Classification Using Deep Learning: A Comprehensive Literature Review" BioMedInformatics 4, no. 3: 2043-2106. https://doi.org/10.3390/biomedinformatics4030111

APA Style

Marinakis, I., Karampidis, K., & Papadourakis, G. (2024). Pulmonary Nodule Detection, Segmentation and Classification Using Deep Learning: A Comprehensive Literature Review. BioMedInformatics, 4(3), 2043-2106. https://doi.org/10.3390/biomedinformatics4030111

Article Metrics

Back to TopTop