Lung Cancer Prediction with Machine Learning, Deep Learning and Hybrid Techniques: A Survey

Zahid, Abdullah Bin; Nisa, Fakhar Un; Malik, Ahmad Kamran; Qamar, Nafees

doi:10.3390/labmed3010007

Open AccessReview

Lung Cancer Prediction with Machine Learning, Deep Learning and Hybrid Techniques: A Survey

¹

Barani Institute of Management Sciences (BIMS), Rawalpindi 43600, Pakistan

²

Department of Computer Science, COMSATS University Islamabad, Islamabad 45550, Pakistan

³

School of Health and Behavioral Sciences, Bryant University, Smithfield, RI 02917, USA

^*

Author to whom correspondence should be addressed.

LabMed 2026, 3(1), 7; https://doi.org/10.3390/labmed3010007

Submission received: 25 August 2025 / Revised: 22 November 2025 / Accepted: 19 December 2025 / Published: 28 February 2026

Download

Browse Figures

Versions Notes

Abstract

Lung cancer remains one of the most formidable health challenges globally, with significant morbidity and mortality rates. Despite advancements in diagnostic and treatment technologies, the disease’s high prevalence, late-stage detection, and complex variations continue to hinder effective management. Early detection and accurate diagnosis play a pivotal role in improving survival rates. Crucially, the clinical and translational relevance of AI-based prediction lies in its potential to significantly reduce the incidence of late-stage diagnoses, thus increasing the chance of successful intervention. Lung cancer was first identified by medical professionals in the mid-19th century. Today, cancer remains a significant global health challenge, affecting an estimated 14 million individuals annually and causing 8.2 million fatalities worldwide. Lung cancer ranks among the leading causes of death associated with cancer. This research aims to bridge gaps in lung cancer diagnosis by exploring various learning methodologies. By focusing on studies from the last 10 years, this survey provides a contemporary understanding of the field, emphasizing the importance of automated diagnostic systems in reducing human error and improving efficiency. The selection of relevant research is based on a rigorous methodology, including specific inclusion and exclusion criteria, which are later discussed in detail with supporting figures and comparative data. Ultimately, this work underscores the critical need for innovative diagnostic solutions and comprehensive screening programs to combat lung cancer, save lives, and advance the field of medical research.

Keywords:

lung cancer prediction; machine learning; deep learning; classification method; feature extraction; hybrid approaches

1. Introduction

Lung cancer was first identified by medical professionals in the mid-19th century. Today, lung cancer ranks among the leading causes of death associated with cancer. Although the overall lung cancer life expectancy statistics are reduced relative to many other cancers, the survival rate increases significantly to 54.4% when detected at an early stage [1]. As estimated by the WHO, in 2015, cancer was responsible for the loss of 8.8 million lives. Lung cancer was the cause of 1.69 million (or nearly 20%) of these deaths [2]. The Malaysia National Cancer Registry Report 2012–2016 reveals lung cancer as the third-highest cancer detected among Malaysians, with males experiencing it more than females [3]. Annually, 1.2 million people are diagnosed with lung cancer, accounting for 12.3% of total cancer diagnoses, and about 1.1 million die from it, accounting for 17.8% of total cancer deaths. Due to unchecked cell development in the lungs, lung cancer results in severe breathing difficulties in the chest’s inhalation and exhalation regions [4,5]. The average lung cancer survival levels in 2017 were estimated at 65% [6]. A study showing projection of lungs cancer in multiple countries is described in [7]. According to a study by the WHO, lung-related diseases were the second primary contributor to mortality in 2015 and ranked fifth in 2017 based on contributing factors. These conditions are especially prevalent among smokers, who account for 85% of all cases [8,9]. Non-small cell lung cancer (NSCLC) accounts for approximately 80–90% of all lung cancer cases [10,11,12,13], while 15–20% of cases are related to small cell lung cancers. Lung and breast cancer lead to the significant morbidity and mortality rates associated with tumors in women in the United States. Additionally, NDDS can be designed for regulated release, which could help to overcome resistance and enable greater drug accumulation in tumors [14]. In the last thirty years, early screening techniques and improvements in clinical diagnostic and treatment have led to a decrease in mortality and increased survival time for breast cancer patients [15]. It was discovered that about 26–30% of such patients have quit smoking, while 70–74% are still smokers [16]. Numerous traffic-related issues commonly found in urban areas contribute to extreme air pollution. As urban air pollution has such negative impacts, investigations of people’s lung health conditions are often carried out with the goal of lowering the death rate [17]. Many individuals are diagnosed with lung cancer in the later stages, resulting in a poor prognosis. The challenge for healthcare professionals lies in identifying the most effective treatment options, as lung cancer presents a range of imaging features and histological variations, compounded by the advanced stage at which it is often diagnosed [18]. Cigarette smoking is the leading factor contributing to lung cancer, which ranks among the most prevalent forms of cancer. This type of cancer represents over 25% of all deaths attributed to cancer and affects both men and women. It has been estimated that 80–85% of lung cancer fatalities are directly associated with smoking [19,20]. The authors of [21] proposed that only about half were smokers now or in the past, compared to the Mayo Clinic, Johns Hopkins, and Memorial Sloan–Kettering trials. Consistent with earlier research, there was no mortality advantage associated with CXR screening. Nonetheless, there were some encouraging findings to consider: CXR screening identified 60% of lung tumors that emerged during the screening period (as opposed to “interval” malignancies), half of which were stage I illnesses. It was estimated that the UK and EU’s cancer death rates are still favorable. In the EU, rates decreased by 3.72% for women and 6% for males [22]. This research centers on using different classification algorithms to diagnose lung cancer. Following diagnosis, patients with lung cancer typically have a survival rate of 10% to 20% over five years. Early detection methods, such as MRI and CT, are commonly utilized in medical procedures that significantly enhance patient survival rates. Lung cancer is typically caused by smoking cigarettes, accounting for 85% of cases, while roughly 10–15% of lung cancer cases never smoked. Lung cancer is classified into two types based on growth patterns: SCLC and NSCLC [23,24]. How a patient is treated is determined by the kinds and categorization of their data [25]. A CT scan, X-ray, biopsy, blood test, and patient assessment are performed in addition to diagnosing cancer [26]. Lung cancer, a serious form of malignant growth, poses significant challenges in terms of diagnosis and treatment. However, for non-smokers, early detection can lead to effective prevention or treatment. Nonsmokers may get lung cancer as a result of radon radiation, secondhand smoke, air pollution, or other causes. This cancer is one of the most common and aggressive varieties encountered in patients [27]. Computed tomography (CT) scans, produced by merging many X-ray images obtained from various angles around the body, can reveal lung cancer [28]. Lung cancer has a growth rate of 13%. In clinical practice, analyzing and interpreting lung CT images is a delicate procedure that requires significant time and expertise [29]. A substantial percentage of patients can have a higher chance of survival if lung nodules are correctly diagnosed early [30]. As cancer can react well to therapy when caught early, early detection of this health issue can considerably lower the disease’s death rate, thus saving many lives. The creation of automated tools to assess and categorize this illness helps to expedite diagnosis processes and lowers the likelihood of human error [31]. Both active and passive smokers make up about 90% of lung cancer patients. Since most lung cancer patients do not exhibit symptoms in their early stages, most of these people are diagnosed with stage 3 or stage 4 lung cancer. Early screening is therefore highly useful. Sputum cytology, biopsy, and computed tomography (CT) scans can all be used for lung cancer screening [32]. The tumor size and location are considered to characterize symptoms. In certain circumstances, no pain or symptoms emerge in the early stages, making analysis and detection challenging. Individuals diagnosed with lung cancer might face various symptoms, including coughing up blood (hemoptysis), shoulder pain associated with Pancoast syndrome, hoarseness resulting from vocal cord involvement, as well as significant weight loss, fatigue, and general weakness [33]. Early identification of lung cancer is mandatory for successful treatment and recovery. The most prevalent techniques for diagnosing lung cancer in its initial stages include chest X-rays, CT scans, MRI, isotope scans, bronchoscopy, and various other diagnostic tests [34]. A key technique, known as “pathological diagnosis,” involves analyzing needle biopsy samples obtained from patients to establish a diagnosis [35]. Early detection using ML techniques is key to increasing the survival rate. If such an approach can be used to increase the efficiency and effectiveness of radiology diagnostics, it will be a significant step towards improving early detection. The lung cancer datasets utilized in this study were sourced from Data World and the UCI ML Repository. Initially, k-fold cross-validation was employed to divide the datasets into training and testing subsets. Next, several classification models were constructed using the training data, utilizing approaches like SVM, Logistic Regression, Naive Bayes, and Decision Trees. The training data serve to build the classification models, while the testing data are used to evaluate these models and calculate their accuracy [36]. Lung cancer continues to be one of the most daunting health challenges worldwide, with considerable rates of morbidity and mortality. Even with progress in diagnostic and treatment technologies, the disease’s high prevalence, late-stage detection, and complex variations remain obstacles to effective management. Early detection and precise diagnosis are crucial for enhancing survival rates, especially with the aid of advanced imaging techniques and machine learning tools. This research aims to address deficiencies in lung cancer diagnosis by investigating different methodologies. Focusing on studies from the past 12 years, the survey offers a contemporary perspective on the field, highlighting the significance of automated diagnostic systems in minimizing human error and enhancing efficiency. In the end, this work highlights the crucial necessity for creative diagnostic approaches and thorough screening initiatives in order to fight lung cancer, preserve lives, and promote progress in medical research. Here, we discuss existing surveys focused on lung cancer prediction. The survey in [37] explored various methodologies for predicting lung cancer, including the evaluation of imaging techniques used to detect lung cancer, with a focus on their accuracy in identifying neoplasms. The study also discussed the use of classification systems to differentiate between types of lung cancer based on radiographic features, analyzed the correlation between radiographic findings and patient prognosis to aid outcome prediction, and examined patient demographics and clinical data to identify patterns that may predict lung cancer occurrence. Relevant European organizations [38] suggested that the process could be achieved through the implementation of well-designed, targeted demonstration programs across multiple countries. These programs would emphasize methodology, standardization, tobacco cessation, emotional impacts, cost-effectiveness, risk–benefit analyses, and education on healthy living. A ten-year hospital data study in the USA [39] noted that certain trends in lung cancer patterns align with previous reports. These include the continued decline in the male-to-female gender ratio, reflecting a significant rise in lung cancer incidence among American women, an increasing number of African-American patients, and a growing prevalence of adenocarcinoma. The rise in older patients presents therapeutic challenges, as comorbidities tend to increase with age, and older patients often tolerate aggressive, multimodal therapies less effectively than younger patients. Although the proportion of patients not receiving cancer-directed treatment increased across all age groups, the most notable rise was observed among older patients. Another survey using data mining techniques [40] aimed to identify the most effective methods for extracting knowledge and insights from existing lung cancer profile data. The study reviewed various data mining techniques applied to cancer research and related fields, noting that the data involved are often incomplete. Data cleaning—a critical step in the process—is particularly challenging due to the heterogeneous nature of the data sources, which often lack essential attributes. In this survey, we seek to address the gaps in lung cancer diagnosis by exploring innovative methodologies. We categorize surveys according to three main detection techniques used in the last 12 years: ML, DL, and hybrid. This provides a contemporary perspective on the field, emphasizing the role of automated diagnostic systems in minimizing human error and enhancing efficiency. Ultimately, the work underscores the urgent need for advanced diagnostic solutions and comprehensive screening programs to combat lung cancer, save lives, and drive progress in medical research. Our survey also addresses limitations such as dataset size, algorithmic complexity, and real-world validation that are essential for realizing the full potential of detection methods. Focusing on these limitations, future research can pave the way for the development of scalable, reliable, and efficient diagnostic systems. Lung cancer is recognized as one of the leading causes of cancer-related mortality worldwide, posing a significant health threat to both men and women. It primarily originates in the epithelial cells that line the airways of the lungs, with 90–95% of cases attributed to this cellular origin [12]. The disease manifests through various symptoms, including severe chest pain, persistent dry cough, breathlessness, and unexplained weight loss [4]. The significant correlation between smoking (particularly tobacco use) and the incidence of lung cancer is well documented, with smoking being implicated in over 80% of cases [19].

Lung cancer continues to be a major global health challenge, attributed to its high death rates and complex biological nature. Its close links to smoking and secondhand smoke highlight the pressing necessity for successful prevention and quitting initiatives. The diversity of lung tumors and the differences in how patients respond to treatment underscore the need for improved prognostic methods and individualized treatment strategies. By incorporating state-of-the-art technologies like AI, deep learning, and computer vision, we can revolutionize the early detection, precise diagnosis, and personalized treatment of lung cancer. Researchers are making great progress in addressing the shortcomings of conventional approaches by utilizing these tools in conjunction with detailed datasets and sophisticated imaging methods. The management of lung cancer is constantly evolving, and teamwork among researchers, clinicians, and technologists is leading to innovative diagnostic and therapeutic solutions. These developments promise not only to enhance patient outcomes, but also to help alleviate the worldwide healthcare burden associated with lung cancer. Research and innovation in this area must continue in order to tackle one of the major public health challenges of our era.

This survey is formally structured to help researchers working in the area of lung cancer detection. We created a taxonomy of relevant ML, DL, and hybrid techniques and present their pros and cons, which is followed throughout the paper. At the end, we provide future research directions in this area that can help researchers to focus their research on important research gaps in this area. We employed a systematic research methodology to ensure the relevance and quality of the included literature. We set inclusion and exclusion criteria to select the most relevant research works. We collected peer-reviewed published papers in the most relevant journals and conferences, mostly from the last 10 years. We excluded unpublished, old papers, other than lung cancer, and papers not using ML/DL learning techniques. We also show many figures from relevant research works that show techniques and data types used in the research. Some figures also depict the model accuracy comparisons reported in the most important relevant research works. At the end, a detailed table provides a comparison of important related works and their limitations.

The rest of this paper is structured as follows. Section 2 provides a comparison with existing surveys on the same topic. Section 3 describes the research methodology. Section 4 describes the details of lung cancer prediction approaches. Section 5 presents our taxonomy and outlines its subsections. Section 6 discusses details of the problems and solutions under each category of taxonomy. Section 7 provides discussion and future research directions. Section 8 concludes the paper.

2. Comparison with Existing Surveys

Here, we discuss existing surveys on lung cancer prediction. The survey in [37] explores various methodologies for predicting lung cancer, including the evaluation of imaging techniques used to detect lung cancer, with a focus on their accuracy in identifying neoplasms. The study also discusses the use of classification systems to differentiate between types of lung cancer based on radiographic features, analyzes the correlation between radiographic findings and patient prognosis to aid outcome prediction, and examines patient demographics and clinical data to identify patterns that may predict lung cancer occurrence.

Relevant European organizations [38] have suggested that the process could be achieved through the implementation of well-designed, targeted demonstration programs across multiple countries. These programs would emphasize methodology, standardization, tobacco cessation, emotional impacts, cost-effectiveness, risk–benefit analyses, and education on healthy living.

A ten-year hospital data study in the USA [39] noted that certain trends in lung cancer patterns align with previous reports. These include the continued decline in the male-to-female gender ratio, reflecting a significant rise in lung cancer incidence among American women, an increasing number of African-American patients, and a growing prevalence of adenocarcinoma. The rise in older patients presents therapeutic challenges, as comorbidities tend to increase with age, and older patients often tolerate aggressive, multimodal therapies less effectively than younger patients. Although the proportion of patients not receiving cancer-directed treatment increased across all age groups, the most notable rise was observed among older patients.

Another survey using data mining techniques [40] aimed to identify the most effective methods for extracting knowledge and insights from existing lung cancer profile data. The study reviewed various data mining techniques applied in cancer research and related fields, noting that the data involved are often incomplete. Data cleaning—a critical step in the process—is particularly challenging due to the heterogeneous nature of the data sources, which often lack essential attributes.

In this survey, we seek to address the gaps in lung cancer diagnosis by exploring innovative methodologies. We categorize surveys according to three main detection techniques used in the last 10 years: ML, DL, and hybrid. Our survey inclusion and exclusion criteria were used to select the most relevant research. We collected only peer-reviewed published papers in the most relevant journals and conferences. We also included the latest papers, published within the last 12 years, while excluding unpublished, old papers, those focused on topics other than lung cancer, and papers not using ML/DL techniques. To categorize the sections, we created a proper taxonomy of all relevant techniques, which is followed throughout the paper. At the end, we also provide future research directions in this area that can help researchers to focus their research on important research gaps in the area. This provides a contemporary perspective on the field, emphasizing the role of automated diagnostic systems in minimizing human error and enhancing efficiency. Ultimately, this work underscores the urgent need for advanced diagnostic solutions and comprehensive screening programs to combat lung cancer, save lives, and drive progress in medical research. Our survey also addresses limitations such as dataset size, data modalities, algorithmic complexity, and real-world validation that are essential for realizing their full potential of detection methods. Focusing on these limitations, future research can pave the way for the development of scalable, reliable, and efficient diagnostic systems.

3. Research Methodology

3.1. Overview

The PRISMA 2020 guidelines [41] are adhered to in this systematic review. We used predefined keywords associated with deep learning (DL) and the diagnosis of lung cancer using computed tomography (CT) imaging to perform a structured search across five major databases (Scopus, IEEE, Springer, PubMed, and MDPI). We finally chose 70 studies published between 2014 and 2025 for in-depth analysis after applying particular inclusion criteria and reviewing the titles and abstracts. The study selection process is graphically documented by the PRISMA flow diagram (Figure 1). With a primary focus on methods for lung nodule detection, segmentation, and classification, we assessed the included studies based on their methodologies, clinical applicability, performance metric accuracy, sensitivity, specificity, F1-score, false positive reduction, Dice similarity coefficient, and Intersection over Union.

Research Questions

The study addresses the following survey questions (SQs):

SQ1: Which Deep Learning (DL) architecture (e.g., CNN, ResNet, DenseNet, Transformer) is most frequently applied in recent literature (2015–2024) for lung nodule classification from CT scans?
SQ2: Which type of hybrid approach (e.g., CNN-SVM, DL-based Feature Extraction + Traditional ML) has demonstrated the most significant performance gain in reducing false positives in lung cancer screening?
SQ3: When evaluating models for lung nodule detection, which performance metric is prioritized by researchers to assess the model’s clinical viability?
SQ4: Which diagnostic sub-task (Detection, Segmentation, or Classification) presents the greatest technical challenge for current DL models in CT imaging?
SQ5: Which publicly available CT imaging dataset (e.g., LIDC-IDRI, NLST, LUNA16) is the most commonly cited and utilized for training and testing DL models in this domain?
SQ6: What is the biggest limitation associated with the publicly available datasets used in current lung cancer DL research?
SQ7: To ensure a model’s robustness and clinical relevance, which validation strategy is considered the most rigorous in recent DL literature?
SQ8: What is the single most significant practical challenge hindering the deployment of DL models for lung cancer diagnosis in a clinical setting?
SQ9: Which technique is most frequently used to enhance the interpretability of CNN-based lung cancer models?
SQ10: Besides classification, what is the primary role of traditional Machine Learning (ML) approaches (like SVM, Random Forest) when integrated into hybrid lung cancer models?

3.2. Literature Search and Selection

To find and choose pertinent journal articles that matched our research goals, we used a methodical search strategy as mentioned in Table 1. Using the following, a thorough search was carried out between 2015 and 2024:

Database: MDPI, Springer, IEEE Xplore, PubMed, and Scopus.
Grey literature includes conference proceedings on Computer Vision and Pattern Recognition (CVPR) and Medical Image Computing and Computer Assisted Intervention (MICCAI), as well as preprints (arXiv, Semantic scholars).
Keywords: “Deep Learning,” “Lung cancer,” “Pulmonary nodule,” “CT imaging,” “Computerized Tomography,” “Nodule Detection,” “Segmentation,” “Classification,” “CNN,” and “Vision Transformer.”
Boolean operators: Terms like “lung cancer” OR “pulmonary nodule” AND (“deep learning” OR “CNN” OR “Vision Transformer” OR “Hybrid CNN-Transformer”) AND (“CT imaging” OR “Computerized tomography”) AND (“Detection” OR “Segmentation” OR “Classification”) were combined using AND/OR.

3.3. Inclusion and Exclusion Criteria

Here we describe the criteria used for including and excluding research papers as shown in the Table 2.

3.4. Study Selection Process

A total of 612 articles were found through the document search and screening process: 15 from other sources (like ResearchGate and institutional repositories) and 597 from primary databases (Scopus, IEEE Xplore, PubMed, SpringerLink, MDPI, Elsevier, and Google Scholar). Eighty duplicate records (13.1%) were eliminated following the initial screening. Six of these exclusions were deemed to be incomplete or non-English publications, and 74 were discovered to be duplicate entries across several databases. After this, 532 distinct records remained, which moved on to the next level of screening. Due to factors like non-lung cancer focus, incompatible study design, or insufficient data, 462 records were eliminated during this second stage because they did not meet the inclusion criteria. Finally, 70 studies were included in the systematic review. The overall selection and filtering process is summarized in the PRISMA flow diagram (Figure 1), which provides a visual representation of the rigorous screening approach applied.

3.5. Study Characteristics

Figure 2, which shows the distribution of a subset of articles from 2016 to 2025, shows that the number of publications on deep learning (DL) approaches rose significantly beginning in 2017 and peaked in 2023. Furthermore, according to the percentage distribution across the five databases and other sources chosen, 18.26% of the studies were obtained from Scopus, whereas 7.10% were accessed through IEEE Xplore (refer to Figure 3).

4. Lung Cancer Prediction

Lung cancer is recognized as one of the leading causes of cancer-related mortality worldwide, posing a significant health threat to both men and women. It primarily originates in the epithelial cells that line the airways of the lungs, with 90–95% of cases attributed to this cellular origin [12]. Various symptoms, such as severe chest pain, persistent dry cough, breathlessness, and unexplained weight loss, indicate the manifestation of the disease [4]. A strong connection between smoking (especially tobacco consumption) and the occurrence of lung cancer is thoroughly recorded, with smoking being linked to more than 80% of cases [19]. Moreover, second-hand smoke exposure significantly increases the risk of this disease among non-smokers, which heightens the public health imperative for effective smoking cessation initiatives [25]. A comprehensive investigation of lung cancer’s complexities is essential for grasping its multifaceted nature, especially regarding tumor biology. Issues for investigation concerning this malignancy frequently stem from a requirement for more accurate prognostic methods based on tumor organization. Conventional classification methods depend on discrete separations, which may result in coarse evaluations that ignore the tumor’s heterogeneous characteristics. This diversity complicates the effective prediction of patient outcomes and requires a transition to more nuanced classification systems that take into account the different morphological and organizational characteristics of lung tumors [27]. The combination of cutting-edge imaging methods and novel computational strategies offers a promising way to tackle these research challenges. It highlights the promise of integrating radiological clinical images with artificial intelligence (AI) and deep learning technologies. By utilizing these tools, researchers aim to extract quantifiable features from imaging data such as CT scans and MRIs, which can significantly enhance predictive assessments in lung cancer [23].

Utilizing computer vision in this context enables the recognition of patterns and anomalies that traditional analyses may overlook, resulting in enhanced early detection and better-informed treatment plans [24]. Furthermore, research into deep learning algorithms within radiology suggests they could revolutionize the diagnosis of lung cancer. Researchers are endeavoring to create models that can diagnose lung cancer with accuracy comparable to or exceeding that of experienced radiologists by training these algorithms on large datasets of confirmed lung cancer cases. This advancement can not only speed up the diagnostic process but also help to standardize imaging interpretations across various healthcare facilities, thus improving overall care quality [33,42]. Another essential aspect of lung cancer research focuses on treatment methods and the variability in patient responses to conventional therapies. Targeted therapies and immunotherapies have advanced significantly, exposing the considerable variability in patients’ treatment responses. This inconsistency promotes a continuous investigation into the identification of biomarkers that might assist in choosing personalized therapies suited to individual patients. To achieve this goal, extensive datasets that link genetic and phenotypic data to treatment outcomes are crucial. Such datasets will facilitate the creation of precision medicine strategies that could enhance lung cancer patients’ prognoses and treatment effectiveness [28]. This paper emphasizes the necessity of such research initiatives, pointing out that the field of lung cancer management is slowly changing. The convergence of AI, deep learning, and conventional oncology offers a promising arena for innovation, leading to new interventions that could significantly change the course of the disease. With the ongoing investigation of lung cancer’s complexities, advanced technologies are becoming crucial for creating diagnostic and therapeutic frameworks that are more effective [43]. Lung cancer poses a significant public health challenge because of its high death rates and complex biological mechanisms. In-depth investigation of the research initiatives demonstrates a unified endeavor to leverage technological progress in order to tackle the complex problems linked to the disease. Filled with promise, the pathway toward better understanding, diagnosing, and treating lung cancer is emphasized by the joint efforts of researchers, clinicians, and technologists. To improve patient outcomes and alleviate the total burden of lung cancer on healthcare systems around the world, it is vital to keep investigating these research issues [44].

Lung cancer continues to be a major global health challenge, attributed to its high death rates and complex biological nature. Its close links with smoking and second-hand smoke highlight the pressing requirement for successful prevention and quitting initiatives. The diversity of lung tumors and the differences in how patients respond to treatment underscore the need for improved prognostic methods and individualized treatment strategies. By incorporating state-of-the-art technologies like AI, deep learning, and computer vision, we can revolutionize the early detection, precise diagnosis, and personalized treatment of lung cancer. Researchers are making great progress in addressing the shortcomings of conventional approaches by utilizing these tools in conjunction with detailed datasets and sophisticated imaging methods. As the landscape of lung cancer management continues to evolve, the collaborative efforts of researchers, clinicians, and technologists are paving the way for innovative diagnostic and therapeutic solutions. These advancements not only hold promise for improving patient outcomes, but also contribute to reducing the global healthcare burden of lung cancer. Continued research and innovation in this field remain imperative for addressing one of the most pressing public health concerns of our time.

5. Taxonomy of Lung Cancer Detection

In this section, we present a taxonomy of lung cancer detection techniques as shown in Figure 4. Following the taxonomy, we describe each section of the taxonomy briefly.

5.1. Machine Learning

From a large dataset, machine learning can provide a desired conclusion or forecast future data. It can typically direct machines to carry out activities by taking into account patterns of how those jobs should be performed [45].

5.1.1. Voting Classifier

In machine learning, voting classifiers are a kind of ensemble approach in which several base classifiers are applied separately, following which their future knowledge is aggregated [27]. Voting classifiers combine predictions from multiple machine learning models to improve overall accuracy. Each model (e.g., decision trees, SVMs, or neural networks) contributes a vote, and the majority of weighted decisions determines the final output. This ensemble method is particularly beneficial in lung cancer detection and diagnosis, enhancing model robustness by aggregating diverse predictive capabilities [4].

5.1.2. Support Vector Machine (SVM)

SVMs serve as an application of the supervised learning framework, focusing mainly on classification and regression tasks [46]. SVM is widely used for binary classification problems in medical imaging, especially lung cancer diagnosis. It works by finding the optimal hyperplane that separates data into distinct categories, such as malignant or benign nodules. In “Paper 3,” a hybrid algorithm incorporating SVM and Feed-Forward Back Propagation Neural Network (FFBPNN) achieved a classification accuracy of 98.08% by using polynomial kernels for feature optimization and classification [26]. These techniques reached a remarkable diagnostic accuracy of 99.5%, especially with Gradient Boosting (GB) and Support Vector Machine (SVM) [44].

5.1.3. Segmentation Techniques

In the fields of image processing and computer vision, segmentation techniques refer to methods that partition an image into multiple segments or regions based on particular characteristics or attributes. These techniques are widely used in a variety of fields, including self-driving cars, medical imaging, object identification, and satellite image processing [47]. In the preprocessing of medical images, segmentation plays a vital role in isolating regions of interest like the lung parenchyma. Methods comprise thresholding and morphological operations, which are frequently used to define lung boundaries and region growth using Watershed Segmentation [43], Genetic Cellular Neural Networks (G-CNN) [26], etc., that are effective for recognizing nodules in noisy CT scans. For robust segmentation, this advanced method integrates genetic algorithms with neural networks.

5.1.4. SilNet

This is a technique for evaluating the quality of clusters produced by clustering algorithms [32]. SilNet is in line with methods that enhance feature extraction through convolutional layers, which are essential for identifying subtle patterns in medical images [34].

5.1.5. Gradient Boosting Classification

GBoost, frequently referred to as Gradient Boosting Classification (GBC), is a robust parallel tree-boosting algorithm. Its design aims to meet a variety of data science challenges with efficiency and accuracy. This technique is often employed by data scientists to produce novel outcomes in a range of machine-learning tasks [48].

5.2. Deep Learning

The emphasis of deep learning is on algorithms that draw inspiration from how the neural networks of the human brain are structured and operate. These algorithms utilize various layers of abstraction to learn data representations on their own. The term “deep” refers to the use of deep neural networks (NNs), which are composed of multiple layers and enable information to be processed in a more unique and nuanced way [43].

5.2.1. Optimal Deep Learning

Optimal deep learning refers to attaining the best performance or outcomes on deep learning tasks. This encompasses tasks such as preparing the dataset, designing the model architecture, fine-tuning hyperparameters, employing training methodologies, and more [33]. Approaches based on deep learning, like Convolutional Neural Networks (CNNs), reach their best performance by automating the process of feature extraction. As an example, the hybrid Kernel Attribute Selected Classifier (KASC) integrates SVM and deep learning to improve classification precision in lung cancer detection [49].

5.2.2. Feature Extraction

In deep learning, feature extraction plays an important role by allowing models to learn from data in an effective way. Unlike traditional machine learning, which frequently depends on manually designed features, models leverage several layers of abstraction to effortlessly recognize and extract pertinent characteristics from raw data. This process improves the model’s ability to identify structures and produce reliable estimates [50].

5.2.3. Back Propagation Neural Network

An artificial neural network that employs the backpropagation algorithm for training is known as a Backpropagation Neural Network (BPNN). This method is a supervised learning approach that applies the chain rule to effectively compute gradients. It aids in gauging the contribution of each weight in the network to the total loss function, which makes it easier to update during training [51].

5.2.4. Deep Convolutional Neural Network

A Deep Convolutional Neural Network (DCNN) is a type of artificial neural network that processes data structured in a grid-like format, such as images. Due to their ability to automatically and adaptively learn the spatial hierarchies present in data, these networks are highly skilled at tasks such as object detection, image segmentation, and image classification [43]. It allows traditional CNNs to learn hierarchical features by using deeper architectures. These models are particularly effective in the diagnosis of lung cancer, as they can detect intricate patterns on CT images [26].

5.2.5. CAD Model

CAD (Computer-Aided Design) models are utilized in a variety of deep learning applications that call for the comprehension and manipulation of three-dimensional (3D) structures and forms [52]. CAD systems integrate machine learning algorithms to assist radiologists in detecting lung nodules. Models such as convolutional networks, decision trees, and gradient boost have been applied to improve detection sensitivity and reduce false positives [53].

5.2.6. Convolutional Neural Network

CNNs are especially effective for handling and analyzing visual data, such as images and videos [54]. They are currently used extensively in image classification tasks for medical imaging. Their layered architecture makes them ideal for extracting spatial features from CT scans.

5.2.7. Hopified Neural Network

A specific type of RNN known as a “Hopfield Neural Network” operates as a content-addressable memory system. This network allows retrieval of stored information based on partial or noisy input patterns, effectively functioning like associative memory [55]. It combines multiple layers of neural networks to process complex datasets.

5.2.8. Convolutional Recurrent Neural Network

Convolutional layers may first appear in a CRNN architecture, followed by recurrent layers. Recurrent layers record temporal correlations within the feature representations, while convolutional layers retrieve pertinent features from input data [44]. This architecture extends CNNs with recurrent layers to capture temporal or sequential features in the data.

5.2.9. Artificial Neural Network

It is a computational model that mimics the layout and processes of biological neural networks present in the human brain, composed of connected processing units or “neurons” arranged in hierarchical layers. All neurons process incoming signals and produce output signals based on processing [56]. ANNs are flexible models that mimic the structure of the human brain. They are used in classification and feature extraction tasks, such as identifying malignant nodules in extracted features [6].

5.3. Hybrid Techniques

In the domain of neural networks, integrating various neural network topologies or other machine learning methods to take advantage of their complementary strengths and enhance overall performance on a given task is commonly referred to as a hybrid technique [29].

5.3.1. Speed-Up Robust Features Technique with Genetic Algorithm and Feed-Forward Backpropagation Neural Network

In tasks like visual recognition and scene analysis, the Speeded Up Robust Features (SURF) algorithm can be combined with Genetic Algorithms (GA) and Feed-Forward Backpropagation Neural Networks (FFNN) to develop a robust hybrid approach [26]. SURF efficiently identifies local image features, which are then optimized with Genetic Algorithms (GA) to improve classification accuracy. This combination enhances the quality of the feature set and reduces computational complexity during lung cancer detection. The foundation of training neural networks is backpropagation, which allows for weight adjustments to reduce errors. Hybrid models employing Feed-Forward Back Propagation Neural Networks (FFBPNNs) for lung cancer classification have attained enhanced accuracy via iterative error correction [26].

5.3.2. AlexNet with SoftMax

In the AlexNet architecture, the softmax function is often used as the activation function in the output layer to convert the raw scores generated by the network into probabilities. Thus, in a given classification task, AlexNet can produce a likelihood distribution over various categories [29]. It is well-suited for high-resolution medical images, leveraging deep features for accurate categorization [31].

5.3.3. Convolutional Neural Network with Deep Belief Network

CNNs and DBNs can be integrated to create a powerful hybrid model suitable for some applications, such as object classification, feature extraction, and image classification [57]. A CNN-DBN hybrid integrates the spatial feature extraction capabilities of CNNs with the probabilistic feature learning of DBNs. This combination enhances the detection and classification of lung nodules, addressing challenges like variability in nodule size and shape [52].

6. Problems and Solutions According to the Proposed Taxonomy

This section discusses in detail all state-of-the-art methods in each of the three main sections (ML, DL, and Hybrid) presented in our lung cancer detection taxonomy.

6.1. Predicting Lung Cancer with Machine Learning Approaches

The study in [1] employed some machine learning classification algorithms to predict lung cancer, including Support Vector Machines (SVM), Logistic Regression, Naive Bayes, Decision Trees, K-nearest neighbors (KNN), and Random Forest classifiers. In some instances, multiple classifiers were combined to form voting classifiers for improving prediction accuracy. In order to improve radiologist sensitivity while reducing the impact on interpretation time, Computer-Aided Detection (CAD) systems should be realistically integrated into radiology workflows by directly connecting them to the Electronic Medical Record (EMR) and the Picture Archiving and Communication System (PACS), as shown in Figure 5. Computer-aided diagnostic (CAD) systems were first conceptualized in the 1960s for the analysis of radiographic images, with early applications in the detection of lung nodules using chest X-rays. This system later expanded to CT-based lung cancer detection following the invention of computed tomography in the 1970s, helping clinicians interpret medical images more effectively. Over time, CAD revolutionized radiology by reducing workloads and improving diagnostic precision. The technical difficulty of attaining a high sensitivity with a low false positive (FP) rate is a major obstacle to adoption, as it compromises system interpretability and clinician trust. Comparing and generalizing methods created on private databases is currently hampered by the absence of standardized data sharing among institutions, which is another major obstacle. Lastly, to handle security and regulatory issues and stop possible system attacks, strong software security assurance is required. According to the analysis of individual studies, the system by Demir and Camurcu, which utilized SVM and was based on outer surface texture features for classification, reported the highest sensitivity, making it the best-performing technique based on this core detection metric. The basis for this superior performance was the use of outer surface texture features, which were concluded to be useful in increasing sensitivity and decreasing the False Positive rate. Its top performance measures were a sensitivity of 98.03%, a selectivity (specificity) of 87.71%, and a False Positive (FP) rate of 2.45/scan. The researchers used the publicly available LIDC/IDRI dataset which, in this particular study, consisted of 200 CT scans containing 609 nodules; however, as the core classifier was an SVM, the metric of epochs performed was not applicable or reported. The significance of the work is in advancing automated systems for the early detection of lung nodules, which is critical since diagnosis at early stages increases the five-year survival rate to 54.4%. The study identifies key limitations in the field, including the persistent challenges of achieving high sensitivity with a low FP rate, detecting all varieties of nodules (based on size, shape, and position), and developing robust techniques that are successful across different databases.

The study [27] talked about how linear discriminant and principal component analysis can be utilized in data mining to identify lung cancer in patients. They first gathered all the data (pictures) needed for the training procedure, after which they are analyzed to determine whether the uploaded category of photographs contains any edges. Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Random Forest (RF), Artificial Neural Networks (ANN), and a Voting Classifier were among the classification and ensemble models used and compared in the study, “Early Stage Lung Cancer Prediction Using Various Machine Learning Techniques.” The Voting Classifier was the best model, according to the study, and was chosen primarily because it achieved the highest performance with an Accuracy (ACC) of 99.5%. The performance measure parameters and their values are as follows: GBC (PCA) achieved Accuracy, Precision, Recall, and F1-score values of 0.95; and SVC (LDA) achieved Accuracy of 0.95, Precision of 0.95, Recall of 0.94, and F1-score of 0.94. The study notes that its models achieved high accuracy for small datasets. The models were used on a microarray dataset that is frequently obtained from public repositories, though the precise size was not specified in the available snippets. Since an ensemble approach is the most effective model, the precise number of epochs executed was not specified. The research’s significance lies in providing a relatively more effective and cheaper approach to early-stage lung cancer prediction, which is essential for raising patient survival rates. Important limitations include the general complexity and computational demands of implementing an ensemble model such as the Voting Classifier, K-NN’s sensitivity to outliers and dependence on balanced data, and SVM’s inability to handle noisy and very large datasets. The dataset utilized comes from the LIDC-IDRI in DICOM format, which contains 1018 thoracic CT scan cases for a total of 244,527 CT images. Additionally, the study [47] used a Probabilistic Neural Network (PNN) as the primary classifier model and a novel segmentation technique, mainly focusing on a Computer-Aided Diagnosis (CAD) system for lung cancer detection. The segmentation method itself combines traditional optimal thresholding with operations based on the lung region’s centroid and convex edge characteristics. The CAD system using the suggested novel segmentation approach is regarded as the best model since it greatly enhanced the system’s diagnostic capabilities. The suggested approach achieved 97% accuracy, which is a significant improvement over the 88.5% accuracy of a system that relied solely on optimal thresholding. This conclusion is based on the diagnostic accuracy measure. Because the PNN can train quickly and ensure convergence to an ideal classifier, the model is significant because it can accurately segment the lungs even in the presence of peripheral pathology-bearing regions (PBRs), something that the traditional method was unable to do. Although the size of the image dataset and the number of epochs performed are not explicitly stated, the training set size consisted of 564 manually labeled Regions of Interest (ROIs) from a Chest CT Image Dataset that included JPEG images with 512 × 512 pixel resolution. The main drawback mentioned is that segmentation in asymmetric lungs is still difficult. The authors of [32] used five deep learning CNN architectures—GoogleNet, SqueezeNet, DenseNet, ShuffleNet, and MobileNetV2—to categorize CT lung tumors into benign and malignant groups, which were then compared. According to the study, the best CNN architecture for this classification task is GoogleNet. This model was regarded as the best due to its design, which successfully reduces the computational bottleneck and facilitates smooth operation by introducing an inception module to shrink dimensions. Accuracy, specificity, sensitivity, and Area Under the Curve (AUC) were used to gauge the model’s predictive performance. With an accuracy of 94.53%, specificity of 99.06%, sensitivity of 65.67%, and AUC of 86.84%, GoogleNet produced the best results. The Lung Image Database Consortium image collection (LIDC-IDRI), the biggest online lung CT dataset available, served as the dataset for the experiment. A total of 1646 datasets were gathered from the LIDC-IDRI database, resized to 244 × 244 pixels, and divided into 70% training and 30% testing. The Max Epochs parameter was set to 20 during the training process. The final drawback of GoogleNet is that more research is needed to increase the classification accuracy of lung lesions in CT images, even though the inception module contributes to its high accuracy and computational efficiency. The study in [41] assesses three approaches that combine Dimensionality Reduction (PCA, LDA), Feature Selection (Genetic Algorithm), and Deep Learning (CNN, ANN) with conventional Machine Learning classifiers (Gradient Boosting, Random Forest, Support Vector Machine). Because they both obtained the highest accuracy of 95% on the test set, the best models are the Support Vector Classification (SVC) on the LDA dataset and the Gradient Boosting Classification (GBC) on the PCA dataset. SVC (LDA) obtained Accuracy, Precision, Recall, and F1-score of 0.95, while GBC (PCA) obtained Accuracy, Precision, Recall, and F1-score of 0.95. The dataset is made up of 364 actual CT scan pictures from a Tehran, Iran, hospital. It is divided into two sets: 324 training images and 40 test images. For neural network models, the number of epochs executed was not specified explicitly. Although the main limitation is the recommendation for future work to improve the accuracy by possibly changing the order of feature selection and dimensional reduction, the significance lies in the novel framework of converting CT images to numerical features, applying dimensional reduction, and feature selection for the first time in this context. A comparison of accuracy in different ML models is shown in Figure 6.

The advancements in machine learning and computer-aided detection (CADe) systems have significantly enhanced the accuracy and efficiency of lung cancer detection and classification. Various algorithms, including Support Vector Machines, Logistic Regression, Decision Trees, and deep learning architectures such as GoogleNet and CNNs, have demonstrated exceptional predictive performance. Segmentation methods like thresholding, region growing, and morphological filtering are essential for preprocessing to guarantee accurate nodule detection and decrease computational demands. Research emphasizes that the diagnostic accuracy can be enhanced by combining multiple classifiers into voting systems and by incorporating advanced segmentation methods. Moreover, improving image quality with the help of novel preprocessing methods results in superior CADe systems. Lesion detection and classification in CT scans are significantly enhanced by the use of deep learning architectures, exemplified by models such as GoogleNet that attain outstanding accuracy and specificity. The recent developments emphasize how crucial it is to incorporate machine learning and image processing methods into lung cancer diagnosis. Researchers are tackling difficulties related to early detection by utilizing advanced algorithms and comprehensive datasets. This work enhances patient outcomes and lays the groundwork for precision medicine. The promising outcomes achieved through these approaches, as indicated in Figure 6 and Table 3 and Table 4, suggest that lung cancer management could be revolutionized and that the struggle against this worldwide health problem could lead to a brighter future.

6.2. Recognizing Lung Cancer Using Deep Learning Approaches

In this section, we describe the most important DL techniques used for lung cancer detection. The author of [33] detailed a model for CT image classification utilizing the ODNN and LDA Model, which integrates an Improved Gravitational Search Algorithm for recognition and Linear Discriminant Analysis (LDA) for classifying lung nodules. This combined model was determined to be the best-performing, achieving superior results compared to other tested methods, including KNN, neural networks, deep neural networks (DNN), and SVM. The basis for its selection lies in its high performance metrics: Accuracy of 94.56%, Sensitivity of 96.2%, and Specificity of 94.2%. The model was analyzed using a Standard CT Database comprising 50 low-dosage lung cancer CT images and the UCI ML database containing 1000 images (500 infected and 500 non-infected). The significance of this approach is its success in combining optimization and classification to surpass existing benchmarks, although the document does not specify its limitations or the number of epochs used. The paper also referenced the TCIA dataset (1449 images), used with a Deep Screener algorithm in related work. By using feature extraction, the study [50] eliminated non-nodule cancer images. Time complexity was induced by multilayer perceptrons, notwithstanding their 95% accuracy. Researchers have explored various techniques for classifying different types of cancer. For instance, a Deep Convolutional Neural Network achieved classification accuracies of 89.0% for adenocarcinoma, 60.0%, and 70.3% for squamous and large cell carcinoma. Meanwhile, the authors of [51] employed the Support Vector Machine (SVM) algorithm for classification, integrated into an image processing method that first performs Discrete Waveform Transform (DWT) for feature extraction. This SVM-based approach is considered the leading solution because it successfully enhances the sensitivity and accuracy of Computer-Aided Detection (CAD) systems by overcoming the deficiencies of other models, such as genetic algorithms (prone to blind mutation) and fuzzy logic (reliant on approximation). The model achieved a 95.16% Accuracy, 98.21% Sensitivity, and 78.69% Specificity on the LIDC dataset, which consists of 1018 thoracic CT scan cases and a total of 244,527 CT images. The primary limitation cited in the study is the challenge posed by complex image processing techniques and the insufficiency of training data images. The number of epochs performed is not applicable to the SVM algorithm used. The paper [42] presented a lung cancer recognition approach using Gray Level Co-occurrence Matrix (GLCM) features and an Artificial Neural Network with a back-propagation algorithm. The study utilized data from 50 CT scan images recognized from the Cancer Imaging Archive Database. The methodology involved several steps: image pre-processing, segmentation, feature extraction, and the application of a three-layer back-propagation neural network for identifying tumor growth. The results indicated that this framework achieved over 80% accuracy in differentiating between lung cancer and healthy lung tissue. A review of DL techniques for lung cancer diagnosis with computed tomography imaging is described in [58]. Vision transformer is used with CNN to detect lungs cancer effectively [59]. Additionally, the research in [43] primarily used a Convolution Neural Network (CNN) model designed to recognize and categorize three types of lung tissues: benign tissue, adenocarcinoma, and squamous cell carcinoma. This model was considered the best as it demonstrated superior results with greater accuracy rates compared to other pre-trained CNN models and achieved a high level of performance. The model’s performance measures include a Training Accuracy of 98.15% and a Validation Accuracy of 98.07%, with a Monte Carlo Average and Weighted Average of 0.99. The study used the LC25000 Lung and colon histopathological image dataset, utilizing a subset of 15,000 images (split for training, validation, and testing), but the number of epochs performed was not specified. The significance of the model is that it provides a feasible, efficient, and highly accurate method to help pathologists identify lung cancer with less time, effort, and cost. A potential limitation is that future work may involve exploring different CNN architectures and hyperparameter tuning for further enhancement. The lungs dataset samples used are shown in Figure 7.

The research in [54] utilized and compared five Transfer Learning (TL) architectures based on Convolutional Neural Networks (CNNs): MobileNet, VGG16, VGG19, DenseNet-201, and ResNet-101, to classify lung CT scans as normal, malignant, or benign. The DenseNet-201 model was selected as the best because it achieved the highest accuracy among the five compared models in the experiments. Its reported performance measures are a mean Accuracy of 53%, a mean Recall of 43%, a mean Precision of 43%, and a mean F1-score of 43%. The study used a dataset of 1100 lung CT scans, but the number of epochs performed was not specified in the provided text. The model’s significance lies in leveraging the high performance of pre-trained TL architectures for early lung cancer detection. The main limitation is that the current model only processes a single CT scan slide, and future research should examine the lungs from all angles, potentially using a 3D CNN. On the other hand, the work in [55] addressed pre-processing challenges related to the fluctuation in gray levels and relative contrast in CT images, which affect segmentation accuracy. To address problems related to intensity variation, they used a rule-based thresholding classifier in a pre-processing step. This classifier proved effective in recognizing cytoplasmic and nuclear areas, removing debris, and establishing optimal threshold values. The method achieved high sensitivity (83%), specificity (99%), and overall accuracy (98%). A diverse range of lung nodules is shown in Figure 8.

The technique used in [60] utilizes an SVM to classify nodules as benign or malignant using watershed segmentation to identify the cancerous nodule from the lung CT scan image. The accuracy of the classifier is 86.6%, while the proposed model identifies cancer with 92% accuracy. Meanwhile, the study [44] proposed a novel deep learning model called Lung-RetinaNet, which is an improved RetinaNet architecture. The model was considered the best because its new structure incorporates a multi-scale feature fusion block, dilated convolution, and a contextual information module to enhance the localization and efficient detection of tiny lung tumors, significantly surpassing existing techniques. The reported performance measures are exceptionally high: Accuracy of 99.8%, Recall of 99.3%, Precision of 99.4%, F1-score of 99.5%, and AUC of 0.989. The study used a fused CT scan dataset for training and testing, but the exact size and the number of epochs performed were not explicitly specified in the provided text. The significance of the model is its ability to achieve highly accurate detection of tiny lung tumors and to overcome the critical issue of class imbalance. The primary limitations include the model’s restriction to limited input image resolution and the inherent difficulty in distinguishing tiny tumors from background tissues. The research [61] compared a Custom CNN model and a VGG16 model with Transfer Learning (TL), determining that VGG16 with TL was the best model due to its highest overall accuracy. Its performance measures were: Accuracy 96.05%, Precision 96.06%, Recall/Sensitivity (96.07%, F1-Score 96.06%, and AUC 96.07%. The model was trained for 50 epochs using the public Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) dataset, which contains 1114 CT scan images. The model’s limitation is that it only performs 2D classification on single CT slices, thereby failing to leverage the complete 3D volumetric information, but its significance is its proven ability to enhance lung cancer detection accuracy and reliability for clinical decision-making. The research in [62] utilized a Genetic Folding Strategy (GFS)-based Support Vector Machine (SVM) and compared it against several other models, including three standard SVM kernels (linear, polynomial, and radial basis function), Random Forest, Logistic Regression, Linear Regression, Gaussian Naive Bayes, Gradient Boosting, K-Neighbors, AdaBoost, and Quadratic Discriminant Analysis. According to the study, the proposed GFS model was the best, achieving the highest accuracy of 96.2% based on performance evaluations conducted on a real lung cancer dataset sourced from Kaggle repositories. The superiority of the GFS model was established based on its highest accuracy, 96.2%, an AUC of 97% from the ROC curve, a low mean squared error, a standard deviation of 2.03, and a favorable trade-off in complexity and testing time, outperforming the next best models, Random Forest (95.8% accuracy) and Quadratic Discriminant Analysis (96.1% accuracy). The dataset initially contained 309 patients (33 excluded due to null values), resulting in a final size of 276 instances (39 benign, 270 malignant), and the experiments were conducted over 300 generations with a population size of 50, using 5-fold cross-validation. Despite its high performance, the limitations of the GFS model include increased computational complexity and longer training times due to its evolutionary approach, potential overfitting on smaller datasets, and the need for careful hyperparameter tuning; nonetheless, its significance lies in providing a highly accurate, novel AI-driven tool for lung cancer classification that can assist in early diagnosis and better patient stratification for targeted therapies, thereby showcasing the potential of evolutionary algorithms in enhancing healthcare analytics. A lungs cancer epidemiology study and future directions are described in [63].

Support Vector Machines (SVMs) gained significant interest among pattern recognition researchers in the early 21st century. Lung cancer detection, once a major research focus, involved various methods and algorithms, including SVM and linear Discriminant Analysis (LDA), among others. In the study [56], the authors proposed that an Artificial Neural Network (ANN) model could accurately diagnose lung cancer effectively with a success rate of 96.67%. After 1,418,105 learning cycles, the model achieved a training error rate of less than 1%. Moreover, the study identified age as the most influential factor in predicting lung cancer. Accuracy comparison of various DL techniques is shown in Figure 9. Table 5 summarizes performance measures, and Table 6 summarizes the datasets used in the above DL models.

6.3. Lung Cancer Prediction Using Hybrid Techniques

A detailed overview of the characteristics of SIFT (Scale-Invariant Feature Transform) and SURF (Speeded Robust Features), which builds upon the SIFT algorithm, is presented in [26]. Their study utilized 827 images, achieving a 96% success rate in detecting nodules, with no false positives reported. Thus, it can be inferred from the preceding discussion that SURF uses the classification results to compute more efficient features. They utilized a Genetic Algorithm (GA) in their research to optimize features that were extracted using the SURF method. To classify, they employed Support Vector Machines (SVM) to differentiate between two types of lung cancer: “Benign” and “Malignant.” The proposed KASC hybrid algorithm technique incorporated the SURF feature extraction method, optimization based on GA, and SVM with a polynomial kernel, in conjunction with a Feed-Forward Backpropagation Neural Network (FFBPNN). When this hybrid method was applied to a dataset comprising 500 CT image samples, it yielded remarkable results, with average precision, accuracy, recall, and f-measure values of 98.17%, 98.08%, 96.5%, and 97%, respectively. Nonetheless, certain domains necessitate additional examination, including a comparison of the performance of Deep Neural Networks with that of FFBPNN to investigate possible enhancements in classification accuracy. Different image operations can be performed that help in analyzing the image properly. Some of these operations are shown in Figure 10.

The research [64] primarily utilized a hybrid Convolutional Neural Network–Support Vector Machine (CNN-SVM) model to classify lung CT images into four categories: adenocarcinoma, large cell carcinoma, squamous cell carcinoma, and normal. This hybrid model was identified as the best in their study based on its high classification accuracy, achieving a testing accuracy of 97.91%, along with other performance measures including an average sensitivity of 97.90%, specificity of 99.32%, precision of 97.96%, and a perfect Area Under the Curve (AUC) of 1.000. The study employed the Chest CT-Scan Images Dataset, initially comprising 1000 images, which was expanded to 5103 images via data augmentation using color transformation techniques, and the model was trained for 60 epochs. Despite its high performance, a limitation of the proposed CNN-SVM is its potential dependency on the specific augmented dataset used, and its generalizability to other medical image types or larger, more diverse datasets requires further validation. The significance of the model lies in its demonstrated capability to enhance the accuracy of computer-aided detection (CAD) systems for early lung cancer diagnosis, leveraging the feature extraction strength of CNN and the classification efficiency of SVM to effectively process medical images even with a moderately sized dataset. The research in [52] proposed and utilized a novel hybrid deep learning model called Cancer Cell Detection and Classification using Hybrid Neural Network (CCDCHNN), which integrates an Advanced 3D-Convolutional Neural Network (3D-CNN) with a Recurrent Neural Network (RNN), specifically employing a Bidirectional LSTM, and identified this hybrid model as the best according to their study. The selected CCDC-HNN model was deemed superior based on its enhanced performance in classifying lung nodules, achieving an accuracy (ACC) of 95%, a sensitivity (SE) of 87%, and a specificity (SP) of 90%, outperforming standalone 3D-CNN and RNN models. The study employed the LUNA16 dataset, a subset of the LIDC-IDRI database, which consisted of 888 patients, split into 710 for training and 178 for validation, and the model was trained in two stages involving the locking and unlocking of CNN layers, though a specific total number of epochs was not explicitly stated. A significant limitation of the proposed model is its computational complexity and the substantial resources required for training 3D-CNNs on volumetric CT data, while its significance lies in its potential for early and accurate diagnosis of lung cancer by effectively distinguishing between benign and malignant tumors through a hybrid approach that leverages both spatial and sequential feature learning. The research [3] presented the VIT-DCNN model, which is a hybrid deep learning architecture that integrates a Vision Transformer and a Deformable CNN to enhance the detection of lung and colon cancers in histopathological images. The model was both trained and evaluated using the Lung and Colon Cancer Histopathological Images Dataset, which contains 25,000 images divided into five classes. Upon data splitting and augmentation application, the model demonstrated excellent performance on the test set, achieving an accuracy of 94.24%, precision of 94.37%, and an F1-score of 94.23%, surpassing numerous other well-known models. The research [29] utilized and compared several deep learning models, including LeNet, AlexNet, and VGG-16, and identified the hybrid AlexNet with a softmax classifier as the best model based on its highest achieved accuracy, attaining a performance of 99.52% accuracy, a precision of 99.203%, a recall of 88.265%, an F1-Score of 93.416%, and a loss of 0.649%, thereby outperforming other configurations like AlexNet+SVM and AlexNet+Deep kNN. The study employed a custom dataset of 500 CT images (250 cancer and 250 normal), which was augmented to improve training; however, the specific number of training epochs was not explicitly stated. A significant limitation of this model is its development and validation on a very small dataset, which raises concerns about its generalizability and robustness to more diverse and larger clinical datasets. The significance of the model lies in its demonstration of how a classic architecture like AlexNet, when combined with a softmax layer, can achieve exceptionally high accuracy on a specific task, presenting a potentially efficient and sustainable tool for assisting in the early detection of lung cancer from CT scans. The research [57] examined various deep learning models for lung nodule Computer-Aided Diagnosis (CAD) from CT scans, primarily focusing on Convolutional Neural Network (CNN) and Deep Belief Network (DBN)-based methods. The best model, representing one of the reviewed classification schemes, was selected on the basis of achieving the highest overall accuracy, with its key performance measures including a maximum Accuracy of 97.58%, and some models achieving high Sensitivity scores such as 98.1%, 97.19%, 94.19%, and 90.70%. The dataset used and discussed in the paper is the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI). The main limitations identified for these models are the insufficient annotated datasets and over-fitting issues, as well as the need to improve the interpretability of neural network models. The significance of the research lies in its comprehensive review of deep learning applications for lung nodule detection and diagnosis, guiding future research in developing more robust CAD systems. An accuracy comparison of different hybrid models is shown in Figure 11. Table 7 and Table 8 summarize the performance measures and datasets used in the above hybrid technique studies.

Figure 12 shows the sensitivity and specificity, and Table 9 summarizes the significant research works focused on lung cancer detection, presenting their techniques, limitations, evaluation criteria, and datasets.

7. Discussion and Future Directions

The findings of this study emphasize the transformative role of machine learning (ML) and deep learning (DL) techniques in lung cancer prediction and diagnosis. With accuracy rates ranging from 90% to 99.5%, these computational approaches demonstrate their capability to effectively detect and classify lung nodules using CT imaging data. Notable algorithms such as Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and hybrid models that incorporate advanced feature extraction techniques, including Speeded-Up Robust Features (SURF) and Gray Level Co-occurrence Matrix (GLCM), have proven to be highly effective. By automating key processes like nodule detection and classification, these methodologies offer the potential to minimize human error, expedite diagnoses, and significantly improve patient outcomes. Despite the promising advancements, certain limitations persist. The reliance on relatively small datasets for training and validation raises concerns about the generalizability of these models across diverse populations. Integrating Computer-Aided Diagnosis (CAD) systems focused on lung cancer into radiology workflows offers significant potential by acting as a “second pair of eyes,” flagging suspicious nodules and potentially reducing diagnostic time and human error. However, several barriers impede their realistic integration. Regulatory hurdles are a major challenge, requiring rigorous validation to prove safety and efficacy before a system can be used clinically. Additionally, the “black box” nature of many deep learning models creates an interpretability problem, as radiologists need to understand a system’s reasoning to trust its recommendations. Finally, concerns around patient data sharing and privacy, governed by regulations like HIPAA and GDPR, make it difficult to amass the large, diverse datasets necessary to train generalizable and robust models. Addressing these issues—through clearer regulatory pathways, advancements in Explainable AI (XAI), and innovative data-sharing frameworks—is crucial for moving these promising technologies from research to practical clinical application. Additionally, some hybrid approaches and deep learning algorithms encounter challenges in detecting subtle anomalies or addressing the heterogeneity of nodules, particularly in early-stage cancers. Computational complexity further restricts the feasibility of deploying these models in resource-limited healthcare environments, where simplicity and efficiency are paramount. These limitations highlight the need for robust solutions to bridge the gap between theoretical effectiveness and practical application. Future research should address these challenges by focusing on several key areas. Expanding datasets to include larger and more diverse samples can enhance the reliability and generalizability of diagnostic models. Optimizing algorithms to balance accuracy with computational efficiency is also critical to their widespread adoption, especially in resource-constrained settings. In addition, validation of these techniques in real-world clinical environments will ensure that the proposed methods are not only accurate but also practical and effective when integrated into existing workflows. Finally, exploring the seamless integration of these tools with clinical decision-making processes will enable real-time application, thereby supporting healthcare professionals in making timely and informed diagnoses. This study highlights the immense potential of machine learning and deep learning techniques in revolutionizing lung cancer detection and diagnosis. The high accuracy rates achieved through various algorithms underscore their value in enhancing early detection, which is crucial to improving survival rates. However, addressing limitations such as the size of the data set, the algorithmic complexity, and the real-world validation is essential for realizing their full potential. The following are the main future research directions that can help in improving lung cancer detection and prediction.

7.1. Limited Dataset Diversity

Models are frequently trained on datasets that lack diversity or are limited in size (e.g., LIDC-IDRI), which restricts their generalizability [1,26,41,44].

7.2. Early-Stage Cancer Detection

Nodules that are small or subtle in the initial phase of cancer are more difficult to identify, resulting in reduced sensitivity [24,55,57].

7.3. Computational Complexity

High computational resources are required for deep learning models, which makes it difficult to deploy them in low-resource settings [44,51,54].

7.4. Real-World Validation

A number of models are not validated in clinical settings that involve real-time data [18,44,65].

7.5. Integration of Multi-Modal Data

The majority of studies concentrate on CT scans; accuracy could be enhanced by incorporating genomics, patient history, or other imaging modalities [18,24,44].

8. Conclusions

To sum up the relevant literature focused on advanced methods and techniques for lung cancer detection, this survey reviewed the reported effectiveness of deep and machine learning approaches designed to detect and classify lung cancer, mainly based on CT scan data. Several algorithms, such as SVM, CNN, and hybrid models, have demonstrated high accuracy rates in identifying nodules and distinguishing between benign and malignant lesions. Combining classification algorithms with feature extraction techniques such as SURF and GLCM can further enhance prediction accuracy. The presented results indicate that these methods could assist radiologists in identifying and diagnosing patients earlier, thereby enhancing their outcomes. Future research may focus on refining existing algorithms, exploring additional feature extraction methods, and validating the reported results on larger and multi-modal datasets to further enhance the performance of lung cancer prediction models.

Author Contributions

Conceptualization, F.U.N. and A.K.M.; methodology, A.B.Z., F.U.N. and A.K.M.; software, A.K.M. and N.Q.; validation, F.U.N. and A.K.M.; formal analysis, A.B.Z. and F.U.N.; investigation, A.B.Z. and F.U.N.; resources, A.K.M. and N.Q.; writing—original draft preparation, A.B.Z.; writing—review and editing, F.U.N., A.K.M. and N.Q.; visualization, A.B.Z. and A.K.M.; supervision, A.K.M. and N.Q.; project administration, F.U.N. and A.K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We confirm that neither the manuscript nor any parts of its content are currently under consideration for publication with or published in another journal. All authors have approved the manuscript and agree with its submission to LabMed.

Conflicts of Interest

The authors declare no conflicts of interest.

References

ur Rehman, M.Z.; Javaid, M.; Shah, S.I.A.; Gilani, S.O.; Jamil, M.; Butt, S.I. An appraisal of nodules detection techniques for lung cancer in CT images. Biomed. Signal Process. Control 2018, 41, 140–151. [Google Scholar] [CrossRef]
Ausawalaithong, W.; Thirach, A.; Marukatat, S.; Wilaiprasitporn, T. Automatic lung cancer prediction from chest x-ray images using the deep learning approach. In Proceedings of the 2018 11th Biomedical Engineering International Conference (BMEiCON), Chiang Mai, Thailand, 21–24 November 2018; pp. 1–5. [Google Scholar] [CrossRef]
Saleh, A.Y.; Chin, C.K.; Penshie, V.; Al-Absi, H.R.H. Lung cancer medical image classification using hybrid CNN-SVM. Int. J. Adv. Intell. Inform. 2021, 7, 151–162. [Google Scholar] [CrossRef]
Patra, R. Prediction of lung cancer using machine learning classifier. In Computing Science, Communication and Security, Proceedings of the First International Conference, COMS2 2020, Gujarat, India, 26–27 March 2020; Springer: Singapore, 2020; pp. 132–142. [Google Scholar] [CrossRef]
Tidke, D.; Banait, S.S. Harnessing deep learning for lung cancer detection and prevention: A comprehensive survey. EPJ Web Conf. 2025, 328, 01007. [Google Scholar] [CrossRef]
Samhitha, B.K.; Mana, S.C.; Jose, J.; Vignesh, R.; Deepa, D. Prediction of lung cancer using convolutional neural network (CNN). Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 3361–3365. [Google Scholar] [CrossRef]
Luo, G.; Zhang, Y.; Etxeberria, J.; Arnold, M.; Cai, X.; Hao, Y.; Zou, H. Projections of lung cancer incidence by 2035 in 40 countries worldwide: Population-Based study. JMIR Public Health Surveill. 2023, 9, e43651. [Google Scholar] [CrossRef]
Jayaram, J.; Haw, S.-C.; Palanichamy, N.; Anaam, E.; Thillaigovindhan, S.K. A systematic review on effectiveness and contributions of machine learning and deep learning methods in lung cancer diagnosis and classifications. Int. J. Comput. Digit. Syst. 2025, 17, 1–12. [Google Scholar] [CrossRef]
Bhise, S.S.; Khot, S.R. Early Stage Lung Cancer Diagnosis using ANN Classifier. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021; pp. 22–27. [Google Scholar] [CrossRef]
Afroze, F.; Nishat, L.; Arjuman, F.; Yesmin, Z.A.; Nahar, L.; Tanjin, R. Expression of Ki-67 and E-cadherin in patients with non-small cell lung cancer attending a tertiary care hospital. Bangabandhu Sheikh Mujib Med. Univ. J. 2023, 16, 1–6. [Google Scholar] [CrossRef]
Firdaus, Q.; Sigit, R.; Harsono, T.; Anwar, A. Lung cancer detection is based on ct-scan images with detection features using gray-level co-occurrence matrix (GLCM) and support vector machine (SVM) methods. In Proceedings of the 2020 International Electronics Symposium (IES), Surabaya, Indonesia, 29–30 September 2020. [Google Scholar] [CrossRef]
Nadkarni, N.S.; Borkar, S. Detection of Lung Cancer in CT Images using Image Processing. In Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 23–25 April 2019. [Google Scholar] [CrossRef]
Pawar, A.B.; Jawale, M.A.; William, P.; Chhabra, G.S.; Rakshe, D.S.; Korde, S.K.; Marriwala, N. Implementation of blockchain technology using extended CNN for lung cancer prediction. Meas. Sens. 2022, 24, 100530. [Google Scholar] [CrossRef]
Yu, C.; Fan, C.-Q.; Chen, Y.-X.; Guo, F.; Rao, H.-H.; Che, P.-Y.; Zuo, C.-J.; Chen, H.-W. Global research trends and emerging hotspots in nano-drug delivery systems for lung cancer: A comprehensive bibliometric analysis (1998–2024). Discov. Oncol. 2025, 16, 33. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Li, J.; Chang, S.; Dong, Y.; Che, G. Risk and influencing factors for subsequent primary lung cancer after treatment of breast cancer: A systematic review and two meta-analyses based on four million cases. J. Thorac. Oncol. 2021, 16, 1893–1908. [Google Scholar] [CrossRef] [PubMed]
Wade, S.; Ngo, P.; He, Y.; Caruana, M.; Steinberg, J.; Luo, Q.; David, M.; McWilliams, A.; Fong, K.; Canfell, K.; et al. Estimates of the eligible population for Australia’s targeted National Lung Cancer Screening Program, 2025–2030. Public Health Res. Pract. 2024, 35, 2025–2030. [Google Scholar] [CrossRef]
Abugabah, A.; AlZubi, A.A.; Al-Obeidat, F.; Alarifi, A.; Alwadain, A. Data mining techniques for analyzing healthcare conditions of urban space-person lung using meta-heuristic optimized neural networks. Clust. Comput. 2020, 23, 1781–1794. [Google Scholar] [CrossRef]
Chiu, H.-Y.; Chao, H.-S.; Chen, Y.-M. Application of artificial intelligence in lung cancer. Cancers 2022, 14, 1370. [Google Scholar] [CrossRef] [PubMed]
Hatuwal, B.K.; Thapa, H.C. Lung cancer detection using convolutional neural network on histopathological images. Int. J. Comput. Trends Technol. 2020, 68, 21–24. [Google Scholar] [CrossRef]
Luo, Q.; Yu, X.Q.; Wade, S.; Caruana, M.; Pesola, F.; Canfell, K.; O’Connell, D.L. Lung cancer mortality in Australia: Projected outcomes to 2040. Lung Cancer 2018, 125, 68–76. [Google Scholar] [CrossRef]
Makaju, S.; Prasad, P.W.C.; Alsadoon, A.; Singh, A.K.; Elchouemi, A. Lung Cancer Detection using CT Scan Images. Procedia Comput. Sci. 2018, 125, 107–114. [Google Scholar] [CrossRef]
Malvezzi, M.; Santucci, C.; Boffetta, P.; Collatuzzo, G.; Levi, F.; La Vecchia, C.; Negri, E. European cancer mortality predictions for the year 2023 with a focus on lung cancer. Ann. Oncol. 2023, 34, 410–419. [Google Scholar] [CrossRef]
Kanan, M.; Alharbi, H.; Alotaibi, N.; Almasuood, L.; Aljoaid, S.; Alharbi, T.; Albraik, L.; Alothman, W.; Aljohani, H.; Alzahrani, A.; et al. AI-driven models for diagnosing and predicting outcomes in lung cancer: A systematic review and meta-analysis. Cancers 2024, 16, 674. [Google Scholar] [CrossRef]
Thanoon, M.A.; Zulkifley, M.A.; Mohd Zainuri, M.A.A.; Abdani, S.R. A review of deep learning techniques for lung cancer screening and diagnosis based on CT images. Diagnostics 2023, 13, 2617. [Google Scholar] [CrossRef] [PubMed]
Bharati, S.; Podder, P.; Mondal, R.; Mahmood, A.; Raihan-Al-Masud, M. Comparative performance analysis of different classification algorithm for the purpose of prediction of lung cancer. In Intelligent Systems Design and Applications, Proceedings of the 18th International Conference on Intelligent Systems Design and Applications (ISDA 2018), Vellore, India, 6–8 December 2018; Springer International Publishing: Cham, Switzerland, 2019; pp. 447–457. [Google Scholar] [CrossRef]
Nanglia, P.; Kumar, S.; Mahajan, A.N.; Singh, P.; Rathee, D. A hybrid algorithm for lung cancer classification using SVM and Neural Networks. ICT Express 2021, 7, 335–341. [Google Scholar] [CrossRef]
Thallam, C.; Peruboyina, A.; Raju, S.S.T.; Sampath, N. Early-stage lung cancer prediction using various machine learning techniques. In Proceedings of the 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020. [Google Scholar] [CrossRef]
Pradhan, A.; Sarma, B.; Dey, B.K. Lung Cancer Detection using 3D Convolutional Neural Networks. In Proceedings of the 2020 International Conference on Computational Performance Evaluation (ComPE), Shillong, India, 2–4 July 2020; pp. 765–770. [Google Scholar] [CrossRef]
Subramanian, R.R.; Mourya, R.N.; Reddy, V.P.T.; Reddy, B.N.; Amara, S. Lung cancer prediction using deep learning framework. Int. J. Control Autom. 2020, 13, 154–160. [Google Scholar]
Shahadat, N.; Lama, R.; Nguyen, A. Lung and colon cancer detection using a deep AI model. Cancers 2024, 16, 3879. [Google Scholar] [CrossRef]
Hosni, M.; Carrillo-de-Gea, J.M.; Idri, A.; Fernandez-Aleman, J.L.; Garcia-Berna, J.A. Using ensemble classification methods in lung cancer disease. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019. [Google Scholar] [CrossRef]
Ashhar, S.M.; Mokri, S.S.; Rahni, A.A.A.; Huddin, A.B.; Zulkarnain, N.; Azmi, N.A.; Mahaletchumy, T. Comparison of deep learning convolutional neural network (CNN) architectures for CT lung cancer classification. Int. J. Adv. Technol. Eng. Explor. 2021, 8, 126–134. [Google Scholar] [CrossRef]
Raoof, S.S.; Jabbar, M.A.; Fathima, S.A. Lung Cancer Prediction using Machine Learning: A Comprehensive Approach. In Proceedings of the 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, 5–7 March 2020. [Google Scholar] [CrossRef]
Mhaske, D.; Rajeswari, K.; Tekade, R. Deep Learning Algorithm for Classification and Prediction of Lung Cancer using CT Scan Images. In Proceedings of the 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), Pune, India, 19–21 September 2019; pp. 1–5. [Google Scholar] [CrossRef]
Zhou, Z.-H.; Jiang, Y.; Yang, Y.-B.; Chen, S.-F. Lung cancer cell identification based on artificial neural network ensembles. Artif. Intell. Med. 2020, 24, 25–36. [Google Scholar] [CrossRef]
Radhika, P.R.; Nair, R.A.S.; Veena, G. A Comparative Study of Lung Cancer Detection using Machine Learning Algorithms. In Proceedings of the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 20–22 February 2019; pp. 1–4. [Google Scholar] [CrossRef]
Karimullah, S.; Khan, M.; Shaik, F.; Alabduallah, B.; Almjally, A. An integrated method for detecting lung cancer via CT scanning via optimization, deep learning, and IoT data transmission. Front. Oncol. 2024, 14, 1435041. [Google Scholar] [CrossRef]
Kauczor, H.-U.; Baird, A.-M.; Blum, T.G.; Bonomo, L.; Bostantzoglou, C.; Burghuber, O.; Čepická, B.; Comanescu, A.; Couraud, S.; Devaraj, A.; et al. ESR/ERS statement paper on lung cancer screening. Eur. Respir. J. 2020, 55, 1900506. [Google Scholar] [CrossRef]
Fry, W.A.; Phillips, J.L.; Menck, H.R. Ten-year survey of lung cancer treatment and survival in hospitals in the United States. Cancer 1999, 86, 1867–1876. [Google Scholar] [CrossRef]
Mohamed, T.I.A.; Ezugwu, A.E.-S. Enhancing lung cancer classification and prediction with deep learning and multi-omics data. IEEE Access 2024, 12, 59880–59892. [Google Scholar] [CrossRef]
Maleki, N.; Niaki, S.T.A. An intelligent algorithm for lung cancer diagnosis using extracted features from Computerized Tomography images. Healthc. Anal. 2023, 3, 100150. [Google Scholar] [CrossRef]
Mukherjee, S.; Bohra, S.U. Lung cancer disease diagnosis using machine learning approach. In Proceedings of the 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), Thoothukudi, India, 3–5 December 2020. [Google Scholar] [CrossRef]
Karim, D.Z.; Bushra, T.A. Detecting Lung Cancer from Histopathological Images using Convolution Neural Network. In Proceedings of the TENCON 2021—2021 IEEE Region 10 Conference (TENCON), Auckland, New Zealand, 7–10 December 2021. [Google Scholar] [CrossRef]
Mahum, R.; Al-Salman, A.S. Lung-RetinaNet: Lung cancer detection using a retinanet with multi-scale feature fusion and context module. IEEE Access 2023, 11, 53850–53861. [Google Scholar] [CrossRef]
Ferdous, M.; Debnath, J.; Chakraborty, N.R. Machine learning algorithms in healthcare: A literature survey. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
Ghosh, S.; Dasgupta, A.; Swetapadma, A. A Study on Support Vector Machine based Linear and Non-Linear Pattern Classification. In Proceedings of the 2019 International Conference on Intelligent Sustainable Systems (ICISS), Palladam, India, 21–22 February 2019; pp. 24–28. [Google Scholar] [CrossRef]
Elizabeth, D.S.; Nehemiah, H.K.; Raj, C.S.R.; Kannan, A. A novel segmentation approach for improving diagnostic accuracy of CAD systems for detecting lung cancer from chest computed tomography images. J. Data Inf. Qual. (JDIQ) 2012, 3, 1–16. [Google Scholar] [CrossRef]
Subhi Malallah, H.; Bahjat Abdulrazzaq, M. Web-Based agricultural management products for marketing system: Survey. Acad. J. Nawroz Univ. 2023, 12, 49–62. [Google Scholar] [CrossRef]
Aharonu, M.; Kumar, R.L. Systematic review of deep learning techniques for lung cancer detection. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 725–736. [Google Scholar] [CrossRef]
Sultana, A.; Khan, T.T.; Hossain, T. Comparison of four transfer learning and hybrid CNN models on three types of lung cancer. In Proceedings of the 2021 5th International Conference on Electrical Information and Communication Technology (EICT), Khulna, Bangladesh, 17–19 December 2021. [Google Scholar] [CrossRef]
Kaucha, D.P.; Prasad, P.W.C.; Alsadoon, A.; Elchouemi, A.; Sreedharan, S. Early detection of lung cancer using SVM classifier in biomedical image processing. In Proceedings of the 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), Chennai, India, 21–22 September 2017. [Google Scholar] [CrossRef]
Wankhade, S.; Vigneshwari, S. A novel hybrid deep learning method for early detection of lung cancer using neural networks. Healthc. Anal. 2023, 3, 100195. [Google Scholar] [CrossRef]
Farheen, F.; Shamil, M.S.; Ibtehaz, N.; Rahman, M.S. Revisiting segmentation of lung tumors from CT images. Comput. Biol. Med. 2022, 144, 105385. [Google Scholar] [CrossRef]
Mohite, A. Lung Cancer Diagnosis using Transfer Learning. Int. J. Sci. Res. Manag. 2021, 9, 621–634. [Google Scholar] [CrossRef]
Riahi, T.; Shateri-Amiri, B.; Najafabadi, A.H.; Garazhian, S.; Radkhah, H.; Zooravar, D.; Mansouri, S.; Aghazadeh, R.; Bordbar, M.; Raiszadeh, S. Lung cancer management: Revolutionizing patient outcomes through machine learning and artificial intelligence. Cancer Rep. 2025, 8, e70240. [Google Scholar] [CrossRef]
Nasser, I.M.; Abu-Naser, S.S. Lung Cancer Detection Using Artificial Neural Network. Int. J. Eng. Inf. Syst. (IJEAIS) 2019, 3, 17–23. [Google Scholar]
Gu, Y.; Chi, J.; Liu, J.; Yang, L.; Zhang, B.; Yu, D.; Zhao, Y.; Lu, X. A survey of computer-aided diagnosis of lung nodules from CT scans using deep learning. Comput. Biol. Med. 2021, 137, 104806. [Google Scholar] [CrossRef] [PubMed]
Abdullahi, K.; Ramakrishnan, K.; Ali, A.B. Deep learning techniques for lung cancer diagnosis with computed tomography imaging: A systematic review for detection, segmentation, and classification. Information 2025, 16, 451. [Google Scholar] [CrossRef]
Pal, A.; Rai, H.M.; Yoo, J.; Lee, S.-R.; Park, Y. ViT-DCNN: Vision transformer with deformable CNN model for lung and colon cancer detection. Cancers 2025, 17, 3005. [Google Scholar] [CrossRef] [PubMed]
Blandin Knight, S.; Crosbie, P.A.; Balata, H.; Chudziak, J.; Hussell, T.; Dive, C. Progress and prospects of early detection in lung cancer. Open Biol. 2017, 7, 170070. [Google Scholar] [CrossRef]
Klangbunrueang, R.; Pookduang, P.; Chansanam, W.; Lunrasri, T. AI-powered lung cancer detection: Assessing VGG16 and CNN architectures for CT scan image classification. Informatics 2025, 12, 18. [Google Scholar] [CrossRef]
Ren, Z.; Zhang, Y.; Wang, S. A Hybrid Framework for Lung Cancer Classification. Electronics 2022, 11, 1614. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Didkowska, J.; Wojciechowska, U.; Mańczuk, M.; Łobaszewski, J. Lung cancer epidemiology: Contemporary and future challenges worldwide. Ann. Transl. Med. 2016, 4, 150. [Google Scholar] [CrossRef]
Mezher, M.A.; Altamimi, A.; Altamimi, R. A genetic folding strategy based support vector machine to optimize lung cancer classification. Front. Artif. Intell. 2022, 5, 826374. [Google Scholar] [CrossRef] [PubMed]
Cao, H.; Liu, H.; Song, E.; Ma, G.; Jin, R.; Xu, X.; Liu, T.; Hung, C.-C. A two-stage convolutional neural networks for lung nodule detection. IEEE J. Biomed. Health Inform. 2020, 24, 2006–2015. [Google Scholar] [CrossRef] [PubMed]

Figure 1. PRISMA 2020 record selection procedure flow diagram (* Indicates initial collected records whereas ** indicates unrelated records removed).

Figure 2. Distribution of Selected Articles (2016–2025).

Figure 3. Selected Articles distribution in Percentage (2016–2025).

Figure 4. A Taxonomy of Lung Cancer Detection Techniques.

Figure 5. Block Diagram of a Typical CADe System [1].

Figure 6. Accuracy of Machine Learning Models [1,27,32,41,47].

Figure 7. Lungs DataSet Samples [43].

Figure 8. A depiction showcasing a diverse range of lung nodules [52].

Figure 9. Accuracies of Deep Learning Models [20,33,43,50,51,52,54,56,60].

Figure 10. Image Operations [26].

Figure 11. Accuracies of Hybrid Models [26,29,57].

Figure 12. Sensitivity and Specificity Forest plot of Machine, Deep Learning and Hybrid Approaches [1,3,26,29,32,33,44,51,52,54,55,57,61].

Table 1. Initial Selection Results.

Database	Records Identified	Duplicates Removed	Records After Screening	Studies Finally Included
Scopus	180	25	155	18
IEEE	75	8	67	7
Springer	70	10	60	8
PubMed	95	12	78	10
MDPI	60	7	53	6
Elsevier	50	5	45	6
Google Scholar	72	9	63	12
Others	15	4	11	3

Table 2. Inclusion and Exclusion requirement criteria.

Category	Inclusion Criteria	Exclusion Criteria	Rationale
Language	English	Non-English	To guarantee global accessibility and to avoid the time and resource constraints associated with translation.
Publication Date	2015 to 2024	Before 2015	To concentrate the review on the most recent decade of innovation in deep learning.
Research Focus	Research on Machine, Deep learning and hybrid models	Focusing on Cancers other than lung cancer	The literature is relevant to the specific objectives of this research.
Text Availability	Publications where the complete full text is accessible.	Publications where only the abstract is available.	To enable a thorough and detailed analysis of the methodologies, results, and data presented in studies.

Table 3. Summary of Performance Measures in the above ML models.

Study	Model/Method	Performance Measure	Percentage
[1]	SVM (Demir and Camurcu)	Sensitivity	98.03%
		Specificity	87.71%
		FP rate	2.45/scan
[27]	Voting Classifier (Best Model)	Accuracy	99.5%
[47]	Noval Segmentation CAD System	Accuracy	97%
[32]	GoogleNet	Accuracy	94.53%
		Specificity	99.06%
		Sensitivity	65.67%
		AUC	86.84%
[41]	SVC (LDA)	Accuracy	95%
		Precision	95%
		Recall	95%
		F1-score	95%

Table 4. Summary of Datasets used in the above ML models.

Study	Dataset Name	Year of Release	Image Modality	Number of Cases/Patients	Number of Images	Annotation	Image Format
[1,27,32]	LIDC-IDRI	2011	Computed Tomography (CT)	1018	244,527	Lesions marked by 4 radiologists into categories: nodule ≥3 mm, nodule <3 mm, and non-nodule ≥3 mm.	DICOM (Original)/Resized to 244 × 244 [32]
[41]	Hospital Dataset (Tehran, Iran)	-	Computed Tomography (CT)	Not explicitly stated (Implied ≤364)	364	Binary labels: 238 cancerous and 126 noncancerous.	Original format not stated/Resized to 512 × 512 pixels.
[47]	Chest CT Image Dataset	-	Computed Tomography (CT)	-	-	564 Regions of Interest (ROIs) detected and labeled by a radiologist.	JPEG

Table 5. Summary of Performance Measures obtained for the above DL models.

Study	Model/Method	Performance Measure	Percentage
[33]	ODNN	Accuracy	94.56%
		Sensitivity	96.2%
		Specificity	94.2%
[50]	Deep Screener Algorithm	Accuracy	95%
[51]	SVM with DWT for feature extraction	Accuracy	95.16%
		Sensitivity	98.21%
		Specificity	78.69%
[43]	Proposed CNN model	Training Accuracy	98.15%
		Validation Accuracy	98.07%
		Monto Carlo Avg	99%
		Weighted Avg	99%
[42]	ANN (GLCM)	Accuracy	>80%
[54]	DenseNet-201	Mean Accuracy	53%
		Mean Recall	43%
		Mean Precision	43%
		Mean F1-score	43%
[55]	Rule-based thresholding classifier	Accuracy	98%
		Sensitivity	83%
		Specificity	99%
[44]	Lung-RetinaNet (Improved RetinaNet)	Accuracy	99%
		Recall	99.3%
		Mean Precision	99.4%
		F1-score	99.5%
		AUC	98.9%
[56]	ANN	Accuracy	96.67%
[60]	Watershed Segmentation (SVM)	Model Accuracy	92%
[60]		Classifier Accuracy	86.6%
[61]	VGG16 (Transfer Learning)	Accuracy	96.05%
		Recall	96.07%
		Mean Precision	96.06%
		F1-score	96.06%
		AUC	96.07%
[62]	Genetic Folding Strategy (GFS) based SVM	Model Accuracy	96.2%
[62]		AUC	97%

Table 6. Summary of Datasets used with the above DL models.

Study	Dataset Name	Year of Release	Image Modality	Number of Images	Annotation	Image Format
[33,51]	Standard CT Database	2010	CT Scans	50 + 1000 (500 infected, 500 non-infected)	Class labels (infected/non-infected)	-
[50]	TCIA (The Cancer Imaging Archive)	Ongoing (Publicly available)	CT Scans	1449 images	Nodule locations, malignancy ratings	DICOM
[51]	LIDC-IDRI	2011	CT Scans	244,527 images (from 1018 cases)	Annotated nodules, masks, features	DICOM
[43,50]	LC25000 (Lung and Colon Cancer Histopathological)	2019	Histopathology	25,000 total (15,000 Lung: 5000 benign, 5000 adenocarcinoma, 5000 squamous)	Class labels (tissue type)	JPEG
[42]	Cancer Imaging Archive Database	Ongoing	CT Scans	50 images	-	-
[54]	IQ-OTH/NCCD Lung Cancer Dataset	2020	CT Scans	1190 slices (from 110 cases)	Class labels (normal, benign, malignant)	JPEG/PNG
[61]	LIDC-IDRI	2011	CT Scans	1114 images (selected subset)	Nodule Annotation	DICOM
[62]	Kaggle Repository (Real lung cancer dataset)	2021	Clinical/Structured Data	309 instances (276 after preprocessing)	Class labels (Benign/Malignant)	-

Table 7. Summary of Performance Measures obtained for the above hybrid approaches.

Study	Model/Method	Performance Measure	Percentage
[26]	KASC Hybrid Algorithm (SURF + GA + SVM + FFBPNN)	Accuracy	98.08%
		Precision	98.17%
		Recall	96.5%
		F1-Score	97%
[29]	AlexNet + Softmax	Accuracy	99.52%
		Precision	99.207%
		Recall	88.265%
		F1-Score	93.416%
		loss	64.9%
[52]	CCDC-HNN (Advanced 3D-CNN + RNN/BiLSTM)	Accuracy	95%
		Sensitivity	87%
		Specificity	90%
[57]	CNN + DBN (Deep Belief Network)	Accuracy	97.58%
[57]		Sensitivity	98.1%
[64]	CNN-SVM	Accuracy	97.91%
		Sensitivity	97.90%
		Specificity	99.32%
		Precision	97.96%
		AUC	1.000
[3]	VIT-DCNN (Vision Transformer + Deformable CNN)	Accuracy	94.24%
		Precision	94.37%
		F1-Score	94.23%

Table 8. Summary of Datasets used with the above Hybrid approaches.

Study	Dataset Name	Year of Release	Image Modality	Number of Images	Annotation	Image Format
[52,57]	LIDC-IDRI	2011	CT Scans	1018 cases/scans	Nodule location and classification	DICOM
[3]	LC25000	2019	Histopathology	25,000 total	Class labels (tissue type)	JPEG
[3]	DSB2017 (Data Science Bowl 2017)	2017	Low-dose CT images	500	Cancer ground truth for training set	DICOM

Table 9. Summary of Existing Lung Cancer Detection Models.

Authors	Proposed Technique	Limitations	Evaluation Criteria	Dataset Used
Rehman et al. [1]	SVM	Unable to perfectly handle nodule length, shape, etc.	Accuracy	LIDS
Nanglia et al. [26]	SURF + Genetic Algorithm and SVM + FFBPNN	Model underfitting because of weekly training data.	Accuracy, Precision, Recall, F1-score	ELCAP
Thallam et al. [27]	Building Voting classifier using (SVM, KNN, RF techniques)	Accuracy depends on the size of the dataset.	Accuracy	LIDC-IDRI
Raoof et al. [33]	Deep Learning (CNN, FCN, DBN)	Did not explore or address the potential benefits and effectiveness of deep learning methods.	Accuracy, Precision	TCIA
Elizabeth et al. [47]	Optimal Thresholding, Rolling Ball operator	Segmentation of lung tissues is not considered.	Accuracy, Specificity, Precision, Recall	JPEG images of Chest CT with PBRs
Ashhar et al. [32]	UNet + R-CNN	Additional study and research are needed to enhance its performance.	Accuracy, Specificity, Sensitivity	LIDC-IDRI
Sultana et al. [50]	SVM, LDA	The existing method may not be fully optimized and adequate for handling other cancer types.	Accuracy	histopathological images
Kaucha et al. [51]	SVM, K- Mean Clustering	The existing algorithm may not adequately filter or select the most relevant nodules.	Accuracy	LIDS
Mukherjee et al. [42]	ANN back propagation-based GLCM	Current implementation still has limitations in terms of accuracy, improvement, or generalizability.	Accuracy	NSCLC Radio genomics lungs malignancy CT images
Karim et al. [43]	SVM, BN, RF, CNN	The current approach using the existing CNN architecture and hyperparameters does not achieve optimal accuracy for the classification task.	Accuracy	Histopathological images
Subramanian et al. [29]	Hybrid (SVM + CNN)	The existing model might be missing important predictive features that could improve its accuracy and effectiveness in predicting lung cancer.	Accuracy	Computer tomography images
Wankhade et al. [52]	SVM, SMOTE	The current approach may not fully leverage the potential of large datasets and sophisticated classification strategies to achieve higher efficiency.	Accuracy	LDCT
Mohite et al. [54]	CNN-ResNet50 + SVM-RBF	To ensure comprehensive detection of lung cancer, relying on a model that processes only a single CT-scan slice is inadequate.	Accuracy	LIDC/IDRI
Taher et al. [55]	HNN, FCM	The existing Computer-Aided Diagnosis system utilizes a Hybrid Neural Network, which may face challenges in achieving high accuracy when it detects lung cancer in its early stages.	Accuracy, Sensitivity, Specificity	Sputum Color Detection
Maleki et al. [41]	CNN, GB, RF, SVM	The current approach for feature selection might not be optimal.	Accuracy	CT scan images dataset of Tehran, Iran Hospital
Haichao Cao et al. [65]	TSCNN, 3DDP-IncepNet-RDU	The current method is competitive compared to other existing techniques; however, it does not necessarily outperform them.	Sensitivity	CT images
Farheen et al. [53]	MultiResUNet	The current approach lacks a front-end segmentation mechanism to isolate the lung within the CT image before processing it with the deep supervision model.	Accuracy	LOTUS
Maham et al. [44]	GCPSO, BoVW, CRNNs, ResNet	The current method faces challenges in precisely detecting and classifying tumors, largely because of low input image resolution and the difficulty in distinguishing tumors from surrounding tissues.	Accuracy, Recall, Precision, F1-score	CT scans
Yu Gu et al. [57]	CNN, DBN	The current approach does not encompass all relevant areas of inquiry.	Sensitivity	CT images

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zahid, A.B.; Nisa, F.U.; Malik, A.K.; Qamar, N. Lung Cancer Prediction with Machine Learning, Deep Learning and Hybrid Techniques: A Survey. LabMed 2026, 3, 7. https://doi.org/10.3390/labmed3010007

AMA Style

Zahid AB, Nisa FU, Malik AK, Qamar N. Lung Cancer Prediction with Machine Learning, Deep Learning and Hybrid Techniques: A Survey. LabMed. 2026; 3(1):7. https://doi.org/10.3390/labmed3010007

Chicago/Turabian Style

Zahid, Abdullah Bin, Fakhar Un Nisa, Ahmad Kamran Malik, and Nafees Qamar. 2026. "Lung Cancer Prediction with Machine Learning, Deep Learning and Hybrid Techniques: A Survey" LabMed 3, no. 1: 7. https://doi.org/10.3390/labmed3010007

APA Style

Zahid, A. B., Nisa, F. U., Malik, A. K., & Qamar, N. (2026). Lung Cancer Prediction with Machine Learning, Deep Learning and Hybrid Techniques: A Survey. LabMed, 3(1), 7. https://doi.org/10.3390/labmed3010007

Article Menu

Lung Cancer Prediction with Machine Learning, Deep Learning and Hybrid Techniques: A Survey

Abstract

1. Introduction

2. Comparison with Existing Surveys

3. Research Methodology

3.1. Overview

Research Questions

3.2. Literature Search and Selection

3.3. Inclusion and Exclusion Criteria

3.4. Study Selection Process

3.5. Study Characteristics

4. Lung Cancer Prediction

5. Taxonomy of Lung Cancer Detection

5.1. Machine Learning

5.1.1. Voting Classifier

5.1.2. Support Vector Machine (SVM)

5.1.3. Segmentation Techniques

5.1.4. SilNet

5.1.5. Gradient Boosting Classification

5.2. Deep Learning

5.2.1. Optimal Deep Learning

5.2.2. Feature Extraction

5.2.3. Back Propagation Neural Network

5.2.4. Deep Convolutional Neural Network

5.2.5. CAD Model

5.2.6. Convolutional Neural Network

5.2.7. Hopified Neural Network

5.2.8. Convolutional Recurrent Neural Network

5.2.9. Artificial Neural Network

5.3. Hybrid Techniques

5.3.1. Speed-Up Robust Features Technique with Genetic Algorithm and Feed-Forward Backpropagation Neural Network

5.3.2. AlexNet with SoftMax

5.3.3. Convolutional Neural Network with Deep Belief Network

6. Problems and Solutions According to the Proposed Taxonomy

6.1. Predicting Lung Cancer with Machine Learning Approaches

6.2. Recognizing Lung Cancer Using Deep Learning Approaches

6.3. Lung Cancer Prediction Using Hybrid Techniques

7. Discussion and Future Directions

7.1. Limited Dataset Diversity

7.2. Early-Stage Cancer Detection

7.3. Computational Complexity

7.4. Real-World Validation

7.5. Integration of Multi-Modal Data

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI