Lung cancer was first identified by medical professionals in the mid-19th century. Today, lung cancer ranks among the leading causes of death associated with cancer. Although the overall lung cancer life expectancy statistics are reduced relative to many other cancers, the survival rate increases significantly to 54.4% when detected at an early stage [
1]. As estimated by the WHO, in 2015, cancer was responsible for the loss of 8.8 million lives. Lung cancer was the cause of 1.69 million (or nearly 20%) of these deaths [
2]. The Malaysia National Cancer Registry Report 2012–2016 reveals lung cancer as the third-highest cancer detected among Malaysians, with males experiencing it more than females [
3]. Annually, 1.2 million people are diagnosed with lung cancer, accounting for 12.3% of total cancer diagnoses, and about 1.1 million die from it, accounting for 17.8% of total cancer deaths. Due to unchecked cell development in the lungs, lung cancer results in severe breathing difficulties in the chest’s inhalation and exhalation regions [
4,
5]. The average lung cancer survival levels in 2017 were estimated at 65% [
6]. A study showing projection of lungs cancer in multiple countries is described in [
7]. According to a study by the WHO, lung-related diseases were the second primary contributor to mortality in 2015 and ranked fifth in 2017 based on contributing factors. These conditions are especially prevalent among smokers, who account for 85% of all cases [
8,
9]. Non-small cell lung cancer (NSCLC) accounts for approximately 80–90% of all lung cancer cases [
10,
11,
12,
13], while 15–20% of cases are related to small cell lung cancers. Lung and breast cancer lead to the significant morbidity and mortality rates associated with tumors in women in the United States. Additionally, NDDS can be designed for regulated release, which could help to overcome resistance and enable greater drug accumulation in tumors [
14]. In the last thirty years, early screening techniques and improvements in clinical diagnostic and treatment have led to a decrease in mortality and increased survival time for breast cancer patients [
15]. It was discovered that about 26–30% of such patients have quit smoking, while 70–74% are still smokers [
16]. Numerous traffic-related issues commonly found in urban areas contribute to extreme air pollution. As urban air pollution has such negative impacts, investigations of people’s lung health conditions are often carried out with the goal of lowering the death rate [
17]. Many individuals are diagnosed with lung cancer in the later stages, resulting in a poor prognosis. The challenge for healthcare professionals lies in identifying the most effective treatment options, as lung cancer presents a range of imaging features and histological variations, compounded by the advanced stage at which it is often diagnosed [
18]. Cigarette smoking is the leading factor contributing to lung cancer, which ranks among the most prevalent forms of cancer. This type of cancer represents over 25% of all deaths attributed to cancer and affects both men and women. It has been estimated that 80–85% of lung cancer fatalities are directly associated with smoking [
19,
20]. The authors of [
21] proposed that only about half were smokers now or in the past, compared to the Mayo Clinic, Johns Hopkins, and Memorial Sloan–Kettering trials. Consistent with earlier research, there was no mortality advantage associated with CXR screening. Nonetheless, there were some encouraging findings to consider: CXR screening identified 60% of lung tumors that emerged during the screening period (as opposed to “interval” malignancies), half of which were stage I illnesses. It was estimated that the UK and EU’s cancer death rates are still favorable. In the EU, rates decreased by 3.72% for women and 6% for males [
22]. This research centers on using different classification algorithms to diagnose lung cancer. Following diagnosis, patients with lung cancer typically have a survival rate of 10% to 20% over five years. Early detection methods, such as MRI and CT, are commonly utilized in medical procedures that significantly enhance patient survival rates. Lung cancer is typically caused by smoking cigarettes, accounting for 85% of cases, while roughly 10–15% of lung cancer cases never smoked. Lung cancer is classified into two types based on growth patterns: SCLC and NSCLC [
23,
24]. How a patient is treated is determined by the kinds and categorization of their data [
25]. A CT scan, X-ray, biopsy, blood test, and patient assessment are performed in addition to diagnosing cancer [
26]. Lung cancer, a serious form of malignant growth, poses significant challenges in terms of diagnosis and treatment. However, for non-smokers, early detection can lead to effective prevention or treatment. Nonsmokers may get lung cancer as a result of radon radiation, secondhand smoke, air pollution, or other causes. This cancer is one of the most common and aggressive varieties encountered in patients [
27]. Computed tomography (CT) scans, produced by merging many X-ray images obtained from various angles around the body, can reveal lung cancer [
28]. Lung cancer has a growth rate of 13%. In clinical practice, analyzing and interpreting lung CT images is a delicate procedure that requires significant time and expertise [
29]. A substantial percentage of patients can have a higher chance of survival if lung nodules are correctly diagnosed early [
30]. As cancer can react well to therapy when caught early, early detection of this health issue can considerably lower the disease’s death rate, thus saving many lives. The creation of automated tools to assess and categorize this illness helps to expedite diagnosis processes and lowers the likelihood of human error [
31]. Both active and passive smokers make up about 90% of lung cancer patients. Since most lung cancer patients do not exhibit symptoms in their early stages, most of these people are diagnosed with stage 3 or stage 4 lung cancer. Early screening is therefore highly useful. Sputum cytology, biopsy, and computed tomography (CT) scans can all be used for lung cancer screening [
32]. The tumor size and location are considered to characterize symptoms. In certain circumstances, no pain or symptoms emerge in the early stages, making analysis and detection challenging. Individuals diagnosed with lung cancer might face various symptoms, including coughing up blood (hemoptysis), shoulder pain associated with Pancoast syndrome, hoarseness resulting from vocal cord involvement, as well as significant weight loss, fatigue, and general weakness [
33]. Early identification of lung cancer is mandatory for successful treatment and recovery. The most prevalent techniques for diagnosing lung cancer in its initial stages include chest X-rays, CT scans, MRI, isotope scans, bronchoscopy, and various other diagnostic tests [
34]. A key technique, known as “pathological diagnosis,” involves analyzing needle biopsy samples obtained from patients to establish a diagnosis [
35]. Early detection using ML techniques is key to increasing the survival rate. If such an approach can be used to increase the efficiency and effectiveness of radiology diagnostics, it will be a significant step towards improving early detection. The lung cancer datasets utilized in this study were sourced from Data World and the UCI ML Repository. Initially, k-fold cross-validation was employed to divide the datasets into training and testing subsets. Next, several classification models were constructed using the training data, utilizing approaches like SVM, Logistic Regression, Naive Bayes, and Decision Trees. The training data serve to build the classification models, while the testing data are used to evaluate these models and calculate their accuracy [
36]. Lung cancer continues to be one of the most daunting health challenges worldwide, with considerable rates of morbidity and mortality. Even with progress in diagnostic and treatment technologies, the disease’s high prevalence, late-stage detection, and complex variations remain obstacles to effective management. Early detection and precise diagnosis are crucial for enhancing survival rates, especially with the aid of advanced imaging techniques and machine learning tools. This research aims to address deficiencies in lung cancer diagnosis by investigating different methodologies. Focusing on studies from the past 12 years, the survey offers a contemporary perspective on the field, highlighting the significance of automated diagnostic systems in minimizing human error and enhancing efficiency. In the end, this work highlights the crucial necessity for creative diagnostic approaches and thorough screening initiatives in order to fight lung cancer, preserve lives, and promote progress in medical research. Here, we discuss existing surveys focused on lung cancer prediction. The survey in [
37] explored various methodologies for predicting lung cancer, including the evaluation of imaging techniques used to detect lung cancer, with a focus on their accuracy in identifying neoplasms. The study also discussed the use of classification systems to differentiate between types of lung cancer based on radiographic features, analyzed the correlation between radiographic findings and patient prognosis to aid outcome prediction, and examined patient demographics and clinical data to identify patterns that may predict lung cancer occurrence. Relevant European organizations [
38] suggested that the process could be achieved through the implementation of well-designed, targeted demonstration programs across multiple countries. These programs would emphasize methodology, standardization, tobacco cessation, emotional impacts, cost-effectiveness, risk–benefit analyses, and education on healthy living. A ten-year hospital data study in the USA [
39] noted that certain trends in lung cancer patterns align with previous reports. These include the continued decline in the male-to-female gender ratio, reflecting a significant rise in lung cancer incidence among American women, an increasing number of African-American patients, and a growing prevalence of adenocarcinoma. The rise in older patients presents therapeutic challenges, as comorbidities tend to increase with age, and older patients often tolerate aggressive, multimodal therapies less effectively than younger patients. Although the proportion of patients not receiving cancer-directed treatment increased across all age groups, the most notable rise was observed among older patients. Another survey using data mining techniques [
40] aimed to identify the most effective methods for extracting knowledge and insights from existing lung cancer profile data. The study reviewed various data mining techniques applied to cancer research and related fields, noting that the data involved are often incomplete. Data cleaning—a critical step in the process—is particularly challenging due to the heterogeneous nature of the data sources, which often lack essential attributes. In this survey, we seek to address the gaps in lung cancer diagnosis by exploring innovative methodologies. We categorize surveys according to three main detection techniques used in the last 12 years: ML, DL, and hybrid. This provides a contemporary perspective on the field, emphasizing the role of automated diagnostic systems in minimizing human error and enhancing efficiency. Ultimately, the work underscores the urgent need for advanced diagnostic solutions and comprehensive screening programs to combat lung cancer, save lives, and drive progress in medical research. Our survey also addresses limitations such as dataset size, algorithmic complexity, and real-world validation that are essential for realizing the full potential of detection methods. Focusing on these limitations, future research can pave the way for the development of scalable, reliable, and efficient diagnostic systems. Lung cancer is recognized as one of the leading causes of cancer-related mortality worldwide, posing a significant health threat to both men and women. It primarily originates in the epithelial cells that line the airways of the lungs, with 90–95% of cases attributed to this cellular origin [
12]. The disease manifests through various symptoms, including severe chest pain, persistent dry cough, breathlessness, and unexplained weight loss [
4]. The significant correlation between smoking (particularly tobacco use) and the incidence of lung cancer is well documented, with smoking being implicated in over 80% of cases [
19].
Lung cancer continues to be a major global health challenge, attributed to its high death rates and complex biological nature. Its close links to smoking and secondhand smoke highlight the pressing necessity for successful prevention and quitting initiatives. The diversity of lung tumors and the differences in how patients respond to treatment underscore the need for improved prognostic methods and individualized treatment strategies. By incorporating state-of-the-art technologies like AI, deep learning, and computer vision, we can revolutionize the early detection, precise diagnosis, and personalized treatment of lung cancer. Researchers are making great progress in addressing the shortcomings of conventional approaches by utilizing these tools in conjunction with detailed datasets and sophisticated imaging methods. The management of lung cancer is constantly evolving, and teamwork among researchers, clinicians, and technologists is leading to innovative diagnostic and therapeutic solutions. These developments promise not only to enhance patient outcomes, but also to help alleviate the worldwide healthcare burden associated with lung cancer. Research and innovation in this area must continue in order to tackle one of the major public health challenges of our era.
This survey is formally structured to help researchers working in the area of lung cancer detection. We created a taxonomy of relevant ML, DL, and hybrid techniques and present their pros and cons, which is followed throughout the paper. At the end, we provide future research directions in this area that can help researchers to focus their research on important research gaps in this area. We employed a systematic research methodology to ensure the relevance and quality of the included literature. We set inclusion and exclusion criteria to select the most relevant research works. We collected peer-reviewed published papers in the most relevant journals and conferences, mostly from the last 10 years. We excluded unpublished, old papers, other than lung cancer, and papers not using ML/DL learning techniques. We also show many figures from relevant research works that show techniques and data types used in the research. Some figures also depict the model accuracy comparisons reported in the most important relevant research works. At the end, a detailed table provides a comparison of important related works and their limitations.