Early Lung Cancer Detection via AI-Enhanced CT Image Processing Software
Abstract
1. Motivation and Significance
1.1. Introduction
1.2. Related Work
- Support Vector Machines (SVMs): Widely used robust classifiers for binary medical image classification, particularly in distinguishing between benign and malignant nodules [10].
- Random Forest (RF): Ensemble of decision trees that reduces overfitting and improves generalization [11].
- K-Nearest Neighbors (KNNs): Non-parametric method that classifies nodules based on feature similarity [12].
- Gradient Boosting (GB): Combines weak learners sequentially to optimize performance, particularly in imbalanced datasets [13].
- Convolutional Neural Networks (CNNs): Automatically extract spatial features from CT scans, enabling precise classification [14].
- Generative Adversarial Networks (GANs): Used to generate synthetic data for augmentation and improved training of classifiers [15].
| Author(s) | Method(s) | Description |
|---|---|---|
| Wang [1] | CNN | Pulmonary nodule detection with high diagnostic accuracy. |
| Bhattacharjee et al. [16] | Multi-class DL Model | Automatic classification of lung and kidney CT images. |
| Thanoon et al. [6] | Deep Learning | Review of DL applications in lung cancer imaging. |
| Chuquicusma et al. [17] | GANs | Synthetic image generation for training enhancement. |
| Zhao et al. [18] | Forward/Backward GAN | Segmentation of lung nodules in CT images. |
| Aberle et al. [2] | Poisson Regression | Evaluation of early detection programs. |
| Yang et al. [19] | AdaBoost | Five-year survival prediction. |
| Liu et al. [20] | Gradient Boosting | Malignant lung nodule detection. |
| Sachdeva et al. [21] | Naive Bayes | Nodule classification. |
| Ye et al. [22] | Naive Bayes | Survival prediction with clinical data. |
| Maheswari et al. [23] | K-Means | Segmentation of lung nodules. |
| Shaukat et al. [24] | KNN | Feature-optimized classification. |
| Dezfuly & Sajedi et al. [25] | Decision Trees | Prediction of treatment response. |
| Hussein et al. [26] | ANN | Lung nodule detection with high sensitivity. |
| Shen et al. [27] | SVM | Benign vs. malignant classification. |
| This study | Ensemble (RF, GB, KNN, SVM) | Ensemble framework integrating classical ML models, optimized for balanced performance, interpretability, and clinical applicability. |
2. Materials and Methods
2.1. Study Design and Eligibility Criteria
Inclusion Criteria
- (a)
- Chest CT studies in DICOM format;
- (b)
- Adult patients (aged 18 years or older);
- (c)
- Availability of a reference standard label (ground truth) at the study or lesion level; and
- (d)
- Sufficient image quality for analysis (slice thickness ≤ 2 mm, without severe motion or metal artifacts).
Exclusion Criteria
- (a)
- Non-chest CT scans;
- (b)
- Missing or ambiguous labels;
- (c)
- Incomplete or corrupted DICOM series;
- (d)
- Duplicated cases across datasets; and
- (e)
- Studies failing quality control (see Section 2.4).
Ethics
2.2. Reference Standard and Annotation Procedure
2.3. Datasets and Case Characteristics
Acquisition Parameters
Demographics
2.4. Preprocessing and Augmentation
- (1)
- DICOM import and resampling: All scans were imported using pydicom and resampled to an isotropic voxel size of 1.0 mm using linear interpolation.
- (2)
- Windowing and normalization: Images were converted to Hounsfield Units (HU) and lung windowing was applied (center HU, width 1500 HU). They were clipped to HU and normalized to the range.
- (3)
- Noise reduction: A 3D median filter was applied to reduce scanner noise and artifacts while preserving structural details.
- (4)
- Segmentation: Basic lung-field segmentation was performed using morphological operations and region-growing masks to isolate pulmonary regions and remove the surrounding background.
- (5)
- Feature standardization: After preprocessing, all pixel intensities were standardized using z-score normalization within the training set to maintain consistent dynamic range during model training.
Augmentation
2.5. Missing Data Handling
2.6. Data Splits and External Validation
2.7. Sample Size and Power
2.8. Missing Data Handling
2.9. Statistical Analysis
2.10. Comparative Analysis of ML Models
2.11. Workflow Overview
- Image Acquisition and Storage: DICOM images are acquired, converted to standard formats (e.g., PNG), and stored in a centralized repository.
- Preprocessing and Segmentation: Images are normalized, denoised, and segmented to extract relevant anatomical areas.
- Feature Extraction: Radiomic and deep features are extracted using specialized filters and learned representations.
- Data Labeling and Splitting: Data are labeled as malignant or benign and split into training and testing sets.
- Model Training: Four models are trained—Random Forest (RF), Gradient Boosting (GB), K-Nearest Neighbors (KNNs), and Support Vector Machine (SVM).
- Ensemble Prediction: A voting-based ensemble method aggregates predictions from the individual models.
- Evaluation: Performance is measured using accuracy, sensitivity, specificity, and training time.
2.12. Machine Learning Workflow
- 1.
- Image Acquisition and Storage: DICOM images were collected from various sources and converted into standard formats (e.g., JPG and PNG), and they were then stored in a centralized image database.
- 2.
- Preprocessing and Segmentation: Images were normalized, filtered for noise, and segmented to focus on relevant anatomical regions.
- 3.
- Data Preparation: Labeled datasets were divided into training and testing sets for model development.
- 4.
- Model Training: Several machine learning algorithms were trained as follows:
- Random Forest (RF): Ensemble of decision trees for generalized predictions.
- Gradient Boosting (GB): Sequential learning with error correction.
- K-Nearest Neighbors (KNNs): Classification by proximity to known samples.
- Support Vector Machine (SVM): Optimal hyperplane separation with high-margin classification.
- 5.
- Model Evaluation:
- Evaluated using accuracy, precision, recall, F1 score, specificity, and AUC.
- Diagnostic efficacy was assessed using sensitivity, specificity, and AUC, which are widely employed in clinical evaluation of diagnostic tools. These metrics quantify the model’s ability to correctly identify malignant cases while minimizing false positives.
- Ensemble method combined predictions from all individual models.
- 6.
- Final Predictions: Trained models analyzed new images and delivered confidence-rated predictions for clinical support.
2.13. Ensemble Algorithms
- Random Forest.
- Gradient Boosting.
- K-Nearest Neighbors (KNNs).
- Support Vector Machine (SVM).
2.13.1. General Ensemble Procedure
- Upload computed tomography (CT) image data.
- Preprocess images (normalization, noise reduction, etc.).
- Extract relevant features from the images.
- Divide the dataset into training and testing sets.
- Define and train each model on the training dataset.
- Make predictions with each model on the test set.
- Combine predictions using the ensemble method (e.g., voting or weighted average).
- Evaluate the ensemble using metrics such as accuracy, precision, recall, and F1-score.
| Listing 1. Pseudocode for Ensemble Prediction and Evaluation. |
![]() |
2.13.2. Random Forest
| Listing 2. Random Forest Pseudocode. |
![]() |
2.13.3. Gradient Boosting
| Listing 3. Gradient Boosting Pseudocode. |
![]() |
2.13.4. K-Nearest Neighbors
| Listing 4. K-Nearest Neighbors Pseudocode. |
![]() |
2.13.5. Support Vector Machine
| Listing 5. Support Vector Machine Pseudocode. |
![]() |
2.14. Model Architecture and Ensemble Logic
2.15. Implementation Details
2.16. System Modularity
2.16.1. User Interface (UI)
- Intuitive interface for healthcare professionals.
- Cross-device compatibility (desktop/tablet).
- DICOM upload, image preview, report generation, and role-based access.
- Built using Flask and Bootstrap for dynamic rendering and responsiveness.
2.16.2. Evaluation Logic
- Preprocessing of uploaded CT images.
- Feature extraction and model prediction.
- Generation of annotated reports.
2.16.3. Database Management
- SQLAlchemy-based backend.
- Secure storage of user accounts, DICOM metadata, predictions, and reports.
- Scalable design for integration with hospital information systems.
3. Results
3.1. Experimental Setup and Computational Environment
- (a)
- Data Collection and Preprocessing
- DICOM medical images were collected and stored centrally.
- Preprocessing included normalization and noise reduction.
- Segmentation isolated relevant regions of interest.
- (b)
- Model Training
- Models were trained on the local machine using the prepared datasets.
- Flask was used for user interface prototyping.
- Hardware acceleration via GPU (GTX 1080, CUDA 11.2).
- (c)
- Cloud Execution
- Final execution on Google Colab Pro+.
- NVIDIA A100-SXM GPU with 83.48 GB RAM, CUDA 11.6.
- (d)
- Model Evaluation
- Evaluated using accuracy, precision, recall, F1 score, specificity, and AUC.
- The ensemble method combined predictions from all individual models.
3.2. Performance Metrics
3.3. Precision–Recall Analysis
3.4. Internal vs. External Validation
3.5. Confusion Matrix Analysis
3.6. User Interface Overview
4. Discussion
4.1. Practical Considerations for Clinical Adoption
4.2. Limitations
4.3. Regulatory Considerations
4.4. Future Directions
5. Conclusions and Future Work
Future Research Directions
- Enhanced Image Preprocessing Techniques: Implementation of adaptive histogram equalization, deep learning-based denoising, and advanced segmentation methods to improve input quality before analysis.
- Integration of Multi-Modal Imaging: Combining CT data with PET and MRI modalities to provide a more holistic view of lung anatomy and function.
- Development of Real-Time Analysis Tools: Designing tools capable of providing instantaneous diagnostic feedback during clinical evaluations.
- Personalized Diagnosis Using Patient Data: Incorporating patient-specific data (e.g., medical history, genetic factors, and lifestyle information) to improve prediction accuracy and personalize treatment plans.
- Subtype-Specific Classification Models: Although the current model focuses on detecting lung cancer in general, distinguishing between histological subtypes (e.g., adenocarcinoma, squamous cell carcinoma, and small cell carcinoma) could improve clinical utility. Developing subtype-specific models trained on annotated datasets is a promising direction.
- Extension to Other Pathologies: Applying the methodology to detect other cancers and diseases, leveraging the adaptability of AI-based models.
- Curated Databases by Cancer Type: Establishing a structured database categorized by lung cancer subtype would support the development of highly sensitive and accurate classifiers. This resource could improve model training and evaluation, leading to more robust and clinically relevant tools.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, L. Deep learning techniques to diagnose lung cancer. Cancers 2022, 14, 5569. [Google Scholar] [CrossRef]
- Aberle, D.R.; Adams, A.M.; Berg, C.D.; Black, W.C.; Clapp, D.J.; Fagerstrom, M.R. Reduced lung-cancer mortality with low-dose computed tomographic screening. N. Engl. J. Med. 2011, 365, 395–409. [Google Scholar] [CrossRef]
- Kreyberg, L. The significance of the histological typing of lung carcinoma in relation to prognosis. Acta Pathol. Microbiol. Scand. 1956, 39, 413–429. [Google Scholar]
- Saphir, O.; Ozzello, L. The cytology of large cell carcinoma of the lung. Cancer 1950, 3, 1101–1109. [Google Scholar]
- Travis, W.D.; Brambilla, E.; Nicholson, A.G.; Yatabe, Y.; Austin, J.H.; Beasley, M.B.; Chirieac, L.R.; Dacic, S.; Duhig, E.; Flieder, D.B. The 2015 World Health Organization classification of lung tumors: Impact of genetic, clinical and radiologic advances since the 2004 classification. J. Thorac. Oncol. 2015, 10, 1243–1260. [Google Scholar] [CrossRef]
- Thanoon, M.A.; Zulkifley, M.A.; Mohd Zainuri, M.A.A.; Abdani, S.R. A review of deep learning techniques for lung cancer screening and diagnosis based on CT images. Diagnostics 2023, 13, 2617. [Google Scholar] [CrossRef]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1965; University of California Press: Berkeley, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Bhattacharjee, A.; Rabea, S.; Bhattacharjee, A.; Elkaeed, E.B.; Murugan, R.; Selim, H.M.; Sahu, R.K.; Shazly, G.A.; Bekhit, M.M. A multi-class deep learning model for early lung cancer and chronic kidney disease detection using computed tomography images. Front. Oncol. 2023, 13, 1193746. [Google Scholar] [CrossRef]
- Chuquicusma, M.J.M.; Hussein, S.; Burt, J.; Bagci, U. How to fool radiologists with generative adversarial networks? A visual Turing test for lung cancer diagnosis. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 240–244. [Google Scholar] [CrossRef]
- Zhao, L.; Xie, H.; Kang, Y.; Lin, Y.; Liu, G.; Sakato-Antoku, M.; Patel-King, R.S.; Wang, B.; Wan, C.; King, S.M.; et al. Heme-binding protein CYB5D1 is a radial spoke component required for coordinated ciliary beating. Proc. Natl. Acad. Sci. USA 2021, 118, e2025241118. [Google Scholar] [CrossRef]
- Yang, B.; Yao, Z.; Yuan, F.; Zhang, X.; Guo, Y.; Zhang, S. A machine learning model for predicting malignant solitary pulmonary nodules in CT images. Sci. Rep. 2020, 10, 1–10. [Google Scholar]
- Liu, Y.; Kim, J.; Balagurunathan, Y.; Li, Q.; Kumar, V. Radiomics features are associated with EGFR mutation status in lung adenocarcinomas. Clin. Lung Cancer 2017, 18, 348–360. [Google Scholar] [CrossRef]
- Sachdeva, R.K.; Garg, T.; Khaira, G.S.; Mitrav, D.; Ahuja, R.S. A Systematic Method for Lung Cancer Classification. In Proceedings of the 2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 13–14 October 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Ye, Z.; Song, P.; Zheng, D.; Zhang, X.; Wu, J. A Naive Bayes model on lung adenocarcinoma projection based on tumor microenvironment and weighted gene co-expression network analysis. Infect. Dis. Model. 2022, 7, 498–509. [Google Scholar] [CrossRef]
- Maheswari, M.; Jothi, B.; Devi, P.G.; Chitradevi, B.; Pradeep, K.S. Automated early diagnosis of lung tumor based on deep learning algorithms. In Proceedings of the 2023 International Conference on Applied Intelligence and Sustainable Computing (ICAISC), Dharwad, India, 16–17 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Shaukat, F.; Raja, G.; Gooya, A.; Frangi, A.F. Fully automatic and accurate detection of lung nodules in CT images using a hybrid feature set. Med. Phys. 2017, 44, 3615–3629. [Google Scholar] [CrossRef]
- Dezfuly, M.; Sajedi, H. Predict Survival of Patients with Lung Cancer Using an Ensemble Feature Selection Algorithm and Classification Methods in Data Mining. J. Inf. 2015, 1, 1–11. [Google Scholar] [CrossRef][Green Version]
- Hussein, S.; Cao, K.; Song, Q.; Bagci, U. Risk Stratification of Lung Nodules Using 3D CNN-Based Multi-task Learning. In Information Processing in Medical Imaging; IPMI Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2017; Volume 10265. [Google Scholar] [CrossRef]
- Shen, W.; Zhou, M.; Yang, F.; Yang, C.; Tian, J. Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification. Pattern Recognit. 2017, 61, 663–673. [Google Scholar] [CrossRef]
- National Cancer Institute. Cancer Moonshot Biobank—Lung Cancer Collection (CMB-LCA). 2025. Available online: https://www.cancerimagingarchive.net/collection/cmb-lca/ (accessed on 22 August 2025).
- Al-Yasriy, H.F.; Al-Husieny, M.A. The IQ-OTH/NCCD Lung Cancer Dataset. 2020. Available online: https://www.kaggle.com/datasets/hamdallak/the-iqothnccd-lung-cancer-dataset (accessed on 11 August 2025).
- Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Lloyd, S.P. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
- Rosset, A.; Spadola, L.; Ratib, O. OsiriX: An open-source software for navigating in multidimensional DICOM images. J. Digit. Imaging 2004, 17, 205–216. [Google Scholar] [CrossRef] [PubMed]
- Fedorov, A.; Beichel, R.; Kalpathy-Cramer, J.; Finet, J.; Fillion-Robin, J.C.; Pujol, S.; Bauer, C.; Jennings, D.; Fennessy, F.; Sonka, M.; et al. 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn. Reson. Imaging 2012, 30, 1323–1341. [Google Scholar] [CrossRef]
- Wolf, I.; Vetter, M.; Wegner, I.; Nolden, M.; Bottger, T.; Hastenteufel, M.; Schobinger, M.; Kunert, T.; Meinzer, H.P. The medical imaging interaction toolkit (MITK): A toolkit facilitating the creation of interactive software by extending VTK and ITK. In Medical Imaging 2004: Visualization, Image-Guided Procedures, and Display; SPIE: Bellingham, WA, USA, 2004; Volume 5367, pp. 16–27. [Google Scholar]
- Travis, W.D.; Linnoila, R.I.; Tsokos, M.G.; Hitchcock, C.L.; Cutler, G.B.; Nieman, L.; Chrousos, G.P. Neuroendocrine tumors of the lung with proposed criteria for large-cell neuroendocrine carcinoma: An ultrastructural, immunohistochemical, and flow cytometric study of 35 cases. Am. J. Surg. Pathol. 1991, 15, 529–553. [Google Scholar] [CrossRef] [PubMed]
- Neacşu, F.; Vârban, A.Ş; Simion, G.; Şurghie, R.; Pătraşcu, O.M.; Sajin, M.; Dumitru, M.; Vrînceanu, D. Lung cancer mimickers—A case series of seven patients and review of the literature. Rom. J. Morphol. Embryol. 2021, 62, 697–704. [Google Scholar] [CrossRef]
- Koss, M.N. Pulmonary eosinophilia. Curr. Top. Pathol. 1986, 75, 109–150. [Google Scholar]
- Nutman, T.B. Evaluation and differential diagnosis of marked, persistent eosinophilia. Immunol. Allergy Clin. N. Am. 2007, 27, 529–549. [Google Scholar] [CrossRef] [PubMed]
- Coffin, C.M.; Watterson, J.; Priest, J.R.; Dehner, L.P. Extrapulmonary inflammatory myofibroblastic tumor (inflammatory pseudotumor). A clinicopathologic and immunohistochemical study of 84 cases. Am. J. Surg. Pathol. 1995, 19, 859–872. [Google Scholar] [CrossRef] [PubMed]





| Dataset | Slice Thickness (mm) | kVp | Kernel | In-Plane Resolution (mm) |
|---|---|---|---|---|
| CMB-LCA (TCIA) [28] | 0.625–2.5 | 100–140 | Soft/Sharp (B30f–B70f) | 0.5–0.9 |
| IQ-OTH/NCCD [29] | 1.0 | 120 | B30f (soft tissue) | 0.65–0.80 |
| Dataset | Subjects (n) | Age, Mean ± SD (Years) | Female (%) | Class Balance (Malignant/Benign/Normal) |
|---|---|---|---|---|
| CMB-LCA (TCIA) [28] | 160 | 63.4 ± 9.7 | 42.5 | Lung cancer (pathology confirmed) |
| IQ-OTH/NCCD [29] | 110 | 58.1 ± 10.8 | 46.3 | 40/15/55 |
| Author(s) | Methods | Accuracy (%) | Complexity | Description |
|---|---|---|---|---|
| Aberle et al. [2] | Logistic Regression, SVM | 85.6 | Moderate | Screening with low-dose CT to reduce lung cancer mortality. |
| Zhao et al. [18] | Convolutional Neural Networks (CNN) | 90.3 | High | AI-based early detection of malignant nodules on CT. |
| Liu et al. [20] | Random Forest, Gradient Boosting | 88.4 | Moderate | Prediction of epidermal growth factor receptor (EGFR) mutation in lung adenocarcinomas. |
| Our study | RF, GB, KNN, SVM | 92.5 | Moderate–High | Ensemble model with advanced preprocessing and segmentation for reliable prediction. |
| Algorithm | Accuracy (%) | Sensitivity (%) | Specificity (%) | Training Time (s) |
|---|---|---|---|---|
| Random Forest | 92.5 | 91.2 | 93.8 | 150 |
| Gradient Boosting | 94.1 | 92.8 | 95.4 | 180 |
| K-Nearest Neighbors (KNN) | 89.7 | 88.5 | 90.9 | 120 |
| Support Vector Machine (SVM) | 91.3 | 90.1 | 92.5 | 160 |
| Research | Algorithms/Neural Networks Used | Functionalities | Accuracy and Effectiveness | Ease of Use | Scientific Contributions |
|---|---|---|---|---|---|
| Rosset et al. [34] | OsiriX: Watershed, K-means, Thresholding | DICOM visualization, 3D tools, PACS integration | High accuracy in image segmentation | Intuitive interface for clinicians | Enhanced 3D visualization and segmentation tools |
| Fedorov et al. [35] | 3D Slicer: GrowCut, statistical analysis | Image segmentation, quantitative analysis, 3D rendering | High-precision segmentation | Advanced, requires learning curve | State-of-the-art segmentation and quantification |
| Wolf et al. [36] | MITK: Segmentation and visualization | Interactive image processing, modality support | Real-time manipulation, highly accurate | User-friendly and adaptable | Real-time interactive processing platform |
| Aberle et al. [2] | Logistic Regression, SVM | LDCT screening for lung cancer | Reduction in mortality | Requires trained personnel and equipment | Early detection methodology |
| Zhao et al. [18] | Convolutional Neural Networks (CNN) | Malignant lung nodule detection | High early detection accuracy | Expert-friendly interface | Advancement in malignancy detection |
| Liu et al. [20] | Random Forest, Gradient Boosting | EGFR mutation prediction | High radiomic prediction accuracy | Needs ML knowledge | Mutation prediction using imaging data |
| Hussein et al. [26] | CNN | Lung nodule risk stratification | High stratification accuracy | Radiologist-oriented design | CNN application to nodule risk analysis |
| Our study | RF, GB, KNN, SVM | DICOM processing, segmentation, prediction | High accuracy via ensemble modeling | Intuitive, accessible UI | Innovations in preprocessing, segmentation, and ensemble design |
| 1-07.dcm | 1-10.dcm | 1-15.dcm | 1-16.dcm |
|---|---|---|---|
![]() | ![]() | ![]() | ![]() |
| 0.74% | 1.60% | 23.86% | 68.31% |
| 1-17.dcm | 1-20.dcm | 1-27.dcm | 1-30.dcm |
![]() | ![]() | ![]() | ![]() |
| 82.46% | 92.73% | 42.67% | 94.67% |
| 1-35.dcm | |||
![]() | |||
| 95.26% |
| Accuracy | Precision | Recall (Sensitivity) | Specificity | F1-Score | AUC |
|---|---|---|---|---|---|
| 92.1% | 89.5% | 90.8% | 93.4% | 90.1% | 0.947 |
| Model | Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|
| Random Forest | 87.2% | 85.4% | 86.1% | 85.7% | 0.902 |
| Gradient Boosting | 88.6% | 86.7% | 87.5% | 87.1% | 0.918 |
| k-Nearest Neighbors | 84.9% | 83.2% | 82.7% | 82.9% | 0.876 |
| Support Vector Machine | 86.5% | 84.1% | 85.0% | 84.5% | 0.891 |
| Ensemble (RF + GB + KNN + SVM) | 92.1% | 89.5% | 90.8% | 90.1% | 0.947 |
| Dataset | Accuracy | Precision | Recall (Sensitivity) | Specificity | F1-Score | AUC |
|---|---|---|---|---|---|---|
| CMB-LCA (Internal Test Set) | 92.1% | 89.5% | 90.8% | 93.4% | 90.1% | 0.947 |
| IQ-OTH/NCCD (External Validation) | 88.6% | 85.2% | 87.0% | 89.7% | 86.1% | 0.918 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Silos-Sánchez, J.; Ruiz-Vanoye, J.A.; Trejo-Macotela, F.R.; Márquez-Vera, M.A.; Diaz-Parra, O.; Martínez-Mireles, J.R.; Ruiz-Jaimes, M.A.; Vera-Jiménez, M.A. Early Lung Cancer Detection via AI-Enhanced CT Image Processing Software. Diagnostics 2025, 15, 2691. https://doi.org/10.3390/diagnostics15212691
Silos-Sánchez J, Ruiz-Vanoye JA, Trejo-Macotela FR, Márquez-Vera MA, Diaz-Parra O, Martínez-Mireles JR, Ruiz-Jaimes MA, Vera-Jiménez MA. Early Lung Cancer Detection via AI-Enhanced CT Image Processing Software. Diagnostics. 2025; 15(21):2691. https://doi.org/10.3390/diagnostics15212691
Chicago/Turabian StyleSilos-Sánchez, Joel, Jorge A. Ruiz-Vanoye, Francisco R. Trejo-Macotela, Marco A. Márquez-Vera, Ocotlán Diaz-Parra, Josué R. Martínez-Mireles, Miguel A. Ruiz-Jaimes, and Marco A. Vera-Jiménez. 2025. "Early Lung Cancer Detection via AI-Enhanced CT Image Processing Software" Diagnostics 15, no. 21: 2691. https://doi.org/10.3390/diagnostics15212691
APA StyleSilos-Sánchez, J., Ruiz-Vanoye, J. A., Trejo-Macotela, F. R., Márquez-Vera, M. A., Diaz-Parra, O., Martínez-Mireles, J. R., Ruiz-Jaimes, M. A., & Vera-Jiménez, M. A. (2025). Early Lung Cancer Detection via AI-Enhanced CT Image Processing Software. Diagnostics, 15(21), 2691. https://doi.org/10.3390/diagnostics15212691















