Comparative Analysis of Machine Learning and Deep Learning Models for Lung Cancer Prediction Based on Symptomatic and Lifestyle Features
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsMain question addressed by the article is evaluating the accuracy of various machine learning models in the diagnosis of lung cancer.
AI related research is much needed in cancer care and this report adds to the literature on potential widespread applications of machine learning models in the most common malignancy worldwide.
Conclusions are mostly consistent with data presented, no ethical concerns. References are appropriate.
Comments:
Line 34, I would not classify small cell lung cancer as slow-growing, it actually is an aggressive malignancy.
Lines 55 and 96, therapeutic results deteriorate markedly if the condition goes untreated for more than 3 years. Most lung cancer patients do not survive if not treated for more than 6 months. Untreated lung cancer for >3 years is quite rare and oncologists hardly see this scenario in their entire careers. This will need to be revised.
Table 1, last line 'patience' needs changed to patient.
Symptoms such as swallowing difficulty and the history of alcohol consumption may not be common symptoms seen in lung cancer. May need to omit these variables. It is stated that shortness of breath is less significant in line 384, in-fact this is the most common symptom seen in this disease. I think this should be included as a variable.
Author Response
1. Line 34, I would not classify small cell lung cancer as slow-growing, it actually is an aggressive malignancy.
Ans. Thank you for your suggestion. I modify the sentence which can be found in Line 34-36.
Lung cancer encompasses a spectrum…and therapeutic challenges [3]. Line 34-36
2. Lines 55 and 96, therapeutic results deteriorate markedly if the condition goes untreated for more than 3 years. Most lung cancer patients do not survive if not treated for more than 6 months. Untreated lung cancer for >3 years is quite rare and oncologists hardly see this scenario in their entire careers. This will need to be revised.
Ans. Thank you for your suggestion. I modify the sentence which can be found in Line 54-57.
The sentence in Line 96 is discarded, as it is already mentioned in Lines 54-57.
The evidence demonstrates… significantly more challenging [7,8,10]. Line 54-57
- Table 1, last line 'patience' needs changed to patient.
Ans. Thank you for your suggestion. I modify the mistake, which can be found in Table-2.
The patient has lung cancer
- Symptoms such as swallowing difficulty and the history of alcohol consumption may not be common symptoms seen in lung cancer. May need to omit these variables. It is stated that shortness of breath is less significant in line 384, in-fact this is the most common symptom seen in this disease. I think this should be included as a variable.
Ans. Thank you for your suggestion. I modify the sentence which can be found in Line 474-483
Finally, literature indicated shortness… a threshold of 0.15 for inclusion. Line 474-476
Shortness of breath was eliminated … was below the required threshold. Line 476-478
On the other hand, swallowing difficulty… and retained for the analysis. Line 478-481
Future studies should consider other… (PCA), to confirm the outcome. Line 481-483
Reviewer 2 Report
Comments and Suggestions for Authors- The title is too generic; revise it to reflect the methodology or specific contribution.
- The abstract lacks clear problem articulation and specific technical details—please revise.
- Keywords are too broad; replace them with more specific and domain-relevant terms.
- Acronyms (e.g., ML, DL) should be introduced once and consistently used thereafter. Further follow this pattern throughout the manuscript (machine learning(ML))
- The problem statement is not clearly defined in the abstract and introduction—please clarify.
- The literature review is shallow. Add at least 5–7 recent studies for both ML and DL in lung disease diagnosis and include a comparative table summarizing related works with datasets, models, and results.
- The methodology section is generic. Provide a stronger approach and compare with prior studies to show novelty.
- The use of Weka is not well-justified—clearly state what parts were automated and what was custom-simulated.
- Add statistical analysis of data features to improve understanding of the dataset.
- Justify the use of Pearson correlation for feature selection—why not explore other state-of-the-art methods?
- Clarify how the three different learning rates were selected—was any tuning method applied?
- Figure 2 needs to be resized, smoothed, and improved in quality.
- Redraw Figure 3 with better flowchart formatting; remove # and correct arrow placements.
- Figures 4, 5, and 6 are blurred—redraw them or improve resolution.
- Use a table to show neural network configurations (e.g., layers, neurons) instead of multiple blurry figures.
- In Line 290–291, the formula for precision and recall has double brackets—clarify if intentional.
- Combine Figures 7 and 8 into a single plot or label subplots as a, b, c, d for clarity.
- Figure 9 seems unnecessary—move it to the methodology section or remove it.
- Subsection headings should be more specific rather than generic.
- Avoid repeating acronyms like “Machine Learning (ML)” multiple times after the first definition.
- Provide a comprehensive comparison table with previous studies and clearly highlight how your approach outperforms them.
- The conclusion is weak—restructure it to reflect contributions, key findings, and future work.
Author Response
- The title is too generic; revise it to reflect the methodology or specific contribution.
Ans. Thank you for your suggestion. I modify the Title of the study.
- The abstract lacks clear problem articulation and specific technical details—please revise.
Ans. Thank you for your suggestion. I rewrite the Abstract, which can be found in Lines 10-25.
Lung cancer is a critical global…effective early detection techniques. Line 10-11
This research seeks to improve… and lifestyle factor data from Kaggle. Line 11-13
The data preprocessing steps… and Support Vector Machines. Line 13-18
Neural Network (NN) was evaluated… 80% train/test splitting method. Line 18-21
NN model was implemented with…improve accuracy and reduce noise. Line 21-23
ML models used in the study… learning rate, attained 92.86% accuracy. Line 23-25
- Keywords are too broad; replace them with more specific and domain-relevant terms.
Ans. Thank you for your suggestion. I modify the keyword, which can be found on Line 26-27.
Lung cancer prediction; Machine learning… Correlation matrix. Line 26-27
- Acronyms (e.g., ML, DL) should be introduced once and consistently used thereafter. Further follow this pattern throughout the manuscript (machine learning (ML))
Ans. Thank you for your suggestion. I use the acronym ML, DL which can be found throughout the paper.
- The problem statement is not clearly defined in the abstract and introduction—please clarify.
Ans. Thank you for your suggestion. I rewrite the abstract and Introduction to clarify the problem statement, which can be found in Line 10-25, 58-81.
Lung cancer is a critical global…effective early detection techniques. Line 10-11
This research seeks to improve… and lifestyle factor data from Kaggle. Line 11-13
The data preprocessing steps… and Support Vector Machines. Line 13-18
Neural Network (NN) was evaluated… 80% train/test splitting method. Line 18-21
NN model was implemented with…improve accuracy and reduce noise. Line 21-23
ML models used in the study… learning rate, attained 92.86% accuracy. Line 23-25
In the last few years, ML and data… decision-making accuracy [8,9]. Lines 58-60
Researchers have utilized ML and… at the onset of treatment [6,7]. Lines 60-63
However, selecting an appropriate… its relation to patient habits [3,5]. Lines 63-64
ML automates disease prediction… patterns from large datasets [8,11]. Lines 65-66
Therefore, it is critical to develop… and facilitate timely intervention. Lines 67-68
This study systematically… factors aiming to address the concerns. Lines 68-70
This research aims to identify the ML… to fulfill the aims of the study: Lines 71-75
What are the predictive accuracy…performs best across these metrics? Lines 76-78
- How does feature selection affect… ML lung cancer detection accuracy? Lines 79
- Do DL methods, such as NN, outperform… in lung cancer prediction? Lines 80-81
- The literature review is shallow. Add at least 5–7 recent studies for both ML and DL in lung disease diagnosis and include a comparative table summarizing related works with datasets, models, and results.
Ans. Thank you for your suggestion. I rewrite the Literature review section.
Table 1 mentions the summaries of recently published studies. Table 10 compares the current study to the recently published study. The rest of the description is found in Lines 91-138.
Machine learning has become an…and classification of disease subtypes. Lines 91-94
Maurya et al. (2024) [1] conducted… for early lung cancer prediction. Lines 95-98
Khanam and Foo (2021) [11] compared…for lung disease diagnosis. Lines 98-101
Protić et al. (2023) [12] explored… efficient diagnostic models. Lines 102-104
Dudáš (2024) [13] investigated… that contribute to lung diseases. Lines 104-107
Patra (2020) [14] utilized ML… with an accuracy of 81.25%. Lines 107-110
Radhika et al. (2019) [15] performed… SVM performing second best. Lines 110-113
Conversely, NB exhibited the … individual and ensemble classifiers. Lines 113-117
DL, a subfield of artificial intelligence… promise in this field. Lines 119-122
Esteva et al. (2021) [17] provided a… and the cultivation of trust. Lines 122-126
Alzubaidi et al. (2021) [8] reviewed…used in lung disease diagnosis. Lines 126-128
Vieira et al. (2021) [18] used a data… risk factor for lung cancer. Lines 129-132
Chakraborty et al. (2024) [6] discussed… utilized in healthcare. Lines 132-135
Liu et al. (2023) [9] explored medical… of lung disease diagnosis. Lines 135-138
- The methodology section is generic. Provide a stronger approach and compare with prior studies to show novelty.
Ans. Thank you for your suggestion. I rewrite the methodology section, which can be found in Line 156-327.
A robust and well-defined methodology…employed in this study. Line 156-159
Furthermore, it goes beyond a…the current study's methodology. Line 159-162
This study utilized a lung cancer…connected to Google LLC. Line 164-166
While some studies focus primarily…diagno-sis [17,21]. Line 166-168
Unlike Patra (2020) [14], which…larger sample size for analysis. Line 168-170
The 'Lung_Cancer' attribute is… 2 to represent patients without it. Line 170-173
A detailed statistical analysis of the… data's inherent characteristics. Line 174-176
These insights are crucial for… the model's performance. Line 176-179
By understanding the data's properties… feature predictiveness. Line 179-182
Weka (Waikato Environment for… to automate several ML tasks. Line 183-184
Weka provides pre-built…of machine learning classifiers [11]. Line 184-188
The utilization of Weka facilitated… common ML workflow steps. Line 189-190
However, NN models were custom…the deep learning architectures. Line 190-192
This hybrid approach allowed us… and Python's customizability. Line 192-193
The NN uses Python programming… Notebook environment. Line 194
The data has been primarily analyzed… between ages 55 and 75. Line 200-202
Data preprocessing transforms raw data… while 39 are negative. Line 204-208
The current study uses… 276 data were included in the next phase. Line 210-212
Unlike Radhika et al. [15], which…data was found in the dataset. Line 214-216
The “InterquartileRange” filter of Weka… and extreme values. Line 218-219
Pearson's correlation technique… and the target variable [19]. Line 221-223
It also provides a straightforward… relevant linear relationships. Line 223-227
The coefficient, ranging from… suggests a significant association. Line 227-229
Weka's “correlation” filter determines…of the correlation coefficient. Line 229-231
The value 0.15 was used as a threshold… dysphagia, and allergies. Line 231-234
In line with Khanam and Foo [11]… of 0 to 1 using min-max scaling . Line 238-239
The "Normalize" filter in Weka was…mean value after normalization. Line 239-241Following preprocessing, the dataset… and 238 with lung cancer. Line 244-245
Figure 2 shows the correlation… indicating the highest correlation. Line 246-248
The dataset was divided into 80% for… set for evaluation. Line 255-257
We further employed 10-fold… imbalanced datasets like this study. Line 257-261
In the K-fold cross-validation…the outcomes from all K iterations. Line 216-264
We implemented the ML algorithms… tuning using grid search. Line 266-268
Unlike Alzubaidi et al. (2021) [8], which… architectural comparison. Line 268-271
We evaluated several ML algorithms… for classification tasks. Line 271-275
We developed three distinct NN… the results were compared. Line 279-281
Within a NN, the activation function… sigmoid and ReLU. Line 281-283
The NN models were developed by…during backpropagation. Line 283-286
Additionally, Stochastic Gradient Descent… on the loss gradient. Line 286-288
We conducted experiments with… partitioning train and test data. Line 289-291
For K-fold cross-validation, we utilized…nature of the target variable. Line 291-293
The NN model was constructed… layer represent the eight attributes. Line 295-296
The hidden layer consists of a…model with a single hidden layer. Line 296-299
A four-layer NN model was… as the single-hidden-layer model. Line 303-305
The second layer comprises 41 hidden… with two hidden layers. Line 305-308
The input and output layers…layer Neural Network (NN). Line 312-313
The second, third, and fourth… NN model is depicted in Figure 6. Line 313-316
Table 5 provides a detailed… NN models used in this study. Line 319-320
The learning rates of 0.1, 0.01, and… convergence and performance. Line 323-326
While a more complete hyperparameter… sensitivity to learning rate. Line 326-328
- The use of Weka is not well-justified—clearly state what parts were automated and what was custom-simulated.
Ans. Thank you for your concern. I try to include the explanation why Weka is used in the current study, which can be found in Lines 183-193
Weka (Waikato Environment for… to automate several ML tasks. Line 183-184
Weka provides pre-built…of machine learning classifiers [11]. Line 184-188
The utilization of Weka facilitated… common ML workflow steps. Line 189-190
However, NN models were custom…the deep learning architectures. Line 190-192
This hybrid approach allowed us… and Python's customizability. Line 192-193
- Add statistical analysis of data features to improve understanding of the dataset.
Ans. Thank you for your concern. I explain why statistical analysis for data feature was used in the current study, which can be found in Lines 174-182
A detailed statistical analysis…interpret model outcomes effectively. Lines 174-175
Descriptive statistics, revealing… the data's inherent characteristics. Lines 175-176
These insights are crucial for… the model's performance. Lines 176-179
By understanding the data's properties… feature predictiveness. Lines 179-182
- Justify the use of Pearson correlation for feature selection—why not explore other state-of-the-art methods?
Ans. Thank you for your concern. I try to include the explanation why Pearson correlation for feature selection was used in the current study, which can be found in Lines 221-234
Pearson's correlation technique… and the target variable [19]. Line 221-223
It also provides a straightforward… relevant linear relationships. Line 223-227
The coefficient, ranging from… suggests a significant association. Line 227-229
Weka's “correlation” filter determines…of the correlation coefficient. Line 229-231
The value 0.15 was used as a threshold… dysphagia, and allergies. Line 231-234
- Clarify how the three different learning rates were selected—was any tuning method applied?
Ans. Thank you for your concern. I explain why three different learning rates are selected, which can be found in Lines 323-328
The learning rates of 0.1, 0.01, and …preliminary experimentation. Lines 323-324
These values represent a range from… convergence and performance. Lines 324-326
While a more complete hyperparameter… sensitivity to learning rate. Lines 326-328
- Figure 2 needs to be resized, smoothed, and improved in quality.
Ans. Thank you for your suggestion. I redraw Figure 2.
- Redraw Figure 3 with better flowchart formatting; remove # and correct arrow placements.
Ans. Thank you for your suggestion. I redraw Figure 2.
- Figures 4, 5, and 6 are blurred—redraw them or improve resolution.
Ans. Thank you for your suggestion. I redraw Figures 4, 5, 6.
- Use a table to show neural network configurations (e.g., layers, neurons) instead of multiple blurry figures.
Ans. Thank you for your suggestion. I redraw Figures 4, 5, 6 and include a Table (Table 5) to describe configuration of different NN models.
- In Line 290–291, the formula for precision and recall has double brackets—clarify if intentional.
Ans. Thank you for your suggestion. I modify the mistakes, which can be found in Lines 338-341
- Combine Figures 7 and 8 into a single plot or label subplots as a, b, c, d for clarity.
Ans. Thank you for your suggestion. I label the subfigure as a, b, c, and d.
- Figure 9 seems unnecessary—move it to the methodology section or remove it.
Ans. Thank you for your suggestion. I delete the Figure 9.
- Subsection headings should be more specific rather than generic.
Ans. Thank you for your suggestion. I modify several subsection headings, which can be found throughout the paper.
2.1 Machine Learning Applications in Lung Disease Diagnosis Line 90
2.2 DL Applications in Lung Disease Diagnosis Line 118
2.3 Research Deficiencies and Prospective Avenues Line 140
3.1. Dataset Description, Features, and Tools Line 163
3.1.1. Descriptive Statistics Line 199
3.1.4. Identification of missing values Line 213
3.1.5. Identification and elimination of outliers Line 217
3.5.2. Data Normalization Techniques Line 237
3.3. Experimental Setup for Testing and Training data Line 254
3.5.4. Selection of different learning rates Line 322
4.1. Performance of Machine Learning Algorithms Line 330
5.1 Evaluation of Machine Learning Models Line 398
5.2 Efficacy of NN in Lung Cancer Diagnosis Line 408
5.3 Influence of Data Preprocessing and Feature Selection Line 421
5.4 Comparative Analysis with Prior Studies Line 430
- Avoid repeating acronyms like “Machine Learning (ML)” multiple times after the first definition.
Ans. Thank you for your suggestion. I use the acronym ML, which can be found throughout the paper.
- Provide a comprehensive comparison table with previous studies and clearly highlight how your approach outperforms them.
Ans. Thank you for your suggestion. I include a Table (Table 10) to compare the findings of the current study with the previous studies, which can be found in Line 440.
- The conclusion is weak—restructure it to reflect contributions, key findings, and future work.
Ans. Thank you for your suggestion. I rewrite the conclusion which can be found on Line 443-449.
This study investigated the efficacy… to improve patient outcomes. Lines 443-446
The research provides a comparative… and lifestyle factors. Lines 443-447
This shows how a rigorous data… performance and reduced noise. Lines 447-449
The study identifies coughing, wheezing… significant risk indicators. Lines 449-451
It evaluates the performance of… for lung cancer prediction. Lines 451-454
Several ML models, including… for effective lung cancer prediction. Lines 455-456
DL models, notably a three-hidden-layer… patterns in the data. Lines 456-459
- The English could be improved to more clearly express the research.
Ans. Thank you for your suggestion. I modify the English writing, which can be found throughout the paper.
Lung cancer is a critical global…effective early detection techniques. Line 10-11
This research seeks to improve… and lifestyle factor data from Kaggle. Line 11-13
The data preprocessing steps… and Support Vector Machines. Line 13-18
Neural Network (NN) was evaluated… 80% train/test splitting method. Line 18-21
NN model was implemented with…improve accuracy and reduce noise. Line 21-23
ML models used in the study… learning rate, attained 92.86% accuracy. Line 23-25
Lung cancer prediction; Machine learning… Correlation matrix. Line 26-27
Lung cancer is a critical global…effective early detection techniques. Line 10-11
This research seeks to improve… and lifestyle factor data from Kaggle. Line 11-13
The data preprocessing steps… and Support Vector Machines. Line 13-18
Neural Network (NN) was evaluated… 80% train/test splitting method. Line 18-21
NN model was implemented with…improve accuracy and reduce noise. Line 21-23
ML models used in the study… learning rate, attained 92.86% accuracy. Line 23-25
In the last few years, ML and data… decision-making accuracy [8,9]. Lines 58-60
Researchers have utilized ML and… at the onset of treatment [6,7]. Lines 60-63
However, selecting an appropriate… its relation to patient habits [3,5]. Lines 63-64
ML automates disease prediction… patterns from large datasets [8,11]. Lines 65-66
Therefore, it is critical to develop… and facilitate timely intervention. Lines 67-68
This study systematically… factors aiming to address the concerns. Lines 68-70
This research aims to identify the ML… to fulfill the aims of the study: Lines 71-75
What are the predictive accuracy…performs best across these metrics? Lines 76-78
- How does feature selection affect… ML lung cancer detection accuracy? Lines 79
- Do DL methods, such as NN, outperform… in lung cancer prediction? Lines 80-81
Machine learning has become an…and classification of disease subtypes. Lines 91-94
Maurya et al. (2024) [1] conducted… for early lung cancer prediction. Lines 95-98
Khanam and Foo (2021) [11] compared…for lung disease diagnosis. Lines 98-101
Protić et al. (2023) [12] explored… efficient diagnostic models. Lines 102-104
Dudáš (2024) [13] investigated… that contribute to lung diseases. Lines 104-107
Patra (2020) [14] utilized ML… with an accuracy of 81.25%. Lines 107-110
Radhika et al. (2019) [15] performed… SVM performing second best. Lines 110-113
Conversely, NB exhibited the … individual and ensemble classifiers. Lines 113-117
DL, a subfield of artificial intelligence… promise in this field. Lines 119-122
Esteva et al. (2021) [17] provided a… and the cultivation of trust. Lines 122-126
Alzubaidi et al. (2021) [8] reviewed…used in lung disease diagnosis. Lines 126-128
Vieira et al. (2021) [18] used a data… risk factor for lung cancer. Lines 129-132
Chakraborty et al. (2024) [6] discussed… utilized in healthcare. Lines 132-135
Liu et al. (2023) [9] explored medical… of lung disease diagnosis. Lines 135-138
A robust and well-defined methodology…employed in this study. Line 156-159
Furthermore, it goes beyond a…the current study's methodology. Line 159-162
This study utilized a lung cancer…connected to Google LLC. Line 164-166
While some studies focus primarily…diagno-sis [17,21]. Line 166-168
Unlike Patra (2020) [14], which…larger sample size for analysis. Line 168-170
The 'Lung_Cancer' attribute is… 2 to represent patients without it. Line 170-173
A detailed statistical analysis of the… data's inherent characteristics. Line 174-176
These insights are crucial for… the model's performance. Line 176-179
By understanding the data's properties… feature predictiveness. Line 179-182
Weka (Waikato Environment for… to automate several ML tasks. Line 183-184
Weka provides pre-built…of machine learning classifiers [11]. Line 184-188
The utilization of Weka facilitated… common ML workflow steps. Line 189-190
However, NN models were custom…the deep learning architectures. Line 190-192
This hybrid approach allowed us… and Python's customizability. Line 192-193
The NN uses Python programming… Notebook environment. Line 194
The data has been primarily analyzed… between ages 55 and 75. Line 200-202
Data preprocessing transforms raw data… while 39 are negative. Line 204-208
The current study uses… 276 data were included in the next phase. Line 210-212
Unlike Radhika et al. [15], which…data was found in the dataset. Line 214-216
The “InterquartileRange” filter of Weka… and extreme values. Line 218-219
Pearson's correlation technique… and the target variable [19]. Line 221-223
It also provides a straightforward… relevant linear relationships. Line 223-227
The coefficient, ranging from… suggests a significant association. Line 227-229
Weka's “correlation” filter determines…of the correlation coefficient. Line 229-231
The value 0.15 was used as a threshold… dysphagia, and allergies. Line 231-234
In line with Khanam and Foo [11]… of 0 to 1 using min-max scaling . Line 238-239
The "Normalize" filter in Weka was…mean value after normalization. Line 239-241Following preprocessing, the dataset… and 238 with lung cancer. Line 244-245
Figure 2 shows the correlation… indicating the highest correlation. Line 246-248
The dataset was divided into 80% for… set for evaluation. Line 255-257
We further employed 10-fold… imbalanced datasets like this study. Line 257-261
In the K-fold cross-validation…the outcomes from all K iterations. Line 216-264
We implemented the ML algorithms… tuning using grid search. Line 266-268
Unlike Alzubaidi et al. (2021) [8], which… architectural comparison. Line 268-271
We evaluated several ML algorithms… for classification tasks. Line 271-275
We developed three distinct NN… the results were compared. Line 279-281
Within a NN, the activation function… sigmoid and ReLU. Line 281-283
The NN models were developed by…during backpropagation. Line 283-286
Additionally, Stochastic Gradient Descent… on the loss gradient. Line 286-288
We conducted experiments with… partitioning train and test data. Line 289-291
For K-fold cross-validation, we utilized…nature of the target variable. Line 291-293
The NN model was constructed… layer represent the eight attributes. Line 295-296
The hidden layer consists of a…model with a single hidden layer. Line 296-299
A four-layer NN model was… as the single-hidden-layer model. Line 303-305
The second layer comprises 41 hidden… with two hidden layers. Line 305-308
The input and output layers…layer Neural Network (NN). Line 312-313
The second, third, and fourth… NN model is depicted in Figure 6. Line 313-316
Table 5 provides a detailed… NN models used in this study. Line 319-320
The learning rates of 0.1, 0.01, and… convergence and performance. Line 323-326
While a more complete hyperparameter… sensitivity to learning rate. Line 326-328
This study investigated the efficacy… to improve patient outcomes. Lines 443-446
The research provides a comparative… and lifestyle factors. Lines 443-447
This shows how a rigorous data… performance and reduced noise. Lines 447-449
The study identifies coughing, wheezing… significant risk indicators. Lines 449-451
It evaluates the performance of… for lung cancer prediction. Lines 451-454
Several ML models, including… for effective lung cancer prediction. Lines 455-456
DL models, notably a three-hidden-layer… patterns in the data. Lines 456-459
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsCorrections have been made as suggested. No ethical concerns identified. Limitation of the study has been discussed at conclusion, with notable mention about the Pearson correlation which only provides evaluation of a linear relationship. References are appropriate. Tables and figures are appropriate and explanatory.paper is clear with scientific integrity. Conclusions appear to be consistent with evidence presented. Findings provide an advancement of current medical knowledge of AI methods used in lung cancer diagnosis.
Author Response
Thank you Reviewer for his/her valuable comments. I have learnt from the reviewer's comments. Thank you.
Reviewer 2 Report
Comments and Suggestions for Authors- The keywords should include more than five and be more specific to the research domain. Currently, terms like Machine Learning, Deep Learning, and Feature Selection are too general.
- Acronyms (e.g., ML, DL) should be introduced only once (e.g., machine learning (ML)) and used consistently throughout the manuscript. Avoid acronyms in the abstract. Additionally, ensure lowercase formatting for terms when defining acronyms (e.g., neural network (NN), not Neural Network (NN)).
- The abstract lacks a clear and well-defined problem statement. Please revise to highlight the research gap and the significance of your contribution.
- Table 1 should follow the journal’s formatting guidelines and include a comprehensive, descriptive caption rather than a generic title.
- The methodology section is too general. Begin with an overview that justifies your approach to solving the research problem. Include a well-designed Figure 1 (flowchart) and explain each methodological step in alignment with the figure.
- Include statistical analysis (e.g., mean, standard deviation) for numerical features in the dataset to improve clarity and understanding.
- Replace multiple unclear neural network diagrams with a single well-formatted table summarizing network configurations (e.g., layers, neurons) in Table 5. Provide explanation in the text accordingly.
- Table 10 should be revised for journal formatting and caption style. Rather than a narrative or limitation-based comparison, include a quantitative performance comparison with previous studies. Discuss limitations and justification for superiority in the main text.
- Include the ROC curve for the best-performing machine learning model to support the evaluation metrics.
Author Response
1. The keywords should include more than five and be more specific to the research domain. Currently, terms like Machine Learning, Deep Learning, and Feature Selection are too general.
Ans. Thank you for your suggestion. I have included modified the keywords, which can be found on Line 27-28.
Keywords: Lung cancer prediction… Correlation matrix. Line-29-30
2. Acronyms (e.g., ML, DL) should be introduced only once (e.g., machine learning (ML)) and used consistently throughout the manuscript. Avoid acronyms in the abstract. Additionally, ensure lowercase formatting for terms when defining acronyms (e.g., neural network (NN), not Neural Network (NN)).
Ans. Thank you for your suggestion. I have deleted Acronyms ML, DL, and NN from the abstract and Keywords. I have started using acronyms from the Introduction section.
This research seeks to improve…and lifestyle factor data from Kaggle. Line 11-14
Results highlight the importance…the machine learning models. Line 23-26
In the last few years, machine learning…making accuracy [8,9]. Line 61-63
This study systematically evaluates… to ad-dress the concerns. Line 71-74
- Do DL methods, such as neural… in lung cancer prediction? Line 82-83
3. The abstract lacks a clear and well-defined problem statement. Please revise to highlight the research gap and the significance of your contribution.
Ans. Thank you for your suggestion. I have rewritten the abstract and addressed the issues mentioned by you, which can be found on Line 11-18, 26-28.
This research seeks to improve…and lifestyle factor data from Kaggle. Line 11-14
This research addresses this… and lifestyle factor dataset from Kaggle. Line 14-18
This study contributes to developing… improved patient outcomes. Line 26-28
4. Table 1 should follow the journal’s formatting guidelines and include a comprehensive, descriptive caption rather than a generic title.
Ans. Thank you for your suggestion. I have reformatted Table 1 according to the Journal’s formatting guidelines and rewritten the heading, which can be found on Lines 141-142.
Table 1: Comparative Analysis of ML and DL… Prediction Research Line 141-142
5. The methodology section is too general. Begin with an overview that justifies your approach to solving the research problem. Include a well-designed Figure 1 (flowchart) and explain each methodological step in alignment with the figure.
Ans. Thank you for your suggestion. I have modified the methodology section according to your suggestions, which can be found on Line 159-308.
This section details the methodology… of research findings. Line 160-162.
The overall work-flow, depicted in…models' predictive capabilities. Line162-165
The study utilized a lung cancer… connected to Google LLC. Line 170-172
The dataset comprises clinical features,… data and CNNs [17,21]. Line 172-173
Unlike some previous work with… independent predictors. Line 173-177
The 'Lung_Cancer' attribute uses… patients without it. Line 177-178
A detailed statistical analysis, including… model performance. Line 179-182
Table 2 presents the dataset's… means, and standard deviations. Line 182-183
Table 2 describes the dataset used… for numeric attributes. Line 184-186
This table is essential for understanding… preprocessing or analysis. Line 186-187
The data has been primarily analyzed… between ages 55 and 75. Line 189-192
Weka (Waikato Environment for… to automate several ML tasks. Line 195-196
Weka provides pre-built implementations… learning classifiers [11]. Line 196-200
The utilization of Weka facilitated the…common ML workflow steps. Line 201-202
However, NN models were custom… the deep learning architectures. Line 202-203
This hybrid approach allowed us to… and Python's customizability. Line 204-205
The NN uses Python programming… Jupyter Notebook environment. Line 206
Data preprocessing is a critical step to… the accuracy of ML models. Line 208-209
The following preprocessing steps… in 276 remaining data points. Line 210-211
Missing Value Handling: Weka's…used to check for missing values. Line 211-212
Unlike some studies that use simple… values were found in the dataset. Line 213-214
Outlier Detection and Removal: Weka's… No outliers were detected. Line 215-216
Pearson's correlation technique was… attributes and the target variable. Line 218-220
The correlation coefficient, ranging… a significant association. Line 221-223
Weka's “correlation” filter determines…input and output properties. Line 223-224
A threshold of 0.15 was set to determine… the correlation coefficients. Line 225-228
To improve the computational efficiency… data to the range of 0 to 1. Line 232-233
The "Normalize" filter in Weka was applied… after normalization. Line 233-235
Figure 3 shows the correlation… indicating the highest correlation. Line 238-240
The preprocessed dataset, now consisting… and 20% for testing. Line 246-247
To ensure a robust estimate of model… for imbalanced datasets [11,19]. Line 247-250
In the K-fold cross-validation … remaining K-1 folds for training. Line 250-251
This process is repeated until each … outcomes from all K iterations. Line 251-253
Several ML algorithms such as… For the KNN model, K was set to 7. Line 256-258
Neural Network (NN) models with… training duration on performance. Line 260-263
ReLU activation functions were used… determine the optimal value. Line 263-265
The NN models were developed by… error during backpropagation. Line 266-268
Additionally, Stochastic Gradient… based on the loss gradient. Line 268-270
We conducted experiments with… partitioning train and test data. Line 271-273
For K-fold cross-validation, we… the architectures of the NN models. Line 273-276
The single hidden layer NN model was… and an output layer. Line 278-279
Eight neurons in the input … function for binary classification. Line 279-282
A four-layer NN model was… as the single-hidden-layer model. Line 283-285
The second layer comprises 41 hidden… activation function. Line 285-287
The input and output layers (first and fifth)… Neural Network (NN). Line 288-289
The second, third, and fourth hidden…the ReLU activation function. Line 289-291
The learning rates of 0.1, 0.01… and preliminary experimentation. Line 293-294
These values represent a range… convergence and performance. Line 294-296
While a more complete hyperparameter…sensitivity to learning rate. Line 296-298
Model performance was evaluated…used for the evaluation metrics: Line 300-302
6. Include statistical analysis (e.g., mean, standard deviation) for numerical features in the dataset to improve clarity and understanding.
Ans. Thank you for your suggestion. I have included statistical analysis (mean, standard deviation) for numerical features in the Table 2.
7. Replace multiple unclear neural network diagrams with a single well-formatted table summarizing network configurations (e.g., layers, neurons) in Table 5. Provide explanation in the text accordingly.
Ans. Thank you for your suggestion. I have deleted all unclear figures (Figures 4-6) and Kept Table 5. I also explain the formation of each NN model separately, which can be found on Lines 278-298.
The single hidden layer NN model was… and an output layer. Line 278-279
Eight neurons in the input … function for binary classification. Line 279-282
A four-layer NN model was… as the single-hidden-layer model. Line 283-285
The second layer comprises 41 hidden… activation function. Line 285-287
The input and output layers (first and fifth)… Neural Network (NN). Line 288-289
The second, third, and fourth hidden…the ReLU activation function. Line 289-291
The learning rates of 0.1, 0.01… and preliminary experimentation. Line 293-294
These values represent a range… convergence and performance. Line 294-296
While a more complete hyperparameter…sensitivity to learning rate. Line 296-298
8. Table 10 should be revised for journal formatting and caption style. Rather than a narrative or limitation-based comparison, include a quantitative performance comparison with previous studies. Discuss limitations and justification for superiority in the main text.
Ans. Thank you for your suggestion. I have modified Table 10 and explained the superiority of the current study in the main text, which can be found on Lines 404-422.
The outcomes of this study corroborate… cancer diagnosis (Table 10). Line 404-405
The current research has shown that… on medical datasets [1,16,14]. Line 405-407
Patra [14] achieved an accuracy… Naïve Bayes, and Decision Trees. Line 407-409
Maurya et al. [1] reached a maximum… Gradient-Boosted Trees. Line 409-410
These findings illustrate a gradual… techniques have advanced. Line 410-412
The current study notably surpasses… an accuracy of 92.86%. Line 413-414
This outcome underscores the… learning rate and epoch selection. Line 414-417
Importantly, this study further illustrates… ML models [16,22]. Line 418-419
The findings indicate that NN… symptomatic and lifestyle data. Line 419-422
9. Include the ROC curve for the best-performing machine learning model to support the evaluation metrics.
Ans. Thank you for your suggestion. I include the ROC curve of KNN, which can be found on Line 325.