Next Article in Journal
Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning
Previous Article in Journal
Unbalance Detection in Induction Motors through Vibration Signals Using Texture Features
 
 
Article
Peer-Review Record

Explainable Mortality Prediction Model for Congestive Heart Failure with Nature-Based Feature Selection Method

Appl. Sci. 2023, 13(10), 6138; https://doi.org/10.3390/app13106138
by Nusrat Tasnim 1,2, Shamim Al Mamun 1,3,*, Mohammad Shahidul Islam 1, M. Shamim Kaiser 1,3 and Mufti Mahmud 3,4
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2023, 13(10), 6138; https://doi.org/10.3390/app13106138
Submission received: 13 February 2023 / Revised: 28 March 2023 / Accepted: 18 April 2023 / Published: 17 May 2023

Round 1

Reviewer 1 Report

Review MDPI Applied Sciences

Title: Explainable Mortality Prediction Model for Congestive Cardiac Failure with Nature-Based Feature Selection Method

 

The authors explored the combination of ML models with nature-based feature selection to predict mortality in ICU for heart failure patients. The predictions from 4 ML models and 3 nature-based algorithms were studied using accuracy and AUROC as metrics. Moreover, feature importance was studied using SHAP. In the reviewer’s opinion, this manuscript should be reconsidered after major revisions for reasons that will follow, but mainly because there is no clear contribution to the field.

 

In section 2.5. the authors describe the research gaps. However, when reading the paragraph, no clear gaps are identified. For instance, the first gap is described as nature-based algorithms are less used in mortality prediction. This means that some people have studied them and, therefore, it is not a gap, unless clarification can be added. The second gap is not a gap since that gap, in theory, as been filled by the other ML studies. Finally, the third gap, again, is not a gap. The use of the word “fewer” means that some work has presented explainable AI in the context of mortality prediction. Again, this needs more clarification to be considered as a gap in the literature.

 

Back to page 2, the authors state their contributions with bullet points. Clearly, study and comparison with previous work is not a contribution. This is just how research should be conducted. As for the other bullet points, these are not clear to the reviewer’s if these are actual contributions. They sound like a general approach to a classification problem (i.e., training and prediction, results analysis, and discussion of the model).

 

Section Background Studies does not provide a lot of insights in the field. Although I don’t have specific comments, I would like to mention that it reads like a list of papers. Table 1 makes a very useful summary of the field, but the text should take the reader by the hand and show why your work is important.

 

In the Dataset Description, it should be mentioned how balanced is the dataset. Moreover, some information regarding the delay between the measured features and death should be present.

 

Page 8 should be removed. It is not required to describe how to compute the mean (and all the other metrics) to the scientific community.

 

Can you please describe why the independent column "outcome" was imputed with the column’s most frequent value (page 11, 2nd paragraph? I am OK with doing that with the inputs to the model, but we can argue that if you don’t have the output, the subject should be discarded.

 

In Section 3.4, there is no need for describing the nature-based algorithms since you have not been developing those. Just provide a general understanding to the reader and a reference to the original papers. Same thing for Section 3.5 – ML algorithms.

 

Please describe the training and testing procedure for your algorithms (cross-validation, hyperparameter tuning, etc.).

 

Table 4 and Figure 5 duplicates the information. Please remove one of the two. If you keep the figure, make it with better resolution. I would also suggest not using Excel.

 

On page 17, third paragraph, you mentioned the common features of the different nature-based algorithms. Surprisingly, the blood sodium, which is the most important feature according to SHAP, is not a common feature. Can you clarify why that is?

 

Please, remove Table 6, since all the information is available in Figure 6.

 

In Figure 6, the (a) and (c) are underneath the graphs. Also, make the figure font size consistent. Moreover, the (b) and (d) graphs have grey under them.

 

On page 20, second paragraph, you say the color represents the impact. Should it represent the value of the feature? Please clarify.

 

On page 22, first line, you say “… the prediction more likely…”. Please clarify more likely to what?

 

Please clarify what are interpretation 1, 2, and 3. Are they different subjects?

 

In the title, change “Cardiac Failure” to “Heart Failure” to be consistent with the text.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

Thank you for your careful reading and detailed response to our manuscript. We have added a point-to-point response to your comment in the revised version of our manuscript. We have felt after the incorporation of the changes you suggested, manuscript quality has been greatly improved. We have highlighted all the changes we made in blue color. 

 

Comments and Response:

 

The authors explored the combination of ML models with nature-based feature selection to predict mortality in ICU for heart failure patients. The predictions from 4 ML models and 3 nature-based algorithms were studied using accuracy and AUROC as metrics. Moreover, feature importance was studied using SHAP. In the reviewer’s opinion, this manuscript should be reconsidered after major revisions for reasons that will follow, but mainly because there is no clear contribution to the field.

  1. In section 2.5. the authors describe the research gaps. However, when reading the paragraph, no clear gaps are identified. For instance, the first gap is described as nature-based algorithms are less used in mortality prediction. This means that some people have studied them and, therefore, it is not a gap, unless clarification can be added. The second gap is not a gap since that gap, in theory, as been filled by other ML studies. Finally, the third gap, again, is not a gap. The use of the word “fewer” means that some work has presented explainable AI in the context of mortality prediction. Again, this needs more clarification to be considered as a gap in the literature.

Author response: 

We are sorry that the research gaps were not mentioned clearly in the manuscript. The research gaps section is now re-written and included in the introduction section like below-

 

“Background research in the field of ICU mortality prediction demonstrates that there is still room for improvement and development by carrying out comparative studies with feature selection using nature-based algorithms vs no feature selection. Likewise, using Explainable AI to determine which features predominate features for the prediction process for patients with heart failure can be an extension to make the models more transparent and trustworthy. We conducted our research using these points as research gaps.”

 

  1.   Back to page 2, the authors state their contributions with bullet points. Clearly, study and comparison with previous work is not a contribution. This is just how research should be conducted. As for the other bullet points, these are not clear to the reviewer’s if these are actual contributions. They sound like a general approach to a classification problem (i.e., training and prediction, results analysis, and discussion of the model).

Author response: 

We regret that our contributions were not correctly and clearly written. According to this suggestion, contributions were rewritten in the manuscript like below-

  • This work broadens the field of mortality prediction in the ICU for heart failure patients by examining the impact of several nature-based feature selection methods on the prediction.
  • An assessment of the role of features in the prediction process was analyzed by SHAP in this study. Thus, it provides insight into the determinant feature which decided mortality in ICU.

 

  1.   Section Background Studies does not provide a lot of insights in the field. Although I don’t have specific comments, I would like to mention that it reads like a list of papers. Table 1 makes a very useful summary of the field, but the text should take the reader by the hand and show why your work is important.

Author response: 

We regret that the background studies section does not offer many new perspectives on the subject. Some additional summaries are provided in this section to demonstrate the significance of our work after discussing the research articles on the relevant subject which highlighted the areas where more research may be done to further grow and enhance this industry.

  1.   In the Dataset Description, it should be mentioned how balanced is the dataset. Moreover, some information regarding the delay between the measured features and death should be present.

Author response: 

We regret that we did not mention this issue in the manuscript. The dataset is not a balanced dataset. The data balancing was done using SMOTE technique. This is now added in the pre-processing section like the following-

 

“It was also found that the dataset was an imbalanced one. There were 1017 records found for output '0' or alive and 159 records for the class '1' or death. Synthetic Minority Oversampling Technique (SMOTE) method, was used in this study to handle the imbalanced data. SMOTE  is a statistical technique for increasing the number of cases in a balanced manner in your dataset. The component creates new instances from existing minority cases.”

 

After balancing the data, we observed changes in the performances of the models which have been modified in the manuscript.

  1.   Page 8 should be removed. It is not required to describe how to compute the mean (and all the other metrics) to the scientific community.

Author response: 

Thank you for this suggestion. According to this comment, the computing procedure of mean and all other metrics are removed.

  1.   Can you please describe why the independent column "outcome" was imputed with the column’s most frequent value (page 11, 2nd paragraph? I am OK with doing that with the inputs to the model, but we can argue that if you don’t have the output, the subject should be discarded.

Author response: 

Yes, we agree that discarding the whole row can also be an option for handling null values. But it may create a loss of data and information. That’s why it was decided to impute independent columns with the most frequent values.

  1.   In Section 3.4, there is no need for describing the nature-based algorithms since you have not been developing those. Just provide a general understanding to the reader and a reference to the original papers. Same thing for Section 3.5 – ML algorithms.

Author response: 

Thank you for this suggestion. According to this suggestion, in sections 3.4 and 3.5 only a general understanding of the utilized algorithm is given. All the detailed information from this section is removed.

  1. Please describe the training and testing procedure for your algorithms (cross-validation, hyperparameter tuning, etc.).

Author response: 

We utilized 5-fold cross-validation in the training and testing procedure. This is now mentioned in the result analysis section like the following:

“In this study, a 5-fold cross-validation process was used for training and testing. The final accuracy is determined by averaging the accuracy results from each fold.”



 

  1.   Table 4 and Figure 5 duplicates the information. Please remove one of the two. If you keep the figure, make it with better resolution. I would also suggest not using Excel.

Author response: 

Thank you for this suggestion. According to the suggestion, Figure 5 is removed from the manuscript as Table 4 and figure 5 contain duplicate information.

 

  1. On page 17, third paragraph, you mentioned the common features of the different nature-based algorithms. Surprisingly, blood sodium, which is the most important feature according to SHAP, is not a common feature. Can you clarify why that is?

Author response: 

We should not deny the feature Blood Sodium was not selected commonly by the utilized algorithms behind which there is no specific reason in our knowledge. But this feature was selected by both FPA and SHAP and FPA works better among these three utilized algorithms. 

  1. Please, remove Table 6, since all the information is available in Figure 6.

Author response: 

Thank you for this suggestion. According to this suggestion, table 6 is removed.

  1. In Figure 6, the (a) and (c) are underneath the graphs. Also, make the figure font size consistent. Moreover, the (b) and (d) graphs have grey under them.

Author response: 

We are really sorry for this mistake. According to this suggestion, the subfigures of figure 6 are re-oriented so that no caption becomes hidden underneath any figure. Grey under the (b) and (d) graphs have been removed.

  1. On page 20, second paragraph, you say the color represents the impact. Should it represent the value of the feature? Please clarify.

Author response: 

Our apologies for writing a piece of unclear information on SHAP. Yes, the color represents the value of a feature. Red indicates a feature's value is higher. Blue denotes features with a lesser value.

This paragraph is rewritten in the manuscript clearly.

 

  1. On page 22, first line, you say “… the prediction more likely…”. Please clarify more likely to what?

Author response: 

Our apologies again for writing an incomplete line in the manuscript. This line is now re-written in the manuscript like below-

 

“Features that make the prediction more likely to positive outcome or death in this study are displayed in red, while those that make it more likely to negative outcome or survival in this study are displayed in blue.”

 

  1. Please clarify what are interpretations 1, 2, and 3. Are they different subjects?

Author response: 

Yes, they are different instances meaning they are three different rows taken from the utilized dataset to observe how SHAP predict their outcome.

  1. In the title, change “Cardiac Failure” to “Heart Failure” to be consistent with the text.

Author response: 

Thank you for this suggestion. According to this comment, “Cardiac Failure” is changed to “Heart Failure” in the title.

Reviewer 2 Report

Dear Colleges,

if you want to write a paper of the form:

with feature selection A)  the algorithm performs better than

B) without feature selection

calls for some statistical proof (testing) involving

Means and Variances of a NUMBER of trials!.

Then a statistical test schould be used that the distribution of A) is different from the distribution of B.

Your results (see table 4 resp. fig 5) do not contain several trials and thus

you paper is non conclusive...

Author Response

Dear Reviewer,

Thank you for your careful reading and thoughtful response to our manuscript. We have added a point-to-point response to your comment in the revised version of our manuscript. We have felt after the incorporation of the changes you suggested, manuscript quality has been greatly improved. We have highlighted all the changes we made in blue color. 

 

Comments and Response:

Dear Colleges,

if you want to write a paper of the form:

with feature selection A)  the algorithm performs better than B) without feature selection calls for some statistical proof (testing) involving Means and Variances of a NUMBER of trials!

Then a statistical test should be used that the distribution of A) is different from the distribution of B. Your results (see table 4 resp. fig 5) do not contain several trials and thus your paper is non-conclusive...

 

Author response: 

Thank you for the suggestion, sir. In this study, 5-fold cross-validation was used, as recommended. The accuracy shown in the paper is an average accuracy calculated by averaging the five results from five folds. The details of the trials and mean accuracy with variance is shown in the following table:

(Here, LR= Logistic Regression, DT=Decision Tree, RF= Random Forest, GB=Gradient Boosting, FPA=Flower Pollination Algorithm, PSO=Particle Swarm Algorithm, GA=Genetic Algorithm.)








Model

Fold1

Fold2

Fold3

Fold 4

Fold 5

Average Accuracy

Variance

LR

0.70877193

0.66315789

0.69366197

0.72183099

0.70774648

69.9%

0.001

DT

0.76140351

0.83508772

0.84859155

0.86267606

0.81690141

82.5%

0.039

RF

0.8245614

0.90877193

0.9084507

0.95070423

0.95070423

90.9%

0.051

GB

0.81403509

0.94035088

0.90140845

0.95774648

0.95422535

91.0%

0.004

LR_FPA

0.70877193

0.69473684

0.69122807

0.74736842

0.73591549

71.6%

0.025

DT_FPA

0.68421053

0.89473684

0.87368421

0.87368421

0.91197183

84.8%

0.093

RF_FPA

0.79298246

0.96140351

0.94385965

0.97192982

0.97183099

92.8%

0.077

GB_FPA

0.73684211

0.94736842

0.93684211

0.97894737

0.95422535

91.1%

0.010

LR_PSO

0.63157895

0.60350877

0.58245614

0.49473684

0.5528169

57.3%

0.052

DT_PSO

0.78596491

0.83859649

0.81754386

0.83508772

0.81338028

81.8%

0.021

RF_PSO

0.80350877

0.9754386

0.9122807

0.91929825

0.91901408

90.6%

0.063

GB_PSO

0.76842105

0.84912281

0.84561404

0.87017544

0.86267606

83.9%

0.002

LR_GA

0.70526316

0.65964912

0.71929825

0.70175439

0.66197183

69.0%

0.027

DT_GA

0.75438596

0.73333333

0.73684211

0.77192982

0.75704225

75.1%

0.016

RF_GA

0.83508772

0.95087719

0.94736842

0.94736842

0.94014085

92.4%

0.050

GB_GA

0.79649123

0.82807018

0.81754386

0.80701754

0.81338028

81.3%

0.000



To solve the problem of selecting the best model, statistical significance tests are used. The Friedman test was used in this study to compare machine learning classifiers. The Friedman test results are shown in the following table. These results indicate that the p-value is 0.0112. The null hypothesis is rejected because the p-value for the classifiers' accuracy data is less than the significance level of 0.05, indicating that the classifiers are not the same.



Friedman Test Result

Degree of Freedom

      Chi-Square

      P-Value

                3

        11.1

        0.0112



Taking the sum of rank for the models with and without the feature selection algorithm, the following table is found which also shows that the FPA feature selection algorithm performs better than the others with the classifiers utilized in this study:

 

 

Rank

      Feature Selection 

     With the classifiers

Sum of Rank

1

FPA

16

2

No Feature Selection

11

3

GA

7

4

PSO

6

 

Reviewer 3 Report

In spite of the emphasized declares of the authors on the explainable characteristics of their study, the manuscript contains extensive efforts on the nature-based feature selection approaches for feature selection. I think the final part which is devoted to the SHAP can be interesting for potential readers. 

The manuscript is well-written. It almost suffers from a lack of novelty in methodology. Although we consider the nature-based algorithms e.g. GA, PSO, ... in the conventional evolutionary optimization category, their application in mortality prediction can be an acceptable aspect for this manuscript. 

 

Author Response

Dear Reviewer,

Thank you for your careful reading and positive response to our manuscript. We have added a point-to-point response to your comment in the revised version of our manuscript. We have felt after the incorporation of the changes you suggested, manuscript quality has been greatly improved. We have highlighted all the changes we made in blue color. 

 

Comments and Response:

In spite of the emphasized declares of the authors on the explainable characteristics of their study, the manuscript contains extensive efforts on the nature-based feature selection approaches for feature selection. I think the final part which is devoted to the SHAP can be interesting for potential readers. 

The manuscript is well-written. It almost suffers from a lack of novelty in methodology. Although we consider the nature-based algorithms e.g. GA, PSO, ... in the conventional evolutionary optimization category, their application in mortality prediction can be an acceptable aspect for this manuscript. 

 

Author response: 

We are thankful for your inspiring words. According to this suggestion, the section written on SHAP is added with more information to make it more interesting to potential readers. 

 

For the question arrives due to the lack of novelty in the methodology, we would like to mention the research gap which was tried to fill up in this study-

Research Gaps:

Background research in the field of ICU mortality prediction demonstrates that there is still room for improvement and development by carrying out comparative studies with feature selection using nature-based algorithms vs no feature selection. Likewise, using Explainable AI to determine which features predominate features for the prediction process for patients with heart failure can be an extension to make the models more transparent and trustworthy. We conducted our research using these points as research gaps.

 

And the contribution of this study is the following:

  • This work broadens the field of mortality prediction in the ICU for heart failure patients by examining the impact of several nature-based feature selection methods on the prediction.
  • An assessment of the role of features in the prediction process was analyzed by SHAP in this study. Thus, it provides insight into the determinant feature which decided mortality in ICU.

We hope, that these contributions will make this field more enriched and developed than before.

Reviewer 4 Report

Dear Authors,

Thanks for your effort in the related study.

The study focuses on mortality prediction with different ML models and it also has feature engineering operations. Besides all those, the authors also suggest an explainable artificial intelligence method based on the SHAP plot method. The dataset is an open-sourced dataset called MIMIC-III which has high-volume samples.

Please see the other comments listed:

1. Authors are suggested to provide a general problem definition which should be included in the abstract section. Why is the explainable AI needed? Why do they use SHAP? 

2. The introduction section begins with the abbreviation ICU and continues with other abbreviations. Although the explanations of the abbreviations are given in the abstract section, the authors should provide the explanations in the introduction again. Please beginning from the introduction, think that the abbreviations should be reset. The abstract section lies separately. 

3. Please move the "research gaps" to the front. Nowadays, the academy is very fast, readers do have not much time. Readers should see the gaps and problem definition in the introduction. Most of the readers cannot keep reading till section 2.5.

4. Please redesign the contribution list. Item 1 (study the previous works and analyze them) is not a contribution. Item 4 is also not a contribution. It just addresses the comparing process. 

5. Please cancel the italic paragraph at the end of subsection 2.1.

After all, the study is well presented. Although the study does not contain any novel structure, it includes deep analyses of feature engineering and it provides a different perspective for the related MIMIC-III dataset. 

Best regards.

Author Response

Dear Reviewer,

Thank you for your positive and thoughtful response to our manuscript. We have added a point-to-point response to your comment in the revised version of our manuscript. We have felt after the incorporation of the changes you suggested, manuscript quality has been greatly improved. We have highlighted all the changes we made in blue color. 

 

Comments and Response:

 

Dear Authors,

Thanks for your effort in the related study. The study focuses on mortality prediction with different ML models and it also has feature engineering operations. Besides all those, the authors also suggest an explainable artificial intelligence method based on the SHAP plot method. The dataset is an open-sourced dataset called MIMIC-III which has high-volume samples.

Please see the other comments listed:

  1. Authors are suggested to provide a general problem definition which should be included in the abstract section. Why is the explainable AI needed? Why do they use SHAP? 

Author response:

Thank you for this suggestion. According to the suggestion, three issues are added in the abstract section like the following:

Problem Definition: The entire world witnessed a severe ICU patient capacity crisis a few years ago during COVID-19. Widely utilized different Machine Learning (ML) models in this research field can provide poor performance for the lack of proper feature selection. Despite the fact that nature-based algorithms in other sectors perform well for feature selection, no comparative study on the performance of nature-based algorithms in feature selection has been found in the ICU mortality prediction field.

 

Why Explainable AI is needed: Explainable AI focuses on establishing transparency and traceability for statistical black-box machine learning techniques. Explainable AI is essential in the medical industry to foster public confidence and trust in machine learning model predictions.

Why do we use SHAP?: Because SHAP offers mathematical assurances for the precision and consistency of explanations, it is used in this study. It is trustworthy and suitable for both local and global explanations.

 

  1. The introduction section begins with the abbreviation ICU and continues with other abbreviations. Although the explanations of the abbreviations are given in the abstract section, the authors should provide the explanations in the introduction again. Please beginning from the introduction, think that the abbreviations should be reset. The abstract section lies separately. 

Author response:

We are sorry for this mistake. According to this suggestion, explanations for the abbreviations are given again in the introduction section of the revised manuscript.

  1. Please move the "research gaps" to the front. Nowadays, the academy is very fast, readers do have not much time. Readers should see the gaps and problem definition in the introduction. Most of the readers cannot keep reading till section 2.5.

Author response:

Thank you for this suggestion. The section written on “research gaps” is now included in the introduction section. Section 2.5 is deleted from the manuscript.

  1. Please redesign the contribution list. Item 1 (study the previous works and analyze them) is not a contribution. Item 4 is also not a contribution. It just addresses the comparing process. 

Author response:

We regret that our contributions were not correctly and clearly written. According to this suggestion, contributions were rewritten in the manuscript like below-

  • This work broadens the field of mortality prediction in the ICU for heart failure patients by examining the impact of several nature-based feature selection methods on the prediction.
  •  An assessment of the role of features in the prediction process was analyzed by SHAP in this study. Thus, it provides insight into the determinant feature which decided mortality in ICU.

 

  1. Please cancel the italic paragraph at the end of subsection 2.1.

Author response: 

Thank you for this suggestion. According to this comment, the italic paragraph at the end of subsection 2.1 is canceled.

After all, the study is well presented. Although the study does not contain any novel structure, it includes deep analyses of feature engineering and it provides a different perspective for the related MIMIC-III dataset.

Author response: Thank you so much for your positive words.

Round 2

Reviewer 1 Report

My comments have been addressed.

Reviewer 3 Report

The authors made a sufficient effort for resolving the criticisms raised by the reviewers. I have no further comments.

Back to TopTop