Next Article in Journal
Evaluation of the Seismic Retrofitting of Mainshock-Damaged Reinforced Concrete Frame Structure Using Steel Braces with Soft Steel Dampers
Previous Article in Journal
Additive Manufacturing in Industry
 
 
Review
Peer-Review Record

Meta-Analysis and Systematic Review of the Application of Machine Learning Classifiers in Biomedical Applications of Infrared Thermography

Appl. Sci. 2021, 11(2), 842; https://doi.org/10.3390/app11020842
by Carolina Magalhaes 1,2, Joaquim Mendes 1,2 and Ricardo Vardasca 1,2,3,4,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2021, 11(2), 842; https://doi.org/10.3390/app11020842
Submission received: 15 December 2020 / Revised: 13 January 2021 / Accepted: 14 January 2021 / Published: 18 January 2021
(This article belongs to the Section Applied Biosciences and Bioengineering)

Round 1

Reviewer 1 Report

This article is a journal focused on machine learning applied to medical applications of infrared thermography.

The objective of the article is fully satisfied.

However, I also believe that the description of the method, the structure of the manuscript, as well as the experimental analysis can be improved. The authors used PRISMA for their review and their sources of information are well described.

The raw results are analyzed quantitatively and qualitatively. The qualitative categorization is in 4 items, the last being a mixture of several subjects. Forest plots 2-4 are interesting but questionable; they correspond to a statistical series of results obtained from all unrelated studies (same databases? same tools? same biases?) I believe that these results are hardly comparable, even with specificity / precision ratios.
This is confirmed on line 229: "test of heterogeneity (...), suggesting a strong heterogeneity between the studies" and on line 277 "From the visual verification of the metastatistics results, it is possible to conclude thatthere is some heterogeneity between studies". whichis conradicted some lines later Eg line 287: "The low χ2 value attained for diabetic foot studies, with a high p-value, reveals that heterogeneity is insignificant." !!!???
This is a very confusing message as a conclusion. ANd it is necessary you clarify the take-home message.

More importantly, it is not explained how the forest plots are constructed: "R was used with the 'meta' package to perform univariate analysis and recover the sensitivity (SN), specificity (SP) and log of. forest diagnostic odds report (DOR) plots ". This is weird because I go through a number of articles: there are classification results but not only. sometimes m-ary classification, sometimes classification binary, so it's not clear to me how the authors merge heterogeneous results into a so-called meta-analysis. I'm sure they identify it, but the merging process is hazy.


The authors cite themselves 4 times, which seems inappropriate. Worse yet, omitting these references is a mystake:
B.B. Lahiri, S. Bagavathiappan, T. Jayakumar, John Philip,
Medical applications of infrared thermography: a review,
Infrared Physics & Technology, Volume 55, Number 4, 2012
and
Jasti, Nishitha & Bista, Suman & Bhargav, Hemant & Sinha, Sanjeev & Gupta, Sushil & Chaturvedi, Santosh & Gangadhar, Bn. (2019). Medical applications of infrared thermography: a narrative review.

Their review cover a richer field of applications:(from detection of cancer to sexual psychophysiology). So it is not clear for me why PRISMA limits so much your selection...


Misspell

- page 6/21: "to trace the summary operating characteristic of the receiver"
- what do you mean by "learners" on line 262 in "based on these learners". Do you mean models ?

In the discussion section, the conclusion "a clear preference for the use of ANN (19 studies), SVM (17 studies) and k-NN" is probably biased by the selection of articles (majoury from 2014 and before).

The authors cite themselves 4 times, which seems inappropriate.

Author Response

We would like to thank the reviewer for his/her interest and time in assessing our manuscript. It was of great appreciation to receive the thoughtful comments and suggestion to improve the manuscript made by the reviewer. Based on them we revised our manuscript in order to improve its content and quality.

Below we encompass the responses to the raised issues.

If there are any further changes or clarifications needed, please let us know. We will be happy to address them.

The raw results are analyzed quantitatively and qualitatively. The qualitative categorization is in 4 items, the last being a mixture of several subjects. Forest plots 2-4 are interesting but questionable; they correspond to a statistical series of results obtained from all unrelated studies (same databases? same tools? same biases?) I believe that these results are hardly comparable, even with specificity / precision ratios.

This is confirmed on line 229: "test of heterogeneity (...), suggesting a strong heterogeneity between the studies" and on line 277 "From the visual verification of the metastatistics results, it is possible to conclude that there is some heterogeneity between studies". Which is contradicted some lines later Eg line 287: "The low χ2 value attained for diabetic foot studies, with a high p-value, reveals that heterogeneity is insignificant." !!!???

This is a very confusing message as a conclusion. And it is necessary you clarify the take-home message.

Answer:

The comment made by the reviewer is valid and we understand the confusion. It is true that the records included in forest plots 2-4 are a mix of several studies. However, all studies refer to the use of machine learning classifiers for biomedical applications with infrared thermal imaging, which is the same assessment method for all. Thus, we believed it would be of interest to plot all together to provide a better visual assessment of the distribution of the metrics of SN, SP and diagnostic odd ratios among studies. To clarify this, we have added this information in line 228: “The plots were constructed with the information retrieved from all studies included in the quantitative analyses process, independently of the studied pathology. Thus, a better visual assessment of the distribution of the different metrics among studies is possible.”

In the sentences of lines 229 and 277 we are referring to the comparison of all records included in the quantitative analyses. Hence, the explanation given in lines 282-283: “This variability may partially be caused by the comparison of studies that implement different learners for the classification of distinct pathologies”. When we refer to the low heterogeneity at line 287 we are referring only to diabetic foot applications, as it is mentioned: "The low χ2 value attained for diabetic foot studies, with a high p-value, reveals that heterogeneity is insignificant." We have added the reference to appendix B, were the information for χ2 and p-value is included for all studies as a group and independently for breast cancer and diabetic foot papers, in the hopes of helping the understanding of this conclusion.

More importantly, it is not explained how the forest plots are constructed: "R was used with the 'meta' package to perform univariate analysis and recover the sensitivity (SN), specificity (SP) and log of. forest diagnostic odds report (DOR) plots ". This is weird because I go through a number of articles: there are classification results but not only. sometimes m-ary classification, sometimes classification binary, so it's not clear to me how the authors merge heterogeneous results into a so-called meta-analysis. I'm sure they identify it, but the merging process is hazy.

The authors cite themselves 4 times, which seems inappropriate.

Answer:

We understand the concern of the reviewer related to the number of auto-citations and agree that it may seem inappropriate. However, the citations 54 and 55 refer to research focused on diabetic foot disease, while 58 and 59 focus on skin cancer related works, which are different, ones in each application are respective to static IRT and others to dynamic assessment. All include the implementation of machine learning classifiers to data collected from IRT images. Hence, we consider that they supply meaningful information to the present review and should be kept on the submitted manuscript.

 

Worse yet, omitting these references is a mystake: B.B. Lahiri, S. Bagavathiappan, T. Jayakumar, John Philip, Medical applications of infrared thermography: a review, Infrared Physics & Technology, Volume 55, Number 4, 2012 and Jasti, Nishitha & Bista, Suman & Bhargav, Hemant & Sinha, Sanjeev & Gupta, Sushil & Chaturvedi, Santosh & Gangadhar, Bn. (2019). Medical applications of infrared thermography: a narrative review.

Their review cover a richer field of applications:(from detection of cancer to sexual psychophysiology). So it is not clear for me why PRISMA limits so much your selection...

Answer:

We agree with the reviewer that the mentioned reviews contain very meaningful information regarding the application of medical infrared imaging. However, those do not cover articles focused on the implementation of this imaging technique with machine learning methods for diagnosis purposes, being this the main focus of the presented review. Furthermore, we chose to include an exclusion criterion, after our bibliographic search, that removes review articles, i.e. criterion fourth, mentioned in line 90: “Reviews (…) were eliminated, (…), making the fourth (…) criterion.”. This decision was based on preference for including only primary sources to avoid an excessively extensive data extraction process, caused by the presence of repetitive/misleading information for the present review.

Misspell

- page 6/21: "to trace the summary operating characteristic of the receiver"

Answer:

At the indicated page our text is: “to plot the summary receiver operating characteristic (SROC) curve…” which seem more adequate than the suggested by the reviewer, although we appreciate his/her suggestion.

 

- what do you mean by “learners” on line 262 in “based on these learners”. Do you mean models ?

Answer:

This question is very pertinent and well received. We use the term learner to refer to the machine learning algorithm that is used to learn from the training data and originates the model that is going to make the prediction. Considering that it is the primary algorithm used by researchers when trying to make predictions, we though that the term learners would be more suited for that particular sentence.

In the discussion section, the conclusion "a clear preference for the use of ANN (19 studies), SVM (17 studies) and k-NN" is probably biased by the selection of articles (majoury from 2014 and before).

Answer:

Thank you for pointing that out, it contributed for us to understand that additional information should  be included in the manuscript to ease comprehension of the conclusion presented in line 258.

We scoped through Appendix A to reach the conclusion presented. As it is possible to see, several studies using SVM models are dated after 2014 (only five in that year and prior) and more than ten studies using ANN occurred after 2014. In the case of the kNN method, every assessed study took place after 2014. Thus, was noted that all studies included in this review were considered, the conclusion in question was not biased.

We have added a reference for Appendix A in line 264 to facilitate its interpretation.

The authors cite themselves 4 times, which seems inappropriate.

Answer:

That remark was already addressed in a previous response.

 

Reviewer 2 Report

Thanks to the authors for their nice work.

This manuscript reports extensive work done by the authors to search, classify and go deeply through a specific field of application in IR Thermography. It is a good reference for those who want to work in the area of application of AI in IRT for future improvements.

However, I would like to suggest a more extended discussion on which AI method is preferred regarding ACC, SN and SP for a specific application and how this can be improved in the view of the authors for future works (or probably suggesting any other approach). I admit the current manuscript might be only a review paper, but I would suggest adding some suggestions and a stronger conclusion part.

Finally, I would like to remind the use of acronyms after the first definition and avoid using the full name after that (e.g. Machine Learning) throughout the manuscript.

Good luck

Author Response

We would like to thank the reviewer for his/her interest and time in assessing our manuscript. It was of great appreciation to receive the thoughtful comments and suggestion to improve the manuscript made by the reviewer. Based on them we revised our manuscript in order to improve its content and quality.

Below we encompass the responses to the raised issues.

If there are any further changes or clarifications needed, please let us know. We will be happy to address them.

However, I would like to suggest a more extended discussion on which AI method is preferred regarding ACC, SN and SP for a specific application and how this can be improved in the view of the authors for future works (or probably suggesting any other approach). I admit the current manuscript might be only a review paper, but I would suggest adding some suggestions and a stronger conclusion part.

Answer:

We thank the reviewer insights and agree with the suggestions made.

We addressed the first comment through some changes that hope to be satisfactory, namely in lines 275-277: “When looking for breast cancer results, maximum performance was achieved by [39], while [53] showed the best approach to aid diabetic foot diagnosis. Still, additional work can be done to better the current methodologies.” and lines 286-293: “Apart from the mentioned suggestions, the implementation of parameter optimization during the construction of the learner may yield better classification results, as well as the use of strategies to deal with potential class imbalance problems. In addition, the construction of user interfaces/dashboards designed to be utilized by health-care professionals would simplify the introduction of ML aiding tools in day-to-day clinical activities.”

Finally, I would like to remind the use of acronyms after the first definition and avoid using the full name after that (e.g. Machine Learning) throughout the manuscript.

Answer:

It was very clever from the reviewer to suggest this and we have addressed this in the entire manuscript, using acronyms after the first definition where applicable.

Round 2

Reviewer 1 Report

Dear Authors,

thank you for your revision; I won't recommend a new one, since your responses sound correct.

I still don't agree with your self-citation, but I left the editor decide if is it fair or not.

You omit to answer on the question about the plot, but you probably do  not have understand my point. Which is you compare classification results together i.e. binary classification results with multiclass problems, which is weird.

I anyway recommend the publication of your review.

 

 

 

 

Author Response

We would like to thank the reviewer for his/her interest and time in assessing our manuscript a second time. It was of great appreciation to receive the thoughtful comments and suggestion to improve the manuscript made by the reviewer. Based on them we revised our manuscript in order to improve its content and quality.

Below we encompass the responses to the raised issues.

If there are any further changes or clarifications needed, please let us know. We will be happy to address them.

I still don't agree with your self-citation, but I left the editor decide if is it fair or not.

Is not a question of being fair, we cite 4 relevant works that fall in the critical review criteria, leaving them out would be biasing the performed research.

You omit to answer on the question about the plot, but you probably do  not have understand my point. Which is you compare classification results together i.e. binary classification results with multiclass problems, which is weird.

We believe that the review did not understand that what has been the object of this critical research is the use of the infrared thermal imaging along with the machine learning algorithms and not their individual biomedical applications, since the imaging method was the same for all included studies there is no multiclass, only binary classification of the imaging method. For single applications we did the charts that are in appendix.

I anyway recommend the publication of your review.

Thank you, we are grateful.

Back to TopTop