Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Open AccessArticle

Peer-Review Record

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

Int. J. Mol. Sci. 2020, 21(3), 713; https://doi.org/10.3390/ijms21030713

by Victor Tkachev¹, Maxim Sorokin^1,2, Constantin Borisov³, Andrew Garazha¹, Anton Buzdin^1,2,4,5 and Nicolas Borisov^1,2,4,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Int. J. Mol. Sci. 2020, 21(3), 713; https://doi.org/10.3390/ijms21030713

Submission received: 23 December 2019 / Revised: 16 January 2020 / Accepted: 17 January 2020 / Published: 22 January 2020

(This article belongs to the Special Issue Medical Genetics, Genomics and Bioinformatics)

Round 1

Reviewer 1 Report

ID IJMS-690184

"Flexible data trimming improves performance of global machine learning methods in omics-based personalized oncology" to be published in International Journal of Molecular Sciences.

General remarks

The authors presented a study in which they applied a hybrid global-local approach to machine learning termed FLOating Window Projective Separator to seven popular machine learning methods including linear SVM, k nearest neighbors, random forest, Tikhonov (ridge) regression, binomial naïve Bayes, adaptive boosting and multi-layer perceptron. They performed computational experiments for 21 high throughput gene expression datasets (41-235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FLOating Window Projective Separator essentially improved the classifier quality for all global machine learning methods (SVM, random forest, binomial naïve Bayes, adaptive boosting and multi-layer perceptron), where the area under the receiver-operator curve for the treatment response classifiers increased from 0.61-0.88 to 0.70-0.94. They also tested FLOating Window Projective Separator-empowered methods for overtraining by interrogating importance of different features for different machine learning methods in the same model datasets. Authors concluded that FLOating Window Projective Separator increases the correlation of feature importance between the different machine learning methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FLOating Window Projective Separator data trimming was observed for the binomial naïve Bayes method, which can be valuable for further building of machine learning classifiers in personalized oncology.

The topic of the article is very interesting and it could serve as an important basis for further research.

There are only some minor corrections needed before the article is suitable for publication, marked in the attached manuscript.

Comments for author File: Comments.pdf

Author Response

///There are only some minor corrections needed before the article is suitable for publication, marked in the attached manuscript///

We thank the Reviewer for his/her valuable remarks that helped us to improve the manuscript.

Here are our responses, with indication of line numbers in the old version of the manuscript.

Lines 278-280: irrelevant text deleted.

Line 301: extra space deleted.

Line 365: typing error corrected.

Table 6: typing errors corrected.

Line 456: volume and pages added.

Line 465: pages added.

Line 471: pages added.

Line 473: this is a book chapter rather than a journal article, and it doesn't have volume number.

Line 481: this is a Tinbergen Institute publication without volume and page numbers.

Line 484: upper case removed.

Line 489: pages added.

Line 498: spaced added.

Line 500: pages added.

Line 504: pages added.

Line 517: pages added.

Line 523: pages added.

Line 630: journal name corrected.

Line 633: journal name corrected.

Reviewer 2 Report

This paper applies a method published by the same group for sample-specific feature selection. This work is of high significance in the field of Machine learning when data availability is a concern. They used multiple datasets and showed the performance of their method, however, the way the results are presented is not well organized and need improvement to tell the message of the article clearly.

Here are my major points to improve the manuscript:

1. Result section is a little bit confusing, I understood that the aim of this article is to apply and test the performance of flexible data trimming preprocessing method (FloWPS) using multiple machine learning methods rather than introducing the concept and the method. If that is the case, then the section 2.1 should not be part of results. Part of the information mentioned can be moved to methods (the algorithm part especially Figure 2), other parts can be moved to introduction or discussion.
2. Results section should not include explanation of principles like balance or cross validation, if the authors need to discuss these concept to defend their results, they can do that in the discussion section. If their discussion is required to explain the methodology of doing the experiment, that can be done in the methods section. Mixing the results with methods and discussion made the paper difficult to be understood.
3. Did the authors calculated ROC AUC for each dataset separately, if so, Can they provide statistical significance test to support the notion that using FloWPS is better than not using it? This should be applied to table 1, table 2, Figure 3, table 3, and table 4. More details of how comparison was done and tests are required in both methods and results sections.
4. In methods section, the authors mentioned R package, while it is only a wrapper over python script, please explain clearly that python code is also usable and you provided R code for convenience.

Author Response

The authors thank the Reviewer, whose valuable comments helped us to improve the readability of the manuscript essentially.

///Result section is a little bit confusing, I understood that the aim of this article is to apply and test the performance of flexible data trimming preprocessing method (FloWPS) using multiple machine learning methods rather than introducing the concept and the method. If that is the case, then the section 2.1 should not be part of results. Part of the information mentioned can be moved to methods (the algorithm part especially Figure 2), other parts can be moved to introduction or discussion.///

Response: We agree. Exactly as it was suggested, we have moved the description of data trimming method to Materials and Methods section.

///2. Results section should not include explanation of principles like balance or cross validation, if the authors need to discuss these concept to defend their results, they can do that in the discussion section. If their discussion is required to explain the methodology of doing the experiment, that can be done in the methods section. Mixing the results with methods and discussion made the paper difficult to be understood.///

Response: We agree again. The explanation of balance factor between false positive and false negative error has been moved in Materials and Methods Section, too.

///3. Did the authors calculated ROC AUC for each dataset separately, if so, Can they provide statistical significance test to support the notion that using FloWPS is better than not using it? This should be applied to table 1, table 2, Figure 3, table 3, and table 4. More details of how comparison was done and tests are required in both methods and results sections.///

Yes, the ROC AUC values were calculated for each dataset separately. That is why we have performed the paired t-test for each ML method and each dataset, to compare AUC results with and without FloWPS. The low p-values for global ML methods and high p-values for local ML methods confirm the FLoWPS usefulness for global ML methods and its uselessness for local ones. The corresponding p-values are added to Tables 1-3, adn an extra supplementary table, S4_1, is added.

///4. In methods section, the authors mentioned R package, while it is only a wrapper over python script, please explain clearly that python code is also usable and you provided R code for convenience.///

We thank the Reviewer for such a suggestion. This explanation was added to Metarilas and methods section.

Article Menu

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

Further Information

Guidelines

MDPI Initiatives

Follow MDPI