Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

gbt-HIPS: Explaining the Classifications of Gradient Boosted Tree Ensembles

Appl. Sci. 2021, 11(6), 2511; https://doi.org/10.3390/app11062511

by Julian Hatwell^*

, Mohamed Medhat Gaber and R. Muhammad Atif Azad

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2021, 11(6), 2511; https://doi.org/10.3390/app11062511

Submission received: 30 January 2021 / Revised: 2 March 2021 / Accepted: 6 March 2021 / Published: 11 March 2021

(This article belongs to the Special Issue Explainable Artificial Intelligence (XAI))

Round 1

Reviewer 1 Report

The manuscript titled "gbt-HIPS: Explaining the Classifications of Gradient Boosted Tree Ensembles" presents the idea of extracting a classification rule to explain Gradient Boosted Tree models. The text is interesting, the method has been walidated on several benchmark datasets.

The most important issue is the length of the article (37 pages). It is too long and many pages can be deleted without losing readability.
In my opinion, this text could be reduced to 10-12 pages. The article after a short introduction could start from point 5, where the new method is introduced.

Section 1 and Section 2, pages 2-7, cover issues related to explaining the machine learning model. This may be a majority part of separate review article.
This article should be focused of proposed new method.

Section 3 provides well-known metrics definitions (eg, precision, page 7-8) with minor modifications, such as "Lower Reliability" (page 9). It is nothing new. It could be removed.

Section 4 depicts Gradient Boosted Tree models, this algorithm is well known, pages 9-11 could be removed.

Section 6 (numerical experiments) provides a long introduction in which well-known validation methods (e.g., K-fold cross-validation, leave-one-out validation)
are described in detail. Page 17 can be reduced to a few lines.
The results are presented on multiple pages (pages 17-35), the same results are presented multiple times.
In my opinion, the most important results should be included in the manuscript, the rest should be available as support material.

Please remove repetitions:

1. The same results are presented two or three times (images, tabular data, sometimes box plots). Please move some of the results to support material.
2. The same ideas are presented many times. For example, problems with using only precision and coverage are mentioned in the introduction (few times), description of the method (section 5.4), description of the experiment (section 6.1 and again section 6.5).

I would like to support the idea using project webpage in the manuscript. This webpage should include source codes (it is already done) and full set of results. The manuscript should include only the most important results.

Minor remarks:
- line 410, the word "and" is redundant
- when presenting tables, please highlight the best cells, e.g. in bold (Table 5, 6, 9, 10, 14, 15, 16).

Author Response

Thanks to the reviewer for your care and attention to detail. The instructions given to edit and improve the focus of the paper have proved very useful. We hope that the newly submitted version meets your expectations.

Here are our responses, point by point.

In my opinion, this text could be reduced to 10-12 pages.

We have dutifully made all the requested changes and shaved out some further, unnecessary notation. Such as, the equations for the Friedman tests for significance, which can be found in the originating works. However, our paper is still 20 pages in length. Please note that at least 5 of these are full pages of diagrams. We have reduced the size of these images throughout but some of the placement is an artefact of the Latex template, which we cannot alter. We would expect the typesetting team to be able to further reduce the length by at least a couple of pages. Such cosmetic changes alone should produce a paper around 13-14 pages in length, which is very close to your target.

The article after a short introduction could start from point 5, where the new method is introduced.

We have re-written the introduction: The context on XAI is just one paragraph and the related work simply introduces the competing methods used in the experimental study. Sections 2-4 have been deleted. The new text is highlighted in blue.

Section 1 and Section 2, pages 2-7, cover issues related to explaining the machine learning model. This may be a majority part of separate review article.
This article should be focused of proposed new method.

Deleted.

Section 3 provides well-known metrics definitions (eg, precision, page 7-8) with minor modifications, such as "Lower Reliability" (page 9). It is nothing new. It could be removed.

Deleted.

Section 4 depicts Gradient Boosted Tree models, this algorithm is well known, pages 9-11 could be removed.

Deleted.

Section 6 (numerical experiments) provides a long introduction in which well-known validation methods (e.g., K-fold cross-validation, leave-one-out validation)
are described in detail. Page 17 can be reduced to a few lines.

This part has been re-written in a highly condensed form, as requested. This edited text is highlighted in purple/red.

We needed to reintroduce 2-3 key ideas from the deleted section 2, to justify our method design and experimental design choices. These paragraphs are significantly trimmed compared to the original paragraphs that made up the old section 2. The new this text is highlighted in blue.

The results are presented on multiple pages (pages 17-35), the same results are presented multiple times.
In my opinion, the most important results should be included in the manuscript, the rest should be available as support material.

Most of the tabulated results have been moved to a supplementary PDF file, which we have added to our github repository.

Please remove repetitions:

The same results are presented two or three times (images, tabular data, sometimes box plots). Please move some of the results to support material.

As per previous point.

The same ideas are presented many times. For example, problems with using only precision and coverage are mentioned in the introduction (few times), description of the method (section 5.4), description of the experiment (section 6.1 and again section 6.5).

We believe that extensive editing has now consolidated the ideas into two new paragraphs at just one location, subsection "Quantitative Study," of the section "Materials and Methods." This text has been highlighted in blue. There are no more accompanying formulas or equations. This text is essential to the readers' understanding of our motivation for and use of non-standard metrics.

See previous points. The majority of tabulated results are now in a separate file.

- line 410, the word "and" is redundant

This was removed as part of a deleted section.

- when presenting tables, please highlight the best cells, e.g. in bold (Table 5, 6, 9, 10, 14, 15, 16).

This is done now but affects tables that are now in the supplementary file.

Reviewer 2 Report

Summary:

This paper describes a new GBT model that applying pruning and making rankings of important path snippets. The authors show the model and the algorithm of the proposed model. They also evaluates the model using binary and multi-class problems. Finally they conclude the paper that the proposed model results better fitting to the classification problems.

This paper is well written and the presentation is straight. It would be better to check the grammatical errors of English.

Author Response

Reviewer 2 has not asked for any specific revisions. We have run a grammar check and made minor corrections accordingly. If the reviewer finds any corrections that we've missed, we kindly ask that you inform us of the line numbers on the new version of the paper.

Please note that reviewer 1 has asked for large amounts of the text to be removed. We trust that, in following their instructions, these revisions do not alter the assessment of Reviewer 2, who appears to be satisfied with the original submission.

Round 2

Reviewer 1 Report

Some of my comments were properly addressed, thank you.

In my opinion, the broken-line graph used to show coverage (or other results, Fig. 3, Fig. 4, Fig. 6 and others) assumes continuity between the values on the X axis. In other words, we have a point (a dataset) between "adult" and "bank". The other problem with this diagram is the order of variables on the X axis, in such cases it should be not alphabetical but based on the similarity. I propose to resign with such diagrams due to the mentioned problem as well as they repeat the same information presented by box-plots.

please check and correct the entire manuscript again, after reducing the text some sentences do not looks good.

Author Response

Thanks again to the reviewer for care and attention to detail. Please note the following point-by-point responses to the most recent comments.

In my opinion, the broken-line graph used to show coverage (or other results, Fig. 3, Fig. 4, Fig. 6 and others) assumes continuity between the values on the X axis. In other words, we have a point (a dataset) between "adult" and "bank".

I propose to resign with such diagrams due to the mentioned problem as well as they repeat the same information presented by box-plots.

We accept this criticism of the dot charts with joining lines. We also note that you commented on the repetition of information not only here but in the previous review round. Therefore, we have taken the decision to remove all the dot and line charts and rely, for the most part, on the box plots.

The two floor statistics represent a single value rather than a distribution, so box plots were not appropriate, but only in these two cases. For one of these, we retained a dot chart but without the joining line. For the other, we replaced the chart with a table. In the latter case, there wasn't any chart type that looked reasonable because the high prevalence of perfect and near-perfect scores results in too much over-plotting.

This change also helps with controlling the article length, which was a further concern of yours last time.

The other problem with this diagram is the order of variables on the X axis, in such cases it should be not alphabetical but based on the similarity.

We have considered this request at some length and found several ways that the data sets might be ordered based on similarity. However, in the end, we felt that there was a real risk of readers over-interpreting our ordering and finding some incorrect meaning or reading of the results. We decided to stick with alphabetical order, simply because it is completely arbitrary and neutral. Alphabetical ordering avoids us having to add a further paragraph of explanation about the layout of the results, which would distract from the main points. In any case, the decision to remove most of the dot charts means that this aesthetic choice only affects one remaining chart.

The good thing about the faceted layout of the box plots is that the alphabetical sequence is not immediately readable and does not dominate the interpretation of the results. Hence, another good reason to reduce the presentation of the results to box plots only.

please check and correct the entire manuscript again, after reducing the text some sentences do not looks good.

We have done this as requested and made a significant number of small corrections. These are not highlighted in the text, only because they are fairly trivial.

We trust that the latest version meets your expectations. We certainly feel that this much more focused version of the paper is a real improvement.

Reviewer 2 Report

The overall organization of the paper has been revised and shortened well. And the main points of the paper has been condensed.
The presentation of the paper has become better than the last version. But it would be better to marge 3.1-3.4 sections because those are short and discrete.

Author Response

Thanks again to the reviewer for their due care and attention. We note in your latest comments that there is one point for us to review, as follows:

But it would be better to marge 3.1-3.4 sections because those are short and discrete.

We merged the very short Hardware Setup section (3.4) with the opening section (3.1). We also merged the Competing Methods (3.2) and Data Sets (3.3) into a single section. On reflection, going any further with this process would result in a single monolithic section that covered too wide a range of sub-topics. We feel that the new structure is the best parallel with all the other main sections, in terms of the size and scope of the subsections.

Article Menu

gbt-HIPS: Explaining the Classifications of Gradient Boosted Tree Ensembles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI