Next Article in Journal
Property Checking with Interpretable Error Characterization for Recurrent Neural Networks
Previous Article in Journal
Acknowledgment to Reviewers of MAKE in 2020
 
 
Article
Peer-Review Record

Explainable AI Framework for Multivariate Hydrochemical Time Series

Mach. Learn. Knowl. Extr. 2021, 3(1), 170-204; https://doi.org/10.3390/make3010009
by Michael C. Thrun 1,2,*, Alfred Ultsch 1 and Lutz Breuer 3,4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Mach. Learn. Knowl. Extr. 2021, 3(1), 170-204; https://doi.org/10.3390/make3010009
Submission received: 30 December 2020 / Revised: 26 January 2021 / Accepted: 27 January 2021 / Published: 4 February 2021

Round 1

Reviewer 1 Report

The paper was improved since the previous submission, and I think it could be published in present form.

Author Response

We thank the reviewer for reading our revised work again. We improved the description of the research design and the description of the methods.

Reviewer 2 Report

The manuscript presents a computational approach to predict the possible water quality by giving multivariate hydrochemical time series data.  The paper is interesting and it describes the methodology adequately.  Overall, it technically sounds well.  However, the quality of the paper should be improved in parts for publication.

 

  1. At first, the overall length of the paper could be shortened to make the article more concise. For example, from row97-118 the authors described I don’t think it is necessary for developing the manuscript.  Many of others are not really relevant to the key problems that this paper wants to address.  A shorter more concise description of the background is necessary, but the long details will simply distract readers.  Some other lengthy parts including those in the section of 2 XAI Framework as well as those at row241-242, row252-263.
  2. I don’t see the validation details of the prediction results, but only saw a single value 96.5% accuracy presented at row417. This has led a major drawback of the study.
  3. I have been confused by the meaning of “…. unable to provide meaningful and relevant explanations from….” mentioned in Abstract at row17, which likely conflicted with the conclusion.
  4. The URL links for obtaining codes mentioned at row775, 781 and 782 are not available.
  5. It will be more informative for readers if authors can provide a particular example to show the difference between the results output from their framework and others in terms of interpretability for domain expert, as mentioned from row41-44.
  6. Some abbreviations mentioned in the manuscript should have full-term provided, such as LIME, SOMs, ESOMs and so on.
  7. I have been confused by the two “multimodal” mentioned at row138 and 139. Both distances are “multimodal”?
  8. A typo: Row158 SI H or I?

Author Response

Please see the attached pdf.

Author Response File: Author Response.pdf

Reviewer 3 Report

The topic of the paper is very interesting but I have a lot of quastions to it esence:

The scientific novelty of the work is very vague. Please provide a clearly structured, logical and meaningful main contribution of this paper

It would be good to extend the Related works section using non-iterative ANN-based explaineble AI (DOI: 10.1007/978-3-319-91008-6_58,  DOI: 10.1007/978-3-030-20521-8_39)

There are a lot of Supplementary Information. Maybe some of it should be in the main text of the paper

A lot of references are outdated. How authors want to argue the importance of this research. Please fix it.

The black line in the Fig 2 is very interesting. It seems that it was drawn by hand. Please explain it or change the figure.

In general, authors should demonstrate that they have developed a truly Explainable AI Framework. Otherwise, it will not be accepted.

Author Response

Please see the attached pdf.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

My previous suggestions and questions have been addressed.

Reviewer 3 Report

Paper can be accepted

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

The paper aims to investigate an explainable AI framework in the context of hydrochemical time series. Although the topic is very interesting and suitable for the journal, the paper needs a deeper restructuring. The paper is quite hard to read, with several definitions jot down together without a main line of argument. Several modeling choices are not properly motivated or evaluated, and the explainability aspect (the main topic of the paper) is restricted to a few rules and graphs. In my opinion, the paper needs a complete restructuration before being considered for publication.

Author Response

Dear Dr.Lesley Miao,
thank you for having handled our manuscript entitled "Explainable AI Framework for Multivariate Hydrochemical Time Series” and giving us the chance to accommodate the reviewer’s comments. However, comments from Reviewer 1 are difficult to address as they are mainly of a very general nature without providing more detailed information on what should be done. We have addressed the comments as detailed out in the following our responses below the reviewers' comments (and in red letters in the supplemented *pdf). It should be noted, that line numbers fit to the Word Review modus “All Markup” but not to the modus “Simple Markup”.

Reviewer 1 comments:    

The paper aims to investigate an explainable AI framework in the context of hydrochemical time series.
Although the topic is very interesting and suitable for the journal, the paper needs a deeper restructuring.The paper is quite hard to read, with several definitions jot down together without a main line of argument.

We thank the reviewer for taking the time to read our work. We improve the main line of arguments in section 2.1.1 in Line 214 as follows “We propose that a user selects a distance metric based on multimodality of the specific data set's distance distribution for which detailed mathematical definitions can be found in SI F. The motivation is that intra-cluster distances should be smaller than inter-cluster distances and the threshold between the two types of distances can be defined by a Bayes boundary which can be defined through a Gaussian mixture.”

We further state in Line 220 ff that
“Several metrics were investigated using the R package 'parallelDist' and the MD-plot function [10] in the R package 'DataVisualizations'. Multimodality was visible most evident in the probability density distribution of the Hellinger point distance measure in the case of the given dataset.” Hence, the probability density distribution of this selected distance is modeled with a Gaussian mixture model…”

and separate the definitions w.r.t. cluster analysis from it’s evaluation by introducing section 2.2.1 into the manuscript (Line 291 ff). We make the manuscript easier to follow by reducing section 2.2.2 to occams razor and we specify the definition in section 2.3.1 (Line 355 ff).

Several modeling choices are not properly motivated or evaluated,
We motivate the preprocessing now in Line 175 ff with “Data was standardized and de-correlated as described in SI J because distance measures are sensitive to correlations and the differences of variances of features“.

We improve section 2.1.1 to make our distance selection process clearer. Distance selection is motivated by multimodality as specified in Line 208 “Usually, partitioning and hierarchical clustering algorithms require a distance metric because they seek to find groups of similar objects [44] (i.e., objects with small distances between them).”

This property is evaluated as follows in Line 225 “Hence, the probability density distribution of the selected distance is modeled with a Gaussian mixture model and verified visually with QQplot as described in [45] with the R package 'AdaptGauss'.”

Decision trees are evaluated by their accuracy to reproduce the clustering this is stated now clearer by moving the sentence “The maxim of quality states that only well-supported facts and no false descriptions should be reported. Quality will be measured by the accuracy of supervised decision trees representing the clustering.“ to section 2.3.1, Line 357.

We state now in Line 475 “CART provides a decision tree presented in Fig. 6b that reproduces the clustering with an accuracy of 96.5 %.”

Further modelling choices are motivated and evaluated in the manuscript as follows:
1. The projection method is motivated in Line 232 “The swarm-based projection method of the Databionic swarm algorithm is used to project the distance matrix of data into a two-dimensional plane [35,37]…. this projection method is parameter-free”. This is further described in Line 249 ff “Projection points near to each other are not necessarily near in the high-dimensional space (vice versa for faraway points), but in planar projections of data, these errors are unavoidable (c.f. Johnson-Lindenstrauss Lemma [54]). Hence, the topographic map identifies data structures based on a projection.”
2. The cluster analysis is motivated by the identification data structures and evaluated. In Line 294 we state: “If a cluster is either divided into separate valleys, or several clusters lie in the same valley of the topographic map, the compact (or connected) clustering approach is not appropriate for the data.” With regard to the heatmaps we explain in Line 300 “A heatmap visualizes the homogeneity of clusters and the heterogeneity of intercluster distances if the clustering is appropriate.” Following, intra versus inter cluster distances are tackled in Line 314 “In this sense, days that a cluster analysis partitions to the same cluster are similar if their intra-cluster distances are lower than the Bayes boundary.”
3. To check for more simple alternatives in model selection we explain in Line 457 “To check for a possibility of a simpler model, a linear projection by the method projection pursuit [81] using a clusterability index of variance and ratio (c.f. [82]) is applied on the dataset. The linear projection does not reveal clear structures, even if the generalized U-matrix is applied to visualize high-dimensional distance structures in the two-dimensional space (Figure SI G, Fig. 11).”
4. The explainability in our approach is motivated as follows in Line 315 (please see next comment).

and the explainability aspect (the main topic of the paper) is restricted to a few rules and graphs.

We are thankful to the reviewer in pointing out that there are only a few rules that provide the explanations in our work. To outline our goal and approach clearer we change Line 41 ff to “For example, Miller et al. 2017 argue that most AI researchers are building XAIs for themselves [8] rather than for the intended domain expert. Hence, resulting rules are often not straightforward to understand or even meaningful for the domain expert. We have therefore focused on deriving simple but comprehensible rules for domain experts that are also understandable for non-experts outside the XAI community.“

To resolve this issue we propose in Line 366 “…the number of explanations should follow the Miller optimum of 4-7 [73,74]. Then, explanations are meaningful to a domain expert (c.f. discussion [8]).“

The explainability in our approach is motivated as follows in Line 315 “Explainability should follow the Gricean maxims of quality, relevance, manner, and quantity [69], for the usage of decision trees in this work. They summarized to meaningful and relevant explanations which should be then interpreted by a domain expert [8].” In the following, the explainability is evaluated in Line 340 “The number of rules measures the property of meaningfulness.” and in Line 346 “The property of relevance is qualitatively evaluated by class mirrored-density plots class (MD plots) [10]. Additionally, statistical testing of class-wise distributions of features can be performed to ensure that the classes defined by rules are tendentially contrastive and, in consequence, relevant.“

In my opinion, the paper needs a complete restructuration before being considered for publication.

We explain our structuring now better by specifying the abstract in Line 18 from “Open-source code in R for the three steps of the XAI framework is provided.” to “The XAI framework can be swiftly applied to new data because open-source code in R for all steps of the XAI framework is provided and the steps are structured application-oriented.”

Further, we revise parts of the manuscript and restructure several sections. As explained above, we change a number of sections and improve the flow of the paper. This includes:
• We move data collection and preprocessing to a supplementary.
• We improve figure 1 with section and subsection numbers to provide another guideline to the structuring of the manuscript.
• We shorten section 2.2 cluster analysis
• We restructure section 2.4.1 and 2.4.2
• Section 2.2.2 Validation of Clustering and Ockham’s Razor is revized to Occams Razor
• We introduce a new section 2.2.1 Verification of a clustering which now states in Line 291f “Clustering is verified by one internal approach and two external methods, The topographic map serves as an internal quality measure of a clustering.” and in Line 300 “Externally, the clustering can be evaluated with heatmaps and the Bayes boundary computed in section 2.1.1 ... The Bayes boundary computed using a Gaussian mixture model and provides a data-driven hypothesis about the similarity of data points (i.e., days in the example of this work). In this sense, days that a cluster analysis partitions to the same cluster are similar if their intra-cluster distances are lower than the Bayes boundary. “

 

We hope that the revision now meets the criteria for publication in MAKE. We are very much looking forward to hearing from you.

Yours sincerely

Michael Thrun, Alfred Ultsch and Lutz Breuer

Reviewer 2 Report

The authors proposed an explainable AI framework for Multivariate Hydrochemical Time Series. There are some major points that need to be addressed as follows:

1. English language should be improved. There are some vague parts and jargon.

2. In my opinion, the content of the manuscript is long and it contains some unnecessary information. The authors should optimize and make it more concise.

3. The authors aimed to propose a new machine learning system, but why did the authors not compare the performance results with some baseline machine learning models?

4. It is necessary to have some validation data.

5. The authors should compare the performance with previous works on the same dataset.

6. AI or machine learning framework has been used in the previous systems i.e. in biomedical or healthcare (PMID: 32942564, PMID: 32613242). Therefore, the authors should refer to more works in this description.

7. In the GitHub repository, the authors should show some instructions to use their source codes.

8. The authors moved from section "2.1" to "2.3"

Author Response

Reviewer 2 comments:    

The authors proposed an explainable AI framework for Multivariate Hydrochemical Time Series. There are some major points that need to be addressed as follows:

1. English language should be improved. There are some vague parts and jargon.
We improve vague parts through restructuring and revising the language. Upon acceptance we could use the MDPI language services for further improvement. However, we must also say that the paper has been checked for flaws by native speakers of the Springer Nature language service before submission.

2. In my opinion, the content of the manuscript is long and it contains some unnecessary information. The authors should optimize and make it more concise.

We shorten the main manuscript by moving some information to the supplementary J named “Collection and Preprocessing of Multivariate Time Series Data” in Line 659 ff to make the manuscript more consise.

The section starting in Line 173 ff is shortened as well “The dataset contains 32,196 data points for 14 different variables with names and units defined in table 1. Further details about the data collection are described in SI J. Missing data was imputed (see SI J). Data was standardized and de-correlated as described in SI J because distance measures are sensitive to correlations and the differences of variances of features.

We further reduce section 2.2 Cluster analysis in its length.

3. The authors aimed to propose a new machine learning system, but why did the authors not compare the performance results with some baseline machine learning models?

We are grateful to the reviewer to allow us to elaborate this point. We change Line 76 to: “Two of the most recent XAI approaches are the unsupervised decision tree clustering eUD3.5 [1] and a hybrid of k-means clustering and a top-down decision tree [2]. These two unsupervised approaches are the two most-similar approaches to our proposed XAI framework and will be used as a baseline.”

Line 509 ff states “Applying the eUD3.5 algorithm [29] to the unprocessed data identified three clusters and resulted in 541 rules that explain various overlaps in the data points of the three clusters. The seven outliers identified in our analysis were disregarded from the data before using eUD3.5. In comparison, the XAI framework proposed here provides five rules. Furthermore, the class MDplot for nitrate does not show different states of water bodies for eUD3.5 (SI H, Fig. 12, right), but one high state of electric conductivity can be identified (SI H, Fig. 12, left).
Dasgupta et al. did not provide any source code in their work [30]. Therefore, the first part of the IMM algorithm [30], the k-means clustering [68] was performed with the unprocessed data. The seven Outliers identified in our analysis were disregarded from the data. Measuring the feature importance for this clustering [86] indicates that it is based mainly on sol71 leading to the assumption that the second part of the IMM algorithm, the decision tree would favor this feature strongly to explain the clusters. Compared to the DBS clustering, the contingency table is presented in SI B, table 3, and does not show an overlap of clusters between projection-based clustering and the first part of the IMM algorithm, k-means. Additionally, the class MDplots are presented in SI H, Fig 13, which do not show different states of water bodies, meaning that IMM's explanation would not be relevant to the domain expert.“

4. It is necessary to have some validation data.

The first two steps of our approach are fully unsupervised and not supervised which makes the usage of test or verification data unfeasible. We state in Figure 1 in Line 121 ff “Framework of Explainable AI for multivariate time series without implicit assumptions about data structures (data-driven). The framework has the three main steps of the identification of data structures, Cluster Analysis and providing explanations.” In order to make this clearer we add Line 475 “CART provides a decision tree presented in Fig. 6b that reproduces the clustering with an accuracy of 96.5 %.”

5. The authors should compare the performance with previous work on the same dataset.

We terrible sorry that we are unsure how to interpret this suggestion. We reference most of our previous work is not comparable as stated in Line 717 ff
“The data was published earlier by Aubert et al. [6]. However, Aubert et al. used a high-frequency temporal analysis. In comparison, this work focuses on the average daily measures for each variable, resulting in a low-frequency.”
We now additionally mention in Line 719: “Prior clustering of was performed on data aggregated by sum instead of mean [35,99] resulting in a clustering that proved to be unfeasible to the domain expert [37] (SI B), not only due to the fact of the unusable aggregation but also since knowledge acquisition was performed on preprocessed data which proved to be problematic”.

If the suggestion of the reviewer is to compare other works on this dataset, then we like to reply that we already compared our approach to two previous unsupervised approaches as stated in Line 75 ff “Two of the most recent XAI approaches are the unsupervised decision tree clustering eUD3.5 [29] and a hybrid of k-means clustering and a top-down decision tree [30]. These two unsupervised approaches are the two most-similar approaches to our proposed XAI framework and will be used as a baseline.”
We further like to refer to line 509 ff “Applying the eUD3.5 algorithm [1] to the unprocessed data identified three clusters and resulted in 541 rules that explain various overlaps in the data points of the three clusters. The seven outliers identified in our analysis were disregarded from the data before using eUD3.5. In comparison, the XAI framework proposed here provides five rules. Furthermore, the class MDplot for nitrate does not show different states of water bodies for eUD3.5 (SI H, Fig. 12, right), but one high state of electric conductivity can be identified (SI H, Fig. 12, left).
Dasgupta et al. did not provide any source code in their work [2]. Therefore, the first part of the IMM algorithm [2], the k-means clustering [3] was performed with the unprocessed data. The seven Outliers identified in our analysis were disregarded from the data. Measuring the feature importance for this clustering [4] indicates that it is based mainly on sol71 leading to the assumption that the second part of the IMM algorithm, the decision tree would favor this feature strongly to explain the clusters. Compared to the DBS clustering, the contingency table is presented in SI B, table 3, and does not show an overlap of clusters between projection-based clustering and the first part of the IMM algorithm, k-means. Additionally, the class MDplots are presented in SI H, Fig 13, which do not show different states of water bodies, meaning that IMM's explanation would not be relevant to the domain expert.”

6. AI or machine learning framework has been used in the previous systems i.e. in biomedical or healthcare (PMID: 32942564, PMID: 32613242). Therefore, the authors should refer to more works in this description.

Other mentioned works by the reviewer are now added as follows in Line 67 ff “There are two approaches for the explanation of machine learning systems: prediction, interpretation, and justification that is used for sub symbolic ML systems (defined in [13]) and interpretable approaches for symbolic ML systems (defined in [16]), which are explained through reasoning [17]. For example, recent sub-symbolic ML systems are introduced in [14,15] for which interpretation, and justification can be performed with LIME [18]. LIME approximates any classifier or regressor locally with an interpretable model.”
and referenced PMID: 32942564, PMID: 32613242 with 14,15.

7. In the GitHub repository, the authors should show some instructions to use their source codes.
A readme with instructions is now provided in the GitHub folder now.

8. The authors moved from section "2.1" to "2.3"
We corrected the section number in line 277 and therefore in the subsequent subsections.

 

Reviewer 3 Report

The paper presents an interesting approach for Interpretable AI in the context hydrochemical data (multivariate time series). After reading the manuscript I find that a comprehensive amount of work has been done, with remarkable findinds with are worth being published in this journal after some revision work.

In this refard, I have several questions that I would like to ask:

  • How are these Topographic Maps related to Self-Organizing Maps (SOM)?
  • When creating the different decision trees, have you considered any pruning for improving the generalization capabilities of the model?
  • In terms of explainability, have you considered the user of any rule-based algorithms such as RIPPER?

 

Besides, I have some minor issues which must be corrected before accepting the manuscript:

  • Some references are missing the publicacion year (e.g., [14], [8]). Please revise the bibliography and include them.
  • Remove $ signs in reference [28].
  • In Fig. 1, "Occam's Rezor" must be "Occam Razor".
  • The reference in line 20 to "Miller et al.", which reference does it refer to? (Missing publication year for being able to locate the proper element in the bibliography).

Author Response

Reviewer 3 comments:    
The paper presents an interesting approach for Interpretable AI in the context hydrochemical data (multivariate time series). After reading the manuscript I find that a comprehensive amount of work has been done, with remarkable findinds with are worth being published in this journal after some revision work.
In this refard, I have several questions that I would like to ask:
How are these Topographic Maps related to Self-Organizing Maps (SOM)?

We elaborate this point in Line 256 and in the following “In SOMS with a low number of neurons, each neuron becomes a so-called best-matching unit or prototype and represents a cluster [9-11]. Emergent self-organizing maps are online SOMs with many neurons for which not every neuron is a best-matching unit- ESOMs do not cluster the input data directly[12]. Instead, ESOMs, combined with the U-Matrix approach, are able to visualize the structures of high-dimensional data[13]. However, the adjustments of parameters in SOMs, which define the selection of best-matching units and the annealing scheme, is challenging and can depend on the data [14]. Hence, in simplified ESOM, the best-matching units are defined by priorly projected points, which omit the setting of any parameters [15]. The topographic map is a 3D landscape visualization approach for the U-matrix (see [16] for alternative approaches).“

When creating the different decision trees, have you considered any pruning for improving the generalization capabilities of the model?

This is an important question. We do not consider manual pruning because the number of rules is already in the Miller optimum. However, in the default parameters of the rpart package for CART and evtree package for globally optimal classification and regression trees, automatic pruning is already performed if the default setting is not changed. We add to Line 355 “Exemplary, the R package "rpart" and the package "evtree" are applied. If the number of leafs exceed the Miller optimum, further pruning can be specified in the open-source libraries.“. We further deleted the sentence referring to the libraries used in line 371 ff.

In terms of explainability, have you considered the user of any rule-based algorithms such as RIPPER?
Wieder mit dem reviewer! argumentiert
In this manuscript, we focus on unsupervised decision trees as Line 75 ff now states more precise “Two of the most recent XAI approaches are the unsupervised decision tree clustering eUD3.5 [29] and a hybrid of k-means clustering and a top-down decision tree [30]. These two unsupervised approaches are the two most-similar approaches to our proposed XAI framework and will be used as a baseline.”

Prior to writing the manuscript, we considered Ripper but it was not of use to us. We provide now an example in our source code which results in a changed DOI on Zenodo: 10.5281/zenodo.4274700 4318830.

Besides, I have some minor issues which must be corrected before accepting the manuscript:
Some references are missing the publicacion year (e.g., [14], [8]). Please revise the bibliography and include them.

Thank you for pointing out this general Endnote error for proceedings, we corrected the MDPI template in endnote, Line 808, 791 and several others show now the publication year.

Remove $ signs in reference [28].

This is now corrected in Line 854.

In Fig. 1, "Occam's Rezor" must be "Occam Razor".

This is corrected now in the figure 1 in line 121.

The reference in line 20 to "Miller et al.", which reference does it refer to? (Missing publication year for being able to locate the proper element in the bibliography).

Line 41 changed to “For example, Miller et al. 2017 argue …”. Bibliography is now corrected using the improved MDPI template.

Round 2

Reviewer 1 Report

The paper changed very little from the first version. I still think it lacks soundness. The authors indicate in the title and introduction that they will provide a general framework for explainable AI for hydrochemical time series, but a general method is not presented. Rather, the authors present a study using several R packages, that contributes little to explain the ML method. The clustering algorithm used ( swarm-based projection method of the Databionic swarm) is rather unusual, and no justification is provided for this choice.

Reviewer 2 Report

My previous comments have been addressed satisfactorily.

Reviewer 3 Report

All my points have been properly tackled so I recommend the acceptance of the manuscript in its present form.

Back to TopTop