Explainable AI Framework for Multivariate Hydrochemical Time Series

. The understanding of water quality and its underlying processes is important for the protection of aquatic environments enabling the rare opportunity of access to a domain expert. Hence, an explainable AI (XAI) framework is proposed that is applicable to multivariate time series resulting in explanations that are interpretable by a domain expert. The XAI combines in three steps a data-driven choice of a distance measure with explainable cluster analysis through supervised decision trees. The multivariate time series consists of water quality measurements, including nitrate, electrical conductivity, and twelve other environmental parameters. The relationships between water quality and the environmental parameters are investigated by identifying similar days within a cluster and dissimilar days between clusters. The XAI does not depend on prior knowledge about data structure, and its explanations are tendentially contrastive. The relationships in the data can be visualized by a topographic map representing high-dimensional structures. Two comparable decision-based XAIs were unable to provide meaningful and relevant explanations from the multivariate time series data. Open-source code in R for the three steps of the XAI framework is provided.


Introduction
Human activities modify the global nitrogen cycle, particularly through agriculture.These practices have unintended consequences; for example, terrestrial nitrate losses to streams and estuaries can impact aquatic life [1].A greater understanding of the variability in water quality and its underlying processes can improve the evaluation of the state of water bodies and lead to better recommendations for appropriate and efficient management practices [2].
Accordingly, the objective here is to describe the water quality in terms of nitrate (NO3) and electrical conductivity (EC) in the Schwingbach catchment (Germany) using environmental variables typically related to chemical water quality.Electrical conductivity is a measure that reflects water quality as a whole because it indicates the number of ions dissolved in the water.NO3 in water bodies is partially responsible for the phenomenon of eutrophication [3].Eutrophication occurs when an excess of nutrients (including NO3) leads to the uncontrollable growth of aquatic plant life, followed by the depletion of the dissolved oxygen [3,4].For decades, water quality has mainly been measured through manual grab sampling of water samples and subsequent chemical analysis in the laboratory.Due to limited resources, high-resolution measurements on the order of days, hours, or even minutes were not available for a long time.With the advancement of deployable, in situ measuring techniques, such as UV spectrometry, a new era of field monitoring has been established [5].However, we still lack reproducible open-source code based methodological approaches interpretable by domain experts with which to analyze the resulting large datasets [6,7].
For example, Miller et al. argue that most AI researchers are building XAIs for themselves [8] rather than for the intended domain expert and, hence, are often not meaningful for the domain expert.Thus, the XAI framework is introduced using a dataset from the Schwingbach catchment for which a domain expert will interpret the explanations that the XAI provides.In consequence, the proposed XAI framework's goal is to provide meaningful and relevant explanations to a domain expert based on the given data.Miller argues that an essential property of explanations should be that they are contrastive [9].This means that a domain expert would not only ask why an event happened but also why that event happened rather than an alternative [9].It follows that interesting features describing water bodies should be distinguishable for the explanations describing these water bodies to be relevant.This work proposed to use class-wise mirrored-density plots (MD plots) [10] to investigate if generated explanations are tendentially contrastive in interesting features (see.2.4.3 for details).
The whole XAI framework is a mix of swarm intelligence, game theory, neural networks, and supervised decision trees.These are combined in a sophisticated manner to detect meaningful relationships in data, which makes it a comprehensive new tool for interpretable machine learning or so-called explainable AI systems [11].
The main contribution of this paper is the proposition of an XAI that reveals cluster structures in time series with a solely data-driven approach.In this context, data-driven means that the authors aim to refrain from making explicit or implicit assumptions about the existence and type of data structures.The found cluster structures are verified with independent approaches and then used in decision trees.
Overall, this work shows how to search for days with similar behavior by using a transparent and opensource framework.The results explain similar environmental, and in particular water quality, situations although NO3 stream concentrations "integrate" many processes varying in space and in time [12].Finally, relevant and meaningful explanations could be used to predict future NO3 and EC values.

Related Works
There are two approaches for the explanation of machine learning systems: prediction, interpretation, and justification that is used for sub symbolic ML systems (defined in [13]) and interpretable approaches for symbolic ML systems (defined in [14]), which are explained through reasoning [15].For the former, a wellknown example is LIME [16], which approximates any classifier or regressor locally with an interpretable model.In the sense of the latter, explanation represents a distinct approach to extract information from the learned model [17].Typical interpretable ML systems or so-called XAIs comprise combinations of neural networks with rule-based expert systems [18,19], Bayesian networks with rule mining [20], hybrids of clustering and fuzzy classification [21] or neuro-fuzzy classification [22], interpretable decision sets [23] or decision tables [24], decision tree clustering [25] or clustering combined with generative models [26].Two of the most recent XAI approaches are the unsupervised decision tree clustering eUD3.5 [27] and a hybrid of k-means clustering and a top-down decision tree [28].
Loyola-González et al. present the eUD3.5 algorithm, which uses a split criterion based on the silhouette index [27].The silhouette index compares every object of a cluster to its homogeneity within a cluster with the heterogeneity to other clusters [29].In eUD3.5, a node is split only if it's possible descendants have a better split criterion than the best split criterion found so far.This leads to a decision tree which is based on the cluster structures.A cluster is associated with the class having the most members in the cluster.In eUD3.5, 100 different trees are generated, their performance is evaluated, and the best performing tree is kept.The user can specify the number of desired leaf nodes (stop criterion).If the algorithm produces more leaf nodes than specified by the user, then leaf nodes are combined using k-means.The authors claim to have similar performance to k-means and better performance than other conventional decision tree clustering algorithms [27].
Dasgupta et al use a hybrid of k-means with a decision tree called iterative mistake minimization algorithm (IMM) [28].The k-means method provides the labels and cluster centers with which the decision tree is built top-down using binary splits in which each node of the tree is associated with a portion of the input data [28].If this portion of input data contains more than one cluster center, then it is split so that the fewest points are separated from its cluster center: this type of optimal split is found by searching over all pairs using dynamic programming.The IMM algorithm terminates at a leaf node whenever all points have the same label resulting in a number of leaves equal to the number of clusters [28].Dasgupta et al. did not provide any source code in their work [25].Therefore, in the first part of the IMM algorithm [25], the kmeans clustering is used through the toolbox published in [61].Then, feature importance in a k-means clustering is measured using the algorithm provided in the R package "FeatureImpCluster" available on CRAN [80].The importance per feature is measured by the misclassification rate relative to the baseline cluster assignment due to a random permutation of feature values [80].
More general, time series clustering is either raw-data-based, feature-based, or model-based [30].Often, the Euclidean metric on data points or Kullback-Leibler dissimilarity on distributions are used.Cluster analysis is performed via medoid or agglomerative methods of conventional clustering algorithms [30].Clustering algorithms are often sensitive to outliers and noise [31,32].Such approaches also imply that the relevant cluster structures are spherical or of other specific structures because a global optimization function has to be used [33].Then, evaluation is commonly performed on the basis of within-cluster variance [30].Other common approaches to the evaluation of time series are shape-based, meaning that the shapes of two time series are matched according to specific dissimilarity measures [34].The approaches have in common that they optimize a global objective function that defines the data structures, and they do not investigate if cluster structures exist.

XAI Framework
Contrary to the approaches introduced above, this work applies an alternative approach to the optimization of a global objective function in the task of projection by Databionic swarm (DBS) [35] and subsequent projection-based clustering [36].In contrast to most other clustering algorithms, the topographic map visualization [37] identifies if the data contains no structures based on clusters.Outliers can be interactively marked in the visualization [38] after the automated clustering process in the case that they are not recognized sufficiently in the automatic clustering process [38].The clustering itself is one module of the XAI framework presented in Figure 1.There are three main steps in Figure 1.First, the data structures have to be identified (blue).Next, the cluster analysis is performed (orange).In the third step, explanations (green) are provided using the labels of the clustering and the multivariate time series data.Each step has several modules that are connected to each other and colored similar to the step.Arrows outline the connections between the modules.In the first step, the time series data is aggregated appropriately (e.g., daily) and then standardized.In step II, various available distance metrics are applied to the data and investigated for multimodality.If a distance distribution is multimodal, it can be modeled by a Gaussian Mixture.This distance should be preferred for the cluster analysis in the second step.If not, distance is multimodal, the framework continuous with the Euclidean distance.Next, the chosen distances are used in the projection in the module followed by the structure visualization through a topographic map.
The cluster analysis in step II is composed of three modules.This visualization of the topographic map enables the user to choose the number of clusters and the setting of the Boolean parameter.The number of clusters can be set as the number of visible valleys in the topographic map.The clustering can be further evaluated by the model of the distance distribution (last arrow between steps I and II), because using the Gaussian mixture model of step I hypothesizes that the intracluster distances of the clustering should be smaller than the Bayesian boundary defined by the Gaussian Mixture.Projection-based clustering can be changed to another clustering algorithm accordingly to the preference of the user.The clustering can be validated by the topographic map [36].Additionally, it is preferable to search for linear models and to validate the clustering externally (e.g., by a heatmap).
After validation in step II, the labels of step II's resulting clustering are used in supervised decision trees in step III for the training of the un-preprocessed but aggregated data.Then, meaningful explanations are defined by paths in the decision tree.Class-wise distribution analysis and statistical testing of interesting features are performed to assure that explanations are relevant to the domain expert.In the last module, a domain expert interpret relevant and meaningful explanations (c.f.[8]).The analytic procedure details can be found in the methods sections, which are organized according to these three steps, as illustrated in the titles in Fig. 1.The result sections are organized the same way.The R packages used in this work are summarized in SI H, table 6.

Collection of Multivariate Time Series Data
Data used in this work have been collected in 2013/2014 in the Schwingbach Environmental Observatory (SEO), in central Germany.The mixed developed landscape is mainly characterized by agricultural land use (44%) and forests (48%).The Schwingbach is a small headwater stream draining the landscape.An in-situ hyperspectral UV-spectrometer (ProPS, Trios, Rastede, Germany, wavelength range 200-360 nm, path length 5 mm, solar panel supplied) was used to measure absorption spectra every 15 min at the gauging station of the catchment's outlet.Prior to measurements, air blasts (5 s) were blown on the lens to prevent the optics from biofouling.Wavelengths spectra at 200-220 nm were utilized for the calculation of nitrate concentration, following calibration with local stream water matrix (see [6] for further details).All other variables used in this machine learning approach (Table 1) were also monitored at the same high-frequency or aggregated to 15 min intervals.
Discharge and stream water temperature was recorded by pressure transducers (Diver DCX, Schlumberger Water Services, ON, Canada) at two gauging stations at the outlet (q13) and upstream (q18) of the first-order stream of the Vollnkirchener Bach.Groundwater depths in three piezometers were measured with similar pressure transducers.The piezometers were located in different positions along the Schwingbach.GWl3 was measured in the lowland close to the outlet of the stream, GWl25 was recorded at a hillslope, and GWl32 upstream in the riparian zone).Electromagnetic induction sensor (5TE) attached to EM50 data loggers (Decagon, Labcell LTD, Alton, UK) installed at 0.1 m depth in the riparian zone were used to gauge soil moisture and soil temperature.All meteorological data were collected at a climate station 4 km from the outlet (Campbell Scientific Inc., CR1000 data logger, Loughborough, UK).
In total, the dataset contains 32,196  Further technical information on the SEO, the analytical procedures applied, the coding of abbreviations, and the experimental design, in general, are outlined in detail by [6], [7], [39].Four percent of the data are missing.For each day, the measurements were aggregated by the mean of all available measurements for that day.Then, missing values (i.e., days) were interpolated using the sevennearest-neighbors approach.Distance measures are sensitive to the variance in the distribution of features.For example, the Euclidean metric weights feature more if they have values above 1.Therefore, the variance of features is usually standardized before a cluster analysis is performed.
The variables q13 and q18 were log-transformed.All variables, with the exception of rainfall, were normalized to values between zero and one through a robust normalization procedure [40] improved by [33].Correlating variables have to be detected before further data evaluation; otherwise, these variables will be over-weighted in assessing the following distance matrices.The discharges correlated linearly with each other (r=0.95,p(S=347,270, N=351) <0.001), and q13 were therefore excluded from the analysis.The air temperatures Wt13 and Wt18 also correlated linearly (r=0.99,p(S=18,386, N=351) <0.001); hence, Wt13 was removed as well.
The outliers in the rainfall variable were detected via ABC analysis [41].ABC analysis is a method used to compute precise limits to acquire subsets in skewed distributions by exploiting the mathematical properties pertaining to the distribution.The data containing positive values are divided into three disjoint subsets, A, B, and C, with subset A comprising very profitable values, i.e., the largest data values ("the important few").Subset B contains values for which the yield equals the effort required to obtain it, and subset C contains the non-profitable values, i.e., the smallest datasets ("the trivial many").The R package is called 'ABCanalysis'.Then, the rain was normalized with respect to the minimum value in group A. All other points in group A were capped by defining the upper bound for rainfall.

Distance Selection
Usually, partitioning and hierarchical clustering algorithms require a distance metric because they seek to find groups of similar objects [42] (i.e., objects with small distances between them).If no specific distance measure is used, most algorithms use the Euclidean distance metric, and the user is not always able to manually change the distance metric (c.f.54 common algorithms in the R package 'FCPS').Projection-based clustering with the Databionic swarm (DBS) has the advantage that a specific distance metric can be selected by the user, which is then used in the dimensionality reduction and clustering part of the algorithm.However, the choice of distance remains undiscussed in prior work.We propose that a user selects a distance metric based on the specific properties of the specific data set's distance distribution.The Hellinger point distance measure is selected because clear multimodality is visible in the probability density distribution.Several metrics were investigated using the R package 'parallelDist' and the MD-plot function [10] in the R package 'DataVisualizations'.The detailed mathematical definitions can be found in SI F. The probability density distribution is modeled with a Gaussian mixture model and verified visually with QQplot as described in [43] with the R package 'AdaptGauss'.The Bayes boundary of two modes with the highest weights separates the intra-cluster distances from the inter-cluster distances.It provides a data-driven hypothesis about the similarity of data points (i.e., days in the example of this work).In this sense, days that a cluster analysis partitions to the same cluster are similar if their intra-cluster distances are lower than the Bayes boundary.

Projection
The swarm-based projection method of the Databionic swarm algorithm is used to project the distance matrix of data into a two-dimensional plane [33,35].Similar to the nonlinear and focusing projection methods of ESOM [44], CCA [45], t-SNE [46], or NerV [47], the dimensionality reduction by the swarm first adapts to global structures.As time progresses, structure preservation shifts from global optimization to the preservation of local neighborhoods.This learning phase requires an annealing scheme and usually require parameters to be set.However, by exploiting concepts of self-organization and emergence, swarm intelligence, and game theory, this projection method is parameter-free [33,35].The intelligent agents of the swarm, called DataBots [48], operate on a toroid grid, where positions are coded into polar coordinates to allow for the precise definition of their movement, neighborhood function, and annealing scheme.
In contrast to other focusing projection methods (e.g., [45,46,49]) the size of the grid and the annealing scheme are data-driven.During the learning phase, each agent moves across the grid or stays in its current position in the search for the most potent scent.The equation (c.f.Eq. 18 in [35]) that mathematically defines the scent uses information stored in the distance matrix (c.f.step II and SI F).Hence, agents search for other agents carrying data with the most similar features to themselves with a data-driven decreasing search radius [35].Every agent's movement is modeled using a game-theory approach, and the radius decreases only if a Nash equilibrium is found [50,51].Contrary to ant-based clustering algorithms, DataBots do not move data.Instead, each DataBot possesses a scent, defined by one high-dimensional data point.

Structure Visualization by Topographic Map
Projection points near to each other are not necessarily near in the high-dimensional space (vice versa for faraway points), but in planar projections of data, these errors are unavoidable (c.f.Johnson-Lindenstrauss Lemma [52]).Hence, the topographic map identifies data structures based on a projection.First, the generalized U-matrix [33,53] is calculated on this projection using emergence through an unsupervised artificial neural network called a simplified (because parameter-free) emergent selforganizing map [37].The generalized U-matrix generates the visualization of a topographic map with hypsometric tints, which can be vividly described as a virtual 3D landscape with a specific color scale chosen with an algorithm defining the contour lines [54].The topographic map addresses the central problem in clustering, i.e., the correct estimation of the number of clusters.It allows the assessment of the number of clusters by inspecting the 3D landscape.
The topographic maps correspond to high-dimensional distance and density structures.Hypsometric tints are surface colors that represent ranges of elevation.The contour lines are combined with a specific In this 3D landscape, the borders of the visualization are cyclically connected with a periodicity (L,C).

Step II: Cluster Analysis
In step II, projection-based clustering is applied by calculating the shortest paths [55]) of the Delaunay graph between all projected points weighted with high-dimensional Hellinger point distances.This is possible because it was shown that the U-matrix is an approximation of the abstract U-matrix [56], which is based on Voronoi cells.Voronoi cells define a Delaunay graph where the edges between every projected point are weighted by the corresponding data points' high-dimensional distances.
The clustering approach itself involves two choices.For this dataset, the compact approach is used, where the two clusters with the minimal variance (S) are merged together until the number of clusters defined by the topographic map is reached.The other approach for connected structures and a general discussion of cluster structures can be found in [36].
Let   ⊂  and   ⊂  be two clusters such that ,  ∈ {1, … , } and   ∩   = { } for  ≠  and where the merging cost between two the clusters   ,   ⊂  Then, the variance (S) between two clusters (cr and ck) is defined as In praxis, the choice of the Boolean parameter of compact versus connected can be evaluated in step I using the topographic map as specified in Figure 1: If a cluster is either divided into separate valleys, or several clusters lie in the same valley of the topographic map, the compact (or connected) clustering approach is not appropriate for the data.An extensive discussion of this behavior can be found in [36].Additionally, the clustering can be improved further using an interactive interface (c.f.provided in the R package 'ProjectionBasedClustering' [38].The ultrametric portion of the distance [57]  Projection, topographic map of structures, and cluster analysis are available in R language on CRAN in the package 'DatabionicSwarm' [33].It should be noted that the cluster analysis can also be performed independently to the projection method, with, for example, k-means but most often results in a focus on specific cluster structures [36].This can be preferable if prior knowledge about the data exists that leads a user to a specific choice of a global cluster criterion.

Validation of Clustering and Ockham's Razor
Engle et al. state: "Cluster heatmaps are commonly used in biology and related fields to reveal hierarchical clusters in data matrices.Heatmaps visualize a data matrix by drawing a rectangular grid corresponding to rows and columns in the matrix and coloring the cells by their values in the data matrix.
In their most basic form, heatmaps have been used for over a century [58].In addition to coloring cells, cluster heatmaps reorder the rows and/or columns of the matrix based on the results of hierarchical clustering.(…).Cluster heatmaps have high data density, allowing them to compact large amounts of information into a small space [59]."[60].For distance matrices, the procedure can be performed as described above, meaning that the clustering reorders the distances, each pixel represents a distance value, and the clusters are divided by black lines.Further, the clustering is valid if mountains do not partition clusters indicated by colored points of the same color and colored regions of points [58].
A heatmap visualizes the homogeneity of clusters and the heterogeneity of intercluster distances if the clustering is appropriate.The R package 'DataVisualizations' is used.
Ockham's razor states that if two models are applicable, the less complex one should be used [61].Therefore, the authors suggest investigating if simpler models can represent the data structures and provide meaningful and relevant explanations.A simpler projection approach assuming linear cluster structures and a simpler clustering [62] approach assuming spherical cluster structures is applied to the data.Moreover, spherical cluster structures are tested with the Silhouette plot using the R package' DataVisualizations '.

Step III: Providing Meaningful and Relevant Explanations
Explainability should follow the Gricean maxims of quality, relevance, manner, and quantity [63], for the usage of decision trees in this work.They summarized to meaningful and relevant explanations which should be then interpreted by a domain expert [8].

Decision Trees for Identified Data Structures
The conventional usage of decision trees is usually supervised, requiring a prior classification (e.g.[64]) but, alternatively, can also be unsupervised using split evaluation criteria that do not require a prior classification (e.g.[25]).Supervised decision trees are, for example, the classification and regression tree (CART) [65] or globally optimal classification and regression trees [66].Here, we propose a third approach by using the clustering labels provided by cluster analysis in step III instead of a prior classification.Contrary to common usage, the decision tree is exploited here to explain the identified data structures.The supervised decision trees are computed using the labels and unprocessed or back-transformed data.

Extracting Meaningful Explanations
The maxim of quality states that only well-supported facts and no false descriptions should be reported.Quality will be measured by the accuracy of supervised decision trees representing the clustering (see section 2.5).The maxim of manner suggests to be brief and orderly and to avoid obscurity and ambiguity [63].For the explanation to be in an appropriate manner, the standardization has to be either backtransformed to provide SI units of measurement or unprocessed data has to be used.The maxim of quantity states that neither too much nor too few explanations should be presented [63].This work specifies the statement in the sense that the number of explanations should follow the Miller optimum of 4-7 [67,68].Then, explanations are meaningful to a domain expert (c.f.discussion [8]).However, Decision tree algorithms do not aim at meaningful explanations [63,69].Therefore a transformation of the decision tree into rules is necessary [63,69].Here, rules are extracted from the decision tree by following each path from the root to the leaf.Exemplary, the R package "rpart" and the package "evtree" are applied.The number of rules measures the property of meaningfulness.

Evaluating the Relevance of Explanations
The maxim of relevance requires that only rules relevant to the expert are listed.Typically, explanations are especially relevant if they are tendentially contrastive (c.f.[9])).Suppose an explanation based on a clustering of the data is relevant (i.e., reveals to the domain expert relevant high-dimensional structures for similar days).In that case, the classes defined by such explanations should contain samples of different environmental states and be based on different processes.The property of relevance is qualitatively evaluated by class mirrored-density plots class (MD plots) [10].Additionally, statistical testing of class-wise distributions of features can be performed to ensure that the classes defined by rules are tendentially contrastive and, in consequence, relevant.
The Mirrored-Density plot (MD-plot) introduced in [10] visualizes a density estimation in a similar way to the violin plot [70].The MD-plot uses for density estimation, the Pareto density estimation (PDE) approach [71].It can be shown that comparable methods have difficulties in visualizing the probability density function in case of uniform, multimodal, skewed, and clipped data if density estimation parameters remain in a default setting [10].In contrast, the MD plot is particularly designed to discover interesting structures in continuous features and can outperform conventional methods [10].The MD plot does not require any adjustments in density estimation parameters, which makes the usage compelling for nonexperts.The class MD-plot is available in the R package 'DataVisualizations'.The class MD plot visualizes the density of each class of an interesting feature separately and is used to show the relationship of the clusters of the XAI systems to NO3 and EC concentrations.

Results
An overview of the analysis is provided in Fig. 1.For clarity, the rest of this chapter is subdivided into three sections: the first section consists of the selection of an appropriate distance metric, extracting the first hypothesis from the distribution of distances (3.1) and providing the topographic map.However, the projected points in the topographic map are already colored by the clustering of the second step.The second section presents the projection-based cluster analysis method and validates the clustering (3.3).The third section presents and evaluates explanations.

Step I: Structure Identification
In supplementary Information A (SI A), the probability density distributions of the 12 finally selected and preprocessed variables are visualized with mirrored-density plots [10] (see also 2.4.3Step III).SI A shows the result of an appropriate standardization of features resulting in similar variances.
Exemplary, Fig. 2 presents the probability density estimation (PDE) [71] of the distance feature df of the Hellinger point distance matrix of the preprocessed data in black and its Gaussian mixture model (GMM) in red.Specific definitions can be found in SI F. The Hellinger point distance [72] in the R package 'parallelDist' was chosen for cluster analysis because the distribution of distances is statistically not unimodal according to Hartigan's dip test [73] (with p(D = 0.006385, N= 61425)<0.001).The distance distribution can be modeled through the Gaussian mixture model (GMM) using the expectationmaximization (EM) algorithm [74].The distance distribution and GMM are visualized in Fig. 2. The QQ-plot verifies the GMM in Fig. 3.This serves as an indication of the existence of high intercluster distances (distances between different clusters) and outlier distances as well as small intracluster distances (distances within each cluster), meaning that a distance-based cluster structure can be found.The Bayes boundary of the GMM in Fig. 2   Next, the topographic map of high-dimensional structures evaluates the clustering by indicating which points are in the high-dimensional space far away (brown/white hills) or near (blue seas, green grassland).The topographic map with hypsometric tints generated with the R package 'GeneralizedUmatrix' is toroidal, meaning that the grid's borders are cyclically connected with a periodicity defined by the size of the grid of the projection of the Databionic swarm.In Fig. 4, a cutout island of the topographic map is shown.Every point symbolizes a day.The high-dimensional distances of the low-dimensional projected points are visualized.The topographic map shows three valleys and basins indicating clusters and watersheds of hills and mountains shown by borderlines between clusters.Thus, the number of clusters is equal to the number of valleys.Without the cluster analysis providing the labels, all projected points would have the same color.

Figure 4:
In the topographic map of high-dimensional structures, every point symbolizes a day and is colored by the independently performed clustering.The color of the points is defined by the labels of projection-based clustering.Clusters lie in valleys.The topographic map shows two main clusters (magenta and yellow points), a smaller cluster (black points).In addition, seven single outliers (marked in red and by red arrows) in the hydrology dataset are disregarded before comparable XAIs are applied.Visualization of high-dimensional data structures is generated using the R package "GeneralizedUmatrix".

Step II: Cluster Analysis
In Figure 4, the labels of the clustering are hereafter visualized as the colors of the projected points.In addition to the two main clusters (magenta and yellow points) and one outlier cluster (black points), seven outliers can be identified as volcanoes or within the valleys indicated by red arrows in Fig. 4. Next, the authors created a heatmap in order to verify the clustering.The heatmap shows intra-versus intercluster distances ordered by each cluster (Fig. 5).Blue colors symbolize small distances, and yellow and red colors represent large distances.The heatmap depicts clusters' homogeneity because the pattern of blue and teal colors is present for intra-cluster distances and yellow to the red color pattern for inter-cluster distances.The median intra-cluster distances of clusters 1, 2, and 3 are 0.24, 0.36, and 0.31, respectively and are below the Bayes boundary of the GMM in Fig. 2 of 0.39.The average intercluster distance is 0.48 and above the Bayes boundary of 0.39.These results indicate that the intracluster distances are smaller than the intercluster distances.This means that days within each cluster are evidently more similar to one another than days between clusters.To check for a possibility of a simpler model, a linear projection by the method projection pursuit [75] using a clusterability index of variance and ratio (c.f.[76]) is applied on the dataset.The linear projection does not reveal clear structures, even if the generalized U-matrix is applied to visualize high-dimensional distance structures in the two-dimensional space (Figure SI G, Fig. 11).Therefore, it can be assumed that the structures cannot be separated linearly, motivating the usage of more complex and elaborate methods.The clustering can be reproduced with an accuracy of 86% using hierarchical clustering as described by Ward [77] if the seven outliers are disregarded because the method is sensitive to outliers [78].Silhouette plots (SI D, Fig. 9) indicate inappropriate values for this clustering procedure if a spherical cluster structure is assumed.

Step III: Providing Explanations
For the explanation of the clustering as described in step II, non-standardized features have to be used because the system should explain the clustering to the domain expert and not the data scientist [8].The clusters are explained by applying the evtree [66] and CART algorithm [65,79].The evtree decision tree is shown in Fig 6a, and the CART decision tree for the data is visualized in Fig. 6b.Both decision trees agree on the same features sets and relations for each cluster except for cluster three, for which Rain <0.2 is not required to differentiate from cluster one and two in evtree although that makes cluster 3 less meaningful.The boundaries vary slightly between CART and evtree.None of the outliers could be explained by either evtree or CART.CART has a lower error and improves the meaningfulness of cluster three.Therefore, the rules are extracted from the CART tree instead of the evtree by following a path from root the leaf.For example, rule R1 explains cluster one of figure 4 (magenta).The combination of rules and clusters describes the classes which could predict future NO3 and EC values (Table 2).The description of class 2 gains more detail if maximum likelihood plots of rain and water temperature (Wt18) are used (SI E, Fig. 10).

Table 2:
Explanations based on rules derived from the decision tree of Fig. 6b.Abbreviations: rainfall intensity (rain), soil temperature (St24), soil moisture (Smoist24), and water level at point 3 (GWl3).All values are expressed as percentages.For units of measurement, please see table 1. Class 2 R5 is extended by SI E, Fig. 10.The color names of the projected points of Fig. 4

Evaluating Explanations for different XAIs
Next, we investigate the NO3 and EC probability density distributions per class.In the last section, the clusters were explained by rules to define classes.The class-dependent MD-plots of Fig. 7a and Fig. 7b show that the classes depend on normal or high NO3 levels (Fig. 7a) as well as on low, intermediate or high conductivity levels (Fig. 7b) because the distributions of classes differ significantly from one another, with the exception of NO3 classes 2 and 3.It is confirmed by Kolmogorov-Smirnov tests (SI C) that the classes differ significantly from each other in the NO3 and EC distributions, except for class 2 versus class 3 in NO3.However, class 2 and class 3 also differ significantly from each other in the variables of rain and Wt18 (water temperature) in SI E, Fig. 10.Therefore, the explanations are relevant to the domain expert.
Applying the eUD3.5 algorithm [27] to the unprocessed data identified three clusters and resulted in 541 rules that explain various overlaps in the data points of the three clusters.The seven outliers identified in our analysis were disregarded from the data before using eUD3.5.In comparison, the XAI framework proposed here provides five rules.Furthermore, the class MDplot for nitrate does not show different states of water bodies for eUD3.5 (SI H, Fig. 12, right), but one high state of electric conductivity can be identified (SI H, Fig. 12, left).
Dasgupta et al. did not provide any source code in their work [28].Therefore, the first part of the IMM algorithm [28], the k-means clustering [62] was performed with the unprocessed data.The seven Outliers identified in our analysis were disregarded from the data.Measuring the feature importance for this clustering [80] indicates that it is based mainly on sol71 leading to the assumption that the second part of the IMM algorithm, the decision tree would favor this feature strongly to explain the clusters.Compared to the DBS clustering, the contingency table is presented in SI B, table 3, and does not show an overlap of clusters between projection-based clustering and the first part of the IMM algorithm, k-means.Additionally, the class MDplots are presented in SI H, Fig 13,which do not show different states of water bodies, meaning that IMM's explanation would not be relevant to the domain expert.

Interpreting Explanations
The acquired relevant and meaningful explanations by the XAI framework proposed in this work (Tab. 2) can be explained as follows: While water temperature governs the biological turnover of nitrogen compounds in the stream water, hydrological variables such a groundwater level determine how and whether terrestrial NO3 pools are connected to the stream system by activating flow pathways.Furthermore, the rainfall-runoff generation processes either concentrate or dilute the stream NO3 concentration, according to the difference in NO3 concentration in the stream and in the "new water" added to the stream system.
In the search for days with similar behavior, days with normal and high NO3 were identified.In 321 out of 343 days, the NO3 concentrations were normal (in the average range of [1, 3.5]  or intermediate to low (in the average range of [0.25, 0.045] mS/m).Normal NO3 and higher EC occurred on dry days with increased stream water temperature and higher groundwater levels.From a data-driven perspective, these days were highly similar to one another (c.f.cluster 1 in Fig. 4 and Fig. 5).The explanation for normal NO3 with low to normal EC concentrations is more complex and described by "duality": they likely had an intermediate stream water temperature (6.1°C<WT18<12.5°C)with either dry days (average rain < 0.15 mm) and low groundwater levels (<1.28 m) or rainy days with high groundwater levels (see SI E).Simultaneously, high NO3 concentrations (in the average range of [3, 5.5] mg/L) and very low EC concentrations (in the average range of [0.025, 0.028] mS/m) occurred only if the stream water temperature was low on dry days.In particular, stream water temperature influences the activities of living organisms.The groundwater level (or head, in m) is the primary factor driving discharge in the Schwingbach catchment, while rainfall intensity triggers discharge and affects the leaching of nutrients [39].

Discussion
Selecting a suitable distance measure enabled to apply the process of clustering of the above-described dataset.It is assumed that cluster analysis is valid if intracluster distances are smaller (more similar to each other) than the intercluster distances.As a parameter-free clustering algorithm, the DBS was chosen.It enables the evaluation of the clustering with a topographic map in addition to the conventional heatmap.Projection-based clustering [36] with the projection of the Databionic swarm [35] is a flexible and robust clustering framework that has the ability to separate complex distance-based structures.The existence of data structures defining clusters and the number of clusters can be estimated prior to the clustering by the visualization of the topographic map.Such structures were identified by low intracluster distances and high intercluster distances of a Gaussian mixture model of the distance distribution and verified by the heatmap (Fig. 5) and topographic map (Fig. 4).However, a simpler linear model (SI G, Fig. 11) or spherical cluster structure were inappropriate (SI D, Fig. 10, SI B, Table 3).It follows that most conventional approaches for clustering listed in [34] or recent XAIs [27,28] would be not appropriate to detect meaningful and relevant data structures.Statistical testing indicates that the distributions of interesting variables differ between classes (SI C and E).Further results imply that the explanations for the clustering were meaningful because brief rules were extracted by applying decision trees and using maximum likelihood plots.Overall, it can be deduced that this dataset contains linearly non-separable distance-based on non-spherical cluster structures that are meaningful and relevant to the domain expert.
The major difference of other XAIs to the approach followed here is that comparable approaches use unsupervised decision trees, whereas this work uses decision trees based on a clustering that is performed independently.This has two main advantages.
First, in Thrun and Ultsch, it was shown that k-means only can grasp spherical cluster structures, which is a severe restriction [36].Moreover, the Silhouette plot is only useful in the case of spherical or ellipsoidal structures [29,81], leading to the assumption that eUD3.5 will prefer hyper ellipsoidal cluster structures and IMM will only work on specific cluster structures because it is based on k-means.In contrast, the XAI presented here outperforms k-means because it can find a large variety of cluster structures through the exploitation of emergence and self-organization [35].These authors state that "Additionally, [clustering algorithms] may return meaningless results in the absence of natural clusters [82][83][84] in contrast to DBS, which will clearly indicate that no cluster structures exist" [35].Moreover, DBS is able to discover small classes [35] whereas UD3.5 is not [27] (p.52381).
Second, the compared approaches of eUD3.5 and IMM are limited to either use transformed data simultaneously providing no meaningful explanations to the domain expert, or to use non-transformed data.If unprocessed data with SI units are used, the results showed that comparable XAI would not provide meaningful explanations because for eUD3.5, many rules are presented for each cluster which do not cover the clusters well enough, and because IMM would focus on one feature.Moreover, neither IMM nor eUD3.5 provide a clustering that is relevant to the domain expert because the class-wise distributions of NO3 and In sum, the explanations are relevant and meaningful in our work, whereas for eUD3.5 and IMM the explanations are not relevant and meaningful.

Conclusion
No prior knowledge usable for a cluster analysis was available.Therefore, a machine-learned AI system relying on the Databionic swarm (DBS) method was used for the projection and clustering of environmental and water quality data.Explanations were provided in a three-step approach of "data structure identification" followed by cluster analysis for which explanations were provided using unprocessed data.
Explanations were provided by rules through a combination of supervised decision trees that trained on the labels of the clustering of preprocessed data.Contrary to a comparable XAI approach, the explanations of the XAI framework proposed here were meaningful and explained relevant content in a human-understandable way.The explanations suggest that the stream water quality data regarding NO3 and EC can be described by a combination of one variable related to biological processes (water temperature) and two variables related to hydrological processes (rain and groundwater level).Our XAI provided explicit ranges of values and could enable future prediction of stream water quality.One comparable XAI (eUD3.5)failed to extract relevant and meaningful rules.Another XAI (IMM) failed because it focuses on specific cluster structures and features, hence relying on prior knowledge about data structure.
The XAI framework presented here allows for unbiased detection of meaningful data structures in high dimensionality datasets.Such datasets become more and more available, not only in hydrochemistry but also in other environmental disciplines due to the technical innovation in monitoring equipment.Our Explainable AI provides a unique possibility to search for unknown structures and can provide meaningful and relevant explanations because it does not rely on prior knowledge about data structure.[75] of the hydrology dataset.The linear projection does not reveal a linear structure, even if the generalized U-matrix is used to visualize highdimensional distances of the two-dimensional projection [53].This visualization was generated using the R package 'GeneralizedUmatrix" and the projection and clustering using "FCPS".

Figure 1 :
Figure 1: Framework of Explainable AI for multivariate time series without implicit assumptions about data structures (data-driven).The framework has the three main steps of the identification of data structures, Cluster Analysis and providing explanations.Each step has several modules explained in the methods section.

Preprints
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 17 November 2020 doi:10.20944/preprints202011.0451.v1color scale.The specific color scale is chosen to display various valleys, ridges, and basins: blue colors indicate small distances (sea level), green and brown colors indicate middle distances (low hills), and shades of white colors indicate vast distances (high mountains covered with snow and ice).Valleys and basins represent clusters, and the watersheds of hills and mountains represent the borders between clusters.
, ) the data points in the clusters be denoted by   ∈   and   ∈   ;  the cardinality |  | of the first set;  the cardinality |  | of the second set;  * Δ the high-dimensional distance based on weighted shortest paths in the Delaunay graph; can be visualized by a dendrogram allowing the alternative selection of the number of clusters: Large changes in the fusion levels of the ultrametric portion of the distance indicate the best cut.Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 17 November 2020 doi:10.20944/preprints202011.0451.v1

Figure 2 :
Figure 2: Distribution analysis of the distances using a Gaussian mixture model (GMM) using the R package 'AdaptGauss '.The black line indicates the estimated distribution of the distance feature df (defined in SI F).The distribution is estimated using PDE.The blue line depicts the single Gaussian distributions ("modes") of the model and the red line the overall model, i.e., the superposition of single Gaussians to a mixture.Bayes Boundary in magenta separates the first mode from the second mode and the third mode leading to the hypothesis that the first mode should consist of intra-cluster distances if clustering is performed.PDE=Pareto Density Estimation(Ultsch, 2005).

Figure 3 :
Figure 3: Quantile-Quantile plot (QQ plot) visualizes a good match between the distance and the GMM through a straight line.The plot is generated using the R package 'AdaptGauss'.

PreprintsFigure 5 :
Figure 5: The four clusters have distinctive distances, as shown by the heatmap.The black lines divide the distances between the data points belonging to a cluster.The outliers are summarized in cluster 4.There are small distances within each cluster and large distances between the clusters.The heatmap was generated with the R package 'DataVisualizations'.

Figure 6a :Figure 6b :
Figure 6a: Globally optimal classification and regression trees (evtree) analysis visualizes a decision tree for the dataset using the labels of the three clusters identified by projection-based clustering.The error of class 1 is 15%, class2 is 6.4%, and class 3 is 8.3%.Outliers are summarized in class 4. The rules are quite similar to Fig 6B but have a higher error.The tree was generated using the R package' evtree.

Figure 7a :
Figure 7a: Class-wise mirrored-density plot (MD-plot) of the three explained classes with regard to NO3 and the outliers.There are two low to intermediate classes of N concentrations and one class of high N concentrations.Classes are colored similar to the clusters in Fig. 4. The MD-plot was generated using the R package 'DataVisualizations'.

Figure 7b :
Figure 7b: Class-wise mirrored-density plots (MD-plot) of the three explained classes with regard to electrical conductivity C.There is a class of high concentration, a class of low to intermediate concentration, and a class of low C concentrations.Classes are colored similarly to the clusters in Fig. 4. The MD plot was generated using the R package 'DataVisualizations', mg/L).On such days, the concentrations of electric conductivity (EC) were either high (in the average range of [0.034, 0.055] mS/m) Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 17 November 2020 doi:10.20944/preprints202011.0451.v1

Figure 8 :
Figure 8:The distribution of variables after preprocessing is visualized using mirrored-density plots of the hydrology dataset.The magenta overlay marks features that are statistically not skewed or multimodal.The mirrored-density plot (MD-plot) was generated using the R package 'DataVisualizations'..

Figure 11 :
Figure 11: Toroidal topographic map of a projection pursuit approach by [75] of the hydrology dataset.The

Figure 12 :Figure 13 :
Figure 12: Class wise mirrored-density plot (MD-plot) of the three classes defined 541 rules of by eUD3.5 with regard to nitrate NO3 (left) and electrical conductivity C (right).Seven Outliers identified in DBS were priorly disregarded.In the case of nitrate, no clear differences between the distributions of the classes are visible.There is one high to intermediate classes of C concentrations and classes of low to intermediate C concentrations.The MD-plot was generated using the R package 'DataVisualizations'.

www.preprints.org) | NOT PEER-REVIEWED | Posted: 17 November 2020 doi:10.20944/preprints202011.0451.v1 measurements
[6]n Table1, abbreviations and SI units of all variables are provided.The data was published earlier by Aubert et al.[6].However, Aubert et al. used a high-frequency temporal analysis.In comparison, this work focuses on the average daily measures for each variable, resulting in a low-frequency analysis.
data points for 14 different variables.Data gaps due to technical problems and data quality control reduced the available data coverage to two growing seasons (05 March 2013 to 24 September 2013, n=15,475 measurements; 27 April 2014 to 23 October 2014, n=16,721 Preprints (

Table 1 :
Measured environmental variables with abbreviations and units.The probability density distributions of the transformed dataset are visualized in the supplementary section.
are mapped to the rules of this table.