Predicting Bulk Average Velocity with Rigid Vegetation in Open Channels Using Tree-Based Machine Learning: A Novel Approach Using Explainable Artificial Intelligence

Predicting the bulk-average velocity (UB) in open channels with rigid vegetation is complicated due to the non-linear nature of the parameters. Despite their higher accuracy, existing regression models fail to highlight the feature importance or causality of the respective predictions. Therefore, we propose a method to predict UB and the friction factor in the surface layer (fS) using tree-based machine learning (ML) models (decision tree, extra tree, and XGBoost). Further, Shapley Additive exPlanation (SHAP) was used to interpret the ML predictions. The comparison emphasized that the XGBoost model is superior in predicting UB (R = 0.984) and fS (R = 0.92) relative to the existing regression models. SHAP revealed the underlying reasoning behind predictions, the dependence of predictions, and feature importance. Interestingly, SHAP adheres to what is generally observed in complex flow behavior, thus, improving trust in predictions.


Introduction
Flow-through vegetation is often observed in rivers and channels. The condition is particularly common during floods. However, the interaction between flow and vegetation provides a complicated flow field. Therefore, understanding and analyzing flow situations is highly important for various engineering and management aspects of these rivers and channels. The literature showcases many studies related to understanding the flow situation across these water bodies. Huai et al. [1], Nikora et al. [2], and Tang et al. [3] attempted to examine the velocity (vertical) distribution of an approaching flow through vegetation. They had established mathematical models to express the velocity variation. Nikora et al. [2] specified five layers that can be observed in a complex flow regime: namely, bed-boundary, uniform, mixing, logarithmic, and wake (refer to Figure 1). The first layer is the closest to the channel bed. It is generally thin, and a rapid increase in the longitudinal velocity can be observed with height above the bed. The second layer, called uniform, has a state of equilibrium from sliding forces and drag forces. Subsequently, a complicated layer (mixing) can be identified. Accordingly, the chaotic nature of flow makes the velocity distribution difficult to predict. By employing the fundamentals of river engineering, flood discharge can be evaluated solely in terms of bulk mean velocity (U B ). However, as a result of the complicated flow field, predicting U B is extremely difficult under submerged vegetation conditions. These vegetations remarkably alter the hydrodynamics of flow [4]. Hence, the research community has explored the applicability of numerical expressions to predict U B . For example, various single-layer and multi-layer approaches were introduced [4].
Cheng [5] used the Darcy-Weisbach formula to derive a single-layer model. They included provisions to consider the hydraulic radius and vegetation obstructions. In addition, a relationship was constructed between the Darcy-Weisbach coefficient, energy slope, vegetation density, and submergence. Tinoco et al. [6] employed genetic programming (GP) to search for an acceptable mathematical expression in which the Froude number was set as a target parameter. A Chezy-like formula was found as the final expression. In addition, Gualtieri et al. [7] examined distinct conventional equations in flow with vegetation and high submergence. They reported that Keegan's equation is one of the bestperforming models. Generally, these single-layer models are simple in that the effects of vegetation-induced drag and roughness-induced resistance are ignored. Furthermore, the flow resistance equations developed by Cheng [5], Gualtieri et al. [7], and Tinoco et al. [6] were established in flumes, neglecting vegetation. However, friction resistance was considered, which occurs as a result of bed roughness in those equations. Nevertheless, in the flow-through vegetation, drag is a dominant source of resistance in contrast to bed roughness [8]. Bed-induced roughness is markedly different from vegetation-induced roughness. Therefore, the direct application of traditional equations is due attention.
In contrast, two-layer models separate flow from vegetation into the resistance layer, which considers the vegetation layer. The top of the vegetation layer is the lower boundary of the surface layer. Subsequently, the velocity in each layer (U V (vegetation layer) and U S (surface layer)) is estimated, and U B is derived using a combination of weights. The average velocity in the vegetation layer is usually calculated using force balance between sliding force and vegetation-induced drag force [8][9][10][11][12]. In addition, velocity in the surface layer is estimated using a logarithmic assumption [12] based on equations similar to the Darcy-Weisbach [10] or equivalent considerations [8]. In addition, Kolmogorov's theory of turbulence [13], genetic programming [9], and representative of roughness height [5,10] were also used in the literature.
As these expressions highlight, two major parameters that affect flow resistance are submergence and non-dimensional vegetation density. Huthoff et al. [8], Augustijn et al. [14], Nepf [15], and Pasquino et al. [16] reported that the shear layer was similar as a result of the increase in the submergence ratio (H/H V > 5; H¯Total flow height and H v¯H eight of vegetation). Belcher et al. [17] have described three distinct flow regimes based on the density of vegetation (λ << 0.1, λ = 0.1, and λ > 0. 23, where λ is the density of vegetation). For λ << 0.1 (sparse vegetation), the shear layer again resembles a boundary layer, whereas a shear layer resembling a free layer with an inflection point can be observed for transitional (λ = 0.1) and dense (λ > 0. 23) vegetation. Recently, Shi et al. [4] combined the two-layer approach and GP to develop analytical models to predict U B . In addition, they proposed an equation to predict the friction coefficient (f S ) for the surface layer.
However, these equations and methods failed to achieve a mean relative error (MRE) within 10% [4]. Be that as it may, developing such integrated formulae is less practical. Numerical modeling generally requires significant time and effort. Moreover, previous equations demonstrate that the relationship is extremely non-linear. According to GP-based studies, in order to explore such complex relationships, ML can be applied as it is fast and requires less effort. Unique model architectures are available that can approximate complex relationships. Many researchers have used ML in the hydrology field [18][19][20][21]. Generally, ordinary and ensemble learning methods have been frequently used to examine complex relationships. However, it was reported that ensemble models are highly efficient in hydrological modeling [22][23][24].
For example, Cannon and Whitefield [25] examined changes in streamflow as a result of climate change using multiple linear regression (MLR) and ensemble neural networks (ENN). Accordingly, the ENN model performed better in contrast to the MLR method (4% improvement in R 2 ). Diks and Vrugt [26] used model averaging ensemble methods to forecast streamflow in the Leaf River watershed, Mississippi. They proposed Granger-Ramanathan averaging (GRA) as the superior model averaging method in their study. Later, Li et al. [27] reported that ensemble methods consisting of bagged-MLR and bagged-support vector machines (SVM) outperformed individual ML models. Tiwari and Chatterjee [28] attempted to predict daily river discharge using bootstrap and wavelet artificial neural networks (ANN). Similarly, the combined model showcased its superiority in predicting streamflow with respect to the individual models. Erdal and Karakurt [29] employed tree-based learners (classification and regression trees (CART)) to build ensemble models (bagged regression tree (BRT), stochastic gradient-boosting regression trees (GBRT)). The study showed that BRT and GBRT are better than the CART and SVM models (17% improvement in RMSE indices). Kim et al. [30] proposed a method to estimate discharge using satellite altimetry data using ensemble regression ML. They combined conventional rating curves with the ensemble method, and it was effectively used to predict discharge in the Congo River.
Alternatively, ML is popular in predicting extreme events such as droughts [31,32] and flood events. For instance, several studies focused on predicting floods and geospatial mapping of flood susceptibility with the aid of ML [33][34][35]. Shu and Burn [36] evaluated the flood index using ANNs. They concluded that ensemble ANNs were significantly (10% improvement in relative squared error) reliable compared to individual ANNs. Araghinejad et al. [37] reported that a 50% improvement in precision was obtained for ensemble ANNs in predicting floods. Lee et al. [38] developed boosted tree models and random forests for flood mapping. The random forest model provided moderately better results compared to boosted tree models. Recently, Arabameri et al. [33] claimed that ensemble methods provide excellent accuracy in flash flood susceptibility mapping.
Unlike surface hydrology, hydrogeology has to deal with a shortage of data. These processes show a highly non-linear nature; therefore, accurate predictions are greatly important. Singh et al. [39] developed tree-based models to reproduce groundwater hydrochemistry in the north Indian region. Repeatedly, ensemble models were effective in contrast to single tree models. Ref. [40] evaluated the performance of wavelet-based ML models (extreme learning machine (ELM), the group method of data handling (GMDH), and wavelet ANN) to predict groundwater levels. Similarly, several attempts stated the superior performance of ensemble models in hydrogeology [41,42].
However, none of these studies highlighted the human comprehensibility of ML predictions. For example, regardless of higher accuracies, predictions are unexplainable and contain an explicit black box. Despite hyperparameters tuned during model training, the end-user is not aware of the inner-working methodology of the ML model. Further, the end-user does not know which parameters are significant for a particular prediction. Such drawbacks weaken the user's confidence in ML-based predictions. Moreover, it prevents the implementation of ML models in real-life applications in hydrology.
Explainable artificial intelligence (XAI) intends to eliminate the previously mentioned drawbacks of ML. XAI helps to identify important parameters, revealing the inner-working principle of the ML model. Therefore, it provides a better understanding to the end-user of decision making. XAI is becoming popular in many fields (e.g., data science, business, engineering) as a result of the human comprehensibility explanation [43]. Accordingly, XAI turns a black box model into a glass box model, thereby elucidating the working principle and causality behind predictions [44,45]. A few studies used explainable/interpretable ML to predict evapotranspiration and estuarine water quality [46,47].
To the best of the authors' knowledges, no related studies have been conducted to predict bulk average flow (with vegetation) using explainable ML. The objective of this study is to investigate the performance of the interpretable ML in order to improve the bulk average flow predictions in contrast to conventional regression models. Moreover, as a core part of the study, the same approach is used to predict the friction coefficient of the surface layer (f S ). Therefore, the present study is imperative and novel as it: (a) engages tree-based ML models in the prediction of U B and f S , and (b) interprets the inner-workings of ML models to improve the end-users' confidence. The interpretable ML distinguishes influencing parameters and provides instances and global explanations of the model. On one hand, this is significant because it cross-validates ML predictions using experimental data. Overall, the study emphasizes that XAI does not essentially require sacrificing precision or complexity but rather supports the model's predictions by providing humanunderstandable interpretations.
Since the hydrology research community is new to XAI, the authors start by introducing XAI and subsequently, in Section 2, describe the specific interpretation model we used: SHAP (Shapley Additive exPlainations). Section 3 describes the ML models that we employed for the study. Section 4 provides data description and working methodology, and Section 5 consists of a performance evaluation of ML models. The novel ML interpretations are provided in Section 6. Section 7 concludes the paper, and the limitations and future work of this study are presented in Section 8.

Explainable Artificial Intelligence (XAI)
As previously highlighted, ML-based predictions require transparency to advance the end-users' confidence [48,49]. According to Lundberg and Lee [50], the best explanation for a model is the model itself. For example, models such as decision trees with lower tree depths are self-explanatory. As the tree grows into deep layers, the model and the explanation become moderately complex. For complex models whose inner workings are explicitly unknown, post-hoc explanations are recommended. Such explanations strongly contribute to advancing the decision-making and providing underlying reasonings behind predictions. Figure 2 shows the summarized classification of explanation methods.
Data-driven and model-driven explanations are the main categories of local interpretation methods [51]. For example, model-driven methods investigate the inner components of ML models, which do not need a global interpretation of the inner-working methodology of the model. The explanation depends on a category to interpret how the model performs a provided task. These models are computationally inexpensive and convenient to implement. Moreover, these models are subdivided as such into a class activation map [52][53][54], gradient-based interpretation [55,56], and correlation-score interpretation [57][58][59][60].
Data-driven explanations depend on inputs for the interpretation process; however, this does not necessitate understanding the working principle inside the ML model. It scrutinizes the effects of deviations in each input data on the ML model. There are three subsections under data-driven interpretations, namely, concept-based interpretations [61,62], adversarial-based interpretations [63,64], and perturbation-based interpretations [56,65,66]. The authors suggest perturbation-based interpretations for the present study. Perturbation works by masking a segment of the input data of the ML model. Masking separate regions provides a set of disturbances. Afterward, the disturbed set is taken to the model to obtain a new set of predictions. Later, the original predictions are compared with the predictions obtained using a disturbed sample. Accordingly, the significance of input data (different segments) is obtained. Within each perturbation method, unique strategies and explanation rules are observed. For example, perturbation methods consist of several models such as CXplain, LIME [67], RISE [68], and SHAP [50]. The authors notice that SHAP and LIME are frequently used in ML-based studies. These two methods differ from the method used to calculate weights. LIME creates dummy instances and weighs the instances based on their similarity. SHAP used Shapley values to estimate the weight of each sampling instance. As a result of dummy instances, Moradi and Samwald [69] argued that LIME does not provide the actual feature value, but rather considers the neighborhood of a particular instance. SHAP provides a unified measure of feature importance compared to LIME. Therefore, we recommend SHAP explanations to elucidate tree-based models and their predictions.

SHAP (Shapley Additive Explanations)
Lundberg and Lee [50] suggested using SHAP to elucidate ML predictions based on game theory. For example, inputs are referred to as players, while prediction becomes payout. SHAP determines the contribution of each player to the game. Lundberg and Lee [50] have introduced several versions of SHAP (e.g., DeepSHAP, Kernel SHAP, LinearSHAP, and TreeSHAP) for specific ML model categories. For example, TreeSHAP is used in the present study to explain ML predictions. It uses a linear explanatory model and Shapley values (Equation (1)) to estimate the initial prediction model.
In Equation (2), S represents a subset of the features (input), and x is t feature values of the instance to be interpreted. Thus, the Shapley value through a value function (g ). Here, p symbolizes the number of features; prediction obtained from features in S; and E[g(x)│x ] represents the expec the function on subset S.
In Equation (2), S represents a subset of the features (inpu feature values of the instance to be interpreted. Thus, the Sh through a value function (g ). Here, p symbolizes the number prediction obtained from features in S; and E[g(x)│x ] represen the function on subset S.

Data Description
The data set was obtained from a series of studies conducted 315 instances were available, 27 instances are theoretically inexp vegetation layer) > Us (velocity in the surface layer)). Therefor stances were employed for ML [4]. Descriptive statistics are sum where; In Equation (2), S represents a subset of the features (input), and x is the vector of feature values of the instance to be interpreted. Thus, the Shapley value is denoted through a value function (g ). Here, p symbolizes the number of features; g (S) is the prediction obtained from features in S; and E[g(x)│x ] represents the expected value of the function on subset S.

Data Description
The data set was obtained from a series of studies conducted by [70][71][72][73][74][75][76][77]. Even though 315 instances were available, 27 instances are theoretically inexplainable (Uv (velocity in ∈ R denote the maximum size of the coalition and the feature attribution, respectively. Lundberg and Lee [50] provided Equations (2) and (3) to calculate the feature attribution. where; In Equation (2), S represents a subset of the features (input), and x is the vector of feature values of the instance to be interpreted. Thus, the Shapley value is denoted through a value function (g ). Here, p symbolizes the number of features; g (S) is the prediction obtained from features in S; and E[g(x)│x ] represents the expected value of the function on subset S.
where; g In Equation (2), S represents a subset of the features (input), and x is the vector of feature values of the instance to be interpreted. Thus, the Shapley value is denoted through a value function (g x ). Here, p symbolizes the number of features; g x (S) is the prediction obtained from features in S; and E[g(x)|x K ] represents the expected value of the function on subset S.

Data Description
The data set was obtained from a series of studies conducted by [70][71][72][73][74][75][76][77]. Even though 315 instances were available, 27 instances are theoretically inexplainable (U v (velocity in vegetation layer) > U s (velocity in the surface layer)). Therefore, the remaining 288 instances were employed for ML [4]. Descriptive statistics are summarized in Table 1. The data set is simultaneously used to: (a) predict f S (friction coefficient in the surface layer) and (b) predict U B (bulk average velocity). From the experimental data, the following Equation (4) was used to calculate U B . All parameters are presented in Table 1. Equation (5) was used to calculate f S .
where Q is the measured flow(m 3 /s), H S is theheight of surface layer (m), H V is the height of vegetation layer (m), U B is the bulk average velocity (m/s), B is the channel width (m), and λ is the vegetation density. Figure 3 shows the pairwise plot of each variable. Accordingly, Q and H s have a moderate correlation with U B . The remaining parameters show an explicitly non-linear behavior with U B . Further, none of the parameters show a good correlation with f S . Therefore, simple models, such as linear regression, are inadequate for building a relationship. The present study employed tree-based models to predict U B and f S .
However, previously introduced equations (to predict U B ) do not consist of Q and more often 'd'. In addition, the term H can be linearly expressed as H s + H V . Therefore, we neglect parameters Q, d, and H for the ML model. U B is supposed to be a function of S, λ, B, H V , N, and H S . Equations (8)- (14) are several existing regression models developed to predict U B . Shi et al. [4] assumed that f S = g(S, λ, H/H V , and aH V ), where a = 4λ/πd. However, the solution they obtained only consists of λ (refer to Equation (15)).
where f S is the friction coefficient in the surface layer, g is the gravitational acceleration (ms −2 ), S is the energy slope, and U S is the velocity in the surface layer (m/s).
U V -Velocity in vegetation layer (m/s).
ʋ is the kinematic viscosity of the fluid. R refers to the hydraulic radius.
where r represents vegetation related to the hydraulic radius, and r * is the non-dimensional vegetation related to the hydraulic radius.

Machine Learning Models
The authors proposed a single ordinary method (decision tree) and two ensemble methods (extra tree and XGBoost) for this study. As highlighted in the introduction, ensemble methods result in higher efficiency than individual models. However, we intend to explain the models' results. For the ordinary models, we used intrinsic model explanations, whereas post-hoc explanations will be used for the ensemble methods. Moreover, the two ensemble methods used here are based on the decision tree.
Velocity in vegetation layer (m/s).
represents vegetation related to the hydraulic radius, and r * is the non-dimengetation related to the hydraulic radius. f = 0.102 + 3.73λ (15) ne Learning Models authors proposed a single ordinary method (decision tree) and two ensemble (extra tree and XGBoost) for this study. As highlighted in the introduction, enethods result in higher efficiency than individual models. However, we intend the models' results. For the ordinary models, we used intrinsic model explanaereas post-hoc explanations will be used for the ensemble methods. Moreover, nsemble methods used here are based on the decision tree.
is the kinematic viscosity of the fluid. R refers to the hydraulic radius. Huthoff et al. [8] where r represents vegetation related to the hydraulic radius, and r * is the non-dimen- where r v represents vegetation related to the hydraulic radius, and r v * is the non-dimensional vegetation related to the hydraulic radius.

Machine Learning Models
The authors proposed a single ordinary method (decision tree) and two ensemble methods (extra tree and XGBoost) for this study. As highlighted in the introduction, ensemble methods result in higher efficiency than individual models. However, we intend to explain the models' results. For the ordinary models, we used intrinsic model explanations, whereas post-hoc explanations will be used for the ensemble methods. Moreover, the two ensemble methods used here are based on the decision tree.

Decision Tree Regressor
The decision tree can be introduced as the primary structure of tree-based ML algorithms. It serves either classification or regression applications. The working methodology of the decision tree is convenient to understand and interpret because it splits a complicated task into multiple simpler forms [78]. A regular decision tree structure is formed based on hierarchically arranged conditions from roots to leaves [79]. Ahmad et al. [80] provided an interesting conclusion regarding decision tree structure: it is transparent and, subsequently, can be used to generate new data sets through continual splitting. The training sequence of a decision tree model is based on recursive breakdown and multiple regression. This is initiated from the root node and continuously performed until the terminating criteria are met [81]. Each leaf node of an evolved structure can be theoretically approximated to a simple linear regression. Afterward, pruning is performed to reduce the complexity of the model and to improve the generalization.
The regression tree model attempts to distinguish data groups with the same predicted variable by examining variables. Similar to classification, the decision is made about which variables should be partitioned, the corresponding values of the partitioned variables, the number of partitions, and the decisions at terminals. Generally, the sum of the square error (SSE) is used to produce recursive splits (refer to Equation (16)). For example, per each partition, the response variable y is separated into two groups of data, R 1 and R 2 . Subsequently, the tree operates to examine a predictor variable x with respect to the split threshold.
where y 1 and y 2 are the mean values of the response variables of each group. The sequence is formed for a predictor variable to minimize the SSE for the split. Thus, the tree grows with recursive splits and split thresholds similar to classification. The terminal node denotes the mean of the y values of samples collected within a node. However, there can be instances where a complex function defines a terminal node.

Extra Tree Regressor
The extra tree is an ensemble tree-based approach that can be used for either classification or regression [49,82]. The algorithm creates an ensemble of unpruned trees following the top-down process. The extra tree method appears different from other tree ensembles because it splits notes using random cut points. Further, it uses a whole sample to grow the tree. As a result of the random cut points and ensemble averaging, the extra tree approach can reduce variance compared to the weak randomization of other methods. In terms of computation, the complexity of the growing process is in the order of N log N concerning the learning sample size (N). Geurts et al. [83] mentioned that extra trees are approximately 1.5-3 times larger than random forests in terms of leaves. However, the complexity of the extra tree is relatively smaller since leaves grow exponentially. Moreover, the extra tree is typically faster than tree-bagging and random forests when computation time is considered.
The geometric assessment showed that the extra tree asymptotically creates piecewise continuous multi-linear functions. Therefore, the resulting models are smooth in contrast to other ensemble methods that optimize cut points. In essence, it leads to an improvement in accuracy in the regions of input space in which the target function is smooth. Geurts et al. [83] reported that the extra tree is less likely to overfit.
For regression, the extra tree uses relative variance reduction. For example, if K i and K j represent two subsets of cases from K corresponding to the outcome of a split, then the score can be expressed as follows (Equation (17)). (17) var{z|K} denotes variance of outcome z in sample K.

Extreme Gradient Boosting Regressor (XGBoost)
XGBoost is an implementation of gradient boosting decision trees [84,85]. As the base learner, a decision tree is used for integration. Conversely, it is an ensemble algorithm formed on gradient descent iterations. Continous splits are performed to grow the tree structures. For instance, each tree (decision) computes the feature and corresponding threshold along with the best branch effect. Ultimately, predictions become more consistent. XGBoost is preferred for either classification or regression problems, and it is popular among data scientists as a result of its superior execution speed. The workflow of XGBoost is as follows.
For example, let a data set R with k features and m number of examples complete the equation: R = (x i , y i ) : i = 1, 2, . . . m, x i ∈ R k , y i ∈ R . Accordingly, we supposeŷ i is a prediction of a model generated from the following sequence.
where notation J denotes the number of trees, and g j represent the jth tree. To solve Equation (18), suitable functions should be found, minimizing the regularization objective (ζ(ϕ)) and loss. In Equation (19), notation L represents loss function, which is the difference between the actual (y i ) and predicted outputs (ŷ i ). The second term measures the complexity of the model and avoids possible chances of overfitting. The extended version of the complexity term Ω(g j ) is expressed in the following Equation (20).
Ω(g j ) = YT + 0.5ϑ w 2 (20) where T, in Equation (20), denotes the number of leaf nodes, and w is the weight of a leaf. Boosting is used for the training model to minimize objective function, and a new function is added during model training. Here, Y denotes the difficulty in node segmentation, and ϑ is the L2 regularization coefficient. Since XGBoost is also a decision tree-based model, multiple hyper-parameters, including sub-sample and maximum depth, are employed to reduce overfitting, further enhancing the performance.

Performance Evaluation of Tree-Based Models
For model training, 70% (201 out of 288 instances) of the sample was employed while the remaining was used for validation. R 2 was scrutinized as the training score. Figure 4 depicts the summary of the training process for each model. All models are accurate and deviate less from the observed values. However, the training process of f S holds less consistency compared to the process of U B . In addition, the highest training score (R 2 ) was obtained from the XGBoost model for f S and U B . Even so, the weakest decision tree obtains a 0.99 training score for U B . Accordingly, ML models have learned the nonlinearity associated with each sequence, separately. Hyper-parameters were simultaneously optimized using a grid search. Grid search methods create numerous models by using different combinations of hyperparameters. Eventually, the models will be evaluated to obtain the optimum hyperparameters based on prediction accuracies. The effect of each hyperparameter is separately illustrated in Appendix A.
In order to apply XAI, the validation accuracies of the predictions (87 instances) of treebased models were estimated by comparing the f S and U B values. In addition, the equation proposed by previous authors was used for comparison. Figure 5 shows the comparison of friction coefficient (f S ) predictions. Both the extra tree and decision tree regressor obtained moderately lower accuracies compared to the training sequence. The authors emphasize that a large deviation between training and validation occurs as a result of overfitting. Using a large number of data samples will eliminate the issues due to overfitting. Accordingly, we suggest the XGBoost model as the superior model (R 2 = 0.84). The model introduced by Shi et al. [4] failed to surpass the accuracy of the weakest tree model. Despite the training sequence, the validation shows a distinct attribute of each model. For example, the equation suggested by Shi et al. [4] has more deviations for predictions closer to zero compared to the tree-based models. Both the extra tree and decision tree have predictions that exceed the 20% error margin. However, the XGBoost model, at a considerably lower tree-depth, more often provides predictions, thereby avoiding such inconsistencies.  Figure 6 depicts the validation predictions (U B ) obtained from different models. The first row consists of the ML predictions, whereas the latter consists of those of the existing regression models. Among the models, the highest accuracy was obtained for the XGBoost (R 2 = 0.97) model, and the lowest was obtained for the decision tree model (R 2 = 0.78). Given the conclusions of Shi et al. [4], the model proposed by Huthoff et al. [8] fits better with the validation set. The equation from Cheng [5] achieved an R 2 of 0.92. Compared to Huthoff's model, XGboost shows fewer deviations in predictions. For example, 0.3 < U B < 0.6 regions consist of inconsistencies as shown in Figure 6f. However, the observed inconsistencies are inferior to the XGBoost model. The two equations proposed by Shi et al. [4] and Cheng [5] provide conservative values compared to Huthoff's equation.
All prediction accuracies were numerically evaluated using the equations provided in Appendix B (refer to Table 2). The obtained R values show how well predictions fit into the experimental observations. All ML predictions exceeded R = 0.8 for U B and R = 0.75 for f S , indicating a strong correlation between predictions and observations. Subsequently, the R 2 value indicates different degrees of the deviations of ML predictions in contrast to the actual values. For example, the decision tree had more deviations from the experimental data than the rest of the ML models. However, the XGBoost model showed superior performance to all of the models, including previous regression models. Shi's and Huthoff's equations reached a good R 2 value. Still, the R 2 value of XGBoost is higher than the the R 2 value of those two models. This explains the better flexibility of the XGBoost algorithm to perform a task similar to those complex regression models. The MAE and RMSE values obtained for XGBoost and Huthoff's equation are 0.025, 0.04, and 0.038, 0.06, respectively. The authors suggest fractional bias values between −0.3 to 0.3 for an acceptable model. Accordingly, the negative fractional bias value indicates that ML models underestimate predictions that cause slight imperfections.    However, the extra tree and decision tree were not perfect for predicting f S . Still, those three models are superior in contrast to the equation proposed by Shi et al. [4]. For instance, Shi's equation obtained the highest MAE and RMSE values (0.069 and 0.11). Compared to U B predictions, the f S predictions achieved moderately less accuracy, despite the XGBoost predictions. XGBoost obtained an R 2 value of 0.85 and an MAE value of 0.042 for validation predictions. Moreover, it captured variation better than the existing equation. Figures 7 and 8 show Taylor diagrams of validation predictions for both occasions. Accordingly, the XGBoost model is superior to the other models in both tasks with a strong correlation. However, for U B predictions, Huthoff's model is also comparable, but slightly less accurate, in contrast to the XGBoost model.
In addition, Belcher et al., [17] and Nepf [15] introduced three flow regimes based on λ. According to the current data set, λ << 0.1 will be considered sparse vegetation, whereas λ ≥ 0.1 is considered transitional (note λ > 0.23 dense). We intend to compare the XGBoost predictions and Huthoff's predictions according to these flow regimes (refer to Figure 9). Since the analysis of this study emphasized that Huthoff's model is superior to existing regression models, it is observed that XGBoost consists of a higher prediction accuracy (with lower deviations) in sparse vegetation conditions. In the same regime, deviations of up to 60% can be expected from Huthoff's regression model. However, from sparse vegetation to transition regime, both models show a comparable prediction accuracy for U B .   As previously mentioned, ML models can overfit a training data set, which can result in higher and lower prediction scores for training and validation sequences, respectively. Therefore, the authors examine this drawback using the R 2 score of training and validations (refer to Figure 10). Accordingly, all three models are acceptable for predicting U B with rigid vegetation. However, the decision tree's and extra tree's performances are suspicious for predicting f S . For example, the gap between validation and training scores is significant for both these models. Therefore, considering abrupt changes in the decision tree and extra tree when predicting f S , the authors do not recommend these two models to obtain further validation predictions. However, the validation scores were better than the observed score for Shi's equation. On one hand, it shows the unique prediction performance of each tree-based model, despite them being decision tree-based algorithms.

Intrinsic Model Explanation
For tree-based models, the evolved tree structures can be graphically illustrated. However, based on model complexity, the inner-working method is conveniently explainable for models such as simple decision trees. The present study developed separate models for U B and f s predictions. In both phases, the developed decision tree consists of eight layers (tree depth = 8). Regardless of the unique advanced methods used in each tree-based algorithm, the basic decision formation is similar. Therefore, this study presents the first three layers of decision trees formed to predict U B (See Figure 11). We suggested a mean squared error (MSE) as the index to perform recursive splits. Therefore, at lower layers, the MSE becomes gradually lower. The tree identifies B as a parameter to start splitting at the root node. Accordingly, a sample that satisfies criteria (B < 0.755) is moved to the left side, whereas the remaining samples are moved to the right side. The term "value" represents the mean value of the dependent variable that passes through a box. Likewise, the trees continue splitting based on dominant features. For example, in the second level, the tree decides that Hs and S are the dominant features of predictions. At each box, the "IF-THEN" sequence is associated with the left arrow. On one hand, the tree separations are samples with a large variation. However, the tree becomes complex with depth. Therefore, it stresses the requirement for a post-hoc explanation (e.g., SHAP) to explain the inner-working methodology of the ML model. Figure 12 shows the average mean absolute SHAP value for the overall U B predictions. Accordingly, the energy slope (S) has the highest impact on U B (+0.09). Next, the effects of channel width (B) and surface height (H S ) are dominant in the variation of U B . Stems (N), vegetation density (λ), and vegetation height (H V ) have a moderately lower impact on U B . The same explanation can be separated into instances to obtain a global explanation as shown in Figure 13.  Accordingly, the variation is markedly different from the obtained values in Figure 12. For the energy slope, SHAP identifies that lower energy slopes have negative (mostly) and positive impacts on the overall output, whereas higher energy slopes have only a positive impact on the predicted U B . The observed influence of channel width (B) is concatenated in the negative region. For example, lower B values result in a negative (low in magnitude) impact on a model's output. When the channel width increases, a greater positive impact is observed. Interestingly, SHAP notices the dominance of H S , whereas an increase in H S may result in a higher positive effect on U B .

Post-Hoc Explanation
However, the effect of N, λ, and H V is in the opposite direction in contrast to previous parameters. For instance, U B decreases when vegetation density increases. A similar effect is observed for the highest of vegetation increases. Thus far, the ML interpretation provides an overview of predictions and their dependence.
In comparison to the equations proposed to predict U B , SHAP conveniently figures out the dominant parameters and their influence in order. Previous models mapped predictions with complex combinations of input features, though an implicit explanation is impossible. However, SHAP provides explanations by mapping the primary parameters where it is convenient to obtain an overall explanation. Further, developing a stepwise regression model requires time and significant expertise and still overlooks the dependence and interactions between parameters. SHAP overcomes such drawbacks and provides the whole resolution within less time.
In addition, SHAP provides instance-based explanations with the feature importance value (SHAP value). An instance-based explanation is helpful to distinguish the effect of various parameters in particular instances. Figure 14 shows explanations of four selected instances. The instance-based explanations are notably different from the global explanation.
For example, the energy slope, which was the dominant feature of global interpretation, is not a key feature in each instance. From Figure 14a,b, channel width holds major importance. Interestingly, the height of vegetation (H V ) increases from 0.45 to 1.5, which decreases U B from 0.64 to 0.36. The negative impact of an increase in vegetation height majorly contributes to a change in the base value (average value observed during training) of U B . The effect of the remaining parameters (Figure 14b) has changed slightly with respect to Figure 14a. From Figure 14c,d, the increase in U B (0.1 to 0.19) depends on several factors. For example, the energy slope changes from S = 0.00054 to S = 0.004, creating a significant positive impact on the output. Simultaneously, an increase in λ from 0.061 to 0.12 creates a slightly pronounced negative effect on U B . In both instances, the height of vegetation holds an inferior significance.  Figure 15 shows a SHAP dependence plot. If a particular feature (first) is selected, it determines the subsequent feature that interacts the most with the selected feature. The y-axis represents the SHAP value of the selected feature. For example, the energy slope ( Figure 15a) mostly interacts with channel width (B). However, only the lower features of B interact with higher features of S. The SHAP value drastically increases from 0 < S < 0.01 and becomes stalled. Features N and H S also interact more with feature B. However, the interaction is notably different from that observed in Figure 15a. The SHAP value of N reduces with the feature value. The lower features of N interact more with the higher features of B (Figure 15c). In addition, the lower feature of N obtained SHAP values of mostly less than 0. It is noteworthy that a fair correlation is observed between the SHAP value of H S and feature values of H S . H S values in the range of (0.5 to 2) interact with higher features of B. For λ, B, and H V , the most interacted variable is the energy slope (S). A weak negative correlation is observed between λ and its SHAP value. Except for λ, frequent interaction is noticed between the higher features of H V and S as well as B and S.  Figure 16 shows the absolute mean SHAP value obtained for the f S predictions. The effect of the vegetation density is noticed as dominant. Next, the vegetation diameter and surface layer height have obtained comparable SHAP values. The effect of the vegetation diameter was omitted from the U B prediction model as it held a trivial contribution during the training process. Similarly, the effect of channel width was inferior for f S predictions. The energy slope, which held the most impact for U B predictions, less affects the overall output. The lowest feature contribution was obtained for the number of stems.
All three (d, H S , and S) have a mixed impact on model output (See Figure 17). For example, both the lower and higher feature values contribute positive and negative impacts on f S . Higher features of vegetation density can contribute up to a 0.15 SHAP value, whereas the lowest feature values achieve a SHAP value of −0.1. Instances exist where lower energy slopes contribute a higher positive effect on the model output compared to vegetation density. On the contrary, the effect of H V and λ is explicitly reversed for f S in contrast to U B . A higher vegetation height can increase f S .  The same four instances that were explained previously were selected for the present explanation ( Figure 18). Across three instances, vegetation density shows maximum impact regardless of its direction (negative/positive). From Figure 18a,b, an increase in vegetation height results in a decrease in f S . The SHAP value has changed from almost zero to +0.07. This influence corresponds to an increase in f S from 0.09 to 0.17. Likewise, a slight decrease in surface height increases the f S (Figure 18b). When λ is increased from 0.061 to 0.12, the corresponding SHAP value decreased from 0.11 to 0.09. However, an increase in Hs value from 0.1 to 0.184 results in a positive impact on the f S . All four instances indicate an increase in the SHAP value (from negative to positive) when the number of stems per unit bed area (N) increases. The effect of the energy slope is almost comparable in magnitude for all four cases.  Figure 19 showcases that, for the same data set, two prediction models can consist of different interactions. The vegetation density and energy slope exhibit a moderate correlation with their corresponding SHAP values. However, their interaction with the second feature (d and λ) is a mixed variation. All features of H S mostly interact with the lower features of N (Figure 19c). A similar observation is noticed between d and H S as well as H V and S (Figure 19b,d). Despite the weak linear correlation, a mixed interaction is observed between N and S (Figure 19e).
We highlighted that the SHAP explanations are vital to understanding the complex behavior of flow with vegetation. For example, unique plots (e.g., global explanation, instance-based explanations, the dependence of features, the interaction between features, and feature importance) provide different insights into the prediction model. In contrast, previous regression models fail to elucidate their predictions or the importance of each feature. The confidence in ML-based prediction models increases in the presence of humancomprehensible explanations and causality of predictions. Therefore, we strongly believe that these explanations appeal to the interest of domain experts and improve the end-users' confidence. In addition, a person with minimum technical knowledge can understand the provided explanation with basic parameters.

Conclusions
The following remarks are important findings from this study, which proposed employing XAI and ML to predict the bulk average velocity of open channels with rigid vegetation: • Ordinary (decision tree) and ensemble tree-based models (extra tree and XGBoost) are accurate in predicting the bulk average velocity (U B ). However, XGBoost showcased a superior performance, even when compared to existing regression models (R = 0.984).
Further, the XGBoost model is accurate in predicting the friction coefficient of the surface layer (f S ) with an accuracy of R = 0.92. Compared to existing regression models, XGBoost provides consistent predictions under sparse vegetation conditions (λ << 0.1). However, as a result of the complex tree structure, a post-hoc explanation was required to elucidate the XGBoost predictions.
• SHAP revealed the inner-working of the XGBoost model and the underlying reasoning behind respective predictions. Explanations present the contribution of each feature in a model in whole and instances, identifying the dominant parameters. SHAP provides the causality of predictions compared to existing complex regression models without sacrificing either the accuracy or complexity of ML models. Knowledge obtained through SHAP can be used to validate models using experimental data. For example, SHAP explanations adhere to what is generally observed in complex flow with rigid vegetation. Therefore, we believe that it will improve end-users' and "domain experts'" trust in implementing ML in hydrology-related studies.

Limitation and Future Work
This study presented valuable insights on employing explainable artificial intelligence (XAI) with tree-based ML to interpret the rationale behind the bulk average velocity predictions in an open channel with rigid vegetation. However, we highlight the limitations of the present study to conduct future research work in this area:

•
The work proposed was focused on open channel flow with rigid vegetation. However, results do not rule out methods to be used with flexible vegetation. A separate study can be carried out using experimental data and explainable ML. It provides a great opportunity to explain the underlying reasoning behind complex applications. Further, the ability of XAI and ML can be explored in hydrology-related applications.

•
We suggested tree-based ordinary and ensemble methods as the optimization is more convenient. Further, these models follow a deterministic and human-comprehensible process compared to a neural network. However, several researchers have already used ANN models for hydrology-related studies. Therefore, we suggest examining the performance of advanced ML architectures, such as deep neural networks, generative adversarial networks (GAN), and artificial neural networks (ANN), for the proposed work. These studies can be combined with XAI to obtain the inner workings of the model to improve end-users' and domain experts' trust in these advanced ML models. • It is important to evaluate different explanation models other than SHAP. For example, Moradi and Samweld [69] reported that the explanation process of LIME is markedly different from that of SHAP. The knowledge of different explanation (post-hoc) methods will assist in comparing a set of obtained predictions (feature importance).

Appendix B
Performance and associated uncertainty of ML-based prediction should be evaluated with respect to the original sample. For this purpose, we have employed four indices namely, coefficient of correlation (R 2 ), Coefficient of determination (R), Mean absolute error (MAE), Root mean square error (RMSE), and Fractional Bias (Equations (A1)-(A5)). Further, these indices will assist in determining the best model in terms of overall performance.
P i and O i denote prediction and experimental values, respectively. O i and P i refer to the mean value of the experimental and predicted set.