Identification of Trends in Dam Monitoring Data Series Based on Machine Learning and Individual Conditional Expectation Curves

: Dams are complex systems that involve both the structure itself and its foundation. Rheological phenomena, expansive reactions, or alterations in the geotechnical parameters of the foundation, among others, result in non-reversible and cumulative modiﬁcations in the dam response, leading to trends in the monitoring data series. The accurate identiﬁcation and deﬁnition of these trends to study their evolution are key aspects of dam safety. This manuscript proposes a methodology to identify trends in dam behavioural data series by identifying the inﬂuence of the time variable on the predictions provided by the ML models. Initially, ICE curves and SHAP values are employed to extract temporal dependence, and the ICE curves are found to be more precise and efﬁcient in terms of computational cost. The temporal dependencies found are adjusted using a GWO algorithm to different function characteristics of irreversible processes in dams. The function that provides the best ﬁt is selected as the most plausible. The results obtained allow us to conclude that the proposed methodology is capable of obtaining estimates of the most common trends that affect movements in concrete dams with greater precision than the statistical models most commonly used to predict the behaviour of these types of variables. These results are promising for its general application to other types of dam monitoring data series, given the versatility demonstrated for the unsupervised identiﬁcation of temporal dependencies.


Introduction and Background
Dams, like any other infrastructure, respond to external and internal loads.The anisotropy and evolution of the mechanical properties of the materials they are composed of and their foundations make the whole system complex and evolutionary.
The mechanical response of the dam to variations in external variables exhibits reversible behaviour as long as the materials remain in the elastic zone.However, plasticization of materials, rheological changes, expansive chemical reactions, or degradation due to external factors impose irreversible deformations.Identification and definition of these trends are key to dam safety.
Traditional tools for trend detection in time series, such as dam monitoring data, are based on univariate analysis.A large number of scientific references on this topic can be found applied to various fields [1][2][3].
These methods, as univariate, do not consider possible trends in loads as the temperature increases or the water levels decrease due to global climate change.Therefore, the trends of the behavioural variables of a dam defined by such univariate models may be influenced by the trends that the causal variables may exhibit and, consequently, may not adequately reflect the irreversible variations in its structural response mode, leading to a misinterpretation of the structural health of the dam.
Water 2024, 16, 1239 2 of 25 Therefore, it is necessary to identify irreversible variations in the dam response (trends) using models that consider both behavioural variables and loads.
The most common models that relate causal and behavioural variables in current practice for dam safety monitoring are statistical models such as HST, HTT, variants thereof, or other types of multiple linear regression [4,5].
Numerical models, because of the significant heterogeneity of materials and complexity of the physical processes governing the dam's response to external solicitations, as well as their high computational cost, are less widely used in monitoring instrumentation data.
The use of artificial intelligence (AI) models applied to dam safety has proliferated in recent times.Numerous scientific references related to the use of ML, DL, or hybrid models with other types of models such as statistical, time series, or physics-based numerical models can be found [6][7][8][9][10][11][12].Machine Learning models have shown good performance in monitoring data prediction and are more accessible to interpretation than DL models.However, despite the fact that these types of models generally provide better accuracy in predictions than any of the former, the practical application of such models is currently far from commonplace.This may be due to their label as black-box models.
For the consideration of irreversible effects attributable solely to the time variable, statistical models incorporate a series of terms that depend exclusively on this variable.The response part corresponding to these terms is then interpreted as the trend in the behaviour of the target variable.If the actual trend does not correspond to the shape of the function incorporated in the regression through the terms dependent on the time variable, the irreversible behaviour obtained with the model will lead to a misinterpretation of the structural response, not only because it cannot adjust to the actual trend, but also because it affects the coefficients of the other terms during least squares fitting.
Behaviour models based on Machine Learning do not impose any predetermined form of relationship between variables on the model and can incorporate time as just another feature among the dataset with which they are trained.In this way, data-driven models based on Machine Learning can learn, during their training stage and from monitoring data, how the dam's response changes over time while considering the rest of the causal variables at the same time.
In these black-box models, the interpretation of the relationships between different variables and the target variable is not direct, as it is in statistical models, and requires specific methods.In the literature, multiple references can be found on the interpretation or explanation of these types of complex models.
Cortez et al. [13], proposed different Sensitivity Analysis (SA) methods mainly focused on determining the importance of the inputs.He suggested the use of a Variable Effect Characteristic (VEC) curve to visualise the average impact of a given input in the model response based on the mean or median values of the rest of the inputs.
Based on this work, Lin et al. [14] proposed a method for explanation of an Optimized Sparse Bayesian Learning.The explanation was focused on the relative importance of the input variables.
Lundberg et al. [15] presented the Shapley Additive Explanations (SHAP) method to interpret predictions in complex models.Based on additive feature attribution methods and game theory, SHAP values provide the expected change in model prediction when a particular feature is conditioned.
Shao et al. [16] used SHAP methods to evaluate the importance of factors involved in a multiple monitoring point (MMP) model orientated to predict settlements in a CFRD dam and to gain control of the settlement trends in this type of dam.[19], as model explanation techniques that allow understanding the logic behind complex ML models.
Individual Conditional Expectation curves (ICE), introduced by Goldstein (2015) [20], are a visual tool used in the analysis of regression models to understand the relationship between a specific predictor variable and the response variable.These curves show how the conditional expectation of the target variable changes for a specific record, as the explanatory variable under analysis varies while keeping the other variables constant.In other words, they provide a graphical representation of the relationship between an independent variable and the response at an individual level.
The use of ICE curves has become popular in the context of Machine Learning model interpretability [21,22] and enables the acquisition of a detailed perspective on how a particular variable influences the predictions of the model.
Appley (2020) [23] warns about the use of this type of analysis in data sets with strongly correlated variables due to the possibility of considering unlikely combinations of these variables in the calculation of partial dependence and proposes the alternative Accumulated Local Effects plots (ALE plots).Baucells (2021) [24] compared ICE and PDP with other alternatives, including ALE plots, preferring the former.
The use of ICE curves or PDP in dam safety to interpret the relations found by ML models between target variables and regressors, including time as a trend indicator, has been proposed by authors [25][26][27][28].These authors interpret the temporal dependence obtained by these methods as the part of the behaviour that is related only to time or the existing trend.However, since the real trend is not known, the validity of this interpretation cannot be evaluated.
More complex and difficult to explain Deep Learning models can also be used for predicting the behaviour of dams.Several authors have included interpretability within the main objectives of their research [29].In these cases, the explanation has also been aimed at determining the importance of the inputs in the predictions provided by the models.
Given the expressed need to understand and evaluate dam behaviour in terms of their safety, particularly those behaviours that do not constitute a response to the measured causal variables and result in irreversible behaviours, the objective of this research is to define an effective methodology capable of identifying and properly defining trends in dam behaviour.To achieve this, the following actions are necessary:

•
Determine whether the partial dependency or conditional expectation obtained by methods of interpreting complex ML models is representative of the actual trend existing in dam behaviour.

•
Evaluate whether these dependencies respond as expected from an engineering perspective, and if not, define a method to rationalise the results obtained from this perspective.

•
Compare the results with those that would be obtained from a conventional statistical model, such as multiple linear regression commonly used today to monitor the evaluation of this type of dam [30,31].

Overview of the Methodology
Although data on dam behaviour showing irreversible movements are available, trends are not known a priori, making it impossible to verify the goodness of identification carried out by ML models.
Therefore, to evaluate the ability of ML models to identify the irreversible part of movement behaviour, it is necessary to have data sets that incorporate known trends.Given the complexity of the processes behind dam monitoring data, the problem is addressed in two phases of increasing complexity.
In the first phase, the generic capacity of ML methods to identify trends coupled with pure periodic synthetic series is evaluated.A set of synthetic cases is created by adding different trend laws to a pure periodic carrier series.Predictive models of these series are developed using various ML methodologies.The part of the model response corresponding to the time variable is extracted through a conditional expectation method and compared with the trend introduced in the series.The analysis of the results obtained provides information on the theoretical capacity of the ML models and interpretation methods to identify the trend.
In the second phase, the complexity inherent in dam monitoring data series is introduced.To obtain a series similar to that corresponding to real behaviour in which the real trend is known in advance, the same trend laws applied in the previous phase are introduced into a set of real data that do not show temporal dependence or irreversible behaviours.Since it is not known a priori if the data obtained from the monitoring of a real dam exhibit any trend, an HTT model is developed on a series of real movements in the pendulum of an arch dam.When time-dependent polynomial terms are removed from the HTT model, the resulting series is taken as a stationary behaviour series (without trend).Different trend laws are added to these series to proceed in the same way as with the pure synthetic cases.
The workflow for both phases of the study would proceed as follows. 1.
Creation of synthetic stationary data series: pure oscillatory functions in Phase I and stationary series based on real dam behaviour data in Phase II; 2.
Selection of trend shape functions and creation of trend series; 3.
Generation of experiments: synthetic data series with a trend developed by combining the two above; 4.
Development of prediction models for the series: neural networks (NN), Support Vector Machines (SVM), Boosted Regression Trees (BRT), and HTT models; 5.
Extraction of the response part associated with the time variable using both ICE curves and SHAP values, methods for interpretation of ML models in three tests, measurement and comparisons of results, selection of the interpretation method to use, and application to all the experiments of the phase; 6.
Fitting functions to the different extracted trends through regression; 7.
Determination of the error obtained on the real irreversible components.
The work concludes with an analysis and discussion of the results obtained.
In Figure 1, a general outline of the methodology followed is provided.

Methodology
This section outlines the approaches taken for the development of each of the tasks that make up the methodology followed.

Methodology
This section outlines the approaches taken for the development of each of the tasks that make up the methodology followed.

Generation of Stationary Series
As indicated, to evaluate the trend identification capability of the different ML methods analysed, it is necessary to have series where the trend is known.To achieve this, the approach is to create stationary series to which the irreversible term of the predefined trend is added, thus creating time series with known trends.
Two types of synthetic stationary series were proposed: • Pure synthetic stationary series: stationary oscillatory series with constant frequency.

•
Behaviour-based stationary series: series based on real behaviours of stationary dams.

Pure Synthetic Stationary Series
To create a stationary oscillatory synthetic series, hereafter referred to as a pure synthetic series, a dataset of n real data points was taken from monitoring the radial movement of a direct pendulum in an arch dam.For this movement, a multiple linear regression model of the type HTT was trained, producing the coefficients of the corresponding polynomial.
where z i is the radial movement of the pendulum at register i; a k is the coefficient of term k in the polynomial; h i is the water level at register i; T air, i is the air temperature at register i; T air_mmdd, i is the moving average of the last dd registers of the air temperature at register i; T i is the temporal index of register i.
Two sinusoidal oscillatory series x j i , j ∈ [1, 2], i ∈ [1 : n] were generated within the ranges covered by the water level (j = 1) and temperature (j = 2) variables of the dam dataset (y j i using the following expression: x i,j = sin 2πT i 365 y max,j − y min,j + y med,j Using these series, a synthetic stationary oscillatory series was constructed employing the same regression polynomial trained on the monitoring data: where ∼ z i is the synthetic oscillatory variable at register i; x i,2_mmdd is the moving average at register i of x i,2 over the last dd registers.The pure synthetic stationary series obtained show the data Statistics and appearance as shown in Table 1 and Figure 2 below: ̃ is the synthetic oscillatory variable at register i;  , _ is the moving average at register i of  , over the last dd registers.The pure synthetic stationary series obtained show the data Statistics and appearanc as shown in Table 1 and Figure 2 below:

Behaviour-Based Stationary Series
The behaviour of monitored variables in dams, while exhibiting seasonal components and oscillatory characteristics, deviates from pure synthetic stationary series as it responds to a much more complex system.Therefore, the development of synthetic series that, while stationary, captures this behaviour is proposed in order to assess the ability of ML models to identify trends.
To obtain this kind of series, we started with the HTT model trained on a data set of n real data points from the radial movement of a direct pendulum in an arch dam, using the same expression as used to create the pure synthetic series.This HTT model incorporates two time-dependent terms . . .+ a 11 T i + a 12 e T i . ... By setting the calibrated polynomial coefficients a 11 and a 12 to zero, the series of predicted movement without considering the temporal effect will be obtained.
The behaviour-based stationary series obtained show the data Statistics and appearance as shown in Table 2 and Figure 3 below:

Generation of Trend Series
With the aim of evaluating the generality of the methodology, the use of different trend laws was proposed: linear, exponential, sigmoidal, and expansive reaction in concrete.These types of law are the most common in the movements recorded in dams.Depending on the process that governs these drifts and their degree of development, different types of trends can be encountered.
Geotechnical instability of the foundation can lead to linear creep phenomena over very long periods of time or to faster collapse phenomena in which the deformation rate increases exponentially over time.Other processes, such as expansive reactions in concrete, exhibit a slow onset in their development, which increases until a maximum expansion rate is reached and then gradually decreases until they practically stop.Thus, they respond to sigmoidal functions.Depending on the start point of the dam movement records, the data period, and the reactivity of the process, observations will generally cover a portion of these functions.Two sigmoidal formulations are used: a generic formulation in which the complete form of the function is developed, and another, proposed by Araujo (2005) [32] for the characterisation of residual movements observed on the crest of dams affected by internal reactions, which is introduced only for decreasing slope ranges.
Consequently, the corresponding series of n records was constructed according to the following formulations: , where T i is the time variable, m is the slope of the line, and k l is the y-intercept.

•
Exponential Trend: t e,i = d•p T i + k e , where T i is the time variable, d is the amplification parameter, p is the parameter associated with the growth rate of the trend, and k e is the y-intercept.
, where T i is the time variable, q is the parameter controlling the total increment in the y-axis, k 1 is the parameter defining the time position of the inflection point of the function, k 2 is the parameter defining the slope at the inflection point, and k s is the y-intercept at time zero.
, where T i is the time variable, k a is the value of the y-intercept, B is the value of the y-intercept corresponding to the maximum increment, C is the time of the inflection point, and p is the parameter influencing the shape of the curve.

Generation of Experiments
Using the defined stationary series and trends, a battery of experiments is generated by their combination to evaluate the ability of ML models to identify trends in the series.
Thus, the experiments are divided into two main groups to be developed in Phase I and Phase II, based on the pure synthetic series and the stationary behaviour series, respectively.
Each type of base series is combined with the different trends, resulting in four series in each phase, to which the base series without trend of each type is added.In total, five series compose each of the two phases.

Development of Prediction Models
For each series, four types of prediction model are trained: SVM, BRT, NN, and HTT, resulting in a total of 20 experiments per phase, 40 experiments in total.
To avoid overfitting of the models during calibration, the cross-validation (CV) method was used.With this method, the training data set is divided into a series of n folds and n models are trained, each using one of the folds for training and the remaining fold for validation.
The division of the folds can be performed randomly or by sequential blocks.In this case, the latter strategy was employed, which is more suitable for modelling the behaviour of dams because, in time series where various regressors are involved, random division may provide the model with information from neighbouring records that could influence its interpretation.
Thus, in this research, one fold was taken for each year of data in the series.
To select the hyperparameters of each model, a brute-force algorithm or a grid search was employed.
The parameters of the terms used for the HTT model were tuned by least squares.

Extraction of ML Trends
The hypothesis is that the ICE curve with "time" as the mobile explanatory variable, or the SHAP values corresponding to this variable, provides the part of the behaviour Water 2024, 16, 1239 9 of 25 captured in the target variable that cannot be explained through the remaining explanatory variables, namely, the part that depends solely on time, the trend.
The determination of the trend based on the ICE curves relative to time in the different models is carried out following the following process.

•
For each record in the dataset, the target variable is predicted with the trained models, while keeping all explanatory variables constant, except for the time variable, which varies throughout its domain in the data set.This provides a set of as many curves as there are records in the dataset, reflecting the variation of the model's response when only the time variable changes, while keeping the values of the remaining variables fixed for each record.

•
For each value of the time variable in the dataset, the average value of the values obtained for that instant in the set of ICE curves is obtained.

•
The hypothetical trend curve is defined by these averaged values.
In this study, there are strong correlations between explanatory variables, since integrations of level or temperature are used over different time periods.However, the time variable does not show this strong correlation with any of them, and the combinations of values of strongly correlated variables are not altered in the process, making the method of ICE curves applicable.In any case, any doubts are dispelled by observing the procedure's capability obtained from the test campaign conducted, beyond any theoretical discussion.
SHAP values corresponding to the variable 'time' are obtained following this process: • For each instance in the dataset, SHAP values are calculated by performing the following actions: Evaluating the prediction of the model when the 'time' variable is included (active) and when it is excluded (inactive).Compare these predictions to determine the impact of the feature.Considering all possible combinations of features to ensure fair attribution.

•
The SHAP values for the variable 'time' represent the average marginal contribution of this feature across all possible combinations of features.
Trends are determined by both methods on a sample of three experiments in each phase.The RMSE, MAE, and R 2 values of the trends obtained are compared with the real ones, and the method that provides the best result is selected.
The trends for the remaining experiments in each phase are determined using the selected method.

Adjustment of ML Trend Laws
The trends obtained with each model will be different from each other and will respond to the nature of each model's algorithm and the relationships found between the explanatory variables and the target variable.While SVMs, by adjusting hyperplanes, or the employed neural networks (perceptron) provide smoother curves, BRT, being based on decision trees, provides stepped lines.
Adjusting these trend lines to the expected, or more common, trends in dam behaviour described in Section 3.2 is carried out with the following coefficients as the variables to be adjusted: Linear Trend: Sigmoidal Trend: t s,i = q• 1 1+e Adjustment of the trend lines to these laws is carried out using the Grey Wolf Optimiser (GWO) algorithm [33].For each trend extracted from the ML models, the four laws are fitted, and the RMSE, MAE, and R 2 of each fit are measured.The trend with the lowest errors and better performance is selected as the most plausible.The correctness of the trend selection is analysed.

Determining Error in Real Trends
Once the type of law that best fits each trend line extracted from the different ML models is selected, they are compared with the real trends introduced in the base series, measuring RMSE, MAE, and R 2 .
where ŷi represents the predicted values, y represents the mean of the actual values, y i represents the i th actual value, and n is the number of records in the sample.

Results and Discussion
In this section, the results obtained from the application of the methodology described to the series comprising the experiments of Phase I and Phase II are presented.

Phase I Study on Pure Synthetic Series
In Figure 4, the base of the pure synthetic series of the experiments in this phase can be observed, generated from the criteria outlined in Section 3.1.1., and the series obtained by adding them to the trends generated from the expressions described in Section 3.2.In each graph, both the resulting series and the trend employed can be observed.

Results and Discussion
In this section, the results obtained from the application of the methodology described to the series comprising the experiments of Phase I and Phase II are presented.

Phase I Study on Pure Synthetic Series
In Figure 4, the base of the pure synthetic series of the experiments in this phase can be observed, generated from the criteria outlined in Section 3.1.1., and the series obtained by adding them to the trends generated from the expressions described in Section 3.2.In each graph, both the resulting series and the trend employed can be observed.In each of these series, the four types of predictive models indicated in Section 3.4 are trained.In this way, a total of 20 models are generated, the results of which are presented graphically in Figure 5, along with a summary of the errors committed by each model in Table 3.In each of these series, the four types of predictive models indicated in Section 3.4 are trained.In this way, a total of 20 models are generated, the results of which are presented graphically in Figure 5, along with a summary of the errors committed by each model in Table 3.The trends obtained using ICE curves and SHAP values on a sample of three experiments in this phase are shown in Figure 6 alongside the corresponding real trend for each experiment.
Table 4 below presents the RMSE, MAE, and R 2 values obtained by each of the methods, highlighting in bold the best result in each case.ICE curves provide better results while requiring substantially less computational effort and, therefore, will be used to identify the trend in experiments of this phase.The trends obtained using ICE curves and SHAP values on a sample of three experiments in this phase are shown in Figure 6 alongside the corresponding real trend for each experiment.Table 4 below presents the RMSE, MAE, and R 2 values obtained by each of the methods, highlighting in bold the best result in each case.ICE curves provide better results while requiring substantially less computational effort and, therefore, will be used to identify the trend in experiments of this phase.Using the ICE curves as the best option, the temporal dependency lines of all the experiments in this phase are obtained, as shown in Figure 7, along with the trend line used for each set of models.
The lines of temporal dependence are obtained as the average of the ICE curves of each model, and therefore, the initial value of these curves will generally be different from zero.Considering that, in a behaviour affected by a trend, the irreversible increments that can be observed are relative to the value of the moment when the series begins, the initial value of this trend can be taken as null.For this reason, the initial value of each model's lines of temporal dependence is subtracted from them so that they start at zero.
On each of these lines of temporal dependence, the adjustment is made to the types of function established according to the methodology described in Section 3.6, and the error of each adjustment is measured with its line of temporal dependence.The one with the lowest RMSE is selected as the best approximation.The table compares the selected fitting laws with the laws actually introduced in each case.In Figure 8, the adjusted laws of temporal dependence with the lowest error are shown for each type of ML model, along with the trend introduced in each case.Table 5 shows the errors made by each fit with respect to the introduced trend.Using the ICE curves as the best option, the temporal dependency lines of all the experiments in this phase are obtained, as shown in Figure 7, along with the trend line used for each set of models.The lines of temporal dependence are obtained as the average of the ICE curves of each model, and therefore, the initial value of these curves will generally be different from zero.Considering that, in a behaviour affected by a trend, the irreversible increments that can be observed are relative to the value of the moment when the series begins, the initial value of this trend can be taken as null.For this reason, the initial value of each model's lines of temporal dependence is subtracted from them so that they start at zero.On each of these lines of temporal dependence, the adjustment is made to the types of function established according to the methodology described in Section 3.6, and the error of each adjustment is measured with its line of temporal dependence.The one with the lowest RMSE is selected as the best approximation.The table compares the selected fitting laws with the laws actually introduced in each case.In Figure 8, the adjusted laws of temporal dependence with the lowest error are shown for each type of ML model, along with the trend introduced in each case.Table 5 shows the errors made by each fit with respect to the introduced trend.

Phase II-Study on Series of Real Complexity
In this second phase, the methodology used for Phase I is replicated but applied to the stationary behaviour series described in Section 3.1.2.In this case, the series resulting from the addition of the different trends to the base series can be observed in Figure 9 below.
The results of the prediction models developed in these series provide the results shown in Figure 10 and Table 6.The results of the prediction models developed in these series provide the results shown in Figure 10 and Table 6.Table 7 below presents the RMSE, MAE, and R 2 values obtained by each of th ods, highlighting in bold the best result in each case.ICE curves provide better while requiring substantially less computational effort and, therefore, will be used tify the trend in experiments of this phase.Table 7 below presents the RMSE, MAE, and R 2 values obtained by each of the methods, highlighting in bold the best result in each case.ICE curves provide better results while requiring substantially less computational effort and, therefore, will be used to identify the trend in experiments of this phase.From the trained models and following the method described in Section 3.5 for the ICE curves, the temporal dependence curves are obtained, which can be observed in Figure 12 From the trained models and following the method described in Section 3.5 fo ICE curves, the temporal dependence curves are obtained, which can be observed in ure 12.The best fits of the different temporal dependence curves to the real trends are sh graphically in Figure 13, and the errors committed are listed in Table 8.The best fits of the different temporal dependence curves to the real trends are shown graphically in Figure 13, and the errors committed are listed in Table 8.

Regarding the Representativeness of Conditional Expectation Compared to Trend
The first specific objective of this study was to verify whether the conditiona tation relative to the time variable that can be extracted from the interpretation m of ML models, considered black-box models, such as ICE curves or SHAP value representative of the existing trends in dam monitoring data series derived from t versible evolution phenomena of their behaviour.
Experiments conducted on pure synthetic series and series based on the real iour of radial movements of a pendulum in a dam reveal that the partial depende the response of ML models with respect to the time variable found using both ICE and SHAP values do indeed have a close relationship with the real trends artific troduced in both phases of the study.
The results obtained show how conditional expectation curves follow a be similar to real trends.From a quantitative perspective, it is observed that the par ML model response attributed to the time variable using the interpretation metho ployed significantly aligns with the real values.This demonstrates that it is pos  The first specific objective of this study was to verify whether the conditional expectation relative to the time variable that can be extracted from the interpretation methods of ML models, considered black-box models, such as ICE curves or SHAP values, were representative of the existing trends in dam monitoring data series derived from the irreversible evolution phenomena of their behaviour.
Experiments conducted on pure synthetic series and series based on the real behaviour of radial movements of a pendulum in a dam reveal that the partial dependencies of the response of ML models with respect to the time variable found using both ICE curves and SHAP values do indeed have a close relationship with the real trends artificially introduced in both phases of the study.
The results obtained show how conditional expectation curves follow a behaviour similar to real trends.From a quantitative perspective, it is observed that the part of the ML model response attributed to the time variable using the interpretation methods employed significantly aligns with the real values.This demonstrates that it is possible to consider these conditional expectations as a representative reference of the real trend that may be integrated into dam behaviour series.
From the comparative study conducted on a sample of three experiments from each phase on the results obtained following the methodology of ICE curves and SHAP values, it is evident that better results are obtained when the former is used.While the morphology and general amplitude of the temporal dependencies found by both methods follow the real trends introduced, the SHAP values present greater local variability and poorer fit than the ICE curves.
An important factor in the results obtained is that there is no strong correlation between the time variable and the rest of the causal variables.In cases where this occurs, the procedure followed by the ICE curves results in the calculation of the model response for combinations of causal variables that cannot occur in reality, which can significantly affect the result.Therefore, extrapolating the validity of the result obtained with the time variable using this method to other explanatory variables and considering that the conditional expectation associated with a strongly correlated variable is a good representation of the dam's response to that variable may not be valid.

On the Engineering Significance of the Temporal Dependencies Found
The second specific objective was to evaluate whether the conditional expectations obtained respond to the irreversible behaviour characteristics of dams, equivalent to those used to generate trends artificially.In this regard, it is observed that although the ranges and general shapes of the conditional dependencies identified respond to the introduced trends, substantial differences are observed from the dam engineering perspective, necessitating a review of the extracted conditional dependencies for engineering interpretation.
The conditional expectations found in BRT models provide a staggered pattern derived from their own nature based on decision trees.These shapes might suggest an irreversible behaviour of the dam subject to strong punctual increases in movements, which, given that the solution to the problem is known in this study, is not correct.
On the other hand, SVM provides opposite slopes at the ends of the behaviour prediction series compared to the real ones.
Neural networks also show a divergence from the real trends, especially at their final ends.
These behaviours observed in ML behaviour models are particularly evident when there is no trend.In these cases, studying the expected future evolution of the trend would result in significant errors if the result were extrapolated beyond the data cutoff date.In the case of SVM or neural network models, differences in slopes would lead to increasing errors over time or even changes in the trend direction.In the case of BRT models, the results of the conditional expectation for values of the time variable outside the data range would remain constant at their last value.
Given this disparity of results compared to what is expected from an engineering perspective, it becomes necessary to adjust the conditional expectations found to mathematical functions that meet engineering criteria.
It is observed that, once the conditional expectations are adjusted to the shape functions, these problems are logically resolved, resulting in the final trends thus defined being highly representative of the real trends introduced, making them useful for assessing the existing trend.
However, it should be noted that for cases without trends, ML models tend to identify a linear trend, albeit generally with a reduced slope.
The ML models that provide better approximations to real trends, outside of cases without trends or with linear trends, are BRT and neural networks.Although the overall accuracy of SVM models is not much worse than that of the others, it is penalised by the interpretations it makes at the ends of the prediction series.
In general, the type of formula that best fits is the same as that that defines the trend.However, in the case of exponential trends, both in Phase I and in Phase II, sigmoidal-type trend formulas best fit.For the conditional expectations found by the SVM and BRT models, however, the best fit is achieved using the exponential function in both cases.
In the case of sigmoidal trends, in phase II, the best fit was achieved with an AAR-type function.Note that both formulations respond to a sigmoidal-type function.

On the Comparison of Results Obtained with the Most Common Statistical Methods in Dam Safety
It is observed that, for cases without trends or with linear trends, it is the HTT-type multiple linear regression model that provides a perfect fit both in prediction and trend.This is because the carrier series of the two phases were created from HTT models with the same term structure.Since the form imposed on the response model incorporates the specific terms of a linear trend, it is logical that in these two cases, the fit is perfect.
In these cases, ML models, although logically worse than HTT, provide a fairly similar overall fit of the trend, with little impact on the study period, but with types of fit formulas, sometimes, different from the real ones.
However, when the introduced trend deviates from a straight line, we see how the restrictions imposed by the polynomial structure on HTT have a greater impact on trend identification.It is observed that in these cases, HTT models assign almost straight alignments to the temporal part of the behaviour, deviating significantly from the real trend in these cases.The trends obtained from the ML models clearly fit more closely to the real trends, thus showing significantly greater utility for detecting and defining trends in dam behaviour.

Limitations of Methodology and Future Research Lines
In both phases of the study, trained ML models provide good predictions, as reflected in the errors and R 2 obtained.This provides an indication of how well the regressors employed are able to explain the modelled behaviour.In real cases, sometimes, the information necessary for explaining the modelled behaviours is not complete or sufficiently precise, resulting in vaguer behaviour models.The conditional expectation extracted from these models may then be influenced and lose representativeness.Therefore, further research is needed to study the impact of the quality of ML models on trend identification.

Conclusions
Through this study, it was demonstrated how the conditional expectation relative to the time variable extracted from ML models that predict dam behavior responds to the existing trend.
It has been verified that it is possible to define the trend in the series consistent with the knowledge of dam engineering by adjusting the conditional expectations extracted from the models to mathematical functions corresponding to different types of known irreversible phenomena in dams.
The greater versatility and detection capability of this method were evidenced compared to common practices used in dam safety.
Therefore, the proposed methodology provides a useful tool for dam safety experts in identifying, defining, and studying the irreversible phenomena to which a dam can be subjected, which is fundamental to its safety.

Figure 4 .
Figure 4. Pure synthetic series and Applied Trends.The pure synthetic series corresponding to the different experiments in Phase I can be observed in grey.Each of these series is the sum of the pure synthetic series for the "Without Trend" case and the introduced trend, which is plotted in black for each case.

Figure 4 .
Figure 4. Pure synthetic series and Applied Trends.The pure synthetic series corresponding to the different experiments in Phase I can be observed in grey.Each of these series is the sum of the pure synthetic series for the "Without Trend" case and the introduced trend, which is plotted in black for each case.

Figure 5 .
Figure 5. Predictions obtained with the different models trained on each Phase I experiment.Poorer performance of the HTT models can be observed in cases of non-linear trends.The models corresponding to the exponential trend case show a lower accuracy at the end of the series.

Figure 5 .
Figure 5. Predictions obtained with the different models trained on each Phase I experiment.Poorer performance of the HTT models can be observed in cases of non-linear trends.The models corresponding to the exponential trend case show a lower accuracy at the end of the series.

Figure 6 .
Figure 6.Conditional expectation relative to time variable, or trends, obtained using ICE curves and SHAP values in a sample of three experiments of Phase I. Lower local variability and better fit to real trend can be observed in the trends obtained with ICE curves.

Figure 6 .
Figure 6.Conditional expectation relative to time variable, or trends, obtained using ICE curves and SHAP values in a sample of three experiments of Phase I. Lower local variability and better fit to real trend can be observed in the trends obtained with ICE

Figure 7 .
Figure 7. Real trends and lines of conditional expectation with respect to the time variable obtained in each of the ML and HTT models trained for each experiment in Phase I.

Figure 7 .
Figure 7. Real trends and lines of conditional expectation with respect to the time variable obtained in each of the ML and HTT models trained for each experiment in Phase I.

Figure 8 .
Figure 8. Real trends and best fits to typical irreversible behaviour functions of dams for each type of ML and HTT model trained for each of the experiments in Phase I.The best fit to the real trend of each case is highlighted in thicker red lines.

Figure 8 .
Figure 8. Real trends and best fits to typical irreversible behaviour functions of dams for each type of ML and HTT model trained for each of the experiments in Phase I.The best fit to the real trend of each case is highlighted in thicker red lines.

Figure 9 .
Figure 9. Behaviour-based stationary series and Applied Trends.The behaviour-based stationary series corresponding to the different experiments in Phase II can be observed in grey.Each of these series is the sum of the pure synthetic series for the "Without Trend" case and the introduced trend, which is plotted in black for each case.

Figure 9 .
Figure 9. Behaviour-based stationary series and Applied Trends.The behaviour-based stationary series corresponding to the different experiments in Phase II can be observed in grey.Each of these series is the sum of the pure synthetic series for the "Without Trend" case and the introduced trend, which is plotted in black for each case.

Figure 10 .
Figure 10.Predictions obtained with the different models trained in each Phase II exp Poorer performance of the HTT models can be observed in cases of non-linear trends.

Figure 10 .
Figure 10.Predictions obtained with the different models trained in each Phase II experiment.Poorer performance of the HTT models can be observed in cases of non-linear trends.

Figure 11 .
Figure 11.Conditional expectation relative to time variable, or trends, obtained using IC and SHAP values in a sample of three experiments of Phase II.Lower local variability and to the real trend can be observed in the trends obtained with the ICE curves.

Figure 11 .
Figure 11.Conditional expectation relative to time variable, or trends, obtained using ICE curves and SHAP values in a sample of three experiments of Phase II.Lower local variability and better fit to the real trend can be observed in the trends obtained with the ICE curves.

Figure 12 .
Figure 12.Real trends and lines of conditional expectation with respect to the time variable ob in each of the ML and HTT models trained for each experiment in Phase II.

Figure 12 .
Figure 12.Real trends and lines of conditional expectation with respect to the time variable obtained in each of the ML and HTT models trained for each experiment in Phase II.

Figure 13 .
Figure 13.Real trends and best fits to typical irreversible behaviour functions of dams for e of ML and HTT model trained for each of the experiments in Phase II.The best fit to the re of each case is highlighted in thicker red lines.

Figure 13 .
Figure 13.Real trends and best fits to typical irreversible behaviour functions of dams for each type of ML and HTT model trained for each of the experiments in Phase II.The best fit to the real trend of each case is highlighted in thicker red lines.
[18]unalieva et al. (2024)[17]conducted a review of techniques for interpreting ML models, fundamentally classifying them into different categories: model-based, representation-based, and post hoc models.Within this latter group, they include methods such as the aforementioned SHAP, the Local Interpretable Model-Agnostic Explanations (LIME) introduced by Ribeiro et al. (2016)[18], or the GradCAM focused on Deep Learning

Table 1 .
Data Statistics of the pure synthetic stationary series

Table 2 .
Data Statistics of the behaviour-based stationary series

Table 2 .
Data Statistics of the behaviour-based stationary series

Table 3 .
RMSE , MAE, and R 2 calculated for each of the ML and HTT models trained on the different experiments of Phase I.The best ML model for each type of trend is highlighted in bold.

Table 4 .
RMSE, MAE, and R 2 of the trends obtained using the ICE curves and SHAP values in Phase I with respect to the real trend artificially introduced in the data series.It can be observed that the results provided by the ICE curves improve in almost all cases compared to those obtained with the SHAP values.The best performance indexes are highlighted in bold.

Table 5 .
RMSE, MAE, and R 2 of the fits to the different types of functions representing potentially expected irreversible behaviour patterns in dam data, according to engineering criteria, of the lines of conditional expectation relative to the time variable determined by each ML and HTT model trained in each of the experiments corresponding to Phase I.The best fit of each model type has been highlighted in bold, and the fit with the lowest error and model has been underlined in grey.

Table 6 .
RMSE, MAE, and R 2 calculated for each of the ML and HTT models trained on the different experiments of Phase II.The best ML model for each type of trend is highlighted in bold.

Table 7 .
RMSE, MAE, and R 2 of the trends obtained using the ICE curves and SHAP values in Phase II with respect to the real trend artificially introduced in the data series.A generally better outcome of ICE curves can be observed compared to SHAP values.The best performance indexes are highlighted in bold. .