Prediction and Interpretation of Water Quality Recovery after a Disturbance in a Water Treatment System Using Artiﬁcial Intelligence

: In this study, an ensemble machine learning model was developed to predict the recovery rate of water quality in a water treatment plant after a disturbance. XGBoost, one of the most popular ensemble machine learning models, was used as the main framework of the model. Water quality and operational data observed in a pilot plant were used to train and test the model. Disturbance was determined when the observed turbidity was higher than the given turbidity criteria. Therefore, the recovery rate of water quality at a time t was deﬁned during the falling limb of the turbidity recovery period. It was considered as a relative ratio of the differences between the peak and observed turbidities at time t to the difference between the peak turbidity and turbidity criteria. The root mean square error–observation standard deviation ratio of the XGBoost model improved from 0.730 to 0.373 by pretreatment, removing the observation for the rising limb of the disturbance from the training data. Moreover, Shapley value analysis, a novel explainable artiﬁcial intelligence method, was used to provide a reasonable interpretation of the model’s performance.


Introduction
Safety management in water resources and drinking water systems is an important issue that requires global efforts to improve public health. Various contaminants in streams and reservoirs used for drinking water supply threaten public health, and water treatment plants are crucial public facilities for providing safe water to the public. Thus, the optimization of water treatment plant operations is essential for the stable supply of safe drinking water.
The quality of drinking water produced in a water treatment plant is affected by various factors, including the quality of raw water, the composition of water treatment facilities, and the operational conditions of the treatment processes. Natural disasters, such as typhoons, heavy rains, and earthquakes, are also important factors that should be considered for the proper management of water treatment plants [1][2][3]. The effect of a disaster on a water supply system is often observed based on various conditions, including damage to pipelines, destruction of pumping systems, and malfunctioning of electrical operational systems [4,5]. The damage to water treatment plants caused by natural disasters cause consequent malfunctions in the water treatment system. This results in abnormal water quality in the water treatment process, such as increased turbidity, contamination, and leakage of drinking water. One common accident in water treatment plant operations is the suspension of coagulants (e.g., poly aluminum chloride) owing to the malfunction of the agent supply system. This malfunction results in an increase in the turbidity of the subsequent processes, which represents the status of disturbance in the water treatment process. Turbidity is the most widely used index that quantitatively represents the status of water quality in a drinking water treatment plant. An increase in turbidity represents an increase in the degree of contamination, including organic materials, nutrients, and heavy metals [6,7].
Considering the proper management of water supply systems, the prediction of water quality caused by various malfunctions, including disasters, is essential. Recent studies have provided a methodological approach using statistical methods or advanced machine learning to predict or quantify the damage to social infrastructure caused by various disasters [4,8,9]. Inoue et al. [10] used deep learning and support vector machines for anomaly detection in water treatment plants [11]. Recently, Chen and Guestrin [12] developed an ensemble risk assessment model to determine the risk grade of rainstorm disasters in target areas using random forest (RF) and deep neural networks. The application of machine learning for the management of water treatment systems is still being studied. Furthermore, water quality in drinking water supply systems is difficult to predict because various physical, chemical, and biological factors affect the quality of drinking water produced in water treatment plants [6,11,13].
Practically, post-disaster management is crucial for the proper management of water treatment systems. Recovery or post-disaster management is an emerging issue in the management of public infrastructure [14,15]. Moreover, studies on the application of advanced machine learning models to predict water quality in water resources and water treatment systems have increased [16][17][18][19]. However, studies on the quantification and prediction of recovery stages in water treatment plants after disasters and related disturbances are still at an early stage.
Artificial neural networks (ANNs) were some of the first machine learning algorithms to be developed, and various advanced algorithms (e.g., ensemble machine and deep learning algorithms) have been continuously developed to overcome the limitations of previous models. For example, deep learning has been developed to overcome the limitations of conventional ANNs, such as vanishing gradients and overfitting [20][21][22]. Tree-based ensemble machine learning models, such as RFs and gradient boosting decision trees (GBDTs), are some of the most popular and widely used machine learning models. Ensemble machine learning models have been increasingly applied in water quality management studies recently [23,24]. The ensemble model is composed of multiple independent tree-based models known as weak learners, and model performance can be improved by determining the final model prediction obtained by combining the results of each weak learner [25][26][27].
This study provides a methodological approach to quantify the recovery rates of water quality after the occurrence of malfunctions in water treatment plants. Considering the first part of the study, an ensemble machine learning model was developed to predict recovery rate by the following steps: (i) the recovery rate was defined as the change in turbidity after disturbance in the water treatment process; and (ii) a machine learning model was developed from pilot-scale field operational data, with XGBoost (XGB), the most popular GBDT algorithm, being used as the main structure of the model framework. Regarding the second part of the study, explainable artificial intelligence (XAI) was used to provide an understandable interpretation of the model's performance. A machine learning model is often referred to as a black box model. A limitation on the practical application of a machine learning model for managing water treatment plants is that the results from the machine learning model are hardly interpretable. Recently, Dunnington et al. [28] compared the performance of a physically based model with that of three machine learning models in predicting organic carbon removal in a water treatment system. Their study emphasized the importance of model result interpretability in the practical application of machine learning models. The XAI is a novel method that provides an interpretation of the performance of a machine learning model based on characteristics and relationships. Therefore, it overcomes an important limitation of black box-based machine learning models and improves the practicability of advanced machine learning models [29][30][31]. In this study, a novel XAI method, the Shapley value (SHAP), was used to analyze the model's performance and provide an understandable interpretation of the model's predictions [32].

Pilot Plant Operation
Operational data for water treatment processes were measured in a pilot-scale water treatment plant (pilot plant) ( Figure 1) located at the Bupyung Water Treatment Plant Office in Incheon City, South Korea. The processes applied at the pilot plant involved an intake tank, a flash-mix tank, a flocculation and settlement tank, and a filtration tank. The hourly water quality observational data from the pilot plant were used for the model development in this study, as shown in Table 1. Hourly turbidity and pH were measured using electronic sensors, the water level was measured using an ultrasonic level sensor, and the flow rate was measured using an electromagnetic sensor.
Their study emphasized the importance of model result interpretability in the practica application of machine learning models. The XAI is a novel method that provides an in terpretation of the performance of a machine learning model based on characteristics and relationships. Therefore, it overcomes an important limitation of black box-based machine learning models and improves the practicability of advanced machine learning models [29][30][31]. In this study, a novel XAI method, the Shapley value (SHAP), was used to analyze the model's performance and provide an understandable interpretation of the model's predictions [32].

Pilot Plant Operation
Operational data for water treatment processes were measured in a pilot-scale wate treatment plant (pilot plant) ( Figure 1) located at the Bupyung Water Treatment Plant Of fice in Incheon City, South Korea. The processes applied at the pilot plant involved an intake tank, a flash-mix tank, a flocculation and settlement tank, and a filtration tank. The hourly water quality observational data from the pilot plant were used for the model de velopment in this study, as shown in Table 1. Hourly turbidity and pH were measured using electronic sensors, the water level was measured using an ultrasonic level sensor and the flow rate was measured using an electromagnetic sensor.  The turbidity of water after filtration is one of the most representative indices of wa ter quality status. First, the criteria for abnormal turbidity (Tr) were determined, and the disturbance period was defined as the period when the observed turbidity was highe than Tr, as shown in Figure 2. Thus, the disturbance period started after the observed tur bidity became higher than the criterion Tr. Moreover, the turbidity increased continuously until it reached the maximum turbidity during the disturbance (Tmax). Thereafter, the tur bidity decreased until it became lower than Tr. The recovery rate (R) of turbidity at time during the falling limb of the disturbance was determined from the relative ratio of the differences between Tmax and Tt to the differences between Tmax and Tr (1). R was calculated  The turbidity of water after filtration is one of the most representative indices of water quality status. First, the criteria for abnormal turbidity (T r ) were determined, and the disturbance period was defined as the period when the observed turbidity was higher than T r , as shown in Figure 2. Thus, the disturbance period started after the observed turbidity became higher than the criterion T r . Moreover, the turbidity increased continuously until it reached the maximum turbidity during the disturbance (T max ). Thereafter, the turbidity decreased until it became lower than T r . The recovery rate (R) of turbidity at time t during the falling limb of the disturbance was determined from the relative ratio of the differences between T max and T t to the differences between T max and T r (1). R was calculated from the instance immediately after the turbidity reached T max to the instance the turbidity decreased below T r . Defining the recovery rate in this study, R is considered as 100% during the rising limb of the turbidity after the disturbance. from the instance immediately after the turbidity reached Tmax to the instance the turbidity decreased below Tr. Defining the recovery rate in this study, R is considered as 100% during the rising limb of the turbidity after the disturbance. (1) R (%): recovery rate of turbidity during the falling limb disturbance at time t; (NTU): maximum turbidity during a disturbance event; NTU : criteria for abnormal water quality; NTU : turbidity observed at time t.

Operation Scenario
Disturbance with respect to water quality caused by an accident was simulated in the pilot plant. The fine particles in the water were aggregated into larger-sized particles by adding a coagulant so that the particles were removed in the succeeding filtration process. The suspension of the coagulant thus caused increased turbidity in the water treatment process, including in the filtration tank. In this study, there were seven events of disturbance imitated by the suspension of coagulant (poly aluminum chloride 12%) inputted during the entire operation period ( Table 2). The observed data during the observation period (Table 2), including the simulated disturbance period, were used for the training and testing of a machine learning model to predict R during the falling limb disturbance in the water treatment process. The observed data for the four observation periods (Periods 1-4 in Table 2) were used for model development.

Operation Scenario
Disturbance with respect to water quality caused by an accident was simulated in the pilot plant. The fine particles in the water were aggregated into larger-sized particles by adding a coagulant so that the particles were removed in the succeeding filtration process. The suspension of the coagulant thus caused increased turbidity in the water treatment process, including in the filtration tank. In this study, there were seven events of disturbance imitated by the suspension of coagulant (poly aluminum chloride 12%) inputted during the entire operation period ( Table 2). The observed data during the observation period (Table 2), including the simulated disturbance period, were used for the training and testing of a machine learning model to predict R during the falling limb disturbance in the water treatment process. The observed data for the four observation periods (Periods 1-4 in Table 2) were used for model development. A total of 469 observations from 21 September 2020 to 26 February 2021 and 118 observations from 27 February 2021 to 5 March 2021 were used for the training and testing of the machine learning model, respectively (Figure 3). The ratio of the data used for training and testing of the machine learning model was 80:20.

Input Variables
Five observed input variables (Table 1) were used as independent variables for model development, and the recovery rate, R, calculated using (1), was used as the dependent variable predicted by the model. The turbidity of the effluent in the filtration tank, TB_R3, is approximately 0.05 NTU in normal conditions, and peak TB_R3 ranged from 0.50 NTU to 2.16 NTU in the imitated abnormal condition. The criterion for the abnormal condition T

Input Variables
Five observed input variables (Table 1) were used as independent variables for model development, and the recovery rate, R, calculated using (1), was used as the dependent variable predicted by the model. The turbidity of the effluent in the filtration tank, TB_R3, is approximately 0.05 NTU in normal conditions, and peak TB_R3 ranged from 0.50 NTU to 2.16 NTU in the imitated abnormal condition. The criterion for the abnormal condition T was determined as 0.1 NTU. Furthermore, R was calculated from the turbidity value in the filtration tank (TB_R3) during the falling limb of TB_R3.

Pretreatment of the Input Variable
The proposed model predicted the recovery rate R during the falling limb of the disturbance period. During the rising limb of the disturbance, recovery was not calculated because the peak point of the disturbance cannot be estimated until it reaches the peak value. Thus, R was considered to be 100% (normal status) in both the normal operation period and the rising limb of the disturbance period ( Figure 2). The model considered this status by training it using a dataset that excluded the rising part of the data. The rising part was determined using a logical (2). The data that fulfilled the logical condition in (2) at time t were excluded from the training dataset. This pretreatment was not applied to the test data. The model performances with pretreatment using (2) (Model 1) and without pretreatment (Model 2) were compared by training the model for each case.

Post-Treatment of Machine Learning Results
R was only defined during the falling limb of the disturbance period, and R during the rising limb was determined as R = 100%. The logical (3) was used for the model prediction result during the rising limb of disturbance as a post-treatment to consider this status in the machine learning developed in this study. Regarding (3), R at time t+1 was considered as 100%; therefore, R at the peak of the rising limb in the model prediction was determined as 100%. Considering the definition of the recovery rate, the predicted R was also determined as 100% when it was larger than 100%.

Machine Learning Model Selection
The recovery rate, R, is determined based on the change in turbidity, which is affected by the condition of the water quality in previous processes in water treatment plants. In this study, the GBDT, a tree-based ensemble learning algorithm, was used to predict the recovery rate after disturbance in a water treatment plant. The GBDT and the RF are popular ensemble learning models [33][34][35]. The RF is composed of multiple individual

Pretreatment of the Input Variable
The proposed model predicted the recovery rate R during the falling limb of the disturbance period. During the rising limb of the disturbance, recovery was not calculated because the peak point of the disturbance cannot be estimated until it reaches the peak value. Thus, R was considered to be 100% (normal status) in both the normal operation period and the rising limb of the disturbance period ( Figure 2). The model considered this status by training it using a dataset that excluded the rising part of the data. The rising part was determined using a logical (2). The data that fulfilled the logical condition in (2) at time t were excluded from the training dataset. This pretreatment was not applied to the test data. The model performances with pretreatment using (2) (Model 1) and without pretreatment (Model 2) were compared by training the model for each case.

Post-Treatment of Machine Learning Results
R was only defined during the falling limb of the disturbance period, and R during the rising limb was determined as R = 100%. The logical (3) was used for the model prediction result during the rising limb of disturbance as a post-treatment to consider this status in the machine learning developed in this study. Regarding (3), R at time t+1 was considered as 100%; therefore, R at the peak of the rising limb in the model prediction was determined as 100%. Considering the definition of the recovery rate, the predicted R was also determined as 100% when it was larger than 100%.

Machine Learning Model Selection
The recovery rate, R, is determined based on the change in turbidity, which is affected by the condition of the water quality in previous processes in water treatment plants. In this study, the GBDT, a tree-based ensemble learning algorithm, was used to predict the recovery rate after disturbance in a water treatment plant. The GBDT and the RF are popular ensemble learning models [33][34][35]. The RF is composed of multiple individual decision tree models, where the average of the individual decision tree models is determined as the final result of the model prediction. In contrast, the GBDT is composed of a sequence of decision tree models known as weak learners (Figure 4). The results of the previous decision tree model affect the results of the subsequent decision tree model by assigning a higher weight to data with higher residual errors in the previous decision tree model. Through this process, model performance is improved in the subsequent step [25,26]. mined as the final result of the model prediction. In contrast, the GBDT is composed of a sequence of decision tree models known as weak learners (Figure 4). The results of the previous decision tree model affect the results of the subsequent decision tree model by assigning a higher weight to data with higher residual errors in the previous decision tree model. Through this process, model performance is improved in the subsequent step [25,26]. The GBDT is optimized by minimizing the objective function, which is composed of two parts: the loss function (l) and the regularization function (Ω), where X is an input variable and is an independent and identically distributed random vector ((4), (5) and Figure 4) [36]. The loss function (l) represents the total sum of the differences between the observation ( ) and the model prediction ( ) in each decision tree model for all the input data samples (n). The regularization function reduces complexity and prevents overfitting of the machine learning model (5). In (4), corresponds to an individual k-decision tree with weight , TL is the number of leaves on the tree, represents the complexity of each leaf, and is a parameter that scales the penalty [12,36].
In this study, XGB, one of the most popular GBDT models, was used to develop an ensemble machine learning model to predict recovery rates in water treatment processes after disturbance. The model was developed using the Python XGB library [12,[36][37][38], where the hyperparameters of the model were optimized by a grid search method using the scikit-learn grid search library [37].

Model Evaluation
Three evaluation indices were used to evaluate the prediction performance of the XGB model: the Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), and RMSE-observation standard deviation ratio (RSR) (6)- (8). The value of NSE ranges from −∞ to 1, where NSE approaches 1 when the model prediction shows a better fit with the The GBDT is optimized by minimizing the objective function, which is composed of two parts: the loss function (l) and the regularization function (Ω), where X is an input variable and θ k is an independent and identically distributed random vector ((4), (5) and Figure 4) [36]. The loss function (l) represents the total sum of the differences between the observation (y i ) and the model prediction (ŷ i ) in each decision tree model for all the input data samples (n). The regularization function reduces complexity and prevents overfitting of the machine learning model (5). In (4), f k corresponds to an individual k-decision tree with weight w, TL is the number of leaves on the tree, α represents the complexity of each leaf, and β is a parameter that scales the penalty [12,36].
In this study, XGB, one of the most popular GBDT models, was used to develop an ensemble machine learning model to predict recovery rates in water treatment processes after disturbance. The model was developed using the Python XGB library [12,[36][37][38], where the hyperparameters of the model were optimized by a grid search method using the scikit-learn grid search library [37].

Model Evaluation
Three evaluation indices were used to evaluate the prediction performance of the XGB model: the Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), and RMSE-observation standard deviation ratio (RSR) (6)- (8). The value of NSE ranges from −∞ to 1, where NSE approaches 1 when the model prediction shows a better fit with the observations. The RMSE values ranged from 0 to ∞. The RMSE is calculated from the root-squared mean of the squared differences between the observation and prediction. A smaller RMSE represents a better model prediction performance. The value of the RSR ranges from 0 to 1, where the RSR approaches 0 when the model prediction shows a better fit with the observation. The model prediction is considered satisfactory when RSR < 0.70 [39,40].

XAI for Model Interpretation
A machine learning model is often known as a black box model because of limitations on the understandable interpretations of model performance results. The XAI is a novel method for providing an understandable interpretation of how a model prediction is determined from the input variables used in the model's development. Moreover, SHAP analysis is one of the most representative XAI algorithms that provides an interpretation of model performance by comparing model prediction results with different combinations of input variables [32]. The SHAP value was calculated for each input variable from the weighted mean of the marginal contribution of the input variable (9) [32]. Through this analysis, SHAP provides quantified values for the contribution of the target input variables with respect to model performance. In this study, SHAP analysis was used to provide an understandable interpretation of the contribution of the input variables to the model's performance.
∅ i : SHAP of the ith input variable; V: set of all input variables; S: all subsets of input variables without the ith input variable; x S : values of the input variables in S without the ith input variable; x S∪{i} : the data set that includes the ith input variable; g s (x S ): prediction based on input x S .

Water Quality of the Pilot Plant
The characteristics of all input variables used for the model development are summarized in Table 3. The five variables observed in the pilot plant were used as independent variables, and the recovery rate, R, calculated using (2), was used as the dependent variable for the XGB model development.

Model Prediction of Recovery Rate
The performances of the two XGB models (Model 1 with pretreatment and Model 2 without pretreatment) were compared, and the two models were optimized with three hyperparameters (n estimator, max depth, and learning rate) using a grid search method. The optimal hyperparameters for each model are summarized in Table 4. The XGB model quantitatively predicted water quality recovery during the falling limb of the disturbance until the water quality recovered to a defined normal status. The observation and the prediction results of the two XGB models using the testing data are compared in Figure 5. The evaluation results show that the model's performance is improved by the pretreatment of the input variables used for training the XGB model, where the prediction of Model 1 shows a better fit with a one-to-one line. The evaluation results of the two XGB models using these three indices are compared in Table 5. The values of NSE, RMSE, and RSR improved from 0.467 to 0.860, 10.310 to 5.280, and 0.730 to 0.374, respectively, when the rising limb of the turbidity in the filtration tank was excluded from the pretreatment process of the training data. Pretreatment was not applied to the test data. This result indicates that the XGB model is appropriately trained using the pretreatment process.

Model Prediction of Recovery Rate
The performances of the two XGB models (Model 1 with pretreatment and Model 2 without pretreatment) were compared, and the two models were optimized with three hyperparameters (n estimator, max depth, and learning rate) using a grid search method. The optimal hyperparameters for each model are summarized in Table 4. The XGB model quantitatively predicted water quality recovery during the falling limb of the disturbance until the water quality recovered to a defined normal status. The observation and the prediction results of the two XGB models using the testing data are compared in Figure 5. The evaluation results show that the model's performance is improved by the pretreatment of the input variables used for training the XGB model, where the prediction of Model 1 shows a better fit with a one-to-one line. The evaluation results of the two XGB models using these three indices are compared in Table 5. The values of NSE, RMSE, and RSR improved from 0.467 to 0.860, 10.310 to 5.280, and 0.730 to 0.374, respectively, when the rising limb of the turbidity in the filtration tank was excluded from the pretreatment process of the training data. Pretreatment was not applied to the test data. This result indicates that the XGB model is appropriately trained using the pretreatment process.    The model predictions of R in the two models were compared for two disturbance events (Events 6 and 7). The improvement of the XGB model using the pretreatment process is shown in Figure 6. Model 1 predicted the start and end periods of the disturbance for both events. The scale of recovery was also well predicted for Event 6, whereas the model underestimated the observations for Event 7.
The model predictions of R in the two models were compared for two disturbance events (Events 6 and 7). The improvement of the XGB model using the pretreatment process is shown in Figure 6. Model 1 predicted the start and end periods of the disturbance for both events. The scale of recovery was also well predicted for Event 6, whereas the model underestimated the observations for Event 7.

SHAP
The prediction of a machine learning model is determined by the internal computation of the input variables. Thus, the result is affected by the complicated relationships between the input variables. SHAP analysis provided an understandable explanation of how the model simulation results were determined. The SHAP values represent the relative importance of the input variables in the XGB model. The colored dots in Figure 7 represent the distribution of SHAP for each observation. Considering Figure 7, the variables in the y-axis are sorted by the SHAP values for each variable in descending order. The color hue in Figure 7 represents the actual value of the observation. The red and blue colors indicate high and low observation values, respectively. The SHAP analysis suggests that the TB_R3 (the turbidity at the filtration process) is a factor that has the largest effect on the model simulation result. A larger negative SHAP value is observed when the observed TB_R3 is larger, as represented in red. Moreover, the SHAP value becomes close to zero and is slightly higher than zero when the observation becomes smaller, as represented in blue. This distribution of SHAP values indicates that the model prediction of R becomes larger when the observed turbidity in the filtration tank, TB_R3, becomes smaller. This corresponds to the actual relationship between TB_R3 and R, which indicates that the recovery rate increases to 100% as turbidity reduces until less than 0.1 NTU. The SHAP values for PH_R2 show that PH_R2 does not significantly affect R, whereas R tends to increase slightly when PH_R2 increases.

SHAP
The prediction of a machine learning model is determined by the internal computation of the input variables. Thus, the result is affected by the complicated relationships between the input variables. SHAP analysis provided an understandable explanation of how the model simulation results were determined. The SHAP values represent the relative importance of the input variables in the XGB model. The colored dots in Figure 7 represent the distribution of SHAP for each observation. Considering Figure 7, the variables in the y-axis are sorted by the SHAP values for each variable in descending order. The color hue in Figure 7 represents the actual value of the observation. The red and blue colors indicate high and low observation values, respectively. The SHAP analysis suggests that the TB_R3 (the turbidity at the filtration process) is a factor that has the largest effect on the model simulation result. A larger negative SHAP value is observed when the observed TB_R3 is larger, as represented in red. Moreover, the SHAP value becomes close to zero and is slightly higher than zero when the observation becomes smaller, as represented in blue. This distribution of SHAP values indicates that the model prediction of R becomes larger when the observed turbidity in the filtration tank, TB_R3, becomes smaller. This corresponds to the actual relationship between TB_R3 and R, which indicates that the recovery rate increases to 100% as turbidity reduces until less than 0.1 NTU. The SHAP values for PH_R2 show that PH_R2 does not significantly affect R, whereas R tends to increase slightly when PH_R2 increases.

Exploratory Data Analysis with SHAP
The SHAP analysis also provides an explanation for the individual observations. Recent studies have used SHAP analysis to improve the interpretability of machine learning model results [31,41]. The SHAP force plot shows how model prediction is affected by

Exploratory Data Analysis with SHAP
The SHAP analysis also provides an explanation for the individual observations. Recent studies have used SHAP analysis to improve the interpretability of machine learning model results [31,41]. The SHAP force plot shows how model prediction is affected by individual observations. Figure 8a shows a SHAP force plot of an observation at 23:00 on 4 March 2021. Regarding this datum, the observed R was 24.29%, whereas the model prediction of R was 61.89%. This is an example of the considerable over-prediction of R, as indicated by the red dotted box in Figure 5.

Exploratory Data Analysis with SHAP
The SHAP analysis also provides an explanation for the individual observations. Recent studies have used SHAP analysis to improve the interpretability of machine learning model results [31,41]. The SHAP force plot shows how model prediction is affected by individual observations. Figure 8a shows a SHAP force plot of an observation at 23:00 on 4 March 2021. Regarding this datum, the observed R was 24.29%, whereas the model prediction of R was 61.89%. This is an example of the considerable over-prediction of R, as indicated by the red dotted box in Figure 5. The SHAP force plot in Figure 8a shows that the model's predicted R was 61.89%, when the observed TB_R3 was 1.394 NTU, and this observation of TB_R3 had the greatest effect in reducing the value of the predicted R. The effect of the other input variables on the model prediction was larger in the order of L_R1 and TB_R2. Although the observed input variables reduced the predicted R, the prediction was larger than the observed R. A possible interpretation for this over-prediction can be provided using SHAP analysis and exploratory data analysis (EDA). EDA is a method that analyzes data from various perspectives. The SHAP analysis showed that the model performance was affected by the input variables in the order TB_R3 > L_R1 > TB_R2. The relationship between the distribution of these input variables and R was explored using a target plot. Figure 9 is a target plot often used for EDA, where pdpbox, an open-source library, was used for plotting. Figure 9a shows the distribution of the observed R in the training data within a given range of the two input variables TB_R3 and TB_R2. For example, for the data in the TB_R2 range between 1.94 NTU and 4.58 NTU and in the TB_R3 range between 0.092 NTU and The SHAP force plot in Figure 8a shows that the model's predicted R was 61.89%, when the observed TB_R3 was 1.394 NTU, and this observation of TB_R3 had the greatest effect in reducing the value of the predicted R. The effect of the other input variables on the model prediction was larger in the order of L_R1 and TB_R2. Although the observed input variables reduced the predicted R, the prediction was larger than the observed R. A possible interpretation for this over-prediction can be provided using SHAP analysis and exploratory data analysis (EDA). EDA is a method that analyzes data from various perspectives. The SHAP analysis showed that the model performance was affected by the input variables in the order TB_R3 > L_R1 > TB_R2. The relationship between the distribution of these input variables and R was explored using a target plot. Figure 9 is a target plot often used for EDA, where pdpbox, an open-source library, was used for plotting. Figure 9a shows the distribution of the observed R in the training data within a given range of the two input variables TB_R3 and TB_R2. For example, for the data in the TB_R2 range between 1.94 NTU and 4.58 NTU and in the TB_R3 range between 0.092 NTU and 1.908 NTU, the number of observed Rs within this range is 11, and the average of these 11 observed Rs is 59.92%, as marked in the red dotted box in Figure 9a. Thus, it can be inferred that the model is trained to predict a relatively higher R when the input variable is within this range. As a result of this training process, the model predicted an R of 61.89%. The observed R of 24.3% at 23:00 on 4 March 2021 is quite an exceptional case based on the distribution of other input variables and causes noticeable over-prediction by the XGB model. Another target plot for the input variables L_R1 and TB_R3 provides a similar interpretation. The observed L_R1 and TB_R3 were 1.891 m and 1.394 NTU at 23:00 on 4 March 2021, respectively. This observation falls in the range of L_R1 between 1.891 and 1.892 m and of TB_R3 between 0.092 to 1.908 NTU, as shown by the red dotted box in Figure 9b. There are ten observations with an average R of 85.97%. Since the model was trained from this relationship, the model over-predicted this case with an exceptionally low value for the observed R.
9a. The force plot of data observed at 00:00 on 5 March 2021 in Figure 8b shows that the observed values of TB_R3 = 0.9556 NTU and TB_R2 = 3.054 NTU had the highest effect on model performance. It was estimated that the model exhibited good performance at this observation because the observation at 00:00 on 5 March 2021 was within the range of the distribution for the previous observation used for the training.  Considering the observation of the subsequent time step, at 00:00 on 5 March 2021, the observed and model predicted Rs were 49.93% and 50.44%, respectively. The observed TB_R3 and TB_R2 values still belong to the range marked by the red dotted box in Figure 9a. The force plot of data observed at 00:00 on 5 March 2021 in Figure 8b shows that the observed values of TB_R3 = 0.9556 NTU and TB_R2 = 3.054 NTU had the highest effect on model performance. It was estimated that the model exhibited good performance at this observation because the observation at 00:00 on 5 March 2021 was within the range of the distribution for the previous observation used for the training.
The influence of an input variable on the model's performance is also explained using the SHAP dependence analysis (Figure 10). Regarding the SHAP dependency plot, the x-axis and the first y-axis represent the observed value of an input variable and the corresponding SHAP value of the input variable, respectively. The second y-axis represents the observed value of the interaction variable. Figure 10 shows that the SHAP value of TB_R3 decreased as TB_R3 increased. This relationship demonstrates that an increase in the observed TB_R3 value had a negative effect on the model prediction value, where the corresponding observed value of TB_R2 also increased. the SHAP dependence analysis (Figure 10). Regarding the SHAP dependency plot, the xaxis and the first y-axis represent the observed value of an input variable and the corresponding SHAP value of the input variable, respectively. The second y-axis represents the observed value of the interaction variable. Figure 10 shows that the SHAP value of TB_R3 decreased as TB_R3 increased. This relationship demonstrates that an increase in the observed TB_R3 value had a negative effect on the model prediction value, where the corresponding observed value of TB_R2 also increased.

Conclusions
In this study, the recovery rate of water quality in a water treatment plant after disturbance to the water treatment process was defined and an XGB model was developed to predict the recovery rate. Based on the definition of the recovery rate, a pretreatment process for the input data was used to improve the model's performance. Considering the pretreatment process, the rising limb of turbidity was selectively removed from the training data used for the model's development. Nevertheless, no pretreatment was applied to the data used to test the model's performance. A considerable improvement in the model's performance was observed with pretreatment of the input variables compared to the model's performance without pretreatment. NSE, RMSE, and RSR improved from 0.467 to 0.860, 10.310 to 5.280, and 0.730 to 0.374, respectively, in the model with pretreatment.
Moreover, a novel XAI method was used to interpret the model's performance. The analysis of model performance using the SHAP values for and the target plots of the training input variables provided a reasonable interpretation of the model's prediction results.
The results obtained in this study demonstrate the applicability of a machine learning model for recovery prediction in water treatment processes after malfunctions. The importance of pretreatment of the data used in the model's development, considering the characteristics of the input variables, has also been emphasized. The methodological approach presented in this study provides a useful approach for more stable and efficient management of water treatment systems.

Conclusions
In this study, the recovery rate of water quality in a water treatment plant after disturbance to the water treatment process was defined and an XGB model was developed to predict the recovery rate. Based on the definition of the recovery rate, a pretreatment process for the input data was used to improve the model's performance. Considering the pretreatment process, the rising limb of turbidity was selectively removed from the training data used for the model's development. Nevertheless, no pretreatment was applied to the data used to test the model's performance. A considerable improvement in the model's performance was observed with pretreatment of the input variables compared to the model's performance without pretreatment. NSE, RMSE, and RSR improved from 0.467 to 0.860, 10.310 to 5.280, and 0.730 to 0.374, respectively, in the model with pretreatment.
Moreover, a novel XAI method was used to interpret the model's performance. The analysis of model performance using the SHAP values for and the target plots of the training input variables provided a reasonable interpretation of the model's prediction results.
The results obtained in this study demonstrate the applicability of a machine learning model for recovery prediction in water treatment processes after malfunctions. The importance of pretreatment of the data used in the model's development, considering the characteristics of the input variables, has also been emphasized. The methodological approach presented in this study provides a useful approach for more stable and efficient management of water treatment systems.

Conflicts of Interest:
The authors declare no conflict of interest.