A Hybrid Water Balance Machine Learning Model to Estimate Inter-Annual Rainfall-Runoff

Watershed climatic diversity poses a hard problem when it comes to finding suitable models to estimate inter-annual rainfall runoff (IARR). In this work, a hybrid model (dubbed MR-CART) is proposed, based on a combination of MR (multiple regression) and CART (classification and regression tree) machine-learning methods, applied to an IARR predicted data series obtained from a set of non-parametric and empirical water balance models in five climatic floors of northern Algeria between 1960 and 2020. A comparative analysis showed that the Yang, Sharif, and Zhang’s models were reliable for estimating input data of the hybrid model in all climatic classes. In addition, Schreiber’s model was more efficient in very humid, humid, and semi-humid areas. A set of performance and distribution statistical tests were applied to the estimated IARR data series to show the reliability and dynamicity of each model in all study areas. The results showed that our hybrid model provided the best performance and data distribution, where the R2Adj and p-values obtained in each case were between (0.793, 0.989), and (0.773, 0.939), respectively. The MR model showed good data distribution compared to the CART method, where p-values obtained by signtest and WSR test were (0.773, 0.705), and (0.326, 0.335), respectively.


Introduction
The irregular distribution of water resources in Mediterranean countries has been one of the most observed problems during the past twenty years, due to the great interannual variability of precipitation, seasonal rainfall regimes, summer drought, and intense precipitation [1]. In this area, several scientific studies have predicted a change in water balance states due to climate change, irregular water demand by different sectors, and poor management in the distribution to agricultural areas [2,3]. Underground water is a form of hydraulic resource, which crosses the soil surface. This depends mainly on precipitation and actual evapotranspiration [4]. Rainfall-runoff modeling helps us to determine the distribution of water accumulation on surfaces, which are characterized by geomorphological and climatic diversity, to understand the hydrological phenomena, and to visualize the state of the water system due to changes in permeable surfaces, vegetation, and climatic events. Rainfall-runoff estimation is a very complex area of study-it requires knowing the interconnection between several variables, which have a relationship with actual evapotranspiration, such as the climatic characteristics of the watershed, vegetation, water storage capacity, basin morphology, and meteorological parameters [5].
In the literature, water balance models used for inter-annual estimation are classified into three categories: empirical, physical, and conceptual [6]. The empirical models are The dataset used for this modeling was spatially obtained from 102 hydro-climatic stations between 1960 and 2020, using the inter-annual time scale which is the inter-annual rainfall (IAR), the inter-annual potential evapotranspiration (IAE0), and the real inter-annual rainfall-runoff (IARR). The independent variables of each sub-model used in this study were IAR and IAE0, which also represent the input data to the proposed model, obtained from the measurement history of the Algerian National Agency for Hydrological Resources (ANRH), the National Environmental Information Centers (NCEI-NOAA) https://www.ncei.noaa.gov/, accessed on 12 May 2021, and the climate knowledge portal https://climateknowledgeportal.worldbank.org/, accessed on 17 May 2021. Furthermore, the real IARR was used in this study as a response variable (dependent) in the machine learning and regression models, and to compare and verify the reliability of the proposed model in each bioclimatic area. The latter was obtained by reading the rainfall-runoff maps that are provided by the ANRH service.

Water Balance Model
In this section, a set of non-parametric and empirical models are used to propose a dynamic and reliable estimation of actual evapotranspiration (IAEa) on the inter-annual time scale. Estimating IAEa helps to quantify the quantity of IARR according to the water balance, as represented by Equation (1) [15], which controls the amount of input and The dataset used for this modeling was spatially obtained from 102 hydro-climatic stations between 1960 and 2020, using the inter-annual time scale which is the inter-annual rainfall (IAR), the inter-annual potential evapotranspiration (IAE o ), and the real interannual rainfall-runoff (IARR). The independent variables of each sub-model used in this study were IAR and IAE o , which also represent the input data to the proposed model, obtained from the measurement history of the Algerian National Agency for Hydrological Resources (ANRH), the National Environmental Information Centers (NCEI-NOAA) https://www.ncei.noaa.gov/, accessed on 12 May 2021, and the climate knowledge portal https://climateknowledgeportal.worldbank.org/, accessed on 17 May 2021. Furthermore, the real IARR was used in this study as a response variable (dependent) in the machine learning and regression models, and to compare and verify the reliability of the proposed model in each bioclimatic area. The latter was obtained by reading the rainfall-runoff maps that are provided by the ANRH service.

Water Balance Model
In this section, a set of non-parametric and empirical models are used to propose a dynamic and reliable estimation of actual evapotranspiration (IAE a ) on the inter-annual time scale. Estimating IAE a helps to quantify the quantity of IARR according to the water balance, as represented by Equation (1) [15], which controls the amount of input and output water in a watershed, in the form of IAR, IAE a , IARR, and the change in water storage (∆S), where ∆S is considered to be negligible. IAR = IAE a + IARR + ∆S (1) The different E a models which were analyzed, and for which performance was compared on five bioclimatic floors in northern Algeria between 1960 and 2020, are as follows: 2.2.1. Schreiber Schreiber [8] proposed a simple exponential as represented by Equation (3), which shows the relationship between actual inter-annual evapotranspiration (IAE a ) in terms of inter-annual precipitation (IAR) and mean annual potential evapotranspiration (IAE o ).

Ol'Dekop
Ol'Dekop [9] used a trigonometric hyperbolic tangent function to show the relationship between the mean annual potential evapotranspiration (IAE o ) and the drying factor (Q), which represents the ratio between IAE o and IAR, where the equation of this model is as follows: where, Q = IAR IAE o .

Pike
This equation is a simple formula derived from the Turk model [16], where it is proposed that replacing the value 0.9 by 1 gives a better result [17]. The model formula is as follows:

Budyko
Budyko [10] applied a geometric mean between Schreiber [8] and Ol'dekop [9], on the basis that the Schreiber model gives a result lower than the real data, while Ol'dekop's estimation shows higher values, to give much better results (6).

Yang
Yang [13] proposed an alternative model (7) to estimate the mean annual actual evapotranspiration using Budyko's hypothesis, in which an adjustable parameter was introduced which can use the watershed characteristics and give a better estimation.

Sharif
This is an improvement of the Mezentsev-Choudhury-Yang (MCY) model which replaces the b, k, and n parameters of the MCY equation with values 0, 2, and 1, respectively [12,18].
where IAR is the inter-annual rainfall, IAE o is the inter-annual potential evapotranspiration, and IAE a is the inter-annual actual evapotranspiration. Zhang [15] proposed a relational model which uses simple interpolators between two water balance ratios (9) and (10), defined by Budyko [19]. These interpolators are also related to the mean potential evapotranspiration (IAE o ) and plant available water content given by the coefficient (w). This relation is shown by Equation (11).
where R is surface runoff, IAR is the mean annual rainfall, IAE a is the mean actual evapotranspiration, and R n is net radiation.

Multiple Regression Model (MR)
Regression is a graphical model, which expresses the goodness of fit between two or more sets of data. In hydro-climatic science, it is most frequently used for modeling, optimization, and comparative study between predictive and actual series. A simple regression illustrates the relationship between the dependent variable (Y) and the independent variable (X). In multiple regression, more than one independent variable (X i ) can have a relationship with the dependent variable (Y) [20]. This relationship can be linear (MLR), or non-linear (MNLR) [21]. The least-squares method is used to estimate the model coefficients. The MLR equation is defined as follows: where α is the intercept, β i is the regression coefficient, and ε is the regression residual. The subset problem is related to the choice of the selected variables or the best regression model. It involves using the set of n observations and m explanatory variables to build efficient multiple regression models by reducing the model trend errors. In the literature, the choice of a subset of explanatory variables is based on the objective function, which measures the efficiency of the model by balancing the number of explanatory variables used and the adjustment error according to several criteria, such as R 2 Adj , MSE, and Mallows' Cp, etc. [22].

Classification and Regression Tree Model (CART)
The CART model is a non-parametric procedure to predict continuous dependent variables with categorical and/or continuous predictor variables. This method is used in many fields [23][24][25]. In this model, the data is partitioned into nodes based on the conditional binary responses to questions that include the predictor variable y.CART models use a binary tree to divide the predictor space recursively into subsets in which the distribution of y is successively more homogeneous [26]. The decision tree is constructed using automatic stepwise variable selection to identify mutually exhaustive and exclusive subgroups of a population [27]. In the first step, the method selects the best optimum breakpoint in which the dependent variable may be separated into two groups. Then, each of the two resulting groups is further separated into two other subsets. Following this logic, the method generates a tree structure in which the dependent variable is optimally divided into a certain number of groups, which are characterized by maximum internal homogeneity and maximum external differentiation [28]. In modeling, CART uses a set of techniques for structuring data clusters, such as AID and CHAID [29]. The tree defines a set of rules for each node followed by its predicted value. In each estimation, the model verifies if the independent variable X i accepts the clause, beginning from the child rules node to a father rules node to give the predicted value. A defect of this model is the possibility of having redundancy values.

Results
The progressive steps of modeling are described below in three core sections, starting with the statistical description of the elementary data series used in each sub-model, followed by comparison and performance analysis to choose the best water balance models used to generate the input variables, as well as the hybrid model design (dubbed MR-CART).

Data Description
The dataset used in this study contained 212 lines and 3 columns represented by IAR, IAE o , and IARR variables, which were observed by 212 watersheds. The dataset was processed to give a comparative view of the variability and data distribution between the input and response variables used in the modeling. This analysis is shown in Table 1 by a set of statistical parameters applied to unclassified and classified data using bio-climatic classes. According to the table, 45/212 data lines belonged to the semi-dry region, 16/212 data lines were obtained from the Mediterranean region, 15/212 data lines belonged to the semi-humid area, 11/212 data lines represented the humid region, and 15/212 data lines were obtained from the very humid class. The results showed great variability of IARR, which was observed in all the northern Algeria areas, given by a CV equal to 1.181. The actual IARR measurement showed a non-stationary distribution around the mean, where the median was less than the mean, which was given by values of 39 and 81.719, respectively. A total of 75% of these data lines had values less than the center of the value interval (AV), which equaled 252. This set of data varied between 7 and 104.3. In this area, the IAR input variable showed greater variability compared to IAE o , where the CV equaled 0.404 and 0.073, respectively. A total of 75% of IAR data lines ranged from 222 to 600, which belonged to the inter-annual aridity classes of semi-humid, Mediterranean, and semi-dry. On the other hand, the IAE o provided stationary values, which were distributed around the mean, where the median and the mean were very close and equal to 1357.500 and 1351.598, respectively. A large change in variability and non-stationary distribution pose big problems in modeling reliability where data classification is required. To address this, we have represented the data series according to aridity classes. In the five data classes, the CV values showed no large variability.
For the IARR series, the CV provided values between 0.149 and 0.404, which were lower than 1.181 when we used the data series of northern Algeria. A decrease in variability was also observed by IAE o and IAR in each climatic class, where the values of CV that were obtained for each data set were lower than 0.073 and 0.404, respectively. According to this classification, the data distribution of each variable used in the water balance models was stationary around the mean. Table 1 shows that the median and the mean for each variable were very close for each sub-series. Inter-annual rainfall runoff (IARR), inter-annual potential evapotranspiration (IAE o ), inter-annual rainfall (IAR), number of data (N. data), 1st quartile (1st Q), 3rd quartile (3rd Q), average (AV), std. deviation (SD),coefficient of variation (CV). * CV = SD/Mean. ** AV = (Min + Max)/2.

Experimental Results
In this section, the set of water balance models presented above was compared and their performance in estimating IARR in five bioclimatic regions in northern Algeria between 1960 and 2020 was analyzed. We used boxplots and a set of performance tests including R 2 [30], R 2 Adj [31], MAE, and RMSE [32] to compare the distribution and variability of the predicted and actual IARR data for each model. Regression graphs and residual analysis were applied to determine the degree of fit between the data, and the modeling performance obtained by MR and CART machine learning. The hybrid model dynamicity is described in the next section, and the t-test and the z-test are introduced to analyze the significance of differences between the actual and predicted values of IARR in the whole study area using the different estimation models. This analysis showed the importance of the aridity factor in the estimation of underground water in large surface areas. As the first step in this study, we used the best W and N parameters proposed by the Zhang and Yang model in each bioclimatic area. According to the literature, the range values for the two parameters are as follows: W ∈ [0.5, 2.5], and N ∈ [0.5, 2.5]. Figures 2 and 3 show the predicted data distribution of the Zhang and Yang model obtained values of W and N in boxplot form, which represent graphically the min, max, 1st Q, median, and 3rd Q values of each subset.
The results given in Figure 2 show that the best value of W of 0.5 was obtained in the very humid, humid, and semi-humid areas. In the Mediterranean and semi-dry region, the W parameter was 0.7; in each climatic area, the mean and median value of the predicted data series obtained for the best W was closer to that of the real series. Zhang's model provided a more divergent estimation when W was more than 0.7, where the predicted values were lower than the actual data. Table 2 shows the Zhang model performance using different W values in each climatic region. In the very humid and semi-humid areas, the R 2 and the R 2 Adj showed the greatest performance when W equaled 0.5; in addition, the MAE and the RMSE had minimum errors compared to the other cases. In the humid areas, the best W value was 0.5, as the R 2 Adj , MAE, and RMSE showed the best results, which were equal to 0.792, 8.981, and 10.757, respectively. In the Mediterranean and semi-dry floors, the R 2 and the R 2 Adj , MAE, and RMSE showed the best results when W equaled 0.7. For the Yang model, the best estimate was obtained when the parameter n was chosen as 1.5 in all the five climate areas (Figure 3), where the mean and median values given by the predicted series were closer to that of the measured data series. In all comparative cases, Yang's model gave values above the real data for n equals 0.5 and 1. In contrast, when the n parameter was greater than 1.5, the predicted values were lower than the real IARR. In the very humid areas, the real data series had greater variability compared to the predicted data, where the median was less than the mean in the real dataset, and the range between the 1st Q and the 3rd Q was greater than the quartile variation range of the predicted data. Table 3 shows that the maximum errors given by Yang's model, when compared with the results of each climatic region for n equals 1.5, were obtained in the very-humid region, where the MAE and RMSE were equal to 63.289 and 77.753, respectively. The R 2 and R 2 Adj parameters showed that Yang's model gave good performances in all northern Algeria areas for n equals 1.5 when compared with other values of n. However, the best performance was obtained in the semi-humid area, where the R 2 Adj equaled 0.934. In the Mediterranean region, the model performed less well, as demonstrated by an R 2 Adj equaling 0.508.   A pre-selection analysis of the best water balance models which were used to estimate the input data of the independent variable in the MR and CART machine learning model is presented in Figure 4 and Table 4. The figure shows a comparative graphical analysis of the predicted and actual data distribution using boxplots. In addition, Table 4 shows the results of the descriptive and performance tests of each model applied in each climatic area. The graphs show that in the very humid area, the IARR series obtained by the Schreiber, Yang, Sharif, and Zhang models gave a closer distribution to the real data when compared with the predicted data obtained by the Ol'dekop, Pike, and Budyko models. According to the graphical results, the data estimated by the Schreiber model was located below the actual data. The 1stQ, mean and median parameters showed that the best variability with real data was obtained by the Sharif, Yang, and Zhang models. Moreover, Table 4 shows that the best performance was given by the Sharif model, where the R 2 and the R 2 Adj equaled 0.775 and 0.757, respectively. The Schreiber, Yang, Sharif, and Zhang models gave a good estimate of IARR, where the MAE and the RMSE showed that the residual values were minimal compared to the other models. In the humid area, the four models showed the same behavior when using data observed in the very humid area. According to the graphs, the Schreiber and Zhang model provided the best data distribution with actual data. that it is preferable to select the Yang, Sharif, and Zhang models to estimate IARR with machine learning.  On the other hand, the Yang model was more efficient than the Schreiber model. However, Table 4 shows that the best performance was obtained by the Zhang  The performance analysis showed that the best R 2 , R 2 Adj , MAE, and RMSE were also given by the Zhang model. However, the error analysis showed that the set of models can be accepted to estimate the input data in the IARR modeling where the MAE and RMSE do not exceed 15. On the other hand, with the Ol'dekop, Pike, and Budyko models the error MAE and RMSE was significant, being greater than 40. In the Mediterranean area, the Schreiber model had drawbacks in IARR estimation, where the R 2 and R 2 Adj equaled 0.407 and 0.364, respectively. The data predicted by this model had the same distribution compared to the Ol'dekop, Pike, and Budyko estimations. In the four models, the interval of variation of values was too small compared to the actual data, where the error given by MAE and RMSE was more than 20 (Table 4). According to the table, the four models gave a biased estimation, where the R 2 showed values less than 0.45. In contrast, the graphs show that Sharif's model gave a very high data distribution. The 1st Q, mean, median, and 3rd Q parameters demonstrated that the Zhang model gave the best distribution. Moreover, the performance analysis showed that the Yang, Sharif, and Zhang models performed well in estimating the IARR, where the R 2 and R 2 Adj values obtained by the three models were more than 0.60. The MAE and RMSE showed that these gave minimal errors compared to other models, where the residual values were less than 13 (Table 4). In semi-dry areas, the 1st Q, mean, median, and 3rd Q parameters showed that the best distribution of predicted data, when compared with the actual values, was obtained by the Yang and Zhang models ( Figure 4 and Table 4). The Sharif model showed good variability and an estimate above the actual data series. According to Table 4, the R 2 and the R 2 Adj values showed that the best model was the Yang model, where the obtained errors were the minimum compared to the other models. The R 2 showed that all the models can perform; however, the MAE, RMSE, and the statistical criteria of data variability showed that it is preferable to select the Yang, Sharif, and Zhang models to estimate IARR with machine learning.

Proposed Method
The MR machine learning with the R 2 Adj criterion was applied on subsets (X i ) of the IARR predicted data which were obtained from the best non-parametric and empirical water balance models shown previously in Table 4 and Figure 4 on each climatic floor. The analysis steps and the graphical representation of this model were performed using the XLSTAT library, version 2018. The degree of fit between the predicted and the actual IARR data is shown in Figure 5 in the form of linear regression graphs. We have also graphically represented coefficients of each obtained trend model and the residuals standardized between the two series. According to the figure, the MR model showed the best performance compared to the water balance models selected previously. In the very humid area, the model showed a good adjustment of data where it belongs to the confidence range; moreover, the R 2 Adj of the MR model proved its reliability compared to the Schreiber, Yang, Sharif, and Zhang models, which was 0.8927 (Table 4). standardized between the two series. According to the figure, the MR model showed the best performance compared to the water balance models selected previously. In the very humid area, the model showed a good adjustment of data where it belongs to the confidence range; moreover, the R 2 Adj of the MR model proved its reliability compared to the Schreiber, Yang, Sharif, and Zhang models, which was 0.8927 (Table 4).  The model performed well when using the subsets obtained by the Schreiber, Yang, and Sharif models, where the standardized residuals were negligible being between −1.5 and 1.5 ( Figure 5). The trend model obtained in this region is given as follows: IARR VH MR = 19.16×IARR Schreiber − 24.31 × IARR Yang(n = 1.5) + 5.63 × IARR Sharif + 639.52 (13) In the humid areas, the MR model showed very good performance when using the input data (X i ) obtained by the Sharif and Zhang (w = 0.5) models, where the R 2 Adj equaled 0.8171 which showed the best performance compared to the selected water balance models given in Table 4. The model demonstrated the minimum errors, where the standardized residual values were [−2, 2]. In this case, the MR computational equation is as follows: However, in the semi-humid region, the subset selection criteria showed that the MR model gave the best performance when using the data obtained by the Zhang model, where the estimation equation given by this machine learning is as follows: IARR SH MR = 1.09 × IARR Zhang (w = 0.5) − 15.18 (15) In this region, the predicted data showed good similarity with the measured values, as shown by the R 2 Adj which equaled 0.9385. The use of the subsets which represented the predicted data obtained by the Yang, Sharif, and Zhang models in the MR model as an independent variable (X i ) showed very good performance in the Mediterranean region, which was given by an R 2 Adj equal to 0.7636. In the regression graph, the predicted and actual values showed a good fit; moreover, the standardized residual values indicated no trend ( Figure 5). The model equation is given as follows: IARR ME MR = 7.92 × IARR Yang (n = 1.5) + 0.43 × IARR Sharif − 7.77 × IARR Zhang(w = 0.7) − 33.14 (16) The predicted data series obtained by the Yang model in the semi-dry area proved to be the best subset that can be used in the MR model, in which the R 2 Adj parameter gave the best value, equaling 0.7038. Moreover, the residual analysis showed a better error distribution where most of the values were between −1 and 1. The model equation is defined as follows: IARR SD MR = 1.04 × IARR Yang (n = 1.5) − 4.41 (17) Figure 6 shows a nonlinear relationship between the predicted IARR and the aridity index (A_Index) data obtained by each water balance model which was applied in all the northern Algeria areas. In this study, the A_Index series was obtained by Equation (18). In addition, the Prd-IARR variable was substituted by the A_Index in the next step using the trend equation given by each model in Figure 6 to change the subset bounds of each child node (Tables 5 and 6). This last characterized each climatic region and can make it easy to read the interval bounds for each node since the values are classified from min to max according to the most humid region to the driest, respectively.

A_Index Zhang
IARR_Zhang (mm) Figure 6. Graphs of non-linear regression between the A-Index data series and predicted IARR, which were obtained by the set of water balance models used in the northern Algeria region. Adjusted coefficient of determination (R 2 Adj ).  This last characterized each climatic region and can make it easy to read the interval bounds for each node since the values are classified from min to max according to the most humid region to the driest, respectively. A_Index = IAE a IAR (18) In all cases, Figure 6 shows a good fit between the(A_Index) and the predicted IARR data series obtained by the non-parametric and empirical water balance models, where the R 2 Adj showed very good values which varied between 0.9511 and 0.9727. Moreover, the regression graphs showed good similarity between the data in which all the values fell within the confidence ranges.
The conceptual steps of the decision tree and the predicted IARR data classification used in the CART model in each climate area are detailed in Tables 5 and 6, where the set of parent and child nodes and the number of data (objects) used by each node is presented. The tables also present the set of estimation models which showed very good performance and a better classification of the independent variable (Prd-IARR) used in the CART nonparametric model. The conceptual results of the model in the very humid, humid, and semi-humid regions are given in Table 5. For the Mediterranean and semi-dry areas, the model structure is represented in Table 6. The Q parameter was obtained by multiplying A_Index values by 10 3 , which maintains the values classification and facilitates reading the bounds of each interval. It was also used in the formal algorithm of the model given in Table 7 to make it easier and more dynamic in application. Table 5 shows that in the very humid area, the CART model proposes a tree of two levels classified by the parent nodes numbered 1 and 4, obtained by the subset data given by the Zhang (w = 0.5) and Sharif models, respectively. In the humid zone, the Schreiber and Yang (n = 1.5) models showed very good performance, whereas the CART model showed a tree of two levels given by the parent nodes 1 and 2 in which 36.36% of the input data (X i ) accepted the values obtained by the Schreiber model that was defined as follows: Q ∈ [828.610, 835.011]. However, the subset Q ∈ [835.011, 860.960] accepted the Q value 844.45 as an optimum boundary to subdivide this class into two subsets obtained by the Yang model. In the semi-humid area, the CART model accepted only the data given by the de Schreiber model as a subset (X i ), where the proposed tree had only one level. Table 6 shows a tree of two levels given by the CART model in the Mediterranean area, where the Zhang (w = 0.7) and Sharif models showed the best data classification. Moreover, The Q subset of the Zhang model (Q ∈ [913.504, 923.161]) accepted another more efficient classification using Sharif's data. In the semi-dry area, the CART model showed that the data obtained from the Sharif model provided a better classification, in which the tree structure of this model is given on one level and eight child nodes ( Table 6). The application steps of the CART model used to estimate the IARR in each climatic area are shown in Table 7 as a formal algorithm; the set of rules and estimated values (IARR-CART) corresponding to each child node are shown in the table. The algorithm execution needs only to read the tree from the child node to the parent node. For example, in the very humid region, to check if the Q value that was obtained by the Zhang model (Q Zhang ) belongs to the interval [627.73, 723.36], it needs, as the first step, to check if there is another value Q obtained by the Sharif model (Q Sharif ), in which the Q Zhang and Q Sharif can verify the clause defined by node 5 or 6. Where the two-child condition cannot be verified, the model uses the parent condition to ensure the belonging of the Q Zhang value. In the end, the model gave the value 367.47 as the estimated result of IARR. The algorithm stops when the whole IARR series is estimated.
The hybrid model's performance, the degree of similarity between actual and predicted values, as well as the standardized residual analysis, are shown in Figure 7 for each climatic region. In the very wet area, the model showed excellent performance of data demonstrated by an R 2 Adj equaling 0.9452 when the data subset estimated by the MR and CART model were used as independent variables in the multiple regression model used by the MR-CART model. In this area, the equation used to estimate IARR is defined as follows: The hybrid model showed good performance in the humid region, where the R 2 Adj equaled 0.8748. In the regression graph, no trend was observed by the model; moreover, the standardized residuals values were negligible, being between −2 and 1. The trend equation obtained by this model that is used to estimate IARR is as follows: IARR H MR−CART = 0.4426 × IARR H MR + 0.60636 × IARR H CART − 6.1320 (20) In the semi-humid area, the subset selection criteria in the multiple regression model which was used to define the trend equation of the MR-CART model gave great importance to the dataset obtained by MR. The model showed a small improvement and good performance compared to previously applied machine learning in which all data showed a good fit in the regression graph, with all values falling within the confidence intervals. The MR-CART model equation is given as follows: In the Mediterranean area, the hybrid model showed very good performance compared to all the models previously applied, where the R 2 Adj equaled 0.8919, showing more than 10% performance improvement. In this area, the mathematical equation of the model is defined as follows: IARR ME MR−CART = 0.4467 × IARR ME MR + 0.6578 × IARR ME CART − 4.2848 (22) A small improvement in the estimated IARR was observed by the hybrid model compared to the MR model in the semi-dry climatic floor, where the R 2 Adj equaled 0.7193. The regression curve showed that no trend was given by this model. The equation is given as a function of the IARR MR and IARR CART variables, as follows: The comparative performance analysis of the three proposed models, which are MR, CART, and MR-CART is shown in Table 8 for each climatic region, where a set of statistical parameters was used to study the predicted IARR data distribution relative to the actual data. The results showed strong performance and good dynamicity of the hybrid model compared to the MR and CART model. In the very humid region, the R 2 and R 2 Adj showed that the greatest values were given by the MR-CART model, which equaled 0.9574 and 0.9452, respectively. On the other hand, the CART model performed better than the MR model. The variability analysis showed that the data series obtained by the hybrid model had a very similar distribution to the real data, where the SD given by the two series equaled 102.307 and 105.244, respectively. In the humid area, the hybrid model showed an improvement compared to the MR and CART models, where the R 2 and R 2 Adj equaled 0.886 and 0.875, respectively. In addition, the error for MAE and RMSE showed minimum values when compared to errors given by the other models. Moreover, the predicted data series obtained by the hybrid model showed a close variability to the real data, shown by an SD equal to 17.628.
The MR model was more efficient than the CART model showing a good distribution of data compared to the real values in the semi-humid area, which was given by an R 2 Adj and an SD equal to 0.9385 and 15.2901, respectively. On the other hand, the hybrid model had the best performance, as shown by an R 2 Adj equal to 0.949. The comparative study showed that the series obtained by this model had better variability compared with the real series. In addition, the RMSE and MAE errors showed that the MR-CART model gave minimal errors compared to the MR and CART models, respectively. The hybrid model also showed the best performance in the Mediterranean and semi-dry areas, as shown by R 2 Adj equaling 0.892 and 0.719, respectively. However, in the semi-dry region, the series obtained by CART showed a high level of similarity with the predicted data of the hybrid model, where the SD showed a close variability obtained from the two series equaling 7.153 and 7.214, respectively. In addition, the residual analysis given by RMSE showed values equaling 4.570 and 4.521, respectively. The application steps of the MR-CART model are shown in Figure 8 in the form of a flowchart that expresses the operating dynamism, beginning with the input data selection and estimation through to obtaining the final results. The model is divided into three basic sections, which are given in the figure by input data, check data and model estimation, and output result. According to the figure, the model uses the IAR and the IAE o as independent variables (X i ) in the Schreiber, Yang, Sharif, and Zhang models to estimate IAE a and Q. In the preprocessing step, the MR-CART model prepares the IARR predicted and Q subsets for the next step. At each treatment, the model checks the climatic characteristics of the measuring station using the spatial classification of the IAR interval to select the corresponding equation of the MR-CART model. The model searches for the best rule given by the CART model which can verify the suitability of the Q value to generate the IARR predicted value (IARR CART ). This last is used in the MR-CART equation. The process is recursive depending on the spatial sample size. Finally, the predicted dataset (IARR MR-CART) is given in the last section of the model as the final result.

Discussion
A performance comparison of the proposed models with the non-parametric and empirical water balance models used in this study is shown in Figure 9 and Table 9. The models were applied in the northern Algeria area without taking into account the data classification of each climatic level. This allowed us to compare the residual trend and the dynamicity of each model in the large areas. The performance tests and the spatial distribution of the predicted and actual data are shown in the form of radars and scattergram graphs (Figure 9). In addition, a set of parametric and non-parametric tests, which were the T-test [33], Z-test [34], F-test [35], sign test [36], and WSRtest [37] were applied to verify if there were significant differences in the means, variance, and distribution between the real and predicted data for each model. The results showed that the best per-

Discussion
A performance comparison of the proposed models with the non-parametric and empirical water balance models used in this study is shown in Figure 9 and Table 9. The models were applied in the northern Algeria area without taking into account the data classification of each climatic level. This allowed us to compare the residual trend and the dynamicity of each model in the large areas. The performance tests and the spatial distribution of the predicted and actual data are shown in the form of radars and scattergram graphs (Figure 9). In addition, a set of parametric and non-parametric tests, which were the T-test [33], Z-test [34], F-test [35], sign test [36], and WSRtest [37] were applied to verify if there were significant differences in the means, variance, and distribution between the real and predicted data for each model. The results showed that the best performance and distribution of predicted data compared to the actual values was obtained by the MR-CART hybrid model, with R 2 , R 2 Adj shown in the graphs equaling 0.9884 and 0.9883, respectively. Moreover, the RMSE and MAE errors obtained by the model showed the smallest values, equaling 10.501 and 5.478, respectively. According to the performance tests, the CART model was placed in the second position compared to the other models ( Figure 9). However, the scattergrams showed the model had drawbacks when compared to the real data distribution, as most of the predicted values obtained by the CART model were repetitive. Thus, it is more efficient to use the MR model. The latter showed good performance, as shown by R 2 and R 2 Adj equaling 0.9789 and 0.9787, respectively. In this study, all the non-parametric and empirical water balance models gave lower performance than the proposed models, where the R 2 and R 2 Adj were lower than 0.95. In addition, the RMSE and MAE showed that these models gave significant errors. Table 9 shows significant residuals were obtained by the Schreiber, Ol'dekop, Pike, and Budyko models, in which the parametric (e.g., t-test, z-test, f -test) and the non-parametric (e.g., sign test, WSR test) tests showed poor variability and data distribution; respectively, compared to actual data, where the p-values given by these tests were less than 0.05. On the other hand, Zhang's model gave good data estimation compared to the Yang and Sharif models, where the p-value obtained by each test was between 0.209 and 0.447. However, the predicted data series given by both models showed no significant difference in variability with the actual data series, where the t-test, z-test, and f -test results obtained for the two models showed p-values of more than 0.05.
In comparison, the data distribution of the two series was poor, with the sign test and the WSR test showing p-values less than 0.05. According to Table 9, the proposed models (MR, CART, and MR-CART) were the most efficient and no significant difference was observed compared to the real IARR series, the p-values given by all tests being more than 0.5. The best model remained MR-CART, in which all the tests showed the best results and the data series obtained had very good similarity with the real dataset. In addition, the p-values obtained for the sign test and the WSR test showed that it is preferable to use the MR model as a second choice. The latter had better data distribution compared to the CART model despite its performance shown in Figure 9. Table 9. Two sample parametric and non-parametric statistical tests used to compare variability and data distribution of real and predicted IARR that were obtained through a set of water balance models in the northern Algeria area.

Conclusions and Future Work
The rainfall-runoff estimation, using an inter-annual time scale in a large area which is characterized by great climatic diversity, suffers from the problem of finding a better dynamic model adaptable to the spatial variability and the climatic conditions of the region. There are several models, but most are classified as non-parametric and empirical for local application, or are conceptual and physical and are difficult to apply due to da-   Figure 9. Graphs of (a) scattergrams and (b) radars showing data distribution and performance of proposed models in Algeria's northern area. Coefficient of determination (R 2 ), adjusted coefficient of determination (R 2 Adj ), mean absolute error (MAE), root mean square error (RMSE).

Conclusions and Future Work
The rainfall-runoff estimation, using an inter-annual time scale in a large area which is characterized by great climatic diversity, suffers from the problem of finding a better dynamic model adaptable to the spatial variability and the climatic conditions of the region. There are several models, but most are classified as non-parametric and empirical for local application, or are conceptual and physical and are difficult to apply due to dataset availability problems (such as vegetation index and watershed storage capacity). In this work, MR and CART machine learning was used to propose a dynamic model based on IARR predicted data as input data, obtained by a set of the most efficient water balance models in each climatic class in which both models applied the selection criteria to the input data subsets to give the best estimation. The experimental part of the modeling was applied in the northern Algeria area which is characterized by very humid, humid, semi-humid, Mediterranean, and semi-dry climates. A comparative study between water balance models in each climate floor showed that the Yang, Sharif, and Zhang models performed better throughout the northern Algeria area. It was shown that the choice of Yang's parameter (n) equaled 1.5 giving the best performance in all the study areas. However, Zhang's model showed excellent performance in the very humid, humid, and semi-humid areas when w equaled 0.5. Furthermore, the model gave good reliability in the Mediterranean and semi-dry areas when w equaled 0.7. In addition, the Schreiber model showed good performance in the very humid, humid, and semi-humid regions, where the R 2 Adj varied between 0.667 and 0.928. In the five climatic classes, the performance analysis showed that the MR and CART model was more reliable compared to the water balance models used above, where, in the very humid region, the R 2 Adj showed good performance for both models, shown by values of 0.8927 and 0.9101, respectively. This performance was also obtained in the humid region, where the R 2 Adj equaled 0.8171 and 0.8450, respectively. In the semi-humid floor, the MR and CART model showed a small improvement compared to the previous models, where the input data subsets used in the two models were obtained by Zhang and Schreiber, respectively. In the Mediterranean and semi-dry areas, both machines showed a better performance as given by an R 2 Adj equal to (0.7636, 0.7038) and (0.8382, 0.7137), respectively. The aridity data series (A_Index) showed good similarity with predicted data which was obtained by all the water balance models cited above, where the R 2 Adj had values more than 0.95. This dataset was used by the CART model to generalize the data classification of each child node in the formal algorithm of the model. The MR model showed a better distribution of data compared to that obtained for the CART model, where the p-values for the sign test and the WStest equaled (0.773, 0.705) and (0.326, 0.335), respectively. According to the performance tests, the MR-CART hybrid model showed the best performance, where the R 2 Adj had values between 0.793 and 0.989 in the five climatic classes, and 0.9883 in the northern Algeria region. In addition, the parametric and non-parametric tests (i.e., t-test, z-test, f -test, sign test, and WSRtest) showed that the hybrid model was dynamic and gave better variability and data distribution compared to the real data series, in which the p-values obtained by all the tests were between 0.7193 and 0.989. Future work will seek to develop a forecasting model to estimate inter-annual rainfall runoff (IARR) using continuous and discontinuous hydro-climatic datasets. We would also like to observe the effect of the climatic indices on the spatial estimation of IARR.
Author Contributions: All authors of this manuscript have directly participated in the planning, execution, and analysis of this study. A.A. worked on the hybrid model proposal, statistical analysis, and model comparison; A.L. worked on machine learning performance, validation part, and work supervision; I.K. worked on data collection, pre-processing, and mapping; K.M. worked on hydrological concepts, validation part, and work supervision. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.