Forecasting of SPI and SRI Using Multiplicative ARIMA under Climate Variability in a Mediterranean Region: Wadi Ouahrane Basin, Algeria

Water resources have always been a major concern, particularly in arid and semiarid parts of the world. Low precipitation and its uneven distribution in Algeria, along with fast population and agriculture activity increase and, particularly, recent droughts, have made water availability one of the country’s most pressing issues. The objectives of the studies reported in this article are to investigate and forecast the meteorological and hydrological drought in Wadi Ouahrane basin (270 km2) using linear stochastic models known as Autoregressive Integrated Moving Average (ARIMA) and multiplicative Seasonal Autoregressive Integrated Moving Average (SARIMA). In particular, data from 6 precipitation stations and 1 hydrometric station for the period 1972–2018 were used to evaluate the Standardized Precipitation Index (SPI) and the Standardized Runoff Index (SRI) for 12 months. Then, the multiplicative ARIMA model was applied to forecasting drought based on SPI and SRI. As a result, the ARIMA model (1,0,1)(0,0,1)12 for SPI and (1,0,1)(1,0,1)12 for SRI were shown to be the best models for drought forecast. In fact, both models exhibited high quality for SPI and SRI of 0.97 and 0.51 for 1-month and 12-month lead time, respectively, based on validation R2. In general, prediction skill decreases with increase in lead time. The models can be used with reasonable accuracy to forecast droughts with up to 12 months of lead time.


Introduction
Drought is a recurring temporary natural phenomenon that generally results from a decrease in precipitation relative to its long-term average and can occur in any climate [1]. However, an exact drought definition is not simple, as different drought types exist, which can be defined based on the several hydrometeorological variables related to drought, including precipitation, soil moisture, river flow, water level, and groundwater level. For example, to distinguish meteorological drought the precipitation variable is used, to distinguish agricultural drought the soil moisture variable is used, and other hydrological cycle variables are used to diagnose hydrological drought [2,3]. Meteorological drought is known as the source of other types of droughts. Hydrological and agricultural droughts arise from meteorological droughts [4].
In order to study this complex phenomenon, drought indicators are considered as a key to quantifying the definition of drought in the implementation of drought programs [5,6]. One widely used drought index was presented by Palmer in 1965 (Palmer Drought Stress Index or PDSI) and included previous precipitation as well as modeled soil moisture storage and evapotranspiration [7]. One of the often used indices is SPEI (Standardized Precipitation Evapotranspiration Index). This index is based on a standardized monthly climatic balance calculated as the difference between the cumulative precipitation and the potential evapotranspiration; the latter is estimated based on temperature and other meteorological variables [8][9][10][11]. McKee et al. [8] developed the Standardized Precipitation Index (SPI) and applied it to the Colorado area. SPI has several advantages over the Palmer Drought Stress Index [8,12]. The first advantage is that SPI is based on precipitation only. For this reason, SPI is not affected by soil type. Second, the SPI can be calculated at different time scales, which allows it to describe different types of droughts: shorter time scales for meteorological and agricultural droughts, and longer scales for hydrological droughts [13]. Third, due to the normal distribution of SPI, droughts are expressed on a scale that can be used for any location and time duration. Moreover, Hayes et al. [14] argued that SPI detects a moisture deficit faster than PDSI. Paolo et al. [15] used several drought indicators in Portugal and concluded that SPI is superior to other indicators for drought monitoring. Moreover, for hydrological drought, an index similar to the standardized precipitation index called the Standardized Runoff Index (SRI) can be used, which is based on monthly flow [16].
Time series methods provide an important approach in drought forecasting. One of the most widely used time series models is the Autoregressive Integrated Moving Average Model (ARIMA) [17]. The widespread use of the ARIMA model is due to its flexibility and possibility for systematic search (identification and estimation) at each stage of a model [18]. The ARIMA model has several advantages over other models, in particular its predictability and richer information about changes over time [19]. ARIMA models have also been used to analyze and model hydrological time series [20]. Mishra and Desai [21] used the ARIMA and seasonal ARIMA (SARIMA) models to predict the SPI time series in the Kansabati River basin in India. The results show reasonably good agreement between the observed and predicted data, 1-2 months ahead. The root mean square error and mean average error value increases with increase in lead time. The models could be used to forecast droughts with up to 2 months of lead time with reasonable accuracy. Fernandez et al. [22] used the SARIMA model to predict flow in a small basin in Galicia, Spain. Durdu [23] developed linear stochastic models to predict meteorological drought in Turkey using the SPI series. Bazrafshan et al. [24] used multiplicative ARIMA to predict seasonal and monthly hydrological drought. The results showed that the forecasts at seasonal scale had less error. Concerning the Mediterranean region, the ARIMA model was used in Lebanon to forecast temperature, precipitation, and then drought projections [25]. Finally, the authors concluded that the method can serve as a tool for proactive climate change mitigation or adaptation plans. Shatanawi et al. [26], based on calculation preformed in the Jordan River basin, showed that ARIMA models can be used to forecast the future drought trends.
As mentioned above, drought, which is characterized by high spatial and temporal variability [27], has significant environmental, social, and economic consequences [28]. In particular, the investigation of both meteorological and hydrological drought is important for water management and early warning and mitigation at basin scale [29]. Within this context, the objective of this study is to use the multiplicative ARIMA (and SARIMA) model for meteorological and hydrological drought forecasting in the Wadi Ouaharane watershed located in Algeria. The ARIMA model was tested in this region, which experiences highly variable precipitation and runoff. In the knowledge of authors, the method has not yet been used to forecast drought in Algeria.
For this study, we calculated time series of SPI and SRI at the 12-month time scale and developed stochastic models to forecast and simulate these time series.

Study Area and Data
The study area is the Wadi Ouahrane basin of northern Algeria, which is located between 36 • 00 N and 36 • 24 N and between 01 • 00 E and 01 • 30 E. It is a small tributary of Wadi Cheliff, extending over 270 km 2 with a maximum altitude of 991 m and a minimum altitude of 165 m ( Figure 1). The Wadi Ouahrane basin is limited to the east by the basin of Wadi Fodda, to the west by the Wadi Ras basin, to the north by the Wadi Allala basin, and to the south by the Wadi Sly basin. The basin is monitored by six pluviometric stations and one hydrometric station ( Figure 1). It is influenced by the Mediterranean climate, with an interannual average precipitation of 333 mm during the period 1972-2018. The mean annual temperature is 18 • C. The Ouled Farès sector receives a rainfall that varies from 207 to 628 mm, which is located below 200 m altitude. It occupies nearly 40% of the basin's area. The Benairia sector, located at more than 350 m altitude, receives an average annual rainfall that varies between 234 and 749 mm. This sector covers about 60% of the basin. The Wadi Ouahrane basin is defined by impermeable marl bedrock that covers 80% of the basin's surface. In contrast to the formations in the southern part of the basin, which are composed of conglomerate and red sand and have an average permeability, these soft lithological strata in the northern half of the basin are continually subjected to significant water erosion. The primary agricultural activities in the wadi Ouahrane watershed, in terms of land usage, are mixed farming and cereal cultivation [30]. The Köppen-Geiger classification [31] identifies the climate of the basin as a hot-summer Mediterranean climate, thus presenting relatively mild winters (with rain) and very hot summers (often very dry). With this climate, the coldest month generally averages above 0 • C, at least 1 month's average temperature reaches values higher than 22 • C, and at least 4 months average above 10 • C. Concerning rainfall, in the hot-summer Mediterranean climate, rainfall in the wettest month of winter averages over three times that in the driest month of summer, which receives less than 30 mm.
The precipitation data used in this study were collected at 6 stations ( Figure 1, Table 1) at the monthly scale in the period from 1972 to 2018. These precipitation data are taken from the Algeria National Agency of Meteorology and Hydrology (ANRH) and the National Office of Meteorology (ONM). Monthly runoff data from 1972 to 2018 were also collected from ANRH. In this study, the Thiessen polygons method was applied to compute the monthly areal mean basin precipitation. Weights for the 6 stations were computed based on the Thiessen polygons method ( Figure 1). To construct the SPI, this monthly weighted mean precipitation was used to approximate the basinwide amount.

Standardized Precipitation (Runoff) Index
As mentioned above, to study meteorological drought, several indicators have been developed, of which the standardized precipitation index (SPI), first developed by McKee et al. [8], is particularly widely used. For hydrological drought, an index similar to the SPI can be developed as the standardized runoff index (SRI), which is based on monthly mean streamflow and was first proposed by Shukla and Wood [16]. The computational principles of these indices are presented in Figure 2. To forecast the SPI and SRI indices, the commonly known ARIMA model was used [25]. All steps of stochastic modeling were performed in R software in version 4.1.2. When the time series is stationary, the basic form of ARMA models for a standardized normal variable Z t can be used [32]: where Z t is the observed series, ϕ is the polynomial of order p, and θ is the polynomial of order q. If a time series is nonstationary, it can be made stationary using the difference operator. Using the dth difference of the time series and modeling it with ARMA (p,q) leads to the emergence of a new set of nonseasonal autocorrelated cumulative moving average models (ARIMA (p,d,q)). The basic form of the nonseasonal ARIMA model follows [32]: where ϕ and θ are polynomials of order p and q, respectively, and B is the lag operator that returns previous values of Z t .
In the case of using the seasonal difference operator with delay w and their fitting with ARMA models (p, q), ARIMA seasonal models (P, D, Q) w are created. A combination of seasonal and nonseasonal models forms the so-called multiplicative ARIMA models. The basic form of this model follows [33]: where p is the order of the nonseasonal AR model, P is the order of the seasonal AR model, q is the order of the nonseasonal MA model, Q is the order of the seasonal MA model, w is the season length, ε is a random variable (net perturbation), and B is the lag operator The development of time series models includes three stages: identification, parameter estimation, and a model adequacy test [17]. In the identification step, the general form of the model is determined according to the time series behavior. In the parameter estimation stage, the parameters of the model selected in the previous stage are calculated by a statistical method such as the momentum method or least squares and maximum likelihood. Finally, in the model adequacy test stage or diagnostic check, the selected model is used to model the time series and criteria such as the normality test and the independence of the residual are measured.

Model Development
The first step is the model identification. At this stage, the time series correlation structure is determined by testing the autocorrelation function (ACF) and partial autocorrelation (PACF) [17]; this information is then used to determine the general form of the univariate model. According to the number of significant steps in each of the ACF and PACF diagrams, the range of order changes of the models can be obtained. Finally, by combining them, the candidate models can be found. The model with the lowest Akaike information criterion (AIC) and Schwarz Bayesian criterion (SBC) is selected as the best model. The mathematical formulas of AIC [34] and SBC [35] follow: where m = (p + q + P + Q) is the number of estimated parameters and L is the likelihood function of ARIMA models. After the model identification step, the parameters of each selected model were calculated. Preliminary estimates of the parameters were calculated from the ACF time series obtained at the identification stage.
The parameter estimation is followed by the diagnostic check. In this step, first the residual series is obtained from the difference of the observed time series and the modelpredicted time series fit, then two validation tests are performed on the residues: (i) the white noise ACF and PACF of residuals and (ii) the normal probability of residuals and Kolmogorov-Smirnov statistics of residuals. Concerning the first test, if the ACF and PACF values of the residual series are consistent with zero within the confidence range, there is no significant correlation between the residues, and the residues are pure perturbation [18]. With respect to the latter test, the Kolmogorov-Smirnov (K-S) statistic is used to test the normality of residuals from different sets of models of the fit of data, as follows: where D is the maximum deviation, F(x) is the completely specified theoretical cumulative distribution function under the null hypothesis, and F(x) is the sample cumulative density function based on n observations. For a chosen significance level α, if D is greater than the critical value D tab , the null hypothesis related to normality is rejected for the chosen level of significance.

Kappa (κ)
Comparing the SPI (SRI) classes in the observed and predicted series, the disagreement between the mild drought and the moderate drought categories is not as great as the disagreement between the mild drought and the severe drought categories. Therefore, by considering certain weights for each case of disagreement, it is possible to make a more accurate comparison of the classes in the observed and predicted series. In this paper, we used kappa statistics [36][37][38] to comparison observed and predicted series.
The significant test statistic of the weighted kappa, statistic with the null disagreement hypothesis is as follows [37]: where n is the number of series observations.

Model Validation
To evaluate the accuracy of the hydrological and meteorological drought forecasts and forecasting performance of models at the validation stage, results were compared based on the correlation coefficient (R 2 ), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) [20,39]. Figure 3 shows a flowchart of the proposed method for drought forecasting. All modeling steps were performed using R software, version 4.1.2.

Assessment of Drought Based on SPI and SRI
In this study, SPI and SRI time series on a 12-month time scale were calculated using monthly precipitation and runoff data over a 45-year period (1973-2017). A 12-month duration of drought is one of the most common among other time scales in the studied watershed [40,41]. Moreover, in larger catchments, there is a strong relationship between meteorological and hydrological droughts if analyses are conducted for a more extended accumulation period [42,43].
According to Li et al. [44], the SPI is highly sensitive to onset of meteorological droughts, because meteorological droughts usually occur after the continuous lack of precipitation, but the SRI is effective in identifying the duration and termination of drought. Figure 4 shows the changes of SPI-12 and SRI-12 in the study area. The two variables follow an almost similar trend, and runoff and precipitation show a severe drought in the late 1980s and early 1990s, and generally dry conditions since the late 1990s. Time lags appear between SPI and SRI due to the delayed response of runoff to precipitation variability [45]. It can be therefore be observed that the time of the onset of hydrological drought tends to be later than the meteorological drought in most of the drought events. According to Sattar and Kim [46], the onset time of hydrological drought lags behind meteorological drought. This situation is similar to the one presented in this study. Data from 1973 to 2010 were used for model development and 2011 to 2017 for validation, and the three modeling steps were performed on each series. Climate 2021, 9, x FOR PEER REVIEW 10 of 21

Assessment of Drought Based on SPI and SRI
In this study, SPI and SRI time series on a 12-month time scale were calculated using monthly precipitation and runoff data over a 45-year period (1973-2017). A 12-month duration of drought is one of the most common among other time scales in the studied watershed [40,41]. Moreover, in larger catchments, there is a strong relationship between meteorological and hydrological droughts if analyses are conducted for a more extended accumulation period [42,43].
According to Li et al. [44], the SPI is highly sensitive to onset of meteorological droughts, because meteorological droughts usually occur after the continuous lack of precipitation, but the SRI is effective in identifying the duration and termination of drought. Figure 4 shows the changes of SPI-12 and SRI-12 in the study area. The two variables follow an almost similar trend, and runoff and precipitation show a severe drought in the late 1980s and early 1990s, and generally dry conditions since the late 1990s. Time lags appear between SPI and SRI due to the delayed response of runoff to precipitation variability [45]. It can be therefore be observed that the time of the onset of hydrological drought tends to be later than the meteorological drought in most of the drought events. According to Sattar and Kim [46], the onset time of hydrological drought lags behind meteorological drought. This situation is similar to the one presented in this study. Data from 1973 to 2010 were used for model development and 2011 to 2017 for validation, and the three modeling steps were performed on each series.  Figure 5 shows the ACF and PACF correlograms of the SPI-12 and SRI-12 time series. Owing to the ACF fluctuations in SPI-12 (Figure 5a), the time series is a combination of exponential and sinusoidal damping waves, and lagged correlations are significant in the first 20 time steps. Therefore, an MA model can be proposed. In the case of its PACF (Figure 5b), it is significant in the first step, which can suggest an AR model. In addition, the PACF is significant at 12 and 24 months, so the series has a 12-month period indicative of an annual cycle in precipitation. Therefore, a combination of AR and MA models as a multiplicative SARIMA model is proposed for modeling this series. The same applies to  Figure 5 shows the ACF and PACF correlograms of the SPI-12 and SRI-12 time series. Owing to the ACF fluctuations in SPI-12 (Figure 5a), the time series is a combination of exponential and sinusoidal damping waves, and lagged correlations are significant in the first 20 time steps. Therefore, an MA model can be proposed. In the case of its PACF (Figure 5b), it is significant in the first step, which can suggest an AR model. In addition, the PACF is significant at 12 and 24 months, so the series has a 12-month period indicative of an annual cycle in precipitation. Therefore, a combination of AR and MA models as a multiplicative SARIMA model is proposed for modeling this series. The same applies to the SRI-12. ACF (Figure 5c) is attenuated as a combination of exponential and sinusoidal waves, and in the first 10 time steps, it is significant that an MA model with different orders can be proposed, and PACF (Figure 5d) in steps 1, 2, and 3 means that an AR model can be proposed with degrees 0, 1, 2, and 3. It also has a periodicity of 12 and 24, which indicates that the model is seasonal. Therefore, this variable is also a multiple (seasonal) ARIMA model. According to the number of significant steps in each of the ACF and PACF diagrams, the range of order changes of the models can be obtained and finally, by combining them, the candidate models can be obtained. A model with the least AIC and SBC is selected as the best model. All SPI-12 and SRI-12 time series models were identified and the best model for each time series based on AIC and SBC was obtained from more than 860 candidate models. The same model was selected as the best by Han et al. [47] to drought forecasting in the Guanzhong Plain. Table 2 shows an example of several models, parameters, and criteria of AIC and BIC. The standard error calculated for each of the model parameters was generally small compared to the parameters. Therefore, these parameters can be used in modeling. Then, validation tests were performed on the residues to evaluate the adequacy of the selected model, which are ACF and PACF residuals and residual histogram and normal residual probability. Figure 6a,b shows ACF and PACF residues for SPI-12 time series and Figure 6c,d shows SRI-12 for two selected ARIMA models (0,1,0)(0,1,1) 12 and ARIMA (2,0,1)(1,0,1) 12 , respectively. According to the figures, it can be seen that most of the ACF and PACF values in both indices are within the confidence limit. Therefore, there is no significant correlation between the residuals; the residuals are white noise.  Figure 7 shows the residual histogram and the normal probability for the SPI-12 and SRI-12 time series. According to Figure 7a,b, it can be seen that the residuals of both time series follow the normal distribution, so the residuals are white noise and the cumulative distribution plot of the residuals is a straight line when drawn on normal probabilistic paper. Therefore, the residuals follow the normal distribution.

Drought Forecasting Using Selected Models
For validation, the time series from 2011 to 2017 was used for SPI-12 and SRI-12. In Figure 8a,b, comparison of the observed and predicted time series shows the two drought indicators. As the results show, there is good agreement between the two series. That is, the time series model was able to simulate drought indices well. The main statistical features were compared between the observed and predicted data using Z test for mean and F test for standard deviation. The results are presented in Table 3.
The results of the mean test (Z test) indicate that the null hypothesis, that the means are equal, is accepted for the time series SPI-12 and SRI-12 at a significance level of 95%; therefore, the stochastic model maintains the observed series average well. The results of the variance comparison test (F test) indicate that the null hypothesis, that the variances are equal, is accepted at the 95% significance level for the time series studied. Therefore, there is no significant difference between the observed and predicted values of variance.
We also calculated weighted kappa in drought classes. Based on the results shown in Table 4, there is good agreement between observed in drought classes and predicted (K > 0.7).  The forecast was performed for SPI-12 and SRI-12 from 1 to 12 month lead times using the best model. Table 5 shows R 2 , RMSE, and MAE values between observed data and predicted data using the selected best model for all time series. Results show that with a longer lead time, the coefficient of correlation decreases and error coefficient increases between observed and predicted data. The best models selected from the multiplicative ARIMA approach using a time series data of SPI-12 and SRI-12 series can be used for drought forecasting at whatever lag is desired.

Discussion
Generally, the results show that SPI and SRI indicators have a similar behavior and SRI tends to occur later than SPI indicators for particular events. In this study, the ARIMA model was used to forecast meteorological and hydrological drought. The popularity of the ARIMA model in many areas is due to its flexibility and the systematic searching at each stage (identification, estimation, and diagnostic check) for an appropriate model [18]. Quite visible differences in the behavior of ACF and PACF between SPI and SRI can be explained by mechanisms influencing the two forms of drought.
Comparison of multiplicative SARIMA models used in meteorological and hydrological droughts shows that increasing the forecast lead time (from 1 to 12 months) increases the error rate. The reason for this is the cumulative error from the previous steps, which spreads to the next steps [21]. It can be concluded that stochastic models provide good results in the case study only for the next 12 months; in other words, it is better to use this linear statistical approach to forecast only the next 12 monthly steps ahead to prevent anomalies in the results, and for more than 12 months to consider using other models.
Meteorological drought in this region is determined by a high variability of precipitation and is influenced by the El Niño Southern Oscillation (ENSO) [48], Mediterranean Oscillation (MO) [49], and Western Mediterranean Oscillation (WeMO) [50,51]. The negative phase of the NAO dominated between 1940 and 1980, corresponding to a period when precipitation was above normal and thus SPI generally had positive values in the Mediterranean basin [52]. More recently, drier conditions dominated, corresponding to commonly negative SPI values (Figure 4). These teleconnections with climate indices offer additional possibilities for drought forecasting approaches.
Considering the hydrological drought, expressed by SRI, a lag time between the occurrence of meteorological and hydrological drought is visible. It is mainly visible in Figure 8, where negative SRI are observed after negative SPI. The time buffering is influenced by catchment properties, including land cover, geological properties, and human activities. The ARIMA model (1,0,1) (0,0,1) for the SPI and (1,0,1) (1,0,1) for the SRI model were found as the best models to predict drought. Moreover, the models help to identify seasonality of SPI and SRI indicators that is strongly dependent on the precipitation seasonal variability.
The ARIMA model has some limitation, as it is not suitable for nonlinear relationships, especially in a complex and dynamic problems [53], and is incapable of fully capturing nonlinear patterns hidden within a time series. These limitations are less of a problem for forecasting SPI and SRI, which are transformed to follow a standard normal distribution, as compared to precipitation and streamflow, whose distribution is highly non-normal.
Our results are broadly consistent with those of studies from other regions of the world that tried to use the ARIMA model to predict meteorological and hydrological drought. Karimi et al. [54] detected that an ARIMA model can be used to forecast SPI index in 1, 3, 6, 9, 12 monthly time scales in the Karkheh River watershed. Han et al. [47] describe results of drought forecasting where the ARIMA model linked with remote sensing was used to analyze change of the Vegetation Temperature Condition Index (VTCI) in the Guanzhong Plain. Finally, they conclude that the ARIMA model gave accurate results in forecasting the index. In contrast, Beyaztas and Yaseen [55] showed that Functional Principal Component Analysis (FPCA) gave better results compared to the ARIMA model when predicting the Palmer Drought Severity Index (PDSI).

Conclusions
Drought forecasting is an important issue, especially in arid regions and under climate change. Many techniques are used to forecast drought, but one of the more popular is the ARIMA model. In this work, a multiplicative ARIMA model was used for meteorological and hydrological drought forecast in the Wadi Ouaharane watershed located in Algeria. Moreover, the model can be used in other regions and different types of catchments because parameters of the ARIMA model are optimized based on time series. The model does not have parameters that are linked with catchment features; instead, the model only requires time series characteristics. To use the ARIMA model, a long time series with constant time steps is necessary.
Drought was indicated by the SPI representing meteorological drought and the SRI for hydrological drought. Results showed that despite some similarity between the courses of the two indices, time lags appear between SPI and SRI due to the delayed response of runoff to precipitation variability. Analyses showed that the ARIMA model (1,0,1) (0,0,1) for the SPI and (1,0,1) (1,0,1) for the SRI were found as the best models to predict drought. Based on R 2 , both models had good quality; for the SPI and the SRI, R 2 was equal to 0.96 and 0.97, respectively, at 1-month lag. Finally, we conclude that the seasonal ARIMA model can be used to forecast meteorological and hydrological drought indices in arid regions.