Prediction of Multi ‐ Scalar Standardized Precipitation Index by Using Artificial Intelligence and Regression Models

: Accurate monitoring and forecasting of drought are crucial. They play a vital role in the optimal functioning of irrigation systems, risk management, drought readiness, and alleviation. In this work, Artificial Intelligence (AI) models, comprising Multi ‐ layer Perceptron Neural Network (MLPNN) and Co ‐ Active Neuro ‐ Fuzzy Inference System (CANFIS), and regression, model including Multiple Linear Regression (MLR), were investigated for multi ‐ scalar Standardized Precipitation Index (SPI) prediction in the Garhwal region of Uttarakhand State, India. The SPI was computed on six different scales, i.e., 1 ‐ , 3 ‐ , 6 ‐ , 9 ‐ , 12 ‐ , and 24 ‐ month, by deploying monthly rainfall information of available years. The significant lags as inputs for the MLPNN, CANFIS, and MLR models were obtained by utilizing Partial Autocorrelation Function (PACF) with a significant level equal to 5% for SPI ‐ 1, SPI ‐ 3, SPI ‐ 6, SPI ‐ 9, SPI ‐ 12, and SPI ‐ 24. The predicted multi ‐ scalar SPI values utilizing the MLPNN, CANFIS, and MLR models were compared with calculated SPI of multi ‐ time scales through different performance evaluation indicators and visual interpretation. The appraisals of results indicated that CANFIS performance was more reliable for drought prediction at Dehradun (3 ‐ , 6 ‐ , 9 ‐ , and 12 ‐ month scales), Chamoli and Tehri Garhwal (1 ‐ , 3 ‐ , 6 ‐ , 9 ‐ , and 12 ‐ month scales), Haridwar and Pauri Garhwal (1 ‐ , 3 ‐ , 6 ‐ , and 9 ‐ month scales), Rudraprayag (1 ‐ , 3 ‐ , and 6 ‐ month scales), and Uttarkashi (3 ‐ month scale) stations. The MLPNN model was best at Dehradun (1 ‐ and 24 ‐ month scales), Tehri Garhwal and Chamoli (24 ‐ month scale), Haridwar (12 ‐ and 24 ‐ month scales), Pauri Garhwal (12 ‐ month scale), Rudraprayag (9 ‐ , 12 ‐ , and 24 ‐ month), and Uttarkashi (1 ‐ and 6 ‐ month scales) stations, while the MLR model was found to be optimal at Pauri Garhwal (24 ‐ month scale) and Uttarkashi (9 ‐ , 12 ‐ , and 24 ‐ month scales) stations. Furthermore, the modeling approach can foster a straightforward and trustworthy expert intelligent mechanism for projecting multi ‐ scalar SPI and decision making for remedial arrangements to tackle meteorological drought at the stations under study.


Introduction
Drought is a crucial ecological issue impacting Earth and humans. Drought refers to water scarcity, severely impacting society's different segments, such as hydropower, agriculture, industrial, and water supply. The crucial factors for assessing the drought severity include period, location in absolute time, i.e., start and closure time points, areal coverage, and scale or force [1]. Wilhite and Glantz [2] deployed theoretical and operational expressions for describing a drought. The theoretical definition of drought is expressed generally as the shortage of precipitation causing harm to crops and harvest. The theoretical definition is vital for setting up a drought policy. The operational description (Wavelet-Support Vector Machine), and CS-SVM (Cuckoo Search-Support Vector Machine) for drought estimation based on SPI in Urmia Lake watershed, Iran. The performance of the WSVM model was better compared to the other models. Abbasi et al. [17] studied the meteorological drought in Urmia Lake (Iran) based on multi-scalar SPI and SPEI, and the projection was conducted by utilizing the Gene Expression Programming (GEP) model. The drought took place during 1959-1967 and 1998-2009. The observations indicated that the GEP model's superiority increased by mounting the scale of SPI and SPEI.
With related context, Memarian et al. [18] applied the CANFIS model to forecast the drought in Birjand (Iran) with the combination of climatic signals, i.e., NINO 1 + 2, NINO 3, Multivariate Enso Index, Tropical Southern Atlantic Index, Atlantic Multi-decadal Oscillation Index, NINO 3.4, and lagged values of SPI. Results highlighted the better feasibility of the CANFIS model in drought forecasting over the study region. Rafiei-Sardooi et al. [19] employed the neuro-fuzzy (NF) and time-series, i.e., ARIMA models, to predict the meteorological drought in the Jiroft plain of Iran using 3-and 12-month SPI. The analysis of results demonstrated that the NF model outperformed the ARIMA model. Malik et al. [20] predicted meteorological drought in the Kumaon region of Uttarakhand State (India) by employing the CANFIS, MLPNN, and MLR models considering SPI-1, SPI-3, SPI-6, SPI-9, SPI-12, and SPI-24. The obtained results were evaluated based on performance measures (i.e., RMSE, NSE, COC, and WI) and revealed that the CANFIS models provided better estimates than other models and different study stations. According to our knowledge, the MLPNN, CANFIS, and MLR models' efficacy for meteorological drought projection using the Standardized Precipitation Index (SPI) had not yet been investigated in the Garhwal region of Uttarakhand State, India. In this paper, the analysis was carried out by utilizing monthly rainfall data at Chamoli, Dehradun, Haridwar, Pauri Garhwal, Rudraprayag, Tehri Garhwal, and Uttarkashi stations with the three objectives: (1) to calculate the SPI at various time scales, 1-, 3-, 6-, 9-, 12-, and 24month, by utilizing available rainfall information; (2) to formulate the MLPNN, CANFIS, and MLR models for meteorological drought projection based on nominated input by utilizing the PACF analysis of multi-scalar SPI; and (3) to standardize and corroborate the AI and regression models for prediction of multi-scalar SPI values employing visual and statistical indicators.

Study Area and Data Assembly
This study was executed in the Garhwal region of Uttarakhand State (India), situated in the central Himalayan zone, spanning 32,499 km 2 ( Figure 1) among 77°34′ E to 81°03′ E longitudes and 28°43′ N to 31°28′ N latitudes by varying altitudes of 276 m to 5046 m above mean sea level. It borders with Uttar Pradesh in the South, Himachal Pradesh in the northwest, and China and Nepal in the northeast and southeast. Uttarakhand has 13 districts divided into two administrative divisions: (1) Garhwal, which comprises seven districts (Chamoli, Dehradun, Haridwar, Pauri Garhwal, Rudraprayag, Tehri Garhwal, and Uttarkashi) and (2) Kumaon, which comprises six districts (Almora, Bageshwar, Champawat, Nainital, Pithoragarh, and Udham Singh Nagar). The state boasts of a temperate climate, except the plains where it is tropical. The temperature varies from sub-zero to 43 °C. The yearly rainfall falls in 260 mm and 3955 mm, where 60% to 85% occurs during the monsoon season (June to September).
For estimating meteorological drought, seven rain gauge stations were set up, as shown in Figure 1. The information regarding the longitude, latitude, altitude, and data availability are presented in Table 1. The data used in this study included monthly rainfall records that were gathered over seven stations from the India Meteorological Department (IMD), Pune.

Standardized Precipitation Index (SPI)
McKee et al. [6] formulated the SPI to describe, monitor, and examine droughts on various time scales at Colorado State University. A detailed depiction of SPI was presented by Guttman [21] and Hayes et al. [22]. The calculation of SPI for a particular time scale at any location entails long-term (≥30 years) information on monthly precipitation. In general, the SPI is computed by exploiting statistical probability distribution over the aggregated rainfall of different time scales of attention. This activity was carried out individually for every month and location in space. The SPI is calculated by transforming every probability distribution into the standardized normal distribution (Z-distribution). Many research works have been carried out on rainfall distribution by utilizing the different probability distributions [23,24].
In this research, two-parameter gamma distribution [24][25][26] was utilized for the computation process of SPI by using Equation (1)

Co-Active Neuro-Fuzzy Inference System (CANFIS)
To solve the nonlinear optimization problems, Jang et al. [29] projected the idea of the CANFIS technique. In the present time, the CANFIS model effectiveness has been found in numerous fields of sciences [20,[30][31][32][33][34]. The basic structure of CANFIS involved integrating ANN and FIS (fuzzy-inference system) in one border. Figure 2 demonstrates the typical structure of the CANFIS model composed of these five layers (i.e., Layer-1: fuzzification, Layer-2: rule, Layer-3: normalization, Layer-4: defuzzification, and Layer-5: fuzzy association). Each input is passed and treated through these layers. By considering two inputs ( and ) under the rule of Takagi-Sugeno-Kang (TSK) fuzzy system with IF-THEN rule, we can describe the CANFIS models [35,36]: and (2) Here, , , and , are the fuzzy sets with and for the inputs and . , , and , , are parameters of the consequent part ( Figure. 2). The functioning of every layer is defined as [37,38]: 2 for 1, 2 In Equation (4), and are the linguistic labels and and represent the membership functions (MFs) for linguistic labels. In Equation (5), is the weight (or firing strength) associated with inputs ( and ). In Equation (6), is the normalized firing strength. In this study, through supervised learning, the CANFIS network was designed. For input data classification, the Gaussian (Gauss) MFs were applied along with the TSK (Takagi-Sugeno-Kang) fuzzy model and hyperbolic tangent activation function (for data normalization), and DBD (delta bar delta) algorithm was found to be more potent for multi-scalar SPI prediction at seven study locations.

Multi-Layer Perceptron Neural Network (MLPNN)
MLPNN was first proposed by Haykin [39]. It involves layers of parallel processing elements, known as neurons. Every layer is wholly linked to the next layer through interlinks of weights (W) or strengths. A typical structure of MLPNN models is composed of three layers, Layer-1: input (i), Layer-2: hidden (j), and Layer-3: output (k) with inter-linked weights (Wij and Wjk) between these layers, as illustrated in Figure 3. We specified the number of hidden layers and neurons by the number of predictands and predictors [40].
In this research, the 2n + 1 (n represents the inputs) concept was exploited to decide the optimal number of neurons in the hidden layer [7,10]. The designed architecture of MLPNN was then applied to forecast the meteorological drought condition at various locations depending on the SPI values used as input at different lags.

Multiple Linear Regression (MLR)
The degree of association among target parameter (dependent) and two or many independent parameters was determined using MLR and written as [30,33]: where represents the dependent (target) parameter, , , , … are the regression coefficient of the MLR equation, and , , … are the independent parameters. Subscript denotes the number of independent parameters corresponding to regression coefficient.

Input Selection and Model Development
Choosing proper inputs is a tedious task, particularly in nonlinear hydrological processes. The SPI-1, -3, -6, -9, -12, and -24 were calculated using long-term monthly rainfall information in this study. The ACF and PACF (autocorrelation function and partial-ACF) were applied for choosing the crucial inputs (lags). Both ACF and PACF were computed by utilizing the expressions below [41,42]: in which k defines lag through data series, designates the average of the whole data series, and N states the data points. Afterward, at a 5% confidence level, the computed values of PACF were tested, portraying upper and lower critical limits (UCL and LCL) by using Equation (12) [43]:

Application of AI and Regression Models for Multi-Scalar SPI Prediction
Projection of drought situation was carried out by determining the appropriateness of the MLPNN, CANFIS, and MLR deployed on all SPI values for seven study sites. All models were trained using 70% from the data set, and the remaining 30% were utilized to test the model. The performances of the MLPNN, CANFIS, and MLR were assessed by utilizing NSE, RMSE, COC, and WI during the testing, as stated in Tables 4 to 6, for Chamoli, Dehradun, Haridwar, Pauri Garhwal, Rudraprayag Tehri Garhwal, and Uttarkashi stations, respectively. As noted from Table 4, the CANFIS performance was observed to be the best with 2 Gaussian ( Table 5 outlines that the MLPNN models produced better estimates for SPI-12 at Chamoli  Similarly, Table 6 summarizes that the best performance of MLR models was for projecting the SPI

Performance Assessment by Using Scatter Plots and Taylor Diagram
The temporal disparity between the projected and computed (calculated) values for all SPI scales by the MLPNN, CANFIS, and MLR models during the testing span at Chamoli, Dehradun, Haridwar, Pauri Garhwal, Rudraprayag Tehri Garhwal, and Uttarkashi stations are presented in Figures 4-10. As clearly seen from these figures, line of regression of CANFIS was quite near the best-fit line (highlighted by red color) for SPI-1, SPI-3, SPI-6, SPI-9, and SPI-12 at Chamoli and Tehri Garhwal; SPI-3, SPI-6, SPI-9, and SPI-12 at Dehradun; SPI-1, SPI-3, SPI-6, and SPI-9 at Haridwar and Pauri Garhwal; SPI-1, SPI-3, and SPI-6 at Rudraprayag; and SPI-3 at Uttarkashi. The MLPNN the regression line was close to the bestfit line on Chamoli and Tehri Garhwal for SPI-24; Dehradun for SPI-1 and SPI-24; Haridwar SPI-12 and SPI-24; Pauri Garhwal for SPI-12; Rudraprayag for SPI-9, SPI-12, and SPI-24; and Uttarkashi for SPI-1 and SPI-6. Likewise, for MLR models, these lines were close at Pauri Garhwal on SPI-24 and Uttarkashi on SPI-9, SPI-12, and SPI-24.       Accordingly, the spatial design of projected and calculated (observed) values of multiscalar SPI for the MLPNN, CANFIS, and MLR models was also assessed by utilizing the Taylor diagram (TD) as a polar plot for attaining a graphical judgment of model performance based on SD, COC, and RMSE. Figures 11-17 show the TD of MLPNN, CANFIS, and MLR models at Chamoli, Dehradun, Haridwar, Pauri Garhwal, Rudraprayag, Tehri Garhwal, and Uttarkashi, respectively, for the testing span.
Consequently, Figure 11a-f shows that the CANFIS with chosen lags can be utilized for SPI projection at 1-, 3-, 6-, 9-and 12-month time spans, and MLPNN for the 24 months at Chamoli. Figure 12a-f displays that the CANFIS model with chosen lags can be utilized for SPI projection at 3-, 6-, 9-and 12-month time scales, and MLPNN for 1-and 24-month time spans at Dehradun. Figure 13a-f illustrates that the CANFIS model with certain lags can be utilized for SPI projection at 1-, 3-, 6-, and 9-month periods. The MLPNN model for 12-and 24-month time scales at Haridwar. Figure 14a-f demonstrates that the CANFIS model with certain lags was deployed for SPI projection at 1-, 3-, 6-, and 9-month time scales, MLPNN for the 12 months, and MLR the 24-month time scale at Pauri Garhwal. Figure 15a-f reveals that the CANFIS model with certain lags can be deployed for SPI projection at 1-, 3-, 6-month time scales and the MLPNN model for 9-, 12-, 24-month time spans at Rudraprayag. Figure 16a-f exposes that the CANFIS with certain lags can be utilized for SPI projection at 1-, 3-, 6-, 9-, and 12-month periods. The MLPNN model for 24 months at Tehri Garhwal. Figure 17a-f discloses that the CANFIS with certain lags can be utilized for SPI projection at 3-month time scales, MLR for 9-, 12-, 24-month duration, and MLPNN for 1-and 6-month duration at Uttarkashi.

Discussion
Tables 4-6 show that for a short time scale, i.e., SPI-1, both the AI and regression models yielded higher values of RMSE and lower values of NSE, COC, and WI. It means their unsatisfactory performance at a short time scale SPI. Apart from this, a comparison among AI performance (i.e., CANFIS and MLPNN) and regression (i.e., MLR) models is shown in Table 7, which reveals that the CANFIS gained the highest-ranking followed by the MLPNN at all study sites except the Uttarkashi. Overall, the MLR model received a lower ranking at Chamoli, Dehradun, Haridwar, Pauri Garhwal, Rudraprayag, Tehri Garhwal stations, and the highest Uttarkashi station. To make the results more concrete, this research's findings were compared with the recent investigations conducted in different parts of the world on meteorological droughts' prediction using stochastic and AI models. Mokhtarzad et al. [48] applied SVM, ANFIS, and ANN techniques to forecast the meteorological drought in Tehran based on SPI. According to the results, the SVM model provided more precise estimates than the ANFIS and ANN models. Nguyen et al. [49] conducted a study on the ANFIS model potential concerning meteorological drought prediction using SPEI and SPI at Khanhhoa Province, Vietnam. The investigation revealed the superior performance of the ANFIS model for SPEI and SPI in the study area. Zhang et al. [50] predicted the drought condition by employing the SVR, WANN, ANN, and ARIMA models that used 3 and 6 months of SPI values in Haihe River Basin, China. The estimated outcomes of SPI-6 and SPI-3 showed that the WANN model exhibited superior performance to other models. Liu et al. [51] used the Self-Adaptive Evolutionary-Extreme Learning Machine (SADE-ELM), Online Sequential-ELM (OS-ELM), and ELM for meteorological drought prediction based on SPEI and SPI in Khanhhoa Province, Vietnam. They found that the performance of the SADE-ELM models was superior to the other models. Mouatadid et al. [52] exploited the LSSVR, MLR, ANN, and ELM to forecast drought through various-scalar SPEI and SPI in eastern Australia. The results showed the better performance of ANN and ELM models than the LSSVR and MLR models. Özger et al. [53] applied standalone and hybrid use of ANFIS, SVM, and M5 models coupled with empirical mode decomposition (EMD-ANFIS, EMD-SVM, EMD-M5) and wavelet decomposition (i.e., WD-ANFIS, WD-SVM, WD-M5) for self-calibrated Palmer Drought Severity Index (SC-PDSI) prediction in the southern part of Turkey. The obtained results indicated the improved performance of hybrid WD-ANFIS, WD-SVM, and WD-M5 models over the other models. Considering the excellent performance of AI models, i.e., CANFIS and MLPNN, applied in this study, it might be insightful to compare these two models' performances with other future studies models. In this regard, more recent meteorological data may be considered. Comparing two models applied in this study with other models for drought prediction to consider different climate projections would also be an essential task to be carried out in future studies to assist water managers in better long-term planning of water resources' exploitation.
Accordingly, this study confirmed the superiority of AI models such as MLPNN and CANFIS in predicting meteorological droughts of various durations at the selected study stations.

Conclusions
This study analyzed the feasibility of AI and regression models that can be applied to predict the meteorological drought based on multi-scalar SPI at Pauri Garhwal, Chamoli, Rudraprayag, Dehradun, Haridwar, Uttarkashi, and Tehri Garhwal stations. Partial Autocorrelation Function (PACF) was utilized to choose the optimal input parameters (lags) for MLPNN, CANFIS, and MLR models at 5% significance level on SPI-24, SPI-12, SPI-9, SPI-6, SPI-3, and SPI-1 data series. The estimates yielded by the MLPNN, CANFIS, and MLR models were compared with the calculated (observed) values of multi-scalar SPI that applies statistical indicators, such as NSE, RMSE, WI, COC, and visual basis through Taylor diagram and scatter plot for study stations. Appraisal of results revealed that the applied AI models (i.e., CANFIS and MLPNN) significantly enhanced the modeling performance by improving the WI, COC, and NSE and decreasing the RMSE measurements in the study stations. Also, the executions of MLR at study stations were the poorest, except SPI-24 at Pauri Garhwal and SPI-9, SPI-12, and SPI-24 at Uttarkashi to predict meteorological drought. This study will help develop a reliable and standard intelligent system that can be used for the considered rainfall stations. It will be precious for policymakers and water resources managers to frame the study regions' drought mitigation strategies.