A Theoretical Approach for Forecasting Di ﬀ erent Types of Drought Simultaneously, Using Entropy Theory and Machine-Learning Methods

: Precipitation deﬁcit can a ﬀ ect di ﬀ erent natural resources such as water, soil, rivers and plants, and cause meteorological, hydrological and agricultural droughts. Multivariate drought indexes can theoretically show the severity and weakness of various drought types simultaneously. This study introduces an approach for forecasting joint deﬁcit index (JDI) and multivariate standardized precipitation index (MSPI) by using machine–learning methods and entropy theory. JDI and MSPI were calculated for the 1–12 months’ time window (JDI 1–12 and MSPI 1–12 ), using monthly precipitation data. The methods implemented for forecasting are group method of data handling (GMDH), generalized regression neural network (GRNN), least squared support vector machine (LSSVM), adaptive neuro-fuzzy inference system (ANFIS) and ANFIS optimized with three heuristic optimization algorithms, di ﬀ erential evolution (DE), genetic algorithm (GA) and particle swarm optimization (PSO) as meta-innovative methods (ANFIS-DE, ANFIS-GA and ANFIS-PSO). Monthly precipitation, monthly temperature and previous amounts of the index’s values were used as inputs to the models. Data from 10 synoptic stations situated in the widest climatic zone of Iran (extra arid-cold climate) were employed. Optimal model inputs were selected by gamma test and entropy theory. The evaluation results, which were given using mean absolute error (MAE), root mean squared error (RMSE) and Willmott index (WI), show that the machine learning and meta-innovative models can present acceptable forecasts of general drought’s conditions. The algorithms DE, GA and PSO, could improve the ANFIS’s performance by 39.4%, 38.7% and 22.6%, respectively. Among all the applied models, the GMDH shows the best forecasting accuracy with MAE = 0.280, RMSE = 0.374 and WI = 0.955. In addition, the models could forecast MSPI better than JDI in the majority of cases (stations). Among the two methods used to select the optimal inputs, it is di ﬃ cult to select one as a better input selector, but according to the results, more attention can be paid to entropy theory in drought studies.


Introduction
Drought forecasting is one of the major concerns to the water managers, agriculturalists and other water exploiters. Iran, especially its central desert, is situated on the arid and extra-arid belt of the world and the drought intensity variations directly influence the people's life, agriculture, surface water and groundwater in the country. During the years 1998-2000, Iran experienced one of the worst and the costliest drought periods that occurred in the last 50 years. During this three-year dry period, water scarcity reached a crisis point in more than 270 cities, thousands of villages ran out of drinking water, the rate of surface water flow decreased to 55%, and Iran's dam and reservoirs have had to operate at their minimum volume capacity to transfer water because of the low inflows and high temperatures [1,2]. Thus, the investigation of the meteorological, hydrological and agricultural droughts is of great importance to the country. Drought indices can show the weakness and intensity of different types of droughts separately, but in recent years, certain indices have been introduced for indicating the intensity and weakness of the various types of droughts together, in a single series [3]. Among the most important of these indices, the multivariate standard precipitation index (MSPI) and joint deficit index (JDI) can be pointed out. These indices combine some series of standard precipitation indices (SPI) that each of them indicates a certain type of drought [4]. The combination is based on the various statistical methods, and the result reflects the general status of the drought in a given month.
JDI was first introduced by Kao and Govindaraju [5] as a joint drought index. They introduced the modified standard precipitation index (SPImod) in a study and pooled 1-12 month time window to come up with JDI and asserted that the method could be implemented on all of the other hydrological variables, such as the discharge rates of the rivers plus the precipitation data. After that, Mirabbasi et al. [6] calculated JDI for 50 rain gauges in the north of Iran during the period from 1970 to 2007. They concluded that JDI could provide a comprehensive view of the short-and long-term status of the drought in Iran, and they also forecasted rainfall for one to three-month horizons to investigate the drought status by JDI for the forthcoming months. In a study, Hao and Singh [7] investigated the multivariate indices of drought, including JDI, and the applied characteristics and weak and strong points of these indices. Ma et al. [8] combined JDI and standardized Palmer Index to introduce SPDI-JDI (standardized Palmer drought index-based joint drought index) index for ten climatic zones in Texas.
In 2014, Bazrafshan et al. [9] introduced the MSPI index as a joint drought index. They examined SPI's time windows by performing principal component analysis and applied them to four stations from various climates in Iran and showed that the first principal component (PC1) could account for 74% of the general drought variations. Bazrafshan et al. [2] have investigated MSPI and JDI in terms of multivariate drought monitoring for 42 meteorological stations all over Iran. They calculated both of the indices for 1-12 month time window (which monitors meteorological, hydrological and agricultural droughts simultaneously) and showed that MSPI outperforms JDI for illustrating the intensity and weakness of drought in Iran's climates. Bateni et al. [10] used MSPI for introducing the agrometeorological drought indices (AMDI-SA). In addition, MSPI prediction has been reported for Iranian climates in Aghelpour et al. [11] study for agricultural drought forecasting.
Since the droughts begin slowly, it is possible to rely on real-time predictions so that measures can be taken and policies can be adopted for reducing the drought effects [12]. In recent years, many models have been employed for predicting the drought indices, amongst which machine learning-based models can be pointed out. Moreover, metaheuristic models are obtained by blending these models with optimization algorithms that are applied to predicting drought. These models have been widely used for predicting the SPI family, such as SPI as the meteorological drought index [13][14][15][16] and SPEI as the agricultural drought index [17][18][19][20][21], and their favorable performance has been reported under various climatic conditions. For example, Kisi et al. [22] used adaptive neuro-fuzzy inference system (ANFIS) and its hybrid forms with particle swarm optimization (PSO) and genetic algorithm (GA) for forecasting SPI as a meteorological drought index. Malik et al. used [23] a co-active neuro-fuzzy inference system (CANFIS) for forecasting streamflow drought index (SDI), which is a hydrological drought index of the SPI family. Soh et al. [17] used the wavelet ANFIS model and SPEI indicator for forecasting agricultural drought. In most of the drought studies, machine learning models had significantly acceptable performances in predicting different types of droughts separately.
According to the research done, however, no model has been evaluated to forecast different types of droughts simultaneously, as well as the selection of optimal model inputs is a challenge for nonlinear dynamic models. The questions as to which inputs should be used for model development have been a challenge in practice. Although much research has been done on drought forecasting issues, most of them do not consider effective input selection for drought modeling. As a result, the main challenge lies in the evaluation of existing data and assessing the adequacy of data. Although modeling improves performance by adding information, observations show that adding information improves model performance only slightly. Modeling accuracy can be reduced by increasing information. This is because additional information causes the model to be over-fitted. The over-fitted model also performs well in training but has a very poor performance in testing. Overfitting also occurs when multivariate models have too much input data. It is, therefore, important to know which inputs are effective in modeling and which are not [24][25][26][27]. Furthermore, MSPI has been forecasted by ANFIS and its hybrid-type models [11], but it has been investigated just from the perspective of agricultural drought. In fact, the present study is the first research trying to forecast multivariate drought in extra-arid climates, based on the indices JDI and MSPI; which can theoretically show the severity and weakness of meteorological, hydrological and agricultural droughts, simultaneously. Thus, the current research paper is intended to evaluate the performance of machine learning and metaheuristic models for forecasting these two multivariate drought indices. Furthermore, as another innovation in drought forecasting studies, entropy theory and gamma tests have been used for selecting the suitable predictor input variables.

Study Region
Based on the extended De Marton Climatic Zones Classification Method, the widest climatic class of Iran is the extra-arid cold climatic zone. This zone has an area of 352,738.4 km 2 in Iran, which accounts for about 21.4% of the country's total area [28]. Based on this method, ten synoptic stations are specified in this climactic breadth, as shown in Figure 1. In the present study, drought monitoring and prediction are carried out in these ten stations. The reason for choosing this climate is that drought's damaging effects are more significant in extra arid areas.
The information used in this study includes monthly precipitation (P) and temperature (T) data whose positions and statistical specifications have been summarized in Table 1. As observed from the table, precipitation data have skewed distribution (skewness ranges from 1.81 (Kabootarabad) to 3.11 (Yazd)). The data of each of these stations were procured from the general office of the country's meteorology organization for the period of time from their establishment years till 2017. The precipitation data have been utilized for calculating both of the indices, i.e., JDI and MSPI, and also used as input to the prediction models. The monthly temperature data were also used as input, and their effects on the models' accuracy were also investigated. The information used in this study includes monthly precipitation (P) and temperature (T) data whose positions and statistical specifications have been summarized in Table 1. As observed from the table, precipitation data have skewed distribution (skewness ranges from 1.81 (Kabootarabad) to 3.11 (Yazd)). The data of each of these stations were procured from the general office of the country's meteorology organization for the period of time from their establishment years till 2017. The precipitation data have been utilized for calculating both of the indices, i.e., JDI and MSPI, and also used as input to the prediction models. The monthly temperature data were also used as input, and their effects on the models' accuracy were also investigated.    The index was designed for the investigation of the joint behavior of the precipitation in SPImod's 1-month to 12-month time windows considering their seasonal changes and combines the time windows of SPImod, as one time window of JDI (JDI [1][2][3][4][5][6][7][8][9][10][11][12]. JDI is essentially based on Copula Functions. This index uses a 12-dimensional experimental copula for showing the associations between SPImod's time windows [5]. The experimental copula is based on the empirical ranking scales of joint experimental likelihood. For example, the 12D experimental copula is computed by Equation (1) for an n-numbered sample: where I (A) denotes the logical A term that is assigned a value equal to unity in case of holding true and zero in the otherwise case. (R i1 , . . . , R i12 ) are the ranks of the observed ith datum that are, respectively, shown as u 1 , . . . , u 12 . For more information, please refer to Kao and Govindaraju [5].

Multivariate Standardized Precipitation Index
The index is capable of blending several SPI variables, each of which belongs to a time window in one single series. The method that was used by Bazrafshan et al. [9] for this blending is principal component analysis (PCA). By this method, a large part of the drought intensity variability in different time windows is summarized in 1 series [9]. MSPI is based on the first principal component (PC1) that is a linear combination of k-time windows capable of explaining the highest percentage of variability in k first variables. The notable point is that their values cannot be compared between various places and months due to the algebraic properties of PC1, meaning that, unlike SPI that takes a mean value between 0 and 1, PC1 lacks such a trait, and hence, it must be standardized in respect to the mean and standard deviation of the various months (Equation (2)): In this equation, Z 1ym is the standardized value of PC1 in yth year and mth month; PC 1m is the mean value of PC1 in mth month, and SD 1m is the standard deviation of PC1 in the mth month. Here, Z 1ym is the value of MSPI. For more information regarding the MSPI calculation method, please refer to the original article by Bazrafshan et al. [9]. The drought intensity categories of these two indices are like SPI, as presented in Table 2: Table 2. Classification of the SPI's values and corresponding probability limits [9,29]. Since JDI is calculated for 1-12 months' time window (JDI 1-12 ), MSPI has also been calculated for the same time window (MSPI 1-12 ). Moreover, because 1-month SPI (SPI 1 ) indicates the drought status from meteorological perspectives, 3-month SPI (SPI 3 ) shows drought status from soil moisture viewpoint, 6-month SPI (SPI 6 ) reflects hydrological drought conditions (river streams and dam reservoirs), and 9-month to 12-month SPIs (SPI 9 , SPI 10 , SPI 11 and SPI 12 ) can express the agricultural drought status [4], so the 1-12 month time window can concomitantly exhibit all the meteorological, soil moisture, hydrological and agricultural drought conditions, together. In the current research paper, these two indices (JDI and MSPI) were calculated using MATLAB Software.

Input Determination Methods
After calculating the indices and before prediction by the models, two methods of entropy theory and gamma test were utilized for selecting the optimal inputs and accelerating the modeling task. These methods are described below.

Gamma Test
Gamma test shows the variance of the estimated error according to the direct data. Thus, the error estimation that is commonly termed gamma test can be considered as equivalent to the sum of nonlinear error squares [30]. To have a better understanding of the gamma test in case of assuming that two points x and x' approaching one another in the input space corresponding to y and y' as outputs, these two points should also be approaching one another in the output space, otherwise the non-approaching of these two points stems from the difference caused by errors (noises) [31].

Entropy Theory
Shannon proposed the main concept of information theory in 1948 [32]. The definition of entropy is derived from information theory that analyzes a series of statistical structures from numerical series, signs and symbols establishing relationships between the signals [33]. In general, four types of entropy were introduced for quantifying the information content: boundary entropy, joint entropy, conditional entropy and information transfer entropy. The present study uses information transfer entropy that is succinctly explained as follows: Shannon specified the first boundary entropy, named H(x), from a discrete random variable, x: The parameter takes a value equal to unity (K = 1) on the condition that the amount of H(x) logarithm is expressed based on Napier's constant (e); n expresses the number of events with a probability of P( The entropy of two random and independent variables of x and y is obtained from the sum of the boundary entropy of the two variables: When the two random variables of x and y are associated, the joint entropy takes a value smaller than the total entropy obtained from Equation (5): Information transfer entropy is another type of entropy specifying the mutual relationship between x and y, and it is obtained as the differential of the total and joint entropies for the two dependent variables x and y [34]: Here, H(x|y) expresses the conditional entropy. The conditional entropy indicates x if y expresses the residual uncertainty in x on the condition that y is clearly known and vice versa.
According to the above explanations, entropy calculations can be generalized to M variables. The total entropy for the independent variable, x m (m = 1. 2. . . . .M), is computable from Equation (7): If the variables are associated, their joint entropy is calculated from Equation (8): The complementary information for entropy theory can be found in [27].

Adaptive Neuro-Fuzzy Inference System (ANFIS)
ANFIS is a model that uses neural network learning algorithms and fuzzy logic for designing nonlinear maps between the input and output spaces. The model is capable of learning by neural networks and defining and using the relationships between input-output variables by fuzzy rules, and subsequently creating the input structure of the system.
Three methods of grid partitioning, subtractive clustering and fuzzy c-means are utilized in the classical ANFIS model, and the present study uses fuzzy c-means [35]. For more information about the details of the ANFIS model and fuzzy c-means method, the reference to articles by Jang et al. [36] is suggested.
In the present study, three strong optimization algorithms have also been used for optimizing the ANFIS model, as briefly explained below.

Differential Evolution (DE) Algorithm
This algorithm is a method in the area of complementary calculations, and it was introduced by Storn and Price [32]. DE begins working by providing a series of suggested responses and tries to find the optimum response within a series of consecutive iterations. The most important feature of this algorithm is its high speed, simplicity and strength. DE sets three primary parameters of population size (NP), mutation weight (F) and crossing (Cr). For more information, please see the articles presented by Storn and Price [37] and Halabi et al. [38].

Genetic Algorithm (GA)
This algorithm was introduced by Holland [39]. The principal component in GA is a chromosome that is consisted of genes, and each gene explains a parameter of the problem [40]. As the most important stage in the genetic algorithm, the selection is the process through which the next generation's chromosomes are determined based on the current generation's survival law. In this algorithm, the search is commenced in an initial population of the strains and the unique strains are evaluated in each repetition corresponding to an efficiency condition and assigned a proportion value. For complementary information, refer to the article by Jaramillo et al. [41].

Particle Swarm Optimization (PSO) Algorithm
PSO was inspired by the collective movement of the fish or birds' flight in groups [42], and it was subsequently employed successfully in various sciences. In this algorithm, each solution is just a particle in the search space [43]. All of the particles are assigned a fitness value evaluated by a fitness function that must be optimized. Further information on the PSO algorithm can be found in Eberhart and Kennedy [42].
The schematic form of the ANFIS model, as well as its combination with DE, GA and PSO algorithms, is illustrated in Figure 2, considering the studied inputs and outputs. In this prediction approach, the time lags of each index can only be used to predict the same index; so here, this procedure is used for predicting JDI and MSPI separately.

Particle Swarm Optimization (PSO) Algorithm
PSO was inspired by the collective movement of the fish or birds' flight in groups [42], and it was subsequently employed successfully in various sciences. In this algorithm, each solution is just a particle in the search space [43]. All of the particles are assigned a fitness value evaluated by a fitness function that must be optimized. Further information on the PSO algorithm can be found in Eberhart and Kennedy [42].
The schematic form of the ANFIS model, as well as its combination with DE, GA and PSO algorithms, is illustrated in Figure 2, considering the studied inputs and outputs. In this prediction approach, the time lags of each index can only be used to predict the same index; so here, this procedure is used for predicting JDI and MSPI separately.

Group Method of Data Handling (GMDH)
This model was offered in 1970 by Ivakhnenko, and it is a multivariate nonlinear statistical analysis [39]. GMDH is based on the ability to distinguish nested second-or third-degree polynomials for simulating regression and performing classification. In GMDH, each composite layer is comprised of several nodes as the model's inputs for the transfer function. The results are transferred to the next layer that is per se employed as an input layer for the next layer of the network [40]. For more information, refer to articles by Ivakhnenko [44], Aghelpour and Varshavian [45], Ashrafzadeh et al. [46] and Aghelpour et al. [47] 2.4.6. Generalized Regression Neural Network (GRNN) This type of neural network is one type of multipurpose basic and approximating radial neural networks for smooth functions. GRNN is a three-layer neural network in which, like the other neural

Group Method of Data Handling (GMDH)
This model was offered in 1970 by Ivakhnenko, and it is a multivariate nonlinear statistical analysis [39]. GMDH is based on the ability to distinguish nested second-or third-degree polynomials for simulating regression and performing classification. In GMDH, each composite layer is comprised of several nodes as the model's inputs for the transfer function. The results are transferred to the next layer that is per se employed as an input layer for the next layer of the network [40]. For more information, refer to articles by Ivakhnenko [44], Aghelpour and Varshavian [45], Ashrafzadeh et al. [46] and Aghelpour et al. [47] 2.4.6. Generalized Regression Neural Network (GRNN) This type of neural network is one type of multipurpose basic and approximating radial neural networks for smooth functions. GRNN is a three-layer neural network in which, like the other neural networks [48]. The number of neurons existent on the first and last layers is, respectively, equal to the dimensions of the input and output vectors, but, unlike the other networks, the number of the neurons in the hidden layer of GRNN is equal to the number of the observed data. This type of neural network uses normal efficiency function in each of the neurons of the hidden layer, and the data inputted into this function for each neuron include the Euclidean distance between the input and observed data related to that neuron. See the sources for complementary information on GRNN [45,48,49].

Least Squares Support Vector Machine (LSSVM)
The least-squares support vector machine (LSSVM) was proposed by Suykens and Vandewalle in 1999, and it is used for the prediction of the turbulent time series [50]. LSSVM is a type of support vector machine that uses a collection of linear equations for training, while SVM uses a second-degree optimization problem, and this is their major difference [51]. For complementary information about this model, refer to Moazenzadeh and Mohammadi [52], Mohammadi and Aghashariatmadri [53] and Guan et al. [54].

Performance Evaluation Scales
To ensure the accuracy of the models' predictions, mean absolute error (MAE), root mean square error (RMSE), Willmott index (WI) and Pearson's correlation coefficient (R) were employed, and their related equations are shown in Equations (9)-(12): where y i and y are, respectively the observed data and their means; f i and f are the model-predicted data and their means, and n is the number of the data.

Input Selection
At first, the models' inputs, including temperature, precipitation and the values of the index itself, were backwardly arranged up to 12 steps. Since the periodicity of a hydrological process has a 1 year (12 months) return period, it is logical to consider up to 12-month time lags as inputs. Then, gamma tests and entropy theory were utilized for selecting the optimal inputs. The results of these evaluations are given in Tables 3 and 4. In these tables, P and T, respectively, denote the total monthly precipitation and mean temperature. The subscripts t i , as well, indicate ith time lag for each variable and they were considered as the models' input in the present study.
In the present study, the processes followed in the article by the references of Ahmadi et al. [18], Biazar et al. [55] and Biazar et al. [56], have also been implemented in such a way that the entropy of the entire data was first calculated in these tables based on the entropy theory and the variables were omitted one-by-one and the entropy values of each combination are calculated in a second stage. If by deleting a variable from the input combination, the entropy value exceeds the primary entropy, the variable should be removed from the input combination. As for the gamma test, the same two stages are executed so that the variable whose gamma value becomes smaller than the initial gamma is removed. This process is separately repeated for all the variables (P, T, JDI and MSPI), and the results of the input selection for JDI and MSPI are summarized in Tables 3 and 4.
Next, the input and target data (JDIt/MSPIt) are inserted into the models. In the modeling processes of the present study, 75% of the input and target samples was used for training, and 25% of them was applied in testing. In all of the stations, the modeling and prediction of multivariate drought were carried out based on the inputs selected by both of the methods (gamma and entropy). According to the paper's issue (predicting), in all of the models, the test part was shown in the evaluation tables and discussed. Anar P (a)  Anar P (a)

Results of JDI Prediction
Predicting JDI was implemented by two input scenarios and seven models in 10 stations. The results of JDI prediction are shown in Table 5.
This includes 140 evaluations for JDI. In these evaluations, there are 70 comparisons between entropy theory and gamma test, which shows the superiority of entropy theory in 43 cases (61.4%).   In this figure, the best prediction of each station in the test period (bold cases of Table 5) is presented beside its observation. At the first look at these plots (Figure 3), there is not significant overlapping of the outputs for the stations in JDI prediction. However, in some of the cases, the overlapping can be acceptable. For example, in Boshrouyeh station, the moderate and severe   The bold numbers are the best performance of each station, on its own row.
In this figure, the best prediction of each station in the test period (bold cases of Table 5) is presented beside its observation. At the first look at these plots (Figure 3), there is not significant overlapping of the outputs for the stations in JDI prediction. However, in some of the cases, the overlapping can be acceptable. For example, in Boshrouyeh station, the moderate and severe droughts that occurred in Augusts of 2011, 2012, 2013 and 2014, are predicted accurately. Or the moderate drought classes for Yazd station in September of 2002,2003,2005,2007,2011,2013,2014 and 2016, were acceptably predicted. However, in the wet classes of the stations, the overlapping is weak and unacceptable.

Results of MSPI Prediction
MSPI prediction was also implemented by the inputs of the two aforementioned methods, entropy theory and gamma test, and the evaluation results are shown in Table 6.
Similarly, for this index, there are 140 evaluations and 70 comparisons between the input selectors too. In 40 cases (about 57.1%), entropy theory was superior in comparison to gamma tests. In MSPI prediction like JDI, GMDH is evaluated as the most accurate model, which had the best predictions in all of the investigated stations. The most accurate prediction of MSPI with MAE = 0.280, RMSE = 0.374 and WI = 0.955, is made for Marvast station by GMDH model and inputs of entropy theory. In evaluating ANFIS and its hybrid forms, the results showed that DE, GA and PSO algorithms could cause on average 45.9%, 45.4% and 24.2% improvement for ANFIS, respectively. The weakest MSPI prediction is reported for the ANFIS model in Boshrouyeh station, with MAE = 1.235, RMSE = 1.086 and WI = 0.772, which is given by the inputs of gamma tests. In Figure 4, the overlapping of MSPI predictions and their observations are shown in time series plots.  The bold numbers are the best performance of each station, on its own row. ISPRS Int. J. Geo-Inf. 2020, 9,   In these graphs, the best predictions of each station (bold cases of Table 5) are drawn. At first look at Figure 4, it is obvious that there is stronger overlapping in MSPI prediction compared to JDI in all of the stations. The predictions of stations East Isfahan, Isfahan, Kashan and Yazd, have significant overlapping with their observations, even in severe and extreme drought events. The weakest MSPI predictions can be reported for Biarjmand and Boshrouyeh stations, but in comparison with JDI, they are significantly more accurate. In addition, in wet classes, the predictions of MSPI are In these graphs, the best predictions of each station (bold cases of Table 5) are drawn. At first look at Figure 4, it is obvious that there is stronger overlapping in MSPI prediction compared to JDI in all of the stations. The predictions of stations East Isfahan, Isfahan, Kashan and Yazd, have significant overlapping with their observations, even in severe and extreme drought events. The weakest MSPI predictions can be reported for Biarjmand and Boshrouyeh stations, but in comparison with JDI, they are significantly more accurate. In addition, in wet classes, the predictions of MSPI are acceptable in all of the investigated stations. In addition, the scatter plots for comparing the predicted, and observed values are provided in Figure 5.

Models' Comparisons
In this section, a Taylor diagram was used for comparing the overall performance of the models in such a way that the entire outputs of each model for the entire stations were evaluated against the real values of the indices ( Figure 6).

Models' Comparisons
In this section, a Taylor diagram was used for comparing the overall performance of the models in such a way that the entire outputs of each model for the entire stations were evaluated against the real values of the indices ( Figure 6). The diagram exhibits the error rate, correlation coefficient and the standard deviation of the estimated values in comparison to the observed amounts [57,58]. Taylor diagrams ( Figure 6) show that ANFIS and GRNN have had the lowest accuracy in contrast to the other models in terms of both of the suggested inputs (entropy theory and gamma test), and GMDH was found with the best predictions. It can be stated, according to the Taylor diagram, that the models' predictions are more correlated and less erroneous when applying the inputs of entropy theory. Furthermore, the standard deviation of the outputs also approaches the real values to some extent, with the use of entropy theory's inputs. In diagrams related to JDI, the maximum value of R is about 0.7, and the minimum value of RMSE is about 0.65 that is pertinent to the GMDH model and entropy theory. MSPI diagrams also confirm that GMDH (especially with entropy theory's inputs) has the best performance with RMSE below 0.5 and R found to be about 0.9. In this case (MSPI with entropy theory's inputs), the standard deviation values of the predicted data are closest to that of the observed data.
The models' error distributions are shown in violin plots (Figure 7). In this violin plot, the entire errors of each model (for all the stations) were delineated for each index. It can be seminally observed that the violin's curvature is softer in the violin plots drawn for ANFIS and GRNN models. This is reflective of the idea that these two models' errors have relatively more uniform distribution than the other models; or, in other words, the errors' frequencies slightly differ between the larger and smaller values, and the errors with larger values are relatively more frequent in these two models. In the ANFIS-PSO model and, after this, in LSSVM, ANFIS-GA and ANFIS-DE models, the violins' curvatures are reduced and their frequency is gradually increased about zero error. In the GMDH model, this issue reaches its peak point in such a way that the GMDH model's errors feature the highest frequency about zero in all the four diagrams of Figure 7. The issue can be clearly seen in the MSPI's outputs. The diagram exhibits the error rate, correlation coefficient and the standard deviation of the estimated values in comparison to the observed amounts [57,58]. Taylor diagrams ( Figure 6) show that ANFIS and GRNN have had the lowest accuracy in contrast to the other models in terms of both of the suggested inputs (entropy theory and gamma test), and GMDH was found with the best predictions. It can be stated, according to the Taylor diagram, that the models' predictions are more correlated and less erroneous when applying the inputs of entropy theory. Furthermore, the standard deviation of the outputs also approaches the real values to some extent, with the use of entropy theory's inputs. In diagrams related to JDI, the maximum value of R is about 0.7, and the minimum value of RMSE is about 0.65 that is pertinent to the GMDH model and entropy theory. MSPI diagrams also confirm that GMDH (especially with entropy theory's inputs) has the best performance with RMSE below 0.5 and R found to be about 0.9. In this case (MSPI with entropy theory's inputs), the standard deviation values of the predicted data are closest to that of the observed data.
The models' error distributions are shown in violin plots (Figure 7). In this violin plot, the entire errors of each model (for all the stations) were delineated for each index. It can be seminally observed that the violin's curvature is softer in the violin plots drawn for ANFIS and GRNN models. This is reflective of the idea that these two models' errors have relatively more uniform distribution than the other models; or, in other words, the errors' frequencies slightly differ between the larger and smaller values, and the errors with larger values are relatively more frequent in these two models.
In the ANFIS-PSO model and, after this, in LSSVM, ANFIS-GA and ANFIS-DE models, the violins' curvatures are reduced and their frequency is gradually increased about zero error. In the GMDH model, this issue reaches its peak point in such a way that the GMDH model's errors feature the highest frequency about zero in all the four diagrams of Figure 7. The issue can be clearly seen in the MSPI's outputs.

Evaluation of the Models' Accuracy in Drought Classes
In this part of the present study, the models' errors are evaluated with regard to the various drought classes. Something that draws attention at first glance in Figure 8 is that JDI is one class shorter than MSPI.

Evaluation of the Models' Accuracy in Drought Classes
In this part of the present study, the models' errors are evaluated with regard to the various drought classes. Something that draws attention at first glance in Figure 8 is that JDI is one class shorter than MSPI. The reason for this issue is JDI's weakness in showing the very severe droughts, and it had been previously examined by Bazrafshan et al. [2]. Considering Figure 8, MSPI takes a smaller RMSE value in comparison to JDI for all of the drought classes. The lowest prediction error pertains to the normal class, and the RMSE value is increased after this with the increase in the intensity of drought or wetness. It can also be observed that drought is predicted with smaller errors in contrast to wetness for identical intensities. For instance, in MSPI predictions based on the inputs suggested by entropy theory, RMSE value was smaller in severe drought periods than severe wetness periods and/or more accurate predictions were offered by the models for the very severe drought periods as compared to very intensive wetness periods. In addition, in predicting moderate and severe droughts (for both MSPI and JDI) as well as predicting very severe droughts (MSPI), GMDH followed by ANFIS-DE and ANFIS-GA can be trusted to give more reliable results compared to other models.

Discussion
Forecasting JDI has not been found among the previous studies up to now. MSPI forecasting was found only in one study [11] that is getting discussed. Aghelpour et al. [11] used the GA & PSO algorithms to optimize ANFIS, and there are also similar stations (Isfahan & Yazd). Comparing these two prediction studies shows that the similar models, including ANFIS, ANFIS-GA & ANFIS-PSO, had better performances for these two stations, in the research of Aghelpour et al. [11]. There are similarities between the models and stations; so, the difference can only be related to the time windows. In Aghelpour et al. [11], the prediction was implemented for the time windows MSPI9-12, but in the current study, MSPI1-12 was investigated. MSPI9-12 is an index of agricultural drought and deal with about one type of drought. However, MSPI1-12 is an index for three types of droughts, including meteorological, hydrological and agricultural drought simultaneously. In predicting three types of droughts simultaneously, more error is completely normal, compared to predicting just one type of drought.
In comparing the two multivariate drought indices, these prediction results can complement the study by Bazrafshan et al. [2]. They found MSPI is more appropriate than JDI for monitoring multivariate drought status in Iran. The present study adds that MSPI predictions are also more accurate than JDI's. Hence, it can be stated that the MSPI is more appropriate for both monitoring and predicting multivariate droughts compared to JDI in Iran. The reason for this issue is JDI's weakness in showing the very severe droughts, and it had been previously examined by Bazrafshan et al. [2]. Considering Figure 8, MSPI takes a smaller RMSE value in comparison to JDI for all of the drought classes. The lowest prediction error pertains to the normal class, and the RMSE value is increased after this with the increase in the intensity of drought or wetness. It can also be observed that drought is predicted with smaller errors in contrast to wetness for identical intensities. For instance, in MSPI predictions based on the inputs suggested by entropy theory, RMSE value was smaller in severe drought periods than severe wetness periods and/or more accurate predictions were offered by the models for the very severe drought periods as compared to very intensive wetness periods. In addition, in predicting moderate and severe droughts (for both MSPI and JDI) as well as predicting very severe droughts (MSPI), GMDH followed by ANFIS-DE and ANFIS-GA can be trusted to give more reliable results compared to other models.

Discussion
Forecasting JDI has not been found among the previous studies up to now. MSPI forecasting was found only in one study [11] that is getting discussed. Aghelpour et al. [11] used the GA & PSO algorithms to optimize ANFIS, and there are also similar stations (Isfahan & Yazd). Comparing these two prediction studies shows that the similar models, including ANFIS, ANFIS-GA & ANFIS-PSO, had better performances for these two stations, in the research of Aghelpour et al. [11]. There are similarities between the models and stations; so, the difference can only be related to the time windows. In Aghelpour et al. [11], the prediction was implemented for the time windows MSPI9-12, but in the current study, MSPI 1-12 was investigated. MSPI 9-12 is an index of agricultural drought and deal with about one type of drought. However, MSPI 1-12 is an index for three types of droughts, including meteorological, hydrological and agricultural drought simultaneously. In predicting three types of droughts simultaneously, more error is completely normal, compared to predicting just one type of drought.
In comparing the two multivariate drought indices, these prediction results can complement the study by Bazrafshan et al. [2]. They found MSPI is more appropriate than JDI for monitoring multivariate drought status in Iran. The present study adds that MSPI predictions are also more accurate than JDI's. Hence, it can be stated that the MSPI is more appropriate for both monitoring and predicting multivariate droughts compared to JDI in Iran.

Conclusions
Machine learning models have good accuracies for predicting multivariate drought based on JDI and MSPI and can be considered good alternatives among the models that can provide short-term predictions.
Among the studied models, GMDH was found to have the best accuracy and is suggested to be employed for multivariate drought predictions. The reason for that the number of parameters in this model is more than the other models. The number of layers and neurons in GMDH is changeable and creates a large range of network make-up, whereas GRNN has only spread parameters. The optimizers of ANFIS, including DE, GA and PSO, could improve the ANFIS's accuracy by 39.4%, 38.7% and 22.6%, respectively. Hence, in multivariate drought forecasting, the use of DE and GA is suggested.
On the other hand, according to the results of the gamma test and entropy theory, precipitation and temperature values of the previous months (in addition to the past amounts of the indices) are also influential in forecasting multivariate drought. Hence, the use of these variables as the models' inputs is suggested for predicting the general status of the drought. None of the two methods of input selection for short-term prediction of the indices proposed in the current research can be decisively considered superior to the other, but entropy theory seems to be more promising according to the general results. Generally, forecasting some different types of drought together (e.g., meteorological, hydrological and agricultural droughts simultaneously) by the current method is a "theoretical study" and to be changed to an "applied study", it is suggested to validate the indices by the actual simultaneous events of these different drought types.