Forecasting of SPI and Meteorological Drought Based on the Artificial Neural Network and M5P Model Tree

Pande, Chaitanya B.; Al-Ansari, Nadhir; Kushwaha, N. L.; Srivastava, Aman; Noor, Rabeea; Kumar, Manish; Moharir, Kanak N.; Elbeltagi, Ahmed

doi:10.3390/land11112040

Open AccessArticle

Forecasting of SPI and Meteorological Drought Based on the Artificial Neural Network and M5P Model Tree

¹

Indian Institute of Tropical Meteorology, Pune 411008, India

²

Civil, Environmental and Natural Resources Engineering, Lulea University of Technology, 97187 Lulea, Sweden

³

Division of Agricultural Engineering, ICAR-Indian Agricultural Research Institute, Pusa Campus, New Delhi 110012, India

⁴

Department of Civil Engineering, Indian Institute of Technology (IIT) Kharagpur, Kharagpur 721302, India

⁵

Department of Agricultural Engineering, Bahuddin Zakariya University, Multan 34200, Pakistan

⁶

College of Agricultural Engineering and Technology, Dr. R.P.C.A.U., Pusa 848125, India

⁷

Indian Institute of Forest Management, Bhopal 462003, India

⁸

Agricultural Engineering Department, Faculty of Agriculture, Mansoura University, Mansoura 35516, Egypt

^*

Authors to whom correspondence should be addressed.

Land 2022, 11(11), 2040; https://doi.org/10.3390/land11112040

Submission received: 14 September 2022 / Revised: 7 November 2022 / Accepted: 8 November 2022 / Published: 14 November 2022

(This article belongs to the Special Issue Earth Observation (EO) for Land Degradation and Disaster Monitoring)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Climate change has caused droughts to increase in frequency and severity worldwide, which has attracted scientists to create drought prediction models to mitigate the impacts of droughts. One of the most important challenges in addressing droughts is developing accurate models to predict their discrete characteristics, i.e., occurrence, duration, and severity. The current research examined the performance of several different machine learning models, including Artificial Neural Network (ANN) and M5P Tree in forecasting the most widely used drought measure, the Standardized Precipitation Index (SPI), at both discrete time scales (SPI 3, SPI 6). The drought model was developed utilizing rainfall data from two stations in India (i.e., Angangaon and Dahalewadi) for 2000–2019, wherein the first 14 years are employed for model training, while the remaining six years are employed for model validation. The subset regression analysis was performed on 12 different input combinations to choose the best input combination for SPI 3 and SPI 6. The sensitivity analysis was carried out on the given best input combination to find the most effective parameter for forecasting. The performance of all the developed models for ANN (4, 5), ANN (5, 6), ANN (6, 7), and M5P models was assessed through the different statistical indicators, namely, MAE, RMSE, RAE, RRSE, and r. The results revealed that SPI (t-1) is the most sensitive parameters with highest values of β = 0.916, 1.017, respectively, for SPI-3 and SPI-6 prediction at both stations on the best input combinations i.e., combination 7 (SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-11) and combination 4 (SPI-1/SPI-2/SPI-6/SPI-7) based on the higher values of R² and Adjusted R² while the lowest values of MSE values. It is clear from the performance of models that the M5P model has higher r values and lesser RMSE values as compared to ANN (4, 5), ANN (5, 6), and ANN (6, 7) models. Therefore, the M5P model was superior to other developed models at both stations.

Keywords:

standard precipitation index; drought forecasting; machine learning

1. Introduction

Drought is a normal tragedy that has a harmful impact on society and the atmosphere [1]. Drought has a tremendous influence on water availability, climate, agricultural production, and a huge effect on a region’s economy [2,3]. It is not easy to define drought because it is difficult to estimate the duration of an event. Drought builds up and leaves a lasting, monstrous effect over a geographical space without major infrastructure destruction [4,5]. A drought can be complex in length, intensity, or severity. For simplicity, drought is defined as an event where the water levels are low because of a persistent lack of rainfall.

Droughts can come in many forms, i.e., agricultural, hydrological, or socioeconomic, with the most common being meteorological drought [6,7]. Meteorological drought ensues when the average precipitation is too low. It is the most studied type for monitoring drought because it is the one that often initiates all others [8]. Meteorological drought frequency depends not on precipitation in a region but rather its variation. Large fluctuations in rainfall on the shortage side of the area designate drought. Any climatic zone can experience this, including Northeast India, which is one of the greatest rainfall areas of the globe. It can even happen in humid and tropical regions i.e., Malaysia [9], India [10] and Bangladesh [11,12]. A current investigation study found that the rainier regions of the earth, such as the tropics, will be more at risk than ever for devastating droughts [13]. As a result, there needs to be an increased focus on droughts monitoring and forecasting in tropical areas. Droughts are often worse in tropical areas because their ecosystems are accustomed to high yearly rainfall [14].

Maharashtra, a tropical region located in the north-south, has faced a series of devastating droughts [15]. These droughts were caused by low rainfall and low reserves of water. The drought’s origin of water scarcity has a profound and dire impact on the environment and the people who live there. They affect agriculture, infrastructure, and health [16,17]. These effects are intensified by population expansion, land use alternations, agricultural growth, and industrial development [18]. There is a dire demand for an appropriate conception and drought modeling to ensure viable planning and governing of water resources. However, drought is emerging as a serious issue, and its characteristics cause a challenge when determining the duration and intensity of droughts. Such features also make it challenging to define the spatial extent of droughts and their inter-arrival period [19,20].

Drought is a usual natural disaster in Maharashtra that normally happens once a year, in one part or another [21]. The government of Maharashtra has confirmed that, with 112.4 million people, Maharashtra has 197.9 billion cubic meters of available water resources, with 163.9 BCM of surface water and 33.9 BCM of groundwater. About 43% of the area in Maharashtra is in a shortfall or highly shortfall-sub-basins and encounters constant droughts. This city is more prone to agricultural damages from drought than other natural disasters [21]. Therefore, meteorological droughts have become a hot research topic in India [22,23]. Although various research studies have been conducted to evaluate the effect and danger of droughts on agriculture, economy, water resources, and society [22,24], no research has been conducted about forecasting droughts in India. This is especially important for India, as the country faces issues such as climate change.

The weather in India is changing due to its proximity to the equator and an increase in global temperatures [25,26]. The country has experienced major weather extremes, flooding, and other disasters with this change. In regions of India where drought is a constant concern, experts have noted an increase in the return period of droughts due to changes in temperature and rainfall patterns [27]. Climate change is expected to cause more economic damages due to droughts, affecting water resources and generating water scarcity [28]. These negative effects demand the establishment of models for forecasting and monitoring drought effectively to plan strategies for managing rough-related risks timely [29,30]. Drought forecasting is an essential part of drought management. Improper forecasting leads to poor management and even harms the environment. Thus, there is a demand for quick, authentic, and precise models for drought forecasting that can give quantitative data on forthcoming drought-related dangers. With these models, droughts can be forecasted accurately by utilizing the right combination of input variables or drought indices [31,32,33].

A wide variety of drought indices (DIs) were developed to monitor drought [34]. One of the most detailed and statistically robust drought indices is the standardized precipitation index (SPI). It is simple, easy to understand, and independent of climatic factors [26]. A new type of standardized precipitation index (SPI) [35] has been introduced to help predict drought. It has been broadly acquired by the drought forecasting community and utilized in many research studies to explore drought variability in agricultural and hydrological regions [1,36,37]. Machine learning (ML) techniques use a set of instructions that allow computers to learn from previous input and improve without necessitating a great deal of scripting [38]. Machine learning algorithms have been applied to a variety of climatological application domains, such as rainfall and temperature forecasting, to create models that can replicate the empirical relationship among the various variables [39]; drought forecasting [40]; forecasting extreme weather [41]; and streamflow modelling [42]. Some of the most widely used ML algorithms for modelling the relationship between various variables include: relevance vector machines (RVM); artificial neural networks (ANN); k-nearest neighbours (KNN); extreme learning machines (ELM); support vector machines (SVM); genetic programming (GP); and random forests [43,44,45,46,47,48,49]. There are many models for forecasting droughts. One is ARIMA, a regression integrated moving average. Another is MLR, which is multiple linear regression and is Markov Chain [50]. SPI is an index obtained from a dispersal of rainfall deficits. This means the scale of SPI is not linear. This is troublesome for forecasting droughts because traditional statistical techniques are difficult to prediction drought when they are utilized. Machine learning (ML) has been demonstrated to be an essential tool in the fight against climate change. Recently, it has been able to model drought indices and climatology at unprecedented levels of accuracy. Many different types of ML models can be used for predicting SPI. Some of the most popular are artificial neural network (ANN) and M5 Tree (M5P) ML models [51,52]. Although scientists and scholars have come up with many different models for modeling DIs, it is difficult to generalize or create a “perfect” model that can work for the tropical region. In addition, the inappropriate combination of inputs of a model’s structure can lead to misguidance. Additionally, each area acts distinctively in response to stochastic events and historical conditions. Therefore, there is a need to evaluate the best model for predicting the SPI in the tropical region.

The current research focuses on drought forecasting because in last five decades this area has suffered so much from drought and water shortages for irrigation and drinking purposes. In this area, moderate forecasting should be the main importance for the planning and handling of any natural drought, and effective plans should be developed for lessening the drought impact on human and agricultural hydrological systems. In this view, we have conducted an investigation of the viability and usefulness of the ML models to evaluation of the SPI-3, and SPI-6 area, during 2000–2019. The best subset regression was used in this study to choose the most useful factors as inputs to the created artificial models after many inputs were built. Though machine-learning models are typically used for forecasts, this paper focuses on developing such models for SPI forecasting in Maharashtra, India. Three discrete machine learning models were developed, such as ANN and M5T, for forecasting SPI at two different time scales, i.e., (SPI-3 and SPI-6). The drought model was developed utilizing rainfall data from two stations in India (i.e., Angangaon and Dahalewadi), for the period of 2000–2019, with the three objectives: (1) to develop and compare the machine learning models based on the best input combination and sensitivity analysis; (2) to estimate the forecasting of SPI-3 and SPI6; and (3) to find the best models for meteorological drought forecasting in the semi-arid region.

2. Materials and Methods

2.1. Study Area

The upper Godavari River basin is situated in central India, in the Maharashtra state. It is located between 19°00′00′′ and 20°30′00′′ N latitudes and 73°20′00′′ and 75°40′00′′ E longitudes (Figure 1). In this basin normal annual rainfall is 1110 mm with a tropical climate entire basin area.

2.2. Methodology

2.2.1. Standardized Precipitation Index (SPI)

Definition of SPI: A relatively recent drought index that simply considers precipitation is called the Standard Precipitation Index (SPI). It is a probability-based index that may be applied to any time frame. The relevant time scale is a month or two, and some processes, such as dry land agriculture, are quickly impacted by atmospheric behavior. [53] established the standardized precipitation index (SPI), whose application of computing the shortage of rainfall to estimate drought conditions is widespread. SPI requires long-term historic rainfall records so as to match with a desirable probability distribution, particularly the gamma distribution, followed by its conversion into a normal distribution [54]. The present study has considered the time series from 2000 to 2019, wherein the first 14 years are employed for model training, while the residual six years are employed for model validation. The SPI index has been calculated using the precipitation data with the drought indices package software using Equations (1)–(8). Equation (1) shows the expression for the probability density function (PDF) of the gamma distribution, where β represents the scale variable, α represents the shape variable, x represents the rainfall quantity, and Γ(α) represents the gamma function.

f (x; α, β) = x^{α - 1} e^{- x / β} \frac{1}{β^{α} Γ (α)} f o r x, α, β > 0

(1)

Equations (2)–(4) show the approximate finest values of α and β boundaries. Equation (5) has been used to derive the cumulative probability for non-zero rainfalls, while for the zero rainfall events (q), Equation (6) has been derived.

α = \frac{1}{4 A} (1 + \sqrt{\frac{4 A}{3}}) w h e r e A = \ln (x_{a v g}) - \frac{\sum \ln (x)}{n}

(2)

β = \frac{x_{a v g}}{4 A}

(3)

F (x; α, β) = \int_{0}^{x} f (x; α, β) d x = \frac{1}{β^{α} Γ (α)} \int_{0}^{x} x^{α - 1} e^{- x / β} d x

(4)

F (x; α, β) = \frac{1}{Γ (α)} \int_{0}^{x} t^{α - 1} e^{- t} d t w h e r e t = \frac{x}{β}

(5)

H (x) = q + (1 - q) F (x; α, β)

(6)

Following these steps yield SPI, wherein the cumulative probability is altered to a standardized normal, as shown in Equations (7) and (8), where p₀, p₁, p₂, and q₁, q₂, q₃ are constants.

S P I = - (k - \frac{p_{0} + p_{1} k + p_{2} k^{2}}{1 + q_{1} k + q_{2} k^{2} + q_{3} k^{3}})

(7)

h e n k = \sqrt{\ln (\frac{1}{{[H (x)]}^{2}})} a n d 0 < H (x) \leq 0.5

S P I = + (k - \frac{p_{0} + p_{1} k + p_{2} k^{2}}{1 + q_{1} k + q_{2} k^{2} + q_{3} k^{3}})

(8)

w h e n k = \sqrt{\ln (\frac{1}{{[1 - H (x)]}^{2}})} a n d 0 < H (x) \leq 1

Ref. [53] categorized meteorological droughts based on the SPI values. For the SPI value varying between 0 and −0.99, the condition corresponds to mild drought, between −1 and −1.49 corresponds to moderate drought, between −1.5 and −1.99 corresponds to severe drought, and for the scenarios crossing 2, conditions correspond to extreme drought.

2.2.2. Machine Learning Models

Artificial Neural Network (ANN)

The artificial neural network (ANN) comprises of processing neurons or nodes, which in the designated order are interconnected and thus are able to deliver simple to complex numerical manipulations [55,56]. The current study has used feed-forward neural network (FNN) and recurrent neural network (RNN) for network construction by employing the MATLAB software. Within the hidden layer, node numbers 4, 5, 6, and 7 have been tested [ANN (4, 5), (5, 6), and (6, 7)]. The node selection for testing followed a common trial-and-error approach. Further, three training algorithm has been used viz., Bayesian regularization (BR), Levenberg–Marquardt (LM), and gradient descent with momentum and adaptive learning rate back-propagation (GDX). During training, random variables with specified distributions were considered while assigning weights and biases. To avoid the over-fitting of the functions, regularization parameters have been estimated using statistical techniques. For input, SPI data has been imported such that three-time steps viz., one, two, and three (~6 months) ahead can be considered as a goal. Here, the input layer gradually facilitated data processing through successive layers and yielded results via the output layer. Equations (9)–(12) has been used for constructing and processing the present ANN model, where a, b, and c characterize the neurons in input, hidden, and output layers, respectively. Further, neurons of the input layer are networked to the intermediate layer via w_ba weights, and the neurons in the intermediate layer are associated to those in the output layer through w_ca masses. Equation (9) displays the net weighted input and biases (Net_b) to the b^th neuron of the hidden layer, where x_a and m_b distinctly is the input value to the a^th neuron of input layer and the bias of b^th hidden neuron. Equation (10) helps to generate an output (y_b) from the hidden neuron using an assignment or initiation function into the hidden layer [f(Net_b)]. In this equation, y_b is the output from b^th hidden neuron. Equation (11) shows how the net weighted input (Net_c) to the output neuron is conveyed. Similarly, Equation (12) shows the output from the c^th neuron in the output layer.

N e t_{b} = \sum_{a = 1}^{p} (w_{b a} x_{a} + m_{b})

(9)

y_{b} = f (N e t_{b}) = f_{h} (\sum_{a = 1}^{p} (w_{b a} x_{a} + m_{b}))

(10)

N e t_{c} = \sum_{b = 1}^{q} f_{0} (y_{b}) (w_{c a} x_{a} + m_{c})

(11)

y_{c} = f (N e t_{c}) = f_{0} (\sum_{b = 1}^{q} w_{c a} f_{h} (\sum_{a = 1}^{p} (w_{b a} x_{a} + m_{b})) + m_{c})

(12)

M5P Model Tree

Ref. [57] introduced the M5 algorithm, which was further reconstructed to develop the M5P model tree. This integrates the traditional decision tree with the linear regression function. Ref. [58] described the four steps in the M5P algorithm viz.: (1) splitting of input spaces; (2) developing a linear regression model; (3) pruning process; and (4) smoothing process. Besides, the M5P algorithm has been recognized as a robust algorithm due to its greater efficiency while dealing with missing data problems. Since M5P can efficiently handle and process large datasets so as to ensure reduced errors in the output, this study has considered it for analyzing and predicting the drought forecasting for study area.

The present study acquired data about the excruciating standards for the M5P model tree based on the error calculated at every node (linear regression functions are assigned on terminal nodes). The standard deviation of the class values is used for analyzing the error at each node. The attribute at each node is tested so as to select a particular attribute for splitting. This selection is majorly driven by determining the attribute that maximizes the expected error decline, which can be obtained by standard deviation reduction (SDR), as shown in Equation (13), where A represents the set of instances that attain the node; A_i represents the subset of illustrations that have the i^th product of the possible set, and SD represents the standard deviation.

D R = S D (A) \sum_{1}^{i} \frac{|A_{i}|}{|A|} S D (A_{i})

(13)

The MATLAB was used in this work in implementing the ML models because it is a programming platform designed specifically for engineers and scientists to analyze and design systems and products that transform our world. The heart of MATLAB is the MATLAB language, a matrix-based language allowing the most natural expression of computational mathematics.

2.3. Model Performance Evaluation

For model performance evaluation, this study included six statistical indices viz., mean square error (MSE), root mean square error (RMSE), relative root square error (RRSE), mean absolute error (MAE), relative absolute error (RAE), and coefficient of determination (r²). The MSE measures how nearby a fitted line is to data points using Equation (14) [59]. RMSE statistics represent the root mean square deviation of forecasted values from the observed values of time series, as shown in Equation (15) [59]. The RRSE measures the square root of the relative squared error such that the error is being decrease in the similar dimensions as the quantity being forecast, as shown in Equation (16) [60]. MAE statistics represent the mean absolute deviation of predicted values from the experiential values of time sequence, as shown in Equation (17) [59]; while RAE statistics denote the ratio of the absolute error of the measurement to the actual measurement, which helps to determine the magnitude of the absolute error in terms of the actual size of the measurement, as shown in Equation (18) [61]. Further, r² represents the measure of linear association between the dependent and independent variable, as shown in Equation (19) [62].

The models, which have been found to have a higher value of r² (closer to 1) and RRSE, and a lower value of MSE, RMSE, MAE, and RAE, are considered comparatively better models for SPI drought simulation. In the following equations (Equations (14)–(19)), O and P represent observed and predicted or simulated values for an i^th dataset; O_Avg and P_Avg represent the average or mean magnitude of observed and predicted or simulated values; and N represents the number of observations.

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(O_{i} - P_{i})}^{2}

(14)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(O_{i} - P_{i})}^{2}}

(15)

R R S E = \sqrt{\frac{\sum_{i = 1}^{N} {(O_{i} - P_{i})}^{2}}{\sum_{i = 1}^{N} {(O_{i} - O_{A v g})}^{2}}}

(16)

M A E = \frac{1}{N} \sum_{i = 1}^{N} |O_{i} - P_{i}|

(17)

A E = \frac{\sum_{i = 1}^{N} |O_{i} - P_{i}|}{\sum_{i = 1}^{N} |O_{i} - O_{A v g}|}

(18)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(O_{i} - P_{i})}^{2}}{\sum_{i = 1}^{N} {(O_{i} - O_{A v g})}^{2}}

(19)

3. Results

This section includes the results of the selection of the inputs to developed models for SPI-3 and SPI-6 predictions, the sensitivity analysis of the input parameters and performance evaluations of the developed models at both the stations i.e., Angangaon and Dahalewadi stations. The input variables of the models, such as SPI-1 to SPI-24, means one to 12 months for standardized precipitation index used in the different scenarios of the model. All SPI values are estimated using SPI package in the R- programming software. In this package, we have estimated the month wise SPI for study area. These results are presented in different sub-sections, which includes the descriptions, tables, and figures below.

3.1. Input Selection Using Best Subset Model for the SPI-3, and 6 Months

The regression analysis was performed on different input combinations to select the best input combination for the development of models at both stations. These best input combinations were used to develop the models for the prediction SPI-3 and SPI-6 at Angangaon station and Dahalewadi station. The regression analysis was carried out on 12 different input combinations. The selection of the best input combinations is created on the values of mean square error (MSE), determination coefficients (R²), Adjusted R², Mallows’ Cp, Akaike’s AIC, and Amemiya’s PC. The criterion for the selection of the best input grouping is based on the higher values of R² and Adjusted R², while the lowest values of MSE, Mallows’ Cp, Akaike’s AIC, and Amemiya’s PC.

Table 1 showed the regression analysis performed to determine the best input combination for SPI-3 and SPI-6 prediction at the Angangaon station. It is clear from Table 1A that combination 7 with variables SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-11 has the highest values of R² and Adjusted R² of 0.758 and 0.750, respectively, for the SPI-3 prediction. Similarly, Table 1B showed that combination 4 (SPI-1/SPI-2/SPI-6/SPI-7) has been selected as the best input combination for the prediction of SPI-6 at the Angangaon station.

Table 2A,B shows the values of performance evaluators for the selection of the best input combination at Dahalewadi station. The combination 7 (SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-11) and combination 4 (SPI-1/SPI-2/SPI-6/SPI-7) were selected as the best input combinations for the prediction of SPI-3 and SPI-6, respectively. It is observed that for the prediction of SPI-3, the best input combination 7 (SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-11) has the highest values of the R² and Adjusted R² of 0.758 and 750, and the lowest values of MSE of 0.471 (Table 2A). For predicting SPI-6, combination 4 (SPI-1/SPI-2/SPI-6/SPI-7) has the highest values of the R² and Adjusted R² of 0.847 and 0.844, and the lowest values of MSE of 0.417 at Dahalewadi meteorological station (Table 2B).

3.2. Sensitivity Analysis

The sensitivity analysis was performed on the given input variables to identify the most effective parameters at Angangaon station and Dahalewadi station. The results for the sensitivity analysis for SPI-3 and SPI-6 at Angangaon station are presented in Table 3A,B. It is clear from Table 3A that the input parameters SPI (t-1), SPI (t-3), SPI (t-4), SPI (t-5), SPI (t-8), SPI (t-9) and SPI (t-11) with absolute standard coefficient (β) values of 0.916, −0.168, 0.146, −0.138, 0.121, −0.094, and 0.113, respectively, obtained as the effective parameters for the prediction SPI-3 at both stations. Similarly, for SPI-6 prediction the input parameters SPI (t-1) (β = 1.017), SPI (t-2) (β = −0.085), SPI (t-6) (β = −0.184), SPI (t-7) (β = 0.167) found as the effective parameters at both the stations (Table 3(B)). Therefore, the results revealed that SPI (t-1) is the most sensitive parameter with the highest values of β = 0.916, 1.017, respectively, for SPI-3 and SPI-6 prediction were observed (Table 4A,B). The graphical representation of the effective input parameters is shown in Figure 2 and Figure 3.

3.3. Evaluation Machine Learning Models Based on the Best-Selected Subset Models

The performance of all the developed models for ANN (4, 5), ANN (5, 6), ANN (6, 7), and M5P models was assessed through the different statistical indicators, namely, MAE, RMSE, RAE, RRSE, and R². To select the best model, the model has the highest values of R² and the lowest values of MAE, RMSE, RAE, and RRSE.

3.3.1. Angangaon Station

Table 5A,B shows the results of ANN (4, 5), ANN (5, 6), ANN (6, 7), and M5P models for the prediction of SPI-3 and SPI-6 based on statistical indicators. Table 5 (A) revealed that the M5P model outperformed the ANN (4, 5), ANN (5, 6), and ANN (6, 7) models for SPI-3 prediction during training and testing phases. The values of MAE, RMSE, RAE, RRSE, and r for the M5P model were observed to be 0.709 and 0.388, 0.948, and 0.551, 76.47 and 48.58, 67.61 and 48.21, 0.757, and 0.884, respectively, during the training and testing phases. For the prediction of SPI-6 (Table 5B), ANN (6, 7) performed superior to other models during the training phases, while M5P models were found to be superior during the testing phases. The values of MAE, RMSE, RAE, RRSE, and r for ANN (6, 7) during the training phases, were obtained as 0.502, 0.743, 45.77, 48.56, and 0.885, respectively. Similarly, during testing phases the values obtained, respectively, as 0.396, 0.530, 46.85, 37.80, and 0.927 for the M5P model. The graphical representation through-line plot and scatter plot for ANN (4, 5), ANN (5, 6), ANN (6, 7), and M5P models during the testing phases were also analyzed, as shown in Figure 4 (SPI-3 prediction) and Figure 5 (SPI-6 prediction). The values of coefficient of determination (R²) for ANN (4, 5), ANN (5, 6), ANN (6, 7), and M5P models were observed as 0.705, 0.726, 0.740, and 0.782, respectively, for SPI-3 prediction. For SPI-6 prediction, the values of R² for ANN were obtained as 0.861 and that of the M5P model as 0.857. The developed models are in good agreement with 1:1 line. Therefore, it is clear from the quantitative and qualitative analysis, the M5P model was found to be the most accurate model for the prediction of SPI-3 and SPI-6 at Angangaon station. The results of all the developed models were also improved during SPI-6 prediction.

3.3.2. Dahalewadi Station

The results of developed ANN (4, 5), ANN (5, 6), ANN (6, 7), and M5P models based on performance evaluators for the prediction of SPI-3 and SPI-6 are shown in Table 6A,B. For the prediction of the SPI-3 model, the M5P model was found superior with MAE, RMSE, RAE, RRSE, and r values were obtained as 0.708 and 0.388, 0.947 and 0.551, 76.38, and 48.57, 67.53 and 48.21, 0.758, and 0.885, respectively, during the training and testing phases (Table 6A). Similarly, the values of the M5P model for SPI-6 prediction (Table 6B) during the training and testing phases were found, respectively, as 0.454 and 0.396, 0.710, and 0.530, 41.39 and 46.84, 46.38, and 37.80, 0.888, and 0.927. Furthermore, the graphical analysis showed that the values of R² for developed ANN (4, 5), ANN (5, 6), ANN (6, 7), and the M5P models were obtained as 0.705, 0.726, 0.740, and 0.782, respectively, during the testing phases for SPI-3 prediction (Figure 6). Likewise, the values were obtained, respectively, as 0.861, 0.862, 0.861, and 0.860 during testing phases for SPI-6 prediction (Figure 7). The developed models are in good agreement with 1:1 line. It was also observed that the results were improved during testing phases for al developed models. Therefore, it is clear that the M5P model outperformed the other developed models at Dahalewadi station. In comparison among SPI-3 and SPI-6 models, the M5P model during SPI-6 prediction was found superior and results were also improved during SPI-6 prediction.

4. Discussion

It is crucial to explore the potential application of machine learning methods and data mining approaches for dry season monitoring in order to develop better adaptability methods. Recent research has demonstrated that certain climate occurrences, such as drought episodes and the risks they pose, can be accurately predicted using machine learning algorithms [62]. In several scientific fields, machine learning techniques are now widely used, including: flood prediction and evaluation [63]; determining dust pollution [64]; modelling soil and landscapes [65]; and landslide susceptibility valuation [66]. Machine learning models outperform conventional statistical techniques, according to earlier studies. Machine learning algorithms can also handle enormous datasets and produce more accurate results [65,66]. In drought forecasting, we have spilt SPI-3 and 6-month datasets of 80% and 30%, used to train and test during the development of machine learning models. This dataset covers the 2000–2019 years, it is a times series dataset, hence the 20% dataset was used for testing, and the models performed better for drought forecasting. Machine learning models want to use huge datasets for creating models; if big datasets are used in machine learning model gives a more accuracy. We have checked all models on performance metrics, such as RMSE, MSE, etc. These metric indicators and the Talyor diagram are also helpful for finding the best models for SPI-3 and SPI-6 month drought forecasting, particularly in the semi-arid region. During the models we have a prepoly cleaning dataset and have removed the missing values in the datasets. The ML models are the better performers of all datasets. l The machine learning model results are sufficient for drought forecasting in the semi-arid region. If any models are given 70% accuracy they are very useful for drought forecasting under climatic changes in the semi-arid region. We have checked the ground reality as the basis and have developed the SPI-3 and 6 drought forecasting to be helpful for farmers and crops, particularly in the winter and summer seasons.

The Talyor diagram is better at understanding the model’s performance related to SPI-3 and SPI-6, and this gives more accuracy in the form of the correlation coefficient and standard deviations. This diagram could provide greater knowledge and be used to check our model performance on a mathematical diagram, so many researchers today can use it for model performance. Excessive evapotranspiration and moisture deficiency are two effects of extreme droughts on water resource imbalance [67]. Drought has been shown by some researchers to cause unaffordable socioeconomic losses, decreased agricultural productivity, and environmental deterioration [68]. The onset of droughts is indicated by a downward trend in long-run average precipitation (normal precipitation) for a given basin [69,70]. Droughts are characterized by low relative humidity, high temperatures, high wind velocity, and rainfall characteristics such as intensity, length, and distribution of rainfall during agricultural growth seasons [15]. The Taylor diagram [13] represents the performance of all developed models based on the correlation coefficient (r), root mean square deviation (RMSD) and standard deviation (SD) for all developed models at Angangaon station (Figure 8) and Dahalewadi station (Figure 9). It is clear from Figure 8 and Figure 9 that the M5P model has higher r values and lesser RMSD and SD values, as compared to ANN (4, 5), ANN (5, 6), and ANN (6, 7) models. Therefore, the M5P model was found to be superior to other developed models at both stations.

5. Conclusions

The purpose of this study is to investigate the feasibility of machine learning models to forecast the SPI drought index at two different scales (i.e., SPI-3 and SPI-6) in Maharashtra, India. The developed models examined monthly rainfall data from 2000–2019 at two discrete meteorological stations (i.e., Angangaon and Dahalewadi). The forecasting models were made possible with the help of the statistical auto-correlation method. It is observed that for the prediction of SPI-3, best input combination 7 (SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-11) has the highest values of the R² and Adjusted R² of 0.758 and 0.750, and lowest values of MSE of 0.471, while for predicting SPI-6, combination 4 (SPI-1/SPI-2/SPI-6/SPI-7) has the highest values of the R² and Adjusted R² of 0.847 and 0.844, and the lowest values of MSE of 0.417, at both meteorological stations. Moreover, SPI (t-1) is the most sensitive parameter with the highest values of β = 0.916 and 1.017, respectively, for the observed SPI-3 and SPI-6 prediction. The obtained forecasted outcomes show consistency in results attained utilizing ANN (4, 5), ANN (5, 6), ANN (6, 7); we observed minimal RMSE and greater R2 at both stations in forecasting the SPI-3 and SPI-6. However, the M5P shows the best performance during training with minimal RMSE values during training being (0.948, 0.919) and (0.947, 0.710), and during testing are (0.551, 0.530) and (0.551, 0.530) at Angangaon and Dahalewadi meteorological stations in forecasting the SPI-3 and SPI-6. It is clear from the quantitative and qualitative analysis that the M5P model was found to be the most accurate model for predicting SPI-3 and SPI-6 at both stations. This research will assist in establishing a system that can be utilized for the studied rainfall stations. It will also be a valuable tool for planners, policymakers, and water resource managers to mitigate droughts.

Author Contributions

C.B.P.: original draft writing, discussion section, formal analysis, methodology, supervision, data collection and analysis for modeling purpose, writing review and editing, investigation; A.E.: development of ML models, formal analysis, and writing review and editing; N.L.K.: writing results and discussion section, development of machine learning graphs, and create the Taylor diagrams and analysis; A.S., K.N.M., R.N., M.K. and N.A.-A.: original draft writing, writing review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the first author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mishra, A.K.; Singh, V.P. Drought modelling—A review. J. Hydrol. 2011, 403, 157–175. [Google Scholar] [CrossRef]
Amalero, E.G.; Ingua, G.L.; Erta, G.B.; Emanceau, P.L. Effects of drought stress on growth and yield of barley. Agron. Sustain. Dev. 2005, 23, 407–418. [Google Scholar] [CrossRef]
Deo, R.C.; Kisi, O.; Singh, V.P. Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model. Atmos. Res. 2017, 184, 149–175. [Google Scholar] [CrossRef] [Green Version]
Haykin, S. Neural Networks, a Comprehensive Foundation, 2nd ed.; Prentice-Hall: Englewood Cliffs, NJ, USA, 1999; pp. 135–155. [Google Scholar]
Rhee, J.; Im, J. Meteorological drought forecasting for ungauged areas based on machine learning: Using long-range climate forecast and remote sensing data. Agric. For. Meteorol. 2017, 237–238, 105–122. [Google Scholar] [CrossRef]
Ray, D.K.; Gerber, J.S.; Macdonald, G.K.; West, P.C. Climate variation explains a third of global crop yield variability. Nat. Commun. 2015, 6, 5989. [Google Scholar] [CrossRef] [Green Version]
Wilhite, D.A.; Svoboda, M.D.; Hayes, M.J. Understanding the complex impacts of drought: A key to enhancing drought mitigation and preparedness. Water Res. Manag. 2007, 21, 763–774. [Google Scholar] [CrossRef] [Green Version]
Ahmed, K.; Shahid, S.; Chung, E.S.; Wang, X.J.; Harun, S.B. Climate change uncertainties in seasonal drought severity-area-frequency curves: Case of arid region of Pakistan. J. Hydrol. 2019, 570, 473–485. [Google Scholar] [CrossRef]
Ukkola, A.M.; de Kauwe, M.G.; Roderick, M.L.; Abramowitz, G.; Pitman, A.J. Robust future changes in meteorological drought in CMIP6 projections despite uncertainty in precipitation. Geophys. Res. Lett. 2020, 47, e2020GL087820. [Google Scholar] [CrossRef]
Reddy, A.R.; Chaitanya, K.V.; Vivekanandan, M. Drought-induced responses of photosynthesis and antioxidant metabolism in higher plants. J. Plant Physiol. 2004, 161, 1189–1202. [Google Scholar] [CrossRef]
Alamgir, M.; Khan, N.; Shahid, S.; Yaseen, Z.M.; Dewan, A.; Hassan, Q.; Rasheed, B. Evaluating severity–area–frequency (SAF) of seasonal droughts in Bangladesh under climate change scenarios. Stoch. Environ. Res. Risk Assess. 2020, 34, 447–464. [Google Scholar] [CrossRef]
Sohn, S.J.; Tam, C.Y.; Ahn, J.B. Development of a multimodel-based seasonal prediction system for extreme droughts and floods: A case study for South Korea. Int. J. Climatol. 2013, 33, 793–805. [Google Scholar] [CrossRef]
Wang, Q.; Liu, Y.Y.; Zhang, Y.Z.; Tong, L.J.; Li, X.; Li, J.L.; Sun, Z. Assessment of spatial agglomeration of agricultural drought disaster in China from 1978 to 2016. Sci. Rep. 2019, 9, 14393. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sundnes, K.O. Glossary of Terms. Scand. J. Public Health 2014, 42, 178–190. [Google Scholar] [CrossRef]
Mishra, A.K.; Singh, V.P. A review of drought concepts. J. Hydrol. 2010, 391, 202–216. [Google Scholar] [CrossRef]
Kushwaha, N.L.; Rajput, J.; Elbeltagi, A.; Elnaggar, A.Y.; Sena, D.R.; Vishwakarma, D.K.; Mani, I.; Hussein, E.E. Data Intelligence Model and Meta-Heuristic Algorithms-Based Pan Evaporation Modelling in Two Different Agro-Climatic Zones: A Case Study from Northern India. Atmosphere 2021, 12, 1654. [Google Scholar] [CrossRef]
Rahmati, O.; Falah, F.; Dayal, K.S.; Deo, R.C.; Mohammadi, F.; Biggs, T.; Moghaddam, D.D.; Naghibi, S.A.; Bui, D.T. Machine learning approaches for spatial modeling of agricultural droughts in the south-east region of Queensland Australia. Sci. Total Environ. 2020, 699, 134230. [Google Scholar] [CrossRef] [PubMed]
Tan, M.L.; Juneng, L.; Tangang, F.T.; Chan, N.W.; Ngai, S.T. Future hydro-meteorological drought of the Johor River Basin, Malaysia, based on CORDEX-SEA projections. Hydrol. Sci. J. 2019, 64, 921–933. [Google Scholar] [CrossRef]
Deo, R.C.; Şahin, M. Application of the artificial neural network model for prediction of monthly standardized precipitation and evapotranspiration index using hydro-meteorological parameters and climate indices in eastern Australia. Atmos. Res. 2015, 161–162, 65–81. [Google Scholar] [CrossRef]
Spinoni, J.; Barbosa, P.; de Jager, A.; McCormick, N.; Naumann, G.; Vogt, J.V.; Magni, D.; Masante, D.; Mazzeschi, M. A new global database of meteorological drought events from 1951 to 2016. J. Hydrol. Reg. Stud. 2019, 22, 100593. [Google Scholar] [CrossRef]
Elbeltagi, A.; Zerouali, B.; Bailek, N.; Bouchouicha, K.; Pande, C.; Santos, C.A.G.; Towfiqul Islam, A.R.; Al-Ansari, N.; El-kenawy, E.-S.M. Optimizing hyperparameters of deep hybrid learning for rainfall prediction: A case study of a Mediterranean basin. Arab. J. Geosci. 2022, 15, 933. [Google Scholar] [CrossRef]
Bandyopadhyay, N.; Bhuiyan, C.; Saha, A.K. Drought mitigation: Critical analysis and proposal for a new drought policy with special reference to Gujarat (India). Prog. Disaster Sci. 2020, 5, 100049. [Google Scholar] [CrossRef]
Suarez, M.L.; Kitzberger, T. Differential effects of climate variability on forest dynamics along a precipitation gradient in northern Patagonia. J. Ecol. 2010, 98, 1023–1034. [Google Scholar] [CrossRef]
Mouatadid, S.; Raj, N.; Deo, R.C.; Adamowski, J.F. Input selection and data-driven model performance optimization to predict the Standardized Precipitation and Evaporation Index in a drought-prone region. Atmos. Res. 2018, 212, 130–149. [Google Scholar] [CrossRef]
Bhuiyan, C.; Singh, R.P.; Kogan, F.N. Monitoring drought dynamics in the Aravalli region (India) using different indices based on ground and remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 2006, 8, 289–302. [Google Scholar] [CrossRef]
Kushwaha, N.L.; Rajput, J.; Shirsath, P.B.; Sena, D.R.; Mani, I. Seasonal climate forecasts (SCFs) based risk management strategies: A case study of rainfed rice cultivation in India. J. Agrometeorol. 2022, 24, 10–17. [Google Scholar] [CrossRef]
Dai, A. Drought under global warming: A review. Wiley Interdiscip. Rev. Clim. Change 2011, 2, 45–65. [Google Scholar] [CrossRef] [Green Version]
McKee, T.B. Drought monitoring with multiple time scales. In Proceedings of the Ninth Conference on Applied Climatology, American Meteorological Society, Dallas, TX, USA, 15–20 January 1995; pp. 233–236. [Google Scholar]
Niranjan Kumar, K.; Rajeevan, M.; Pai, D.S.; Srivastava, A.K.; Preethi, B. On the observed variability of monsoon droughts over India. Weather. Clim. Extrem. 2013, 1, 42–50. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Witten, I.H. Induction of Model Trees for Predicting Continuous Classes (Working Paper 96/23); University of Waikato, Department of Computer Science: Hamilton, New Zealand, 1996; Available online: https://hdl.handle.net/10289/1183 (accessed on 13 September 2022).
Sakaa, B.; Elbeltagi, A.; Boudibi, S.; Chaffaï, H.; Islam, A.R.M.; Kulimushi, L.C.; Choudhari, P.; Hani, A.; Brouziyne, Y.; Wong, Y.J. Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin. Environ. Sci. Pollut. Res. 2022, 29, 48491–48508. [Google Scholar] [CrossRef]
Buttafuoco, G.; Caloiero, T.; Coscarelli, R. Analyses of drought events in Calabria (Southern Italy) using standardized precipitation index. Water Res. Manag. 2015, 29, 557–573. [Google Scholar] [CrossRef]
Singh, B.; Sihag, P.; Singh, K. Modelling of impact of water quality on infiltration rate of soil by random forest regression. Model. Earth Syst. Environ. 2017, 3, 999–1004. [Google Scholar] [CrossRef]
Zhang, R.; Chen, Z.Y.; Xu, L.J.; Ou, C.Q. Meteorological drought forecasting based on a statistical model with machine learning techniques in Shaanxi province, China. Sci. Total Environ. 2019, 665, 338–346. [Google Scholar] [CrossRef] [PubMed]
Qutbudin, I.; Shiru, M.S.; Sharafati, A.; Ahmed, K.; Al-Ansari, N.; Yaseen, Z.M.; Shahid, S.; Wang, X. Seasonal drought pattern changes due to climate variability: Case study in Afghanistan. Water 2019, 11, 1096. [Google Scholar] [CrossRef] [Green Version]
Park, S.; Im, J.; Jang, E.; Rhee, J. Drought assessment and monitoring through blending of multi-sensor indices using machine learning approaches for different climate regions. Agric. For. Meteorol. 2016, 216, 157–169. [Google Scholar] [CrossRef]
Shahid, S. Spatial and temporal characteristics of droughts in the western part of Bangladesh. Hydrol. Process. 2008, 22, 2235–2247. [Google Scholar] [CrossRef]
Parmar, A.; Mistree, K.; Sompura, M. Machine learning techniques for rainfall prediction: A review. In Proceedings of the International Conference on Innovations in information Embedded and Communication Systems, Coimbatore, India, 17–18 March 2017; Volume 3. [Google Scholar]
Tian, Y.E.; Xu, Y.-P.; Wang, G. Agricultural drought prediction using climate indices based on Support Vector Regression in Xiangjiang River basin. Sci. Total Environ. 2018, 622–623, 710–720. [Google Scholar] [CrossRef] [PubMed]
Khan, N.; Shahid, S.; Juneng, L.; Ahmed, K.; Ismail, T.; Nawaz, N. Prediction of heat waves in Pakistan using quantile regression forests. Atmos. Res. 2019, 221, 1–11. [Google Scholar] [CrossRef]
Yaseen, Z.M.; El-shafie, A.; Jaafar, O.; Afan, H.A.; Sayl, K.N. Artificial intelligence based models for stream-flow forecasting: 2000–2015. J. Hydrol. 2015, 530, 829–844. [Google Scholar] [CrossRef]
Pande, C.B.; Kadam, S.A.; Jayaraman, R.; Gorantiwar, S.; Shinde, M. Prediction of soil chemical properties using multispectral satellite images and wavelet transforms methods. J. Saudi Soc. Agric. Sci. 2022, 21, 21–28. [Google Scholar] [CrossRef]
Kouadri, S.; Pande, C.B.; Panneerselvam, B.; Moharir, K.N.; Elbeltagi, A. Prediction of irrigation groundwater quality parameters using ANN, LSTM, and MLR models. Environ. Sci. Pollut Res. 2022, 29, 21067–21091. [Google Scholar] [CrossRef]
Elbeltagi, A.; Kumar, N.; Chandel, A.; Arshad, A.; Pande, C.B.; Islam, A.R.M. Modelling the reference crop evapotranspiration in the Beas-Sutlej basin (India): An artificial neural network approach based on different combinations of meteorological data. Environ. Monit. Assess. 2022, 194, 141. [Google Scholar] [CrossRef]
Elbeltagi, A.; Nagy, A.; Mohammed, S.; Pande, C.B.; Kumar, M.; Bhat, S.A.; Zsembeli, J.; Huzsvai, L.; Tamás, J.; Kovács, E.; et al. Combination of Limited Meteorological Data for Predicting Reference Crop Evapotranspiration Using Artificial Neural Network Method. Agronomy 2022, 12, 516. [Google Scholar] [CrossRef]
Elbeltagi, A.; Kumar, M.; Kushwaha, N.L.; Pande, C.B.; Ditthakit, P.; Vishwakarma, D.K.; Subeesh, A. Drought indicator analysis and forecasting using data driven models: Case study in Jaisalmer, India. Stoch. Environ. Res. Risk Assess. 2022. [Google Scholar] [CrossRef]
Orimoloye, I.R.; Olusola, A.O.; Belle, J.A.; Pande, C.B.; Ololade, O.O. Drought disaster monitoring and land use dynamics: Identification of drought drivers using regression-based algorithms. Nat. Hazards 2022, 112, 1085–1106. [Google Scholar] [CrossRef]
Kumar, M.; Elbeltagi, A.; Pande, C.B.; Ahmed, A.N.; Chow, M.F.; Pham, Q.B.; Kumari, A.; Kumar, D. Applications of Data-driven Models for Daily Discharge Estimation Based on Different Input Combinations. Water Res. Manag. 2022, 36, 2201–2221. [Google Scholar] [CrossRef]
Gulhane, V.A.; Rode, S.V.; Pande, C.B. Correlation Analysis of Soil Nutrients and Prediction Model Through ISO Cluster Unsupervised Classification with Multispectral Data. Multimed. Tools Appl. 2022. [Google Scholar] [CrossRef]
Jalalkamali, A.; Moradi, M.; Moradi, N. Application of several artificial intelligence models and ARIMAX model for forecasting drought using the Standardized Precipitation Index. Int. J. Environ. Sci. Technol. 2015, 12, 1201–1210. [Google Scholar] [CrossRef] [Green Version]
Passioura, J.B. Drought and drought tolerance. Plant Growth Regul. 1996, 20, 79–83. [Google Scholar] [CrossRef]
Zin, W.Z.W.; Jemain, A.A.; Ibrahim, K. Analysis of drought condition and risk in Peninsular Malaysia using Standardised Precipitation Index. Theor. Appl. Climatol. 2013, 111, 559–568. [Google Scholar] [CrossRef]
Masroor, M.; Rehman, S.; Avtar, R.; Sahana, M.; Ahmed, R.; Sajjad, H. Exploring climate variability and its impact on drought occurrence: Evidence from Godavari Middle sub-basin, India. Weather. Clim. Extrem. 2020, 30, 100277. [Google Scholar] [CrossRef]
Fung, K.F.; Huang, Y.F.; Koo, C.H.; Soh, Y.W. Drought forecasting: A review of modelling approaches 2007–2017. J. Water Clim. Change 2019, 11, 771–799. [Google Scholar] [CrossRef]
Haykin, S.; Network, N. A comprehensive foundation. Neural Netw. 2004, 2, 41. Available online: https://ieeexplore.ieee.org/iel4/91/8807/x0153119.pdf (accessed on 13 September 2022).
Quinlan, J.R. Learning with continuous classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Tasmania, 16–18 November 1992; Adams & Sterling, Ed.; World Scientific: Singapore, 1992; Volume 92, pp. 343–348. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.885&rep=rep1&type=pdf (accessed on 13 September 2022).
Quiring, S.M. Monitoring drought: An evaluation of meteorological drought indices. Geogr. Compass 2009, 3, 64–88. [Google Scholar] [CrossRef]
Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. Available online: https://www.int-res.com/articles/cr2005/30/c030p079.pdf (accessed on 13 September 2022). [CrossRef]
Zargar, A.; Sadiq, R.; Naser, B.; Khan, F.I. A review of drought indices. Environ. Rev. 2011, 19, 333–349. [Google Scholar] [CrossRef] [Green Version]
Shah, D.; Mishra, V. Integrated Drought Index (IDI) for Drought Monitoring and Assessment in India. Water Resour. Res. 2020, 56, e2019WR026284. [Google Scholar] [CrossRef]
Chen, C.; Twycross, J.; Garibaldi, J.M. A new accuracy measure based on bounded relative error for time series forecasting. PLoS ONE 2017, 12, e0174202. [Google Scholar] [CrossRef] [Green Version]
Nabaei, S.; Sharafati, A.; Yaseen, Z.M.; Shahid, S. Copula based assessment of meteorological drought characteristics: Regional investigation of Iran. Agric. For. Meteorol. 2019, 276, 107611. [Google Scholar] [CrossRef]
Mandal, I.; Pal, S. Modelling human health vulnerability using different machine learning algorithms in stone quarrying and crushing areas of Dwarka river Basin. Eastern India. Adv. Space Res. 2020, 66, 1351–1371. [Google Scholar] [CrossRef]
Ebrahimi-Khusfi, Z.; Taghizadeh-Mehrjardi, R.; Roustaei, F.; Ebrahimi-Khusfi, M.; Mosavi, A.H.; Heung, B.; Soleimani-Sardo, M.; Scholten, T. Determining the contribution of environmental factors in controlling dust pollution during cold and warm months of western Iran using different data mining algorithms and game theory. Ecol. Ind. 2021, 132, 108287. [Google Scholar] [CrossRef]
Zeraatpisheh, M.; Jafari, A.; Bagheri Bodaghabadi, M.; Ayoubi, S.; Taghizadeh-Mehrjardi, R.; Toomanian, N.; Kerry, R.; Xu, M. Conventional and digital soil mapping in Iran: Past, present, and future. Catena 2020, 188, 104424. [Google Scholar] [CrossRef]
Saha, A.; Saha, S. Comparing the Efficiency of Weight of Evidence, Support Vector Machine and Their Ensemble Approaches in Landslide Susceptibility Modelling: A Study on Kurseong Region of Darjeeling Himalaya, India. Remote Sens. Appl. Soc. Environ. 2020, 19, 100323. [Google Scholar] [CrossRef]
Anderson, M.C.; Zolin, C.A.; Sentelhas, P.C.; Hain, C.R.; Semmens, K.; Yilmaz, M.T.; Gao, F.; Otkin, J.A.; Tetrault, R. The evaporative stress index as an indicator of agricultural drought in Brazil: An assessment based on crop yield impacts. Remote Sens. Environ. 2016, 174, 82–99. [Google Scholar] [CrossRef]
Akyuz, D.E.; Bayazit, M.; Onoz, B. Markov chain models for hydrological drought characteristics. J. Hydrometeorol. 2012, 13, 298–309. [Google Scholar] [CrossRef]
Edwards, D.C.; McKee, T.B. Characteristics of 20th-Century Drought in the United States at Multiple Time Scales; Climatology Report No. 97-2; Colorado State University: Ft. Collins, CO, USA, 1997. [Google Scholar]
Keyantash, J.; Dracup, J.A. The quantification of drought: An evaluation of drought indices. Bull. Am. Meteorol. Soc. 2002, 83, 1167–1180. [Google Scholar] [CrossRef]

Figure 1. Location map of the study stations. (insets show the study site’s location in the Maharashtra state of India).

Figure 2. The standardized coefficients of input variable for sensitivity analysis Angangaon station for (a) SPI-3, and (b) SPI-6.

Figure 3. The standardized coefficients of input variable for sensitivity analysis at Dahalewadi station for, (a) SPI-3, and (b) SPI-6.

Figure 4. Line plot (left) and scatter plot (right) of observed vs. estimated SPI values by the (a) ANN-(4, 5), (b) ANN-(5, 6), (c) ANN-(6, 7), and (d) M5P during testing at Angangaon station for SPI-3.

Figure 5. Line plot (left) and scatter plot (right) of observed vs. estimated SPI values by the (a) ANN (4, 5), (b) ANN (5, 6), (c) ANN (6, 7), and (d) M5P during testing at Angangaon station for SPI-6.

Figure 6. Line plot (left) and scatter plot (right) of observed vs. estimated SPI values by the (a) ANN (4, 5), (b) ANN (5, 6), (c) ANN (6, 7), and (d) M5P during testing at Dahalewadi station for SPI-3.

Figure 7. Line plot (left) and scatter plot (right) of observed vs. estimated SPI values by the (a) ANN (4, 5), (b) ANN (5, 6), (c) ANN (6, 7), and (d) M5P during testing at Dahalewadi station for SPI-6.

Figure 8. Taylor diagrams of ANN (4, 5), ANN (5, 6), ANN (6, 7), and M5P during testing span at Angangaon station for (a) SPI-3, (b) SPI-6.

Figure 9. Taylor diagrams of ANN (4, 5), ANN (5, 6), ANN (6, 7), and M5P during testing span at Dahalewadi station for (a) SPI-3, (b) SPI-6.

Table 1. The best subset regression analysis for determining the best input combinations at Angangaon station.

(A) SPI-3
Nbr. of variables	Variables	MSE	R²	Adjusted R²	Mallows’ Cp	Akaike’s AIC	Schwarz’s SBC	Amemiya’s PC
1	SPI-1	0.502	0.735	0.734	13.135	−162.225	−155.280	0.267
2	SPI-1/SPI-11	0.488	0.743	0.741	7.321	−167.872	−157.455	0.261
3	SPI-1/SPI-3/SPI-11	0.477	0.750	0.747	2.935	−172.308	−158.419	0.256
4	SPI-1/SPI-3/SPI-4/SPI-11	0.477	0.751	0.746	4.245	−171.014	−153.653	0.258
5	SPI-1/SPI-3/SPI-4/SPI-5/SPI-11	0.473	0.754	0.749	3.038	−172.324	−151.490	0.256
6	SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-11	0.472	0.756	0.749	3.522	−171.904	−147.599	0.257
7	SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-11	0.471	0.757	0.750	4.032	−171.468	−143.690	0.257
8	SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-11/SPI-12	0.472	0.758	0.750	5.537	−169.990	−138.740	0.259
9	SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.473	0.758	0.749	7.223	−168.321	−133.599	0.261
10	SPI-1/SPI-3/SPI-4/SPI-5/SPI-7/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.475	0.758	0.748	9.116	−166.435	−128.240	0.263
11	SPI-1/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.477	0.759	0.747	11.000	−164.557	−122.890	0.265
12	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.479	0.759	0.746	13.000	−162.557	−117.418	0.267
(B) SPI-6
Nbr. of variables	Variables	MSE	R²	Adjusted R²	Mallows’ Cp	Akaike’s AIC	Schwarz’s SBC	Amemiya’s PC
1	SPI-1	0.429	0.840	0.839	4.217	−196.719	−189.800	0.161
2	SPI-1/SPI-2	0.426	0.842	0.840	3.562	−197.388	−187.009	0.161
3	SPI-1/SPI-6/SPI-7	0.418	0.846	0.844	−0.094	−201.173	−187.334	0.158
4	SPI-1/SPI-2/SPI-6/SPI-7	0.417	0.847	0.844	0.453	−200.682	−183.384	0.159
5	SPI-1/SPI-6/SPI-7/SPI-9/SPI-12	0.417	0.847	0.844	1.546	−199.630	−178.873	0.159
6	SPI-1/SPI-2/SPI-6/SPI-7/SPI-9/SPI-12	0.417	0.848	0.844	2.640	−198.580	−174.363	0.160
7	SPI-1/SPI-2/SPI-3/SPI-6/SPI-7/SPI-9/SPI-12	0.417	0.849	0.844	3.516	−197.764	−170.088	0.161
8	SPI-1/SPI-2/SPI-3/SPI-4/SPI-6/SPI-7/SPI-9/SPI-12	0.418	0.849	0.843	5.376	−195.912	−164.775	0.162
9	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-9/SPI-12	0.420	0.849	0.843	7.165	−194.136	−159.540	0.163
10	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-9/SPI-11/SPI-12	0.421	0.849	0.842	9.016	−192.292	−154.237	0.165
11	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-8/SPI-9/SPI-11/SPI-12	0.423	0.849	0.841	11.004	−190.306	−148.791	0.166
12	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.425	0.849	0.841	13.000	−188.310	−143.335	0.167

Where: The best model for the selected selection criterion is displayed in blue; SPI-1 means SPI-3 months value of one-day lag (t − 1), SPI-2 means SPI-3 months value of two-day lag (t − 2), SPI-6 means SPI-3 months value of six-day lag (t − 6) and so on. The best model for the selected selection criterion is displayed in blue.

Table 2. The best subset regression analysis for determining the best input combinations at Dahalewadi station.

(A) SPI-3
Nbr. of variables	Variables	MSE	R²	Adjusted R²	Mallows’ Cp	Akaike’s AIC	Schwarz’s SBC	Amemiya’s PC
1	SPI-1	0.501	0.735	0.734	13.148	−162.296	−155.352	0.267
2	SPI-1/SPI-11	0.488	0.743	0.741	7.325	−167.951	−157.535	0.261
3	SPI-1/SPI-3/SPI-11	0.477	0.750	0.747	2.951	−172.376	−158.486	0.256
4	SPI-1/SPI-3/SPI-4/SPI-11	0.477	0.751	0.747	4.266	−171.077	−153.715	0.258
5	SPI-1/SPI-3/SPI-4/SPI-5/SPI-11	0.473	0.754	0.749	3.042	−172.403	−151.570	0.256
6	SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-11	0.472	0.756	0.750	3.525	−171.985	−147.679	0.257
7	SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-11	0.471	0.758	0.750	4.028	−171.556	−143.778	0.257
8	SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-11/SPI-12	0.472	0.758	0.750	5.537	−170.073	−138.823	0.259
9	SPI-1/SPI-3/SPI-4/SPI-5/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.473	0.758	0.749	7.226	−168.402	−133.679	0.261
10	SPI-1/SPI-3/SPI-4/SPI-5/SPI-7/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.475	0.758	0.748	9.118	−166.516	−128.322	0.263
11	SPI-1/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.477	0.759	0.747	11.000	−164.640	−122.973	0.265
12	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.479	0.759	0.746	13.000	−162.641	−117.501	0.267
(B) SPI-6
Nbr. of variables	Variables	MSE	R²	Adjusted R²	Mallows’ Cp	Akaike’s AIC	Schwarz’s SBC	Amemiya’s PC
1	SPI-1	0.429	0.840	0.839	4.222	−196.736	−189.817	0.161
2	SPI-1/SPI-2	0.426	0.842	0.840	3.565	−197.406	−187.027	0.161
3	SPI-1/SPI-6/SPI-7	0.418	0.846	0.844	−0.092	−201.193	−187.354	0.158
4	SPI-1/SPI-2/SPI-6/SPI-7	0.417	0.847	0.844	0.454	−200.703	−183.405	0.159
5	SPI-1/SPI-6/SPI-7/SPI-9/SPI-12	0.417	0.847	0.844	1.545	−199.652	−178.895	0.159
6	SPI-1/SPI-2/SPI-6/SPI-7/SPI-9/SPI-12	0.417	0.848	0.844	2.639	−198.602	−174.385	0.160
7	SPI-1/SPI-2/SPI-3/SPI-6/SPI-7/SPI-9/SPI-12	0.417	0.849	0.844	3.516	−197.785	−170.109	0.161
8	SPI-1/SPI-2/SPI-3/SPI-4/SPI-6/SPI-7/SPI-9/SPI-12	0.418	0.849	0.843	5.377	−195.933	−164.796	0.162
9	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-9/SPI-12	0.420	0.849	0.843	7.165	−194.157	−159.561	0.163
10	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-9/SPI-11/SPI-12	0.421	0.849	0.842	9.016	−192.314	−154.259	0.165
11	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-8/SPI-9/SPI-11/SPI-12	0.423	0.849	0.841	11.004	−190.327	−148.812	0.166
12	SPI-1/SPI-2/SPI-3/SPI-4/SPI-5/SPI-6/SPI-7/SPI-8/SPI-9/SPI-10/SPI-11/SPI-12	0.425	0.849	0.841	13.000	−188.331	−143.357	0.167

The best model for the selected selection criterion is displayed in blue.

Table 3. The regression analysis for identifying the most effective parameters at Angangaon station.

(A) SPI-3
Source	Value	Standard Error	t	Pr > \|t\|	Lower Bound (95%)	Upper Bound (95%)
SPI-(t-1)	0.916	0.048	19.262	<0.0001	0.823	1.010
SPI-(t-2)	0.000	0.000
SPI-(t-3)	−0.168	0.076	−2.206	0.028	−0.318	−0.018
SPI-(t-4)	0.146	0.088	1.666	0.097	−0.027	0.320
SPI-(t-5)	−0.138	0.069	−2.006	0.046	−0.274	−0.002
SPI-(t-6)	0.000	0.000
SPI-(t-7)	0.000	0.000
SPI-(t-8)	0.121	0.069	1.746	0.082	−0.016	0.258
SPI-(t-9)	−0.094	0.076	−1.231	0.219	−0.245	0.056
SPI-(t-10)	0.000	0.000
SPI-(t-11)	0.113	0.048	2.355	0.019	0.018	0.207
SPI-(t-12)	0.000	0.000
(B) SPI-6
Source	Value	Standard Error	t	Pr > \|t\|	Lower Bound (95%)	Upper Bound (95%)
SPI-(t-1)	1.017	0.065	15.696	<0.0001	0.889	1.145
SPI-(t-2)	−0.085	0.070	−1.218	0.225	−0.223	0.053
SPI-(t-3)	0.000	0.000
SPI-(t-4)	0.000	0.000
SPI-(t-5)	0.000	0.000
SPI-(t-6)	−0.184	0.070	−2.623	0.009	−0.322	−0.046
SPI-(t-7)	0.167	0.065	2.583	0.010	0.040	0.295
SPI-(t-8)	0.000	0.000
SPI-(t-9)	0.000	0.000
SPI-(t-10)	0.000	0.000
SPI-(t-11)	0.000	0.000
SPI-(t-12)	0.000	0.000

Table 4. The regression analysis for identifying the most effective parameters at Dahalewadi station.

(A) SPI-3
Source	Value	Standard Error	t	Pr > \|t\|	Lower Bound (95%)	Upper Bound (95%)
SPI-(t-1)	0.916	0.048	19.265	<0.0001	0.823	1.010
SPI-(t-2)	0.000	0.000
SPI-(t-3)	−0.168	0.076	−2.204	0.029	−0.318	−0.018
SPI-(t-4)	0.146	0.088	1.666	0.097	−0.027	0.320
SPI-(t-5)	−0.139	0.069	−2.010	0.046	−0.275	−0.003
SPI-(t-6)	0.000	0.000
SPI-(t-7)	0.000	0.000
SPI-(t-8)	0.121	0.069	1.748	0.082	−0.015	0.258
SPI-(t-9)	−0.094	0.076	−1.234	0.218	−0.245	0.056
SPI-(t-10)	0.000	0.000
SPI-(t-11)	0.113	0.048	2.358	0.019	0.019	0.207
SPI-(t-12)	0.000	0.000
(B) SPI-6
Source	Value	Standard Error	t	Pr > \|t\|	Lower Bound (95%)	Upper Bound (95%)
SPI-(t-1)	1.017	0.065	15.696	<0.0001	0.889	1.145
SPI-(t-2)	−0.085	0.070	−1.218	0.225	−0.223	0.053
SPI-(t-3)	0.000	0.000
SPI-(t-4)	0.000	0.000
SPI-(t-5)	0.000	0.000
SPI-(t-6)	−0.184	0.070	−2.623	0.009	−0.322	−0.046
SPI-(t-7)	0.167	0.065	2.584	0.010	0.040	0.295
SPI-(t-8)	0.000	0.000
SPI-(t-9)	0.000	0.000
SPI-(t-10)	0.000	0.000
SPI-(t-11)	0.000	0.000
SPI-(t-12)	0.000	0.000

Table 5. MAE, RMSE, RAE, RRSE, and r for machine learning algorithms-based models during the training and testing span at Angangaon station.

(A) SPI-3
Machine Learning Slgorithm	Training					Testing
Machine Learning Slgorithm	MAE	RMSE	RAE	RRSE	r	MAE	RMSE	RAE	RRSE	r
ANN (4, 5)	0.905	1.275	97.62	90.94	0.652	0.650	0.828	81.30	72.45	0.840
ANN (5, 6)	0.884	1.225	95.45	87.33	0.670	0.650	0.816	81.20	71.41	0.852
ANN (6, 7)	0.885	1.228	95.54	87.57	0.674	0.636	0.794	79.50	69.47	0.860
M5P	0.709	0.948	76.47	67.61	0.757	0.388	0.551	48.58	48.21	0.884
(B) SPI-6
Machine learning algorithm	Training					Testing
Machine learning algorithm	MAE	RMSE	RAE	RRSE	r	MAE	RMSE	RAE	RRSE	r
ANN (4, 5)	0.516	0.754	47.08	49.29	0.884	0.603	0.754	71.29	53.72	0.928
ANN (5, 6)	0.507	0.747	46.28	48.85	0.885	0.579	0.730	68.39	52.01	0.928
ANN (6, 7)	0.502	0.743	45.77	48.56	0.885	0.564	0.715	66.59	50.95	0.928
M5P	0.627	0.919	57.17	60.06	0.799	0.396	0.530	46.85	37.80	0.927

Table 6. MAE, RMSE, RAE, RRSE, and 7 for machine learning algorithms-based models during the training and testing span at Dahalewadi station.

(A) SPI-3
Machine Learning Algorithm	Training					Testing
Machine Learning Algorithm	MAE	RMSE	RAE	RRSE	r	MAE	RMSE	RAE	RRSE	r
ANN (4, 5)	0.904	1.275	97.55	90.89	0.653	0.650	0.828	81.29	72.44	0.840
ANN (5, 6)	0.884	1.224	95.35	87.26	0.671	0.650	0.816	81.21	71.42	0.852
ANN (6, 7)	0.885	1.228	95.51	87.57	0.675	0.636	0.794	79.50	69.47	0.861
M5P	0.708	0.947	76.38	67.53	0.758	0.388	0.551	48.57	48.21	0.885
(B) SPI-6
Machine learning algorithm	Training					Testing
Machine learning algorithm	MAE	RMSE	RAE	RRSE	r	MAE	RMSE	RAE	RRSE	r
ANN (4, 5)	0.516	0.754	47.07	49.29	0.884	0.603	0.754	71.27	53.72	0.928
ANN (5, 6)	0.507	0.747	46.27	48.84	0.885	0.579	0.730	68.38	52.00	0.928
ANN (6, 7)	0.502	0.743	45.77	48.55	0.885	0.563	0.715	66.58	50.94	0.928
M5P	0.454	0.710	41.39	46.38	0.888	0.396	0.530	46.84	37.80	0.927

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pande, C.B.; Al-Ansari, N.; Kushwaha, N.L.; Srivastava, A.; Noor, R.; Kumar, M.; Moharir, K.N.; Elbeltagi, A. Forecasting of SPI and Meteorological Drought Based on the Artificial Neural Network and M5P Model Tree. Land 2022, 11, 2040. https://doi.org/10.3390/land11112040

AMA Style

Pande CB, Al-Ansari N, Kushwaha NL, Srivastava A, Noor R, Kumar M, Moharir KN, Elbeltagi A. Forecasting of SPI and Meteorological Drought Based on the Artificial Neural Network and M5P Model Tree. Land. 2022; 11(11):2040. https://doi.org/10.3390/land11112040

Chicago/Turabian Style

Pande, Chaitanya B., Nadhir Al-Ansari, N. L. Kushwaha, Aman Srivastava, Rabeea Noor, Manish Kumar, Kanak N. Moharir, and Ahmed Elbeltagi. 2022. "Forecasting of SPI and Meteorological Drought Based on the Artificial Neural Network and M5P Model Tree" Land 11, no. 11: 2040. https://doi.org/10.3390/land11112040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Forecasting of SPI and Meteorological Drought Based on the Artificial Neural Network and M5P Model Tree

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Methodology

2.2.1. Standardized Precipitation Index (SPI)

2.2.2. Machine Learning Models

Artificial Neural Network (ANN)

M5P Model Tree

2.3. Model Performance Evaluation

3. Results

3.1. Input Selection Using Best Subset Model for the SPI-3, and 6 Months

3.2. Sensitivity Analysis

3.3. Evaluation Machine Learning Models Based on the Best-Selected Subset Models

3.3.1. Angangaon Station

3.3.2. Dahalewadi Station

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI