Modeling and Forecasting of Rice Prices in India during the COVID-19 Lockdown Using Machine Learning Approaches

: Via national lockdowns, the COVID-19 pandemic disrupted the production and distribution of foodstuffs worldwide, including rice ( Oryza sativa L.) production, affecting the prices in India’s agroecosystems and markets. The present study was performed to assess the impact of the COVID-19 national lockdown on rice prices in India, and to develop statistical machine learning models to forecast price changes under similar crisis scenarios. To estimate the rice prices under COVID-19, the general time series models, such as the autoregressive integrated moving average (ARIMA) model, the artiﬁcial neural network (ANN) model, and the extreme learning machine (ELM) model, were applied. The results obtained using the ARIMA intervention model revealed that during the COVID-19 lockdown in India, rice prices increased by INR 0.92/kg. In addition, the ELM intervention model was faster, with less computation time, and provided better results vs other models because it detects the nonlinear pattern in time series data, along with the intervention variable, which was considered an exogenous variable. The use of forecasting models can be a useful tool in supporting decision makers, especially under unpredictable crises. The study results are of great importance for the national agri-food sector, as they can bolster authorities and policymakers in planning and designing more sustainable interventions in the food market during (inter)national crisis situations.


Introduction
Following the outbreak of the COVID-19 pandemic, which was declared in almost every country in March 2020, attempts were made to contain the virus and minimize the consequences of the health crisis through various measures, including national lockdowns [1]. However, the mandated lockdowns had a significant impact on the agri-sector, delaying the delivery of most inputs (e.g., fertilizers and other agrochemicals), as well as foodstuff on the markets, and imposing significant fluctuations in both their supply and prices [2].
Rice (Oryza sativa L.) is one of the most important agricultural commodities in the agricultural supply chain. More people are fed directly by rice than by any other crop, making rice the most significant global food crop. Asia produces >90% of the world's milled rice, while for the majority of the population in Southeast Asia, rice is a primary source of nutrition. India boasts the largest rice-growing area in the world, with 43.82 million hectares, producing 112 million tons of rice [3]. In addition, India has the largest area under paddy cultivation among paddy-growing countries, yet it lags behind China in terms of output volume. However, India's productivity is substantially lower than that of Egypt, Japan, China, Vietnam, the USA, and Indonesia, and is even lower than the global average [4][5][6]. Thus, any disturbance in India's rice agroecosystems, as well as the supply chain and market, can trigger rice price fluctuation, not only on a national but also on an international and even global scale.
Price forecasting and modeling support the formulation of policies required for longterm and comprehensive economic development, as well as decision making and efficient scheduling for the national economy. Based on the past time series data under consideration, time series models are used to develop effective forecasting approaches. The autoregressive integrated moving average (ARIMA) model is widely used, due to its statistical capabilities and the well-known Box-Jenkins model-construction process [7]. In many studies, ARIMA models have been successfully applied to forecast the time series of various consumptions and requirements [8], including the production and exports of different agricultural commodities [9][10][11][12][13]. In addition, an ARIMA genetic algorithm was recently employed to estimate maize yield [14] and oilseed production [15] in India's agroecosystems.
Intervention with the ARIMA time series model developed by Box and Tiao [16] is the most popular method for modeling interrupted time series data. Bianchi and colleagues [17] used the ARIMA intervention model previously for application planning and budgeting. To forecast the cotton yield in India, ARIMA intervention models were used by Ray et al. [18], and the ARIMA intervention model performed better than the ARIMA model in the study conducted by Ramasubramanian and Ray [19]; to estimate the Chinese stock prices, an ARIMA intervention model was employed by Jeffrey and Kyner [20]. By conducting the interviews during the period from February 2020 to January 2021, Corchuelo Martnez-Aza et al. [21] evaluated the impact on agri-food business in the Spanish province of Extremadura. A multinomial logit regression model was employed by Di Marcantonio et al. [22] to detect the factors affecting the impact of the pandemic (COVID-19) on food waste. The general time series models, such as ARIMA and the ARIMA intervention models, were unable to recognize nonlinear components in a time series, leading to weak forecasts.
Numerous parametric nonlinear models have been developed to handle the challenge of time series data having nonlinear components when the process of creating the data is highly heterogeneous, nonlinear, complicated, and chaotic in character. The most popular methods for modeling and forecasting time series data over the years have been artificial neural networks (ANN), which have been successfully applied in different conditions [22][23][24][25][26][27].
The ability of artificial intelligence (AI) to model nonlinear data, difficult data, and unclear data, without the need for the precise model specification, is its key benefit. Based on the historical time series data, traditional AI algorithms were used to forecast the data, and intervention AI models were used to model the time series data, with the intervention variable considered as an exogeneous variable [24,27]. However, in general, ANN requires a long time to tweak the model parameters; to overcome this issue, we have developed the extreme learning machine (ELM) model and the ELM intervention model, which was trained at a much faster rate. ELM models are capable of achieving good generalization performance, as well as learning thousands of times quicker than the backpropagation networks proposed by Huang and others [28]. Furthermore, it was confirmed that these models outperform support vector machines in classification and regression applications [29][30][31].
To determine the impact of agricultural plans or unforeseen changes, prominent classical time series models, such as ARIMA and its intervention models, were used. These models are not capable of detecting nonlinear time series data; thus, they have been altered. To address this issue, we established ANN and ELM-based intervention models. In the input layer of the process, this model contains only one intervention variable that functions as an exogenous variable. This study set out to determine how the COVID-19 lockdown affected rice prices, as well as to evaluate the effect of the COVID-19 related shutdown on rice prices. The recent lockdown imposed by the government of India due to the COVID-19 pandemic had an abrupt impact on the prices of agricultural commodities. This study made an effort to assess the effect of the COVID-19 outbreak on Indian rice prices. Statistical and Agronomy 2022, 12, 2133 3 of 13 machine learning time series intervention was used to forecast future prices. We believe that the study outcomes can be exploited by national and international authorities and policymakers in shaping more resistant and proactive measures and plans regarding future similar crises in the agricultural sector.

Data Source
The input data for the time series analysis on daily rice prices (INR/kg) from January-June 2020 were obtained from the Ministry of Consumer Affairs of the Indian government. Since the Indian government declared a national lockdown on 25 March, this date was regarded as the date of intervention. Rice prices from 1 January to 24 March 2020 were considered part of the pre-intervention period, while prices from 25 March to 30 June 2020 were considered as part of the post-intervention period. In addition, data from 1 January to 23 June were considered as a training set, and from 24 to 30 June 2020 were considered as a validation set.

ARIMA Model
The most popular methodology employed in linear time series analysis is the Box-Jenkins procedure of model building using the ARIMA model. The model is written in the form ARIMA (p, d, q), where p is the auto regressive process order, d is the data stationary process order, and q is the moving average process order for a time series Y t , as follows; where, ϕ is the autoregressive parameter, θ is the moving average parameter, d is the degree of differencing parameter, B is the backshift operator, and ε t is white noise. The steps in ARIMA model building are: (1) model order identification using ACF (autocorrelation function), PACF (partial autocorrelation function), AIC (Akaike information criterion), and BIC (Bayesian information criterion); (2) model estimation using maximum likelihood estimation (MLE); (3) diagnostic checking of residuals; and (4) forecasting of out-of-sample values.

ARIMA Intervention Model
Intervention analysis using the ARIMA Box and Tiao [16] methods established a time series analysis technique that incorporates the effects of external forces, which are called interventions, through modeling methods. An intervention model is a time series model that is used to investigate the impact of external factors on the series. Intervention models are subsets of transfer function modeling, in which the exogenous variable is a categorical variable. As a result, the intervention model using the seasonal ARIMA process can be written as is known as an intervention component; this model can extend to include multiple intervention components, accounting for various types of interventions that influence the process.
Out of the three types of interventions, the COVID-19 pandemic falls under step intervention, which occurs at a specific point in time and across countries, its impact may be endless over time, and it may increase or decrease. The codes or indicator variables for this type of intervention are: before the pandemic, 0, and during the pandemic, 1.

Artificial Neural Network
Over the last three decades, the ANN has been the most frequently used AI technique for time series modeling and forecasting. For the time series regression problem, the ANN Agronomy 2022, 12, 2133 4 of 13 model takes time lags in AR and MA as parameters. The ANN mimics the intelligence behavior of human, in which the information is processed in three layers, e.g., input, hidden, and output layers. The neural network in a time series framework follows feed forward architecture, as the input moves in a forward direction. The general expression of ANN is given as follows; where a j and b ij are the synopsis weights, p is the input node represented in terms of number of lags, q is the number of hidden nodes, and g is the activation function. The difference between expected and actual values was treated as an error function, which was reduced through the ANNs training stage, expressed as follows; The number of error phrases was represented by N. To modify the neural network parameters, a change was made in the term ∆w ij as ∆w ij = −η ∂E ∂w ij ; η gives the learning rate of the model. Most commonly, the logistic transfer function is used as an activation function from the first (input) to the hidden layer, and the linear activation function is used from the hidden to the last (output) layer. This provides a more balanced output to estimate the problem, with continuous target values [24,27].

ELM Model
The extreme learning (ELM) algorithm is a single-layer feed-forward neural network; the training procedure of ELM is slightly different than for the regular neural network. The weights are assigned randomly in the input layers, and weights are estimated using the generalized inverse of the hidden layer output matrix [28] in the output layer, leading to fast and accurate training in the model building procedure.
The output layer of ELM is given as; The number of hidden neurons was represented by h, the activation function was represented by g, the vector input layer weight connected with the ith hidden neuron was represented by w i , and jth input vector was represented by x j ; the ith bias term was represented by b i , and the output layer weight connected with the jth hidden neuron was represented by β i ; the number of input samples were represented by N.
The general steps of the ELM algorithm computation [32] are presented as follows: (i) Assign the weights from the input to the hidden layer randomly.
(ii) Determine the weighted input layer output matrix.
(iii) Determine the output weight β i . ELM has two phases; in the first phase, ELM initializes the hidden layer, in which the mapping of input data to the feature space is completed, and then the Moore-Penrose inversion [32] is used to compute the solution of (linear parameters) in the second phase.

Machine Learning Intervention Models
The conventional ML approaches allow for forecasting solely based on the predicted variable's prior values. The model assumes that a variable's previous values, as well as the past values of external influences, determine its future values. The machine learning intervention model is a modified method of the standard machine learning model, which takes added independent variables as an intervention component [33]. Each detected value is naturally supposed to be an unknown nonlinear function in machine learning forecasting Agronomy 2022, 12, 2133 5 of 13 models. For a specified univariate time series X p , where p = 1, 2, . . . , n, and X p ∈ R, the nonlinear function F of m lags p 1 , p 2, . . . , p m .
The zero-mean error is referred to by ε p . Assume 'i' interventions were looked at across q 1 , q 2 , . . . , q i time periods; then, depending on the type of unexpected changes (interventions), we define 'i' secondary variables Φ 1 p , Φ 2 p , . . . , Φ i p . Finally, the model with m lags can be written as In this study, the instrumental variable is added in the input layer of two models, namely, in the ANN and ELM models, hereafter called the ANN intervention and ELM intervention models. Similarly, the ELM intervention works on the same principle; with a smaller difference in the ELM model, the LASSO function is used from the hidden to the output layer. The two intervention models-the artificial neural network with intervention and the extreme learning machine with intervention-were explained in this section using the intervention concept.
For checking the non-linearity, the Brock-Dechert-Scheinman (BDS) test was used, and for testing the significance comparison of the different models considered in this study, the Diebold-Mariano (DM) test was employed. The particulars of these tests were given in the reference [34,35]. Finally, the most extensively used method for measuring the forecasting error is the MAPE (mean absolute percentage error)

Results
In order to predict the prices of rice under the influence of the nationwide COVID-19 lockdown imposed by the government of India, the intervention time series and machine learning models were developed in this study. The period from 1 January 2020 to 24 March was considered as the pre-intervention period, and the period from 25 March to 30 June 2020 was considered as the post intervention period. Figure 1 depicts the time series plot of rice prices in India, where the red line indicates the date of intervention (25 March) when the government imposed the national lockdown. There was a slight increase in the price of rice during the lockdown in India, as is clearly visible in the post-intervention section shown in Figure 1. Table 1 displays the descriptive statistics of the same rice price data in India, confirming that the price series was fairly symmetrical and leptokurtic, with a coefficient of variation of 1.9, which corresponds to a diverse dataset. The overall average rice price in India during lockdown was INR 33.79/kg, and the minimum and maximum prices were INR 31.60 and INR 35.28 per kg, respectively.
To begin time series modeling, it is required to know the nature of the data under consideration; hence, the BDS (Brock-Dechert-Scheinman) test was employed to examine the rice prices in India, and the results demonstrate that the data under investigation were non-linear, with probability values of p < 0.0001 (Table 2).

Results of ARIMA Model
The Box-Jenkins autoregressive integrated moving average (ARIMA) model is the most widely used classical time series model in the forecasting studies. Hence, ARIMA was employed in this study. Before determining the suitable candidate model, different combinations of autoregressive and moving average order lags were tried, and the model orders with the lowest AIC and BIC values were considered as the best models for the data under consideration. For this rice price data, the model with the best autoregressive, best moving average, and best differencing (to make the series stationary) was identified as the best model, and is hereafter identified as ARIMA (1,1,1). The maximum likelihood estimation method was used to estimate the model parameters, as shown in Table 3. Once the model is estimated, the next step is to evaluate the diagnostic checking of the model. For this, the Box-Pierce non-correlation test for residuals was used, and the residuals are non-correlated, as the probability of significance is p = 0.59.

Results of ARIMA Intervention Model
The ARIMA intervention model was employed for the time series data of rice prices; the model building steps are similar to those of the ARIMA model, except for the impact parameter estimation, which was done by incorporating dummy variables (0: no impact, and 1: impact) in the model. In this study the ARIMA intervention model (2,0,1) was determined to be appropriate for rice prices. Table 3 shows the parameter obtained using the maximum likelihood estimation approaches. The intervention parameter (Impact (ω)) for prices is predicted to be 0.92 (p = 0.060). The data suggest that the lockdown had a positive impact on rice prices, implying that the price increased by INR 0.92 kg during the lockdown period. The diagnostic checking of residuals indicates they are white noise, as the Box-Pierce test autocorrelation shows non-significant residual probability values (p = 0.963). Therefore, the fitted model is adequate for the data under consideration. Ray et al. [18] conducted a study on cotton yield prediction and came up with similar results.

Results of the ANN Model and the ANN Intervention Model
The model with two tapped delays (lags) and five hidden nodes (2: 5S: 1L), with a sigmoidal (S) activation function from the input to the hidden nodes, and a linear activation (L) function from the hidden to the output layer, was chosen as the appropriate model, based on the low MAPE values. Following model fitting, the Box-Pierce test is used to diagnose the residuals. The residuals were neither autocorrelated nor random, since the probability value of rice prices is 0.47. Similarly, the appropriate model chosen for the ANN intervention is with the one with one tapped node, five hidden nodes, and one intervention variable, which was considered as an exogenous variable in the model. The activation function, namely the sigmoidal and linear function, was used from the hiddeninput and hidden-output layer in building both the ANN and ANN intervention models. The same Box-Pierce non-correlation test was used for testing the residuals, which are non-autocorrelated and random in nature, as the probability values are non-significant (Table 4).

Results of the ELM Model and the ELM Intervention Model
The appropriate extreme learning machine model orders were chosen for both the ELM and the ELM intervention models, based on the lowest MAPE values. For this dataset, the model with six tapped and four hidden nodes was chosen as the optimal model (Table 5). Following model fitting, the Box-Pierce test was used to diagnose the residuals. The residuals were neither auto correlated nor random, since the probability value of the residuals was 0.84. Similarly, for the ELM intervention model, a combination of six input lags, four hidden nodes, and one exogenous variable were chosen as the appropriate model orders. In the input to the hidden layer, a sigmoidal activation function was utilized, and from the hidden to the output layer, a LASSO function was used. In this model, a sigmoidal and LASSO function were used as activation functions from the input-output layers. The Box-Pierce non correlation test is used for testing the residuals, which are non-autocorrelated and random in nature; the results were presented in Table 5.

Discussion
The estimated value of the ARIMA intervention parameter demonstrates that the COVID-19 epidemic had a positive impact on rice prices in India. According to the findings, rice prices increased by INR 0.92/kg during the study period. Similar results were found in the time series intervention impact analysis studies [36,37]. For modeling and forecasting, all of the models examined in this study's evaluation of forecasting performance were assessed by taking into account the MAPE values of the two sets (training and testing).
Comparing the performance results from Table 6 and other models, the ELM intervention model outperformed them all. The usual time series models (ARIMA and the ARIMA intervention model) in two sets of this study's data underperformed compared to the machine learning models in two datasets (training and testing). The performance order of all considered models in this study is first, the ELM intervention, followed by ELM, then the ANN models, and finally, the ARIMA models. In this study, the obtained results show that the extreme learning machine intervention model outperformed the other models considered for this investigation. The superiority of the ELM intervention model may be attributable to its ability to replicate the complex, nonlinear structure of the data, while also monitoring the performance of an external intervention variable. Additionally, while ANN required more time to adjust the model's parameters and train the network, the ELM trained much more quickly. It was also discovered that the ANN took a longer time to tweak the model parameters and train the network, but the ELM trained at a much faster rate, making it particularly beneficial in describing the COVID-19 pandemic's effects on rice prices in India. The machine learning models considered in this study performed better than the standard time series models (ARIMA and the ARIMA intervention model) in both sets (training set and testing set). Table 7 shows the predicted values of all models, revealing that the four models, the ARIMA, ANN, ARIMA intervention, and ANN intervention models yield nearly same values. Thus, based on the results, it can be concluded that these models were not able to generalize and estimate different values compared to the ELM intervention model. According to several studies [38][39][40][41][42][43][44] for forecasting time series data in the agricultural and related fields, the results showed that AI performed better than the standard ARIMA model, which is in accordance with some previous findings. By considering the MAPE values, a significant difference between the actual and forecasted values can be obtained by the DM test. The DM test was similarly used to compare the inter-combinational significance comparison between the models [45][46][47][48]. The results of the DM test revealed that in two sets (training and testing set) of data, the extreme learning machine intervention model performed better than all other models (Table 8).  According to several studies [38][39][40][41][42][43][44] for forecasting time series data in the agricul-tural and related fields, the results showed that AI performed better than the standard ARIMA model, which is in accordance with some previous findings. By considering the MAPE values, a significant difference between the actual and forecasted values can be obtained by the DM test. The DM test was similarly used to compare the inter-combinational significance comparison between the models [45][46][47][48]. The results of the DM test revealed that in two sets (training and testing set) of data, the extreme learning machine intervention model performed better than all other models (Table 8).

Conclusions
The purpose of this study is to assess the impact of the nationwide lockdown imposed by the government of India due to COVID-19 on rice prices from January to July 2020 and to subsequently forecast future prices. The study utilized statistical machine learning algorithms by developing models through making use of the data on the prices of rice, which is consumed by a majority of the population in Asia, especially in India. The results obtained by the ARIMA intervention model revealed that rice prices in India increased during the sudden lockdown imposed by the government to contain the spread of COVID-19. Usually, in agricultural impact studies, popular yet traditional models such as the ARIMA and ARIMA intervention models are employed for modeling and forecasting time series data to understand the effect of policies due to unexpected aberrations or unforeseen circumstances. However, ARIMA and ARIMA variants are not efficient for the dataset under consideration; moreover, ARIMA models cannot capture the nonlinearity present in the time series data. To overcome this problem, machine learning algorithm models, such as the ANN and ELM intervention models, were employed to detect the presence of nonlinearity data, along with an intervention variable, basically an exogenous variable in the first (input) layer. The results indicated that for the modeling and forecasting of price data, the ELM intervention models are the ultimate choice due to their ability to capture non-linearity in the time series data. Since in most circumstances, the data obtained are nonlinear, employing a linear time series will result in faulty modeling, and the forecasts obtained will be misleading; thus, policy formulation based on these forecast values will result in failure and poor policy planning.
It is advisable to make use of suitable time series and machine learning models, based on the nature of the data under consideration. While employing statistical and machine learning models, proper care must also be taken for the development of models based on the dataset because the faulty choice of models for different datasets will lead to varying conclusions.
The results of this study show that the ELM intervention model generates faster outputs than the other tested models for predicting similar agri-food prices, as it recognizes the non-linear pattern in time series data, along with the intervention variable considered as an exogenous variable. Thus, these results can be a valuable tool in planning and designing more sustainable interventions in the food market during (inter)national crisis situations.