Agricultural Price Prediction Based on Combined Forecasting Model under Spatial-Temporal Inﬂuencing Factors

: Grain product price ﬂuctuations affect the input of production factors and impact national food security. Under the inﬂuence of complex factors, such as spatial-temporal inﬂuencing factors, price correlation, and market diversity, it is increasingly important to improve the accuracy of grain product price prediction for agricultural sustainable development. Therefore, successful prediction of the agricultural product plays a vital role in the government’s market regulation and the stability of national food security. In this paper, the price of corn in Sichuan Province is taken as an example. Firstly, the apriori algorithm was used to search for the spatial-temporal inﬂuencing factors of price changes. Secondly, the Attention Mechanism Algorithm, Long Short-term Memory (LSTM), Autoregressive Integrated Moving Average (ARIMA), and Back Propagation (BP) Neural Network models were combined into the AttLSTM-ARIMA-BP model to predict the accurate price. Compared with the other seven models, the AttLSTM-ARIMA-BP model achieves the best prediction effect and possesses the strongest robustness, which improves the accuracy of price forecasting in complex environments and makes the application to other ﬁelds possible.


Introduction
As the second largest corn producer in the world, the fluctuation of corn prices in China has attracted much attention.Price stability plays an important role in promoting social sustainable development.If the price fluctuates too sharply, it will not only affect the cost expenditure of producers and consumers, but also affect the government's policy regulation [1], and even national food security.Corn prices have a high degree of complexity and nonlinear relationship, which are vulnerable to economic globalization, climate change, epidemics, and many other factors [2].According to the data obtained from China's agricultural big data website, in the past two decades, large fluctuations in corn prices have occurred frequently.From 2003 to 2020, the price of corn in Sichuan Province rose from USD 167.16 per metric ton to USD 320.9 per metric ton.At the beginning of 2021, the price of corn in Sichuan Province rose from USD 426.87 per metric ton to USD 09 per metric ton in a week, an increase of 12.94%.This short-term sharp price fluctuation makes it more and more challenging to predict the price of corn accurately [3], and has a negative impact on sustainable agricultural development.
The change process of agricultural product prices is complex, including jump change and steady change [4].The practice has proved that the combination model has become one of the most effective prediction methods because of its better robustness and accuracy.For example, in 2017, Zheng et al. [5] used empirical mode decomposition and Long Short-term Memory (EMD-LSTM) neural network for short-term load forecasting of a power system.In 2018, Niu et al. [6] established a variational mode decomposition and autoregressive The rest of this paper is arranged as follows: Section 2 introduces the development of the relevant models; Section 3 sets forth the data source, data collation, model method, and the specific parameters in each model; Section 4 elaborates and discusses the experimental results of each model; finally, Section 5 presents the conclusions and prospects for the future.

Literature Review
This section introduces the development and application of LSTM, BP, ARIMA models, and the association rule mining algorithm, respectively.

Long Short-Term Memory
Since 2006, deep learning [8] was proposed by Geoffrey Hinton, due to its lower cost, better robustness, and other advantages, which has been used in a wide range of research, including but not limited to image recognition [9], traffic flow prediction [10], stock market prediction [11], machine translation [12].By adding the input gate, the forgetting gate and the output gate, LSTM can keep a long-term memory in time series [13].In 2015, Chen et al. [14] used the LSTM model to predict the return rate of Chinese stocks.In 2021, Yan The rest of this paper is arranged as follows: Section 2 introduces the development of the relevant models; Section 3 sets forth the data source, data collation, model method, and the specific parameters in each model; Section 4 elaborates and discusses the experimental results of each model; finally, Section 5 presents the conclusions and prospects for the future.

Literature Review
This section introduces the development and application of LSTM, BP, ARIMA models, and the association rule mining algorithm, respectively.

Long Short-Term Memory
Since 2006, deep learning [8] was proposed by Geoffrey Hinton, due to its lower cost, better robustness, and other advantages, which has been used in a wide range of research, including but not limited to image recognition [9], traffic flow prediction [10], stock market prediction [11], machine translation [12].By adding the input gate, the forgetting gate and the output gate, LSTM can keep a long-term memory in time series [13].In 2015, Chen et al. [14] used the LSTM model to predict the return rate of Chinese stocks.In 2021, Yan et al. [15] found that the multi-hour prediction based on the LSTM model was the optimal prediction model in the research on the prediction of a multi-station air quality index in Beijing.However, the LSTM model does not distinguish the importance of each time node information, and the introduction of an attention mechanism can better solve this problem [16].In 2019, Chen et al. [17] used the LSTM model based on the attention mechanism to predict and discuss the trend of the Hong Kong stock market.The results show that after adding the attention mechanism, the positive and negative motion accuracy are improved by 3.06% and 0.26%, respectively, which proves that the LSTM model based on the attention mechanism has better performance than the original model.In 2021, Zheng et al. [18] added the attention mechanism to calculate different weights in the shortterm traffic flow prediction to distinguish the importance of different time flow sequences.Hence, the prediction performance has been improved effectively.

Back Propagation Neural Network
BP is a nonlinear multilayer feedforward neural network [19].Because of its simple structure, outstanding performance of nonlinear mapping ability, and low computational complexity, this model is widely used in image processing, system simulation, optimal prediction, and other fields [20].The theory and application to BP are relatively mature nowadays.In 2018, Li et al. [21] proposed an air pollution prediction model based on BPNN.In 2019, Hua et al. [22] proposed a short-term power prediction method for photovoltaic power plants based on the LSTM-BP model, and the experimental results show that the improved method has less prediction error.In 2022, Liu et al. [23] greatly improved the timeliness of prediction results through BPNN analyzing internet financial risks.

Autoregressive Integrated Moving Average
ARIMA, a famous and mature linear statistical model proposed by Box and Jenkins in the 1970s, can predict the future value by analyzing the relationship between the past time series [24].This model, which is developed from the initial autoregressive (AR) model and the moving average (MA) model, has been successfully applied in many fields, such as oil production prediction, energy consumption prediction, and precipitation prediction [25].In 2016, Sen et al. [26] revealed that ARIMA is the best prediction model in the experiment of predicting greenhouse gas emissions.In 2018, Oliveira et al. [27] used bootstrap aggregation and the ARIMA model to predict mid-and long-term power consumption.In 2019, Aasim et al. [28] proposed Repeated Wavelet Transform based on the ARIMA model for shortterm wind speed prediction.In 2020, Lai et al. [29] used the statistical prediction model based on ARIMA to obtain the prediction results of local recent regional temperature and precipitation.In 2021, Fan et al. [30] used the ARIMA model and the LSTM model to predict oil well production.These studies show that ARIMA stands out with its relative stability in prediction.

The Association Rule Mining Algorithm
The association rule mining algorithm was first proposed by Agrawal et al. [31] in 1993.The downward closure feature is at the core of the apriori algorithm [32].In order to solve the problem of the apriori algorithm generating a large number of candidate sets, Han et al. [33] introduced Frequent Pattern Tree Growth (FP-Tree Growth algorithm) in 2000.In 2001, H-mine proposed by Pei et al. [34] explored the super structure mining of frequent patterns to build alternative trees.In order to solve the problem of generating unnecessary candidate spanning trees during dynamic association rule mining, Xu et al. [35] proposed the IFP-Growth algorithm in 2002.In 2003, Liu et al. [36] proposed the condensed frequent pattern tree to elaborate on the principle of top-down and bottom-up traversal modes.Given that the importance varies across different transactions in actual data, in 2020, Shao et al. [37] proposed a mining algorithm prediction model based on correlation weighting.Wu et al. [38] proposed a frequent fuzzy item set mining method, which significantly increased the efficiency of the traditional apriori algorithm.

Data Sources
Including futures prices in the prediction model is conducive to improving the accuracy of cash price prediction [39], and the current prices are vulnerable to different temporal and spatial factors.Therefore, in the study of corn price in Sichuan Province, we consider using futures prices and spot prices in the model.The experimental data are collected from China's agricultural big data website, Dalian Commodity Exchange, and Zhengzhou Commodity Exchange.Various factors heavily struck the economy [40] and caused a rocketing price of agricultural products.In order to make an in-depth study of the situation when prices fluctuate violently, all the daily average price data were selected from March 2011 to April 2021.From China's agricultural big data website, we collected the corn price data of some provinces in China, and the national average price data of corn, early rice, and middle-late rice.The soybean futures price data are collected from Dalian Commodity Exchange.Moreover, the futures price data of common wheat and high-quality strong gluten wheat are collected from Zhengzhou Commodity Exchange.

Data Cleaning
Due to the influence of holidays, some daily price data were missing.Moreover, some daily price data remained unchanged for several consecutive days, which hurt the experiment.To solve these two problems mentioned above, we converted the daily average price data into the weekly average price data.The data were numbered and sorted in time order, and finally 511 observations were obtained.Through the half violin plot, the data of agricultural products and future prices are displayed in Figure 2.Among them, China is the corn price in China; GD is the corn price in Guangdong; JL is the corn price in Jilin; LN is the corn price in Liaoning; JS is the corn price in Jiangsu; SD is the corn price in Shandong; ER is the early rice price; MLR is the middle-late rice price; WH is the high-quality strong gluten wheat futures price; PM is the common wheat futures price; Soybean is the soybeans futures price; and SC is the corn price in Sichuan.

Data Sources
Including futures prices in the prediction model is conducive to improving the accuracy of cash price prediction [39], and the current prices are vulnerable to different temporal and spatial factors.Therefore, in the study of corn price in Sichuan Province, we consider using futures prices and spot prices in the model.The experimental data are collected from China's agricultural big data website, Dalian Commodity Exchange, and Zhengzhou Commodity Exchange.Various factors heavily struck the economy [40] and caused a rocketing price of agricultural products.In order to make an in-depth study of the situation when prices fluctuate violently, all the daily average price data were selected from March 2011 to April 2021.From China's agricultural big data website, we collected the corn price data of some provinces in China, and the national average price data of corn, early rice, and middle-late rice.The soybean futures price data are collected from Dalian Commodity Exchange.Moreover, the futures price data of common wheat and high-quality strong gluten wheat are collected from Zhengzhou Commodity Exchange.

Data Cleaning
Due to the influence of holidays, some daily price data were missing.Moreover, some daily price data remained unchanged for several consecutive days, which hurt the experiment.To solve these two problems mentioned above, we converted the daily average price data into the weekly average price data.The data were numbered and sorted in time order, and finally 511 observations were obtained.Through the half violin plot, the data of agricultural products and future prices are displayed in Figure 2.Among them, China is the corn price in China; GD is the corn price in Guangdong; JL is the corn price in Jilin; LN is the corn price in Liaoning; JS is the corn price in Jiangsu; SD is the corn price in Shandong; ER is the early rice price; MLR is the middle-late rice price; WH is the highquality strong gluten wheat futures price; PM is the common wheat futures price; Soybean is the soybeans futures price; and SC is the corn price in Sichuan.

Data Processing
Before the experiment, the values of the sample data need to be normalized.The normalization formula is as follows: where max is the maximum value of data and min is the minimum value of data.
To understand whether the past price impacts the current price, we need to expand all price data forward by 12 horizons, that is three months, and make records in turn.Different agricultural products have different degrees of price change, so we must set different coefficients to judge the price change.By calculating the difference between the price of the previous 1 week, the previous 2 weeks to the previous 12 weeks, and the current week, comparing the difference with the set coefficient, the changes in the relationship between whether the price data are rising, falling, or remaining unchanged will be achieved (specific parameters are shown in Table 1).Then, according to the changing relationship in the price, by using the apriori algorithm, 12 spatial-temporal factors affecting the change of corn price in Sichuan Province are found, and there are 499 observations in each spatial-temporal factor.In the apriori algorithm, the degree of support is generally expressed as P(A ∪ B), which represents the probability of simultaneous occurrence of item sets A and B in transaction T, the specific formula is: The confidence is generally expressed as P(B|A) , which represents the percentage of dataset B contained in dataset A, that is the conditional probability of occurrence of B under the condition that A occurs.The specific formula is: By utilizing the method of training test segmentation, the data obtained by the apriori algorithm is divided into training sets and test sets in the radio of eight to two.The prediction results are obtained by putting the data into the forecast model for training.

Performance Index
In the study of corn prices in Sichuan Province, we use root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE) to calculate the prediction errors of each model.The fitting degree between the predicted value and the actual value is judged by calculating the value of the determined coefficient R 2 .The calculation formulas of RMSE, MAPE, MAE, and determination coefficient R 2 are as follows: where y t represents the real value; ŷt represents the predicted value; y t represents the average value of the real value; and N represents the total number of data.

Linear Regression Model
The linear regression (LR) model is one of the most widely used models in the field of prediction.It is necessary to observe the dependent variable Y and p predictors, and the independent variable X 1 , X 2 , • • • , X p , respectively, to construct the linear relationship between the two variables.The specific formula is [41]: where β 0 , β 1 , • • • , β p represent the regression coefficient of each influencing independent variable, and ε represents the random disturbance or error, which is assumed to follow the normal distribution with a mean of zero and a constant variance.

Random Forest Model
Random Forest (RF) model is the representative of the Bagging algorithm in ensemble learning [42].In RF, the correlation between trees is reduced by randomization in both directions [43].The random forest will first select N samples randomly from all the sample sets, and then select K features randomly from the feature values.In the selected samples, an optimal partition feature is selected to establish a decision tree.The above two steps are repeated M times, that is, M decision trees are generated to form a random forest.Therefore, the Random Forest model is a prediction model composed of multiple decision trees, which can integrate some weak learners into strong learners.

Extreme Gradient Boosting Model
The XGBoost model is a supervised machine learning algorithm based on Pre-Ordering.The concept of it is to sequence all the feature data according to their numerical value, search for the best segmentation point, achieve the effect of reducing forecast error, and finally improve its accuracy after splitting the data into left and right nodes [44].Therefore, the XGBoost model is a representative of the Boosting algorithm in ensemble learning [45], which can accurately capture the nonlinear characteristics between various predictive variables.

Light Gradient Boosting Machine Model
The LightGBM model is a relatively novel and efficient gradient lifting decision tree algorithm, which can solve the problems of excessive space consumption in the XGBoost model [46].By using the Gradient-based One-Side Sampling (GOSS) technology to reduce the data with small gradient, the LightGBM model can save space-time expenses.Moreover, by utilizing the Exclusive Feature Bundling (EFB) technology to bind mutually exclusive features into one, this model can reduce dimensions.[47].Therefore, compared with the traditional Gradient Boosting Decision Tree (GBDT) model, the LightGBM model has the characteristics of faster training speed and rapid processing of massive data [48].

LSTM Model
LSTM is a neural network model that processes time series.It can predict not only univariate time series, but also multivariate time series data [49].The model structure is shown in Figure 3.
model [46].By using the Gradient-based One-Side Sampling (GOSS) technology to reduce the data with small gradient, the LightGBM model can save space-time expenses.Moreover, by utilizing the Exclusive Feature Bundling (EFB) technology to bind mutually exclusive features into one, this model can reduce dimensions.[47].Therefore, compared with the traditional Gradient Boosting Decision Tree (GBDT) model, the LightGBM model has the characteristics of faster training speed and rapid processing of massive data [48].

LSTM Model
LSTM is a neural network model that processes time series.It can predict not only univariate time series, but also multivariate time series data [49].The model structure is shown in Figure 3.Among them, "gate" is the key technology of the LSTM model.By controlling the input, storage, and output of the data, LSTM can deal with the problems such as long-term dependence and gradient explosion [50].The detailed formula is given by the following equation: ) where

AttLSTM Model
The AttLSTM model combines the LSTM model and attention mechanism [51], and the model structure is shown in Figure 4. From Figure 4, we can see that the AttLSTM model has four parts, namely the input layer, the LSTM layer, the attention layer, and the output layer.In decoding, the attention mechanism uses the scoring function to calculate the weights of different inputs, which changes the traditional encode-decode structure [52].The input vectors of the layer enter the LSTM units in the LSTM layer, and become the h 1 , h 2 , • • • , h t vectors.After nonlinear transformation, the u 1, u 2 , • • • , u t vectors are generated.The specific formula is as follows: where W t is the weight coefficient and b t is the offset vector.After the vectors enter the attention layer from the LSTM layer, after normalization and the weighted-sum approach, the feature representation V is generated.The specific formula is as follows: V = ∑ t t=1 α t h t (16) where u s represents the initialized random attention matrix.Finally, the data vectors are transferred to the output layer for training, and the results are obtained.

AttLSTM-ARIMA-BP Combination Model
In this paper, we propose a combined model of AttLSTM-ARIMA-BP to study the corn price data in Sichuan Province.
ARIMA is a simple prediction model, without the help of other variables, which can predict a single column of data by using the correlation between time series [53].The specific formula is as follows: where L is the lag operator; p is the autoregressive order; d is the number of differ- ences; q is the moving average order; and t ε is the white noise.
However, ARIMA can only predict the linear relationship between the corn price data.The farther the target is from the historical data, the worse the prediction is [54].Therefore, we consider using a combination model to solve this problem.
In the combination algorithm of the traditional ARIMA model, the residual data obtained from ARIMA is trained by LSTM, the trained result is regarded as linear.The linear addition method is used to add the results trained by LSTM, and the prediction results obtained by ARIMA, to predict the final results [55].The specific formula is as follows: Sometimes, mere linear addition cannot improve the accuracy of the final prediction, because the relationship between the prediction results of LSTM and ARIMA may be

AttLSTM-ARIMA-BP Combination Model
In this paper, we propose a combined model of AttLSTM-ARIMA-BP to study the corn price data in Sichuan Province.
ARIMA is a simple prediction model, without the help of other variables, which can predict a single column of data by using the correlation between time series [53].The specific formula is as follows: where L is the lag operator; p is the autoregressive order; d is the number of differences; q is the moving average order; and ε t is the white noise.However, ARIMA can only predict the linear relationship between the corn price data.The farther the target is from the historical data, the worse the prediction is [54].Therefore, we consider using a combination model to solve this problem.
In the combination algorithm of the traditional ARIMA model, the residual data obtained from ARIMA is trained by LSTM, the trained result is regarded as linear.The linear addition method is used to add the results trained by LSTM, and the prediction results obtained by ARIMA, to predict the final results [55].The specific formula is as follows: where y t is the raw data; m t is the linear part of the raw data; n t is the nonlinear part of the raw data; r t is the residual data obtained by ARIMA; mt is the predicted data obtained by ARIMA training m t ; rt is the training results obtained by LSTM training r t ; and ŷt is the final prediction result.Sometimes, mere linear addition cannot improve the accuracy of the final prediction, because the relationship between the prediction results of LSTM and ARIMA may be nonlinear [56].Therefore, we consider using the BP model to train the prediction results of LSTM and ARIMA.The specific formula is as follows: The core of the BP neural network model is an error analysis according to training results and expected effect, and the expected results are finally obtained by modifying weights and thresholds [57].The specific formula is as follows: where w i is the prominence weight; x i is the Input signa; f is the Activate function; and y i is the output result.The model structure is shown in Figure 5.
Sustainability 2022, 14, x FOR PEER REVIEW 10 of 19 nonlinear [56].Therefore, we consider using the BP model to train the prediction results of LSTM and ARIMA.The specific formula is as follows: The core of the BP neural network model is an error analysis according to training results and expected effect, and the expected results are finally obtained by modifying weights and thresholds [57].The specific formula is as follows: where i w is the prominence weight; i x is the Input signa; f is the Activate function; and i y is the output result.The model structure is shown in Figure 5.
The specific implementation process of the model is as follows: Step 1. Input the raw data of corn price in the ARIMA model to obtain the predicted value  t m and the residual value t r .
Step 2. Use the attention mechanism to calculate the weight of corn price data in Sichuan Province, and select the data with the top three weight influencing factors t a as the input of the subsequent model.Since the final prediction effect of the model will be affected by other factors, we select several groups of data with the highest weight as the input of the model through the attention mechanism.Through the BP model training the predicted value mt obtained by ARIMA, the residual value rt obtained by LSTM, and several groups of data with the highest weight a t selected by the attention mechanism, the predicted results are finally obtained.The specific formula is as follows: The specific implementation process of the model is as follows: Step 1. Input the raw data of corn price in the ARIMA model to obtain the predicted value mt and the residual value r t .Step 2. Use the attention mechanism to calculate the weight of corn price data in Sichuan Province, and select the data with the top three weight influencing factors a t as the input of the subsequent model.Step 3. Train the residual data r t by LSTM and obtain the training value rt .
Step 4. Obtain the final predicted value by inputting the mt , rt , and a t to the BP.

Experimental Design
In the research of corn price forecast in Sichuan Province, we used the LR model and the ARIMA model in statistics; RF, XGBoost, LightGBM, LSTM, AttLSTM, and BP models in machine learning.Among them, the ARIMA model was completed by Eviews, LR, RF, XGBoost, and LightGBM models were implemented by the Scikit-learn package.We used Keras in Python to run LSTM and AttLSTM models.In addition, under this environment, we combined AttLSTM, ARIMA, and BP models into the AttLSTM-BP-ARIMA model to study the corn price in Sichuan Province.The parameter settings of each model are shown in Table 2.

Model Hyperparameters
LR

The Results of the Predictive Regression Models
In the predictive regression model, we respectively use LR, RF, XGBoost, and Light-GBM models to study corn prices in Sichuan Province, where RF, XGBoost, and LightGBM models are predictive regression models in ensemble learning [58].The experimental results are shown in Figure 6a, where the red, blue, green, and purple dots represent the predicted values of the LR, RF, LightGBM, and XGBoost models, respectively.
Firstly, we can observe from Figure 6a that the predicted values of the first 88 ensemble learning models are consistent with the changing trend and fluctuation of the real values, but from the 88th data, the real values and predicted values vary greatly.In LR, the experimental results of the latter part are better than the ensemble learning model mentioned above.However, there is still a defect that the prediction effect of some intervals is not ideal, such as the 55th to 72th, and the last four intervals.
Secondly, divide the prediction results of the predictive regression model into five parts, each of which has 20 groups of data.Calculate the MAE value of each par, respectively, and sort the error value into Table 3.We can observe from Table 3 that in ensemble learning, the last part of the data has the largest error value.In LR, although the predictive error value of the last part of the data is relatively small, the number itself is still large, and the largest error value is the second and the third part of the data.To sum up, in the case of bouncing fluctuations in corn prices in Sichuan Province, the prediction accuracy of the predictive regression model is not ideal.

The Results of the Predictive Regression Models
In the predictive regression model, we respectively use LR, RF, XGBoost, and LightGBM models to study corn prices in Sichuan Province, where RF, XGBoost, and LightGBM models are predictive regression models in ensemble learning [58].The experimental results are shown in Figure 6a, where the red, blue, green, and purple dots represent the predicted values of the LR, RF, LightGBM, and XGBoost models, respectively.Firstly, we can observe from Figure 6a that the predicted values of the first 88 ensemble learning models are consistent with the changing trend and fluctuation of the real values, but from the 88th data, the real values and predicted values vary greatly.In LR, the experimental results of the latter part are better than the ensemble learning model

The Results of LSTM Models
Regarding the problem of the predictive regression model having an unsatisfying result when the price of corn fluctuates by leaps and bounds, we adopted LSTM to study the price data further.
Through the single LSTM model, we conclude that the prediction accuracy will reach its peak when predicting next week's price based on the corn price data of Sichuan Province over the past four weeks.To compare the prediction accuracy of each LSTM model more conveniently, we set the time window of all LSTM models to 4, so a total of 99 price data are predicted.It can be seen from the orange line in Figure 6b that although the single LSTM model can well predict the changing trend and fluctuation of the price, the prediction result still has the hysteresis quality.Hence, the single LSTM model is not desirable.
To solve the problem of hysteresis quality, we adopted a multivariate LSTM model to predict the price of corn in Sichuan Province.It can be seen from the blue line in Figure 6b that although the multivariate LSTM model solves the problem of hysteresis quality, its image looks like a "smooth" curve, and hence we cannot predict the fluctuation of price changes.So, the multivariate LSTM model is also not desirable.
The attention mechanism performs well when capturing price fluctuations [59], so we continue to research corn prices in Sichuan Province by integrating the attention mechanism into LSTM.The weight of each column's value in agricultural and futures prices are shown in Figure 7.As observed from the pink line in Figure 6b, in terms of price change trends and price fluctuations, AttLSTM can do a good job of predicting the price of corn in Sichuan Province.To illustrate the prediction accuracy of AttLSTM more rationally, we collated the resultant errors of each LSTM model into Table 4. Compared with the predictive regression model, only the AttLSTM model, especially the first four data sets, has shown a significant improvement in the accuracy of prediction.For the last data set, the AttLSTM model still failed to achieve the desired prediction.

The Prediction Result of the AttLSTM-ARIMA-BP Model
Given that the ARIMA model can only predict linear relationships in prices, and the further the target is from the historical data, the worse the prediction is.To completely solve the problem of hysteresis quality, we input the predicted values obtained from the ARIMA model and the residuals obtained from the LSTM model in the BP model.Since

The Prediction Result of the AttLSTM-ARIMA-BP Model
Given that the ARIMA model can only predict linear relationships in prices, and the further the target is from the historical data, the worse the prediction is.To completely solve the problem of hysteresis quality, we input the predicted values obtained from the ARIMA model and the residuals obtained from the LSTM model in the BP model.Since the past price always affects the current price, we also select the three columns with the maximum weights in Figure 7 to research the corn price in Sichuan Province, namely the price of corn in Guangdong Province 12 weeks ago, the price of corn in Guangdong Province 11 weeks ago, and the price of corn in Jilin Province 2 weeks ago.Through the BP model training the predicted value obtained by ARIMA, the residual value obtained by LSTM training, and the three sets of data with the highest weight obtained by the attention mechanism, the price of corn in Sichuan Province is predicted.We divide the prediction results of the model into five parts, each of which has 20 groups of data.Then, we calculate the MAE value of each par, respectively, and sort the error value into Table 4.By judging the fitting degree in Figure 8 and analyzing the MAE score in Table 4, we conclude that the AttLSTM-ARIMA-BP model can accurately predict the corn price in Sichuan Province, whether the price changes steadily or the price fluctuates by leaps and bounds.

The Prediction Result of the AttLSTM-ARIMA-BP Model
Given that the ARIMA model can only predict linear relationships in prices, and the further the target is from the historical data, the worse the prediction is.To completely solve the problem of hysteresis quality, we input the predicted values obtained from the ARIMA model and the residuals obtained from the LSTM model in the BP model.Since the past price always affects the current price, we also select the three columns with the maximum weights in Figure 7 to research the corn price in Sichuan Province, namely the price of corn in Guangdong Province 12 weeks ago, the price of corn in Guangdong Province 11 weeks ago, and the price of corn in Jilin Province 2 weeks ago.Through the BP model training the predicted value obtained by ARIMA, the residual value obtained by LSTM training, and the three sets of data with the highest weight obtained by the attention mechanism, the price of corn in Sichuan Province is predicted.We divide the prediction results of the model into five parts, each of which has 20 groups of data.Then, we calculate the MAE value of each par, respectively, and sort the error value into Table 4.By judging the fitting degree in Figure 8 and analyzing the MAE score in Table 4, we conclude that the AttLSTM-ARIMA-BP model can accurately predict the corn price in Sichuan Province, whether the price changes steadily or the price fluctuates by leaps and bounds.

Experimental Comparison
In this part, by comparing the MAPE, RMSE, MAE, and R 2 of each model, the prediction effect of each model is analyzed.
As can be seen in Figure 9, no matter which error calculation method that we adopted, the AttLSTM-ARIMA-BP model recorded the smallest error value, where the MAPE was 0.0043, the MAE was 1.51, and the RMSE was 1.642, whereas the ensemble learning recorded the largest error value.In Figure 10, we set the horizontal coordinates to the true value of corn prices in Sichuan Province and the vertical coordinates to the predicted value, fitted the curve from the scatter plot, and calculated the value of the coefficient of determination R 2 .By comparison, the AttLSTM-ARIMA-BP model has the highest R 2 , which is 0.9992.adopted, the AttLSTM-ARIMA-BP model recorded the smallest error value, where the MAPE was 0.0043, the MAE was 1.51, and the RMSE was 1.642, whereas the ensemble learning recorded the largest error value.In Figure 10, we set the horizontal coordinates to the true value of corn prices in Sichuan Province and the vertical coordinates to the predicted value, fitted the curve from the scatter plot, and calculated the value of the coefficient of determination   MAPE was 0.0043, the MAE was 1.51, and the RMSE was 1.642, whereas the ensemble learning recorded the largest error value.In Figure 10, we set the horizontal coordinates to the true value of corn prices in Sichuan Province and the vertical coordinates to the predicted value, fitted the curve from the scatter plot, and calculated the value of the co-

Experimental Discussion
Because of some uncontrollable factors, agricultural prices fluctuate dramatically, and traditional forecasting models can achieve good forecasts when prices change smoothly.When prices fluctuate dramatically, the models fail to achieve the ideal forecasting results.
Among the predictive regression models, RF, LightGBM, and XGBoost models can only show high accuracy when prices are changing mildly, whereas the LR model can

Experimental Discussion
Because of some uncontrollable factors, agricultural prices fluctuate dramatically, and traditional forecasting models can achieve good forecasts when prices change smoothly.When prices fluctuate dramatically, the models fail to achieve the ideal forecasting results.
Among the predictive regression models, RF, LightGBM, and XGBoost models can only show high accuracy when prices are changing mildly, whereas the LR model can give precise predictions when prices are just started to bounce.In the single LSTM model, the problem of hysteresis quality will appear in the prediction results.AttLSTM can improve the accuracy of the prediction results, but the prediction result is still not perfect when the price bounces.Because of the smallest error values and the largest R 2 , the AttLSTM-ARIMA-BP model can accurately predict the price of corn in Sichuan Province whether the price is a steady change or a dramatic fluctuation.Moreover, this model will have a broader application and development prospect in the research of time series data.

Conclusions
The price of agricultural products is the key element in agricultural sustainable development, which has a strong interaction with the economy.In recent years, the large-scale fluctuations in national food prices have been frequent, which have had a significant impact on society.This paper applies the apriori algorithm to study agricultural product prices in different regions and futures prices.The conclusion that prices in different times and spaces affect current prices is reached.Through several experiments, a combined AttLSTM-ARIMA-BP price forecasting model is finally proposed.This model is not only suitable for price forecasting in periods of steady data changes, but also gives accurate forecasts in times of great price changes.The results of this study are helpful for economists to formulate hedging strategies in the face of the market's self-regulation drawbacks, and for investors to make the best asset allocation decisions, thus reducing risks in all aspects.In addition, this study enriches the existing theory of price forecasting models and contributes to the sustainable development of the agricultural products market.
To improve the accuracy of the model's prediction results, we should also take the influence of international local war conflicts, epidemic situation, transportation, storage, and natural disasters into account.In the era of big data, people's worries about the market on the internet can also lead to huge price fluctuations.In the follow-up study, the robustness and practicability of the model can be improved by collecting more data and adding perturbation factors to the model.We believe that the forecasting model proposed in this paper can make accurate predictions in times of dramatic price fluctuations, thus making an important contribution to market regulation and the sustainable economic development.

Figure 2 .
Figure 2. Agricultural products and futures price data.(The left is the scatter distribution of data, the right is the half violin in the form of density distribution, the top of the half violin is the maximum value, the bottom is the minimum value, the middle solid circle is the average value, the inner

Figure 2 .
Figure 2. Agricultural products and futures price data.(The left is the scatter distribution of data, the right is the half violin in the form of density distribution, the top of the half violin is the maximum value, the bottom is the minimum value, the middle solid circle is the average value, the inner box represents 25-75% of the data, the upper and lower line of the box represent the next standard deviation of the mean value, and the purple broken line is the mean line of different groups).

Figure 5 .
Figure 5. BP model structure.Since the final prediction effect of the model will be affected by other factors, we select several groups of data with the highest weight as the input of the model through the attention mechanism.Through the BP model training the predicted value  t m obtained

Step 3 .
Train the residual data t r by LSTM and obtain the training value  t r .Step 4. Obtain the final predicted value by inputting the  t m ,  t r , and t a to the BP.

Figure 6 .
Figure 6.Prediction results of each model.(a) Prediction results of the regression models.(b) Prediction results of each LSTM modes.

Figure 6 .
Figure 6.Prediction results of each model.(a) Prediction results of the regression models.(b) Prediction results of each LSTM modes.

Figure 7 .
Figure 7. Weight of each part.

Figure 7 .
Figure 7. Weight of each part.

2 R 2 R
. By comparison, the AttLSTM-ARIMA-BP model has the highest , which is 0.9992.

2 R 2 R
. By comparison, the AttLSTM-ARIMA-BP model has the highest , which is 0.9992.

Figure 10 .
Figure 10.The validation set results under different models.

Table 1 .
Parameter settings of association rule mining.

Table 3 .
Errors of each part in regression prediction models.

Table 4 .
Errors of each part in LSTM and AttLSTM-ARIMA-BP models.