Research on Grain Futures Price Prediction Based on a Bi-DSConvLSTM-Attention Model

Yun, Bensheng; Lai, Jiannan; Ma, Yingfeng; Zheng, Yanan

doi:10.3390/systems12060204

Open AccessArticle

Research on Grain Futures Price Prediction Based on a Bi-DSConvLSTM-Attention Model

School of Science, Zhejiang University of Science and Technology, Hangzhou 310023, China

^*

Author to whom correspondence should be addressed.

Systems 2024, 12(6), 204; https://doi.org/10.3390/systems12060204

Submission received: 7 May 2024 / Revised: 6 June 2024 / Accepted: 6 June 2024 / Published: 11 June 2024

Download

Browse Figures

Versions Notes

Abstract

Grain is a commodity related to the livelihood of the nation’s people, and the volatility of its futures price affects risk management, investment decisions, and policy making. Therefore, it is very necessary to establish an accurate and efficient futures price prediction model. Aiming at improving the accuracy and efficiency of the prediction model, so as to support reasonable decision making, this paper proposes a Bi-DSConvLSTM-Attention model for grain futures price prediction, which is based on the combination of a bidirectional long short-term memory neural network (BiLSTM), a depthwise separable convolutional long short-term memory neural network (DSConvLSTM), and an attention mechanism. Firstly, the mutual information is used to evaluate, sort, and select the features for dimension reduction. Secondly, the lightweight depthwise separable convolution (DSConv) is introduced to replace the standard convolution (SConv) in ConvLSTM without sacrificing its performance. Then, the self-attention mechanism is adopted to improve the accuracy. Finally, taking the wheat futures price prediction as an example, the model is trained and its performance is evaluated. Under the Bi-DSConvLSTM-Attention model, the experimental results of selecting the most relevant 1, 2, 3, 4, 5, 6, and 7 features as the inputs showed that the optimal number of features to be selected was 4. When the four best features were selected as the inputs, the RMSE, MAE, MAPE, and

R^{2}

of the prediction result of the Bi-DSConvLSTM-Attention model were 5.61, 3.63, 0.55, and 0.9984, respectively, which is a great improvement compared with the existing price-prediction models. Other experimental results demonstrated that the model also possesses a certain degree of generalization and is capable of obtaining positive returns.

Keywords:

BiLSTM; DSConvLSTM; attention mechanism; grain futures price prediction; mutual information method

1. Introduction

As a basic material for human survival, grain is a cornerstone in the development of human civilization. It is very important for social stability and economic development.

In comparison to general commodity prices, agricultural commodity prices are subject to a more intricate array of factors such as income levels, international oil prices, grain supply and demand conditions, and market speculation [1] and display irregular fluctuations, characterized by non-stationarity and non-linearity [2]. Violent price fluctuations will have a bad effect on farmer’s production decision making and people’s consumption, and may even cause uneven development of the national economy [3].

Thus, forecasting the grain futures price is of great importance for farmers, food supply chain participants, governments, and regulators [4]. Through effective forecasting, we can better cope with market fluctuations, make reasonable decisions, and promote the sustainable development of the food industry. However, due to the complex influence of multiple factors, grain futures price prediction is still a challenging field, and continuous research and innovation are needed to improve the accuracy and reliability of the prediction.

There are mainly three prediction methods in the field of time series: traditional econometric methods, artificial intelligence methods based on machine learning or deep learning, and combined prediction methods. The traditional econometric model has high requirements for data stationarity, and data with great volatility may lead to unstable prediction results and poor prediction performance on long time series data [5,6].

Compared with traditional econometric models, artificial intelligence models can achieve better prediction results in the face of large-scale data with many relevant features [7]. Typical representative models include artificial neural networks (ANNs), long short-term memory neural networks (LSTM), etc. The work [8] compares the errors of the LSTM, TDNN, and ARIMA models in forecasting the international monthly prices of corn and palm oil and finds that the errors of the LSTM model are smaller than ARIMA and the TDNN. Jiang Zhihang applied BiLSTM to predict cotton price and achieved a lower error than LSTM [9].

Researchers have applied the temporal convolutional network (TCN) model to predict pork prices and found the prediction effect of the TCN to be more obvious when dealing with large quantities of data [10]. The work [11] used ConvLSTM to predict the stock market. Compared with the classical LSTM model, the prediction accuracy of ConvLSTM is higher.

Although these methods have achieved good results in their respective fields, they all have certain limitations in application. For example, the LSTM network will lose the sequence feature information in the face of long time series, and there will also be the problem of disordered structural information between data [12]. BiLSTM can obtain global context information through the bidirectional structure, which can capture the overall sequence pattern, but it will ignore some local features [13]. Similarly, ConvLSTM is not effective in dealing with time series with long-term dependencies, and its complex model structure will lead to poor interpretability [14]. So, a single model has difficulty grasping the relevant dependencies in the time series [15], can be misleading, and is subject to at least three sources of uncertainty: data uncertainty, parameter uncertainty, and model uncertainty [16]. Therefore, it is difficult to find a single model that applies to all situations [17].

To this end, researchers have tried to combine multiple models for prediction. Sun combined variational mode decomposition (VMD), ensemble empirical mode decomposition (EEDM), and LSTM and used VMD and EEDM to perform decomposition twice to reduce the data complexity, which was combined as the VMD-EEMD-LSTM model [18]. In view of the complexity and long-term dependence of stock prices, Lu proposed a futures-price-prediction model combining a CNN and LSTM and introduced the attention mechanism to optimize the model. Compared with the RNN, MLP, CNN, LSTM, CNN-RNN, and other benchmark models, the accuracy was improved [19].

Researchers have also tried to introduce attention mechanisms, which are always combined with deep learning models [20,21], such as multi-head attention [22] and self-attention [23]. The attention mechanism has the advantage of overcoming the issues of long-term dependencies and information loss in recurrent neural networks [24]. It assigns a different importance to each element in the input sequence and focuses on the inputs with stronger correlations, thus better representing the input data. Sun used models such as the SWS-CNN to predict the rise and fall of financial assets and found that the SWS-CNN-Attention model, which had the attention mechanism added to it, on the basis of the SWS-CNN, performed better, with a higher accuracy, precision, recall rate, and F1-score than other simple deep learning models [25]. The work [26] combines a CNN, BiLSTM, and attention mechanism to solve the problem of information loss caused by a time series data input that is too long. The CNN-BiLSTM-Attention model had a significant improvement in the accuracy compared with the single LSTM, CNN-BiLSTM, CNN-LSTM, CNN, and other models.

In summary, combined models can effectively utilize information from the sample data, overcome the drawbacks of single models, and provide a more comprehensive and accurate prediction. They facilitate the integration of useful information from various methods, thereby improving prediction accuracy [27]. However, the combined models bring about new problems. The structure of combined models is more complex. This complexity will raise the cost of prediction and raise additional challenges for the practical deployment and application of the prediction models. Moreover, the performance of the model can vary significantly with respect to different datasets or conditions due to the substantial impact of random factors.

In order to enhance the accuracy and efficiency of the prediction model and support reasonable decision making, this paper considers the rationality of data feature selection and model combination to further improve the prediction performance, especially in the face of long-term time series prediction tasks, and proposes a grain futures price prediction model based on BiLSTM, DSConvLSTM, and an attention mechanism. The main work is as follows:

(1): Feature selection optimization: Calculate the mutual information value between each feature (the influence factor of grain futures price) and the futures price, then sort each feature according to the mutual information value; finally, determine the optimal number of features to select through comparative experiments instead of the traditional artificial setting or a theoretical hypothesis.
(2): Lightweight improvement: A more lightweight depthwise separable convolution (DSConv) was introduced to replace the standard convolution (SConv) in ConvLSTM, which can reduce the complexity of the model without sacrificing its performance [28].
(3): Model combination: This paper proposes a combined model, Bi-DSConvLSTM-Attention, for grain futures price prediction; the BiLSTM and DSConvLSTM neural network models are combined to make full use of the respective advantages of the two models, and the attention mechanism is introduced to enhance the model’s attention to relevant features by dynamically adjusting the weights of different time steps, so as to improve the accuracy and efficiency of grain futures price prediction.
(4): Comparative analysis of model performance: This paper first takes the wheat futures price as an example to determine the specific attention mechanism. Then, taking wheat futures price prediction as an example, through comparative experiments, the Bi-ConvLSTM-Attention, LSTM, BiLSTM, LSTM-Attention, TCN-Attention, CNN-BiLSTM-Attention, and BiLSTM-Attention models were selected. The performance of the proposed model was compared with those of the above selected models. The best performance was achieved in terms of the root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination ( $R^{2}$ ).
(5): Generalization capability test: The soybean futures price was selected for the generalization experiment, and the experimental results showed that the Bi-DSConvLSTM-Attention model also achieved the best performance on various evaluation indicators.

2. Materials and Methods

The price prediction model is mainly divided into the preprocessing module, feature selection module, and prediction module. The original data are preprocessed in the preprocessing module; the mutual information method is used to evaluate the dependence of grain futures prices on each feature selected. Then, the data are divided into the training and test data sets. The Bi-DSConvLSTM-Attention model predicts the grain futures price based on the training data set. Finally, the test data set is used to test the performance of the model. The main process is shown in Figure 1.

2.1. Mutual Information Feature Selection

There are many factors affecting the grain futures price, which will cause the “Curse of Dimensionality” problem in model training. In this paper, dimension reduction is carried out through feature selection. An ideal feature selection algorithm should remove irrelevant, weakly relevant, and redundant features, and retain non-redundant and strongly relevant features [29]. For a variety of influencing factors in the data, the mutual information method is used for feature selection, and the optimal number of features is determined by comparative experiments on the model. The mutual information value is calculated as shown in Formula (1).

I (X_{i}; Y) = \int \int p (x_{i}, y) log (\frac{p (x_{i}, y)}{p (x_{i}) p (y)}) d x_{i} d y, i = 1, 2, \dots, n,

(1)

where X_i and Y are continuous random variables, n represents the total number of influencing factors of the grain futures price, X_i represents the i-th factor among the influencing factors of the grain futures price, Y represents the grain futures price, p(

x_{i}, y

) represents the joint probability density function of random variables X_i and Y, and p(

x_{i}

) and p(y) represent the marginal probability density function of X_i and Y, respectively.

The mutual information value is a measure of the correlation between the grain futures price Y and the factors affecting the grain futures price X_i; the larger the value, the greater the dependence between them.

2.2. Depthwise Separable Convolution

In contrast to the traditional convolution operation, depthwise separable convolution (DSConv) decomposes the traditional convolution operation into two parts: depthwise convolution (DWConv) and pointwise convolution (PWConv) [30]. This approach can significantly reduce the number of parameters and computational cost of the model [31], here taking the input of

D x \times D y \times 4

as an example. This is shown in Figure 2.

In the first step (DWConv), it applies a

1 \times 3

convolution kernel to each channel of the input feature map independently, without changing the number of channels; this operation has the same number of output channels as input channels. Then, in the second step (PWConv), it expands the number of channels to N channels by processing the output of DWConv with

1 \times 1 \times N

convolution kernels. Compared with the standard convolution process, the proposed method can significantly reduce the number of required parameters, where a combination of

1 \times 3 \times N

convolution kernels is applied to all channels of the feature map to directly produce a new feature map with N channels. The number of parameters required for the DSConv and standard convolution (SConv) processes can be calculated and compared using specific formulas.

SConvLSTM and DSConvLSTM have significant differences in the calculation of the number of parameters. Taking the convolution kernel of

1 \times 3

as an example, The number of parameters is calculated as shown in Formulas (2) and (3), respectively.

\begin{matrix} N u m_{S C o n v L S T M} & = (C_{i n} + C_{o u t}) \times 1 \times 3 \times C_{o u t} \times 4 + b i a s \times C_{o u t} \times 4 \end{matrix}

(2)

\begin{matrix} N u m_{D S C o n v L S T M} & = (C_{i n} + C_{o u t}) \times 1 \times 3 + (C_{i n} + C_{o u t}) \times 1 \times 1 \times C_{o u t} \times 4 \\ + b i a s \times C_{o u t} \times 4 \end{matrix}

(3)

In the formulas,

C_{i n}

denotes the number of input channels and

C_{o u t}

denotes the number of output channels, and

b i a s

is the bias term, which can be 0 or 1. We found that, since both

C_{i n}

and

C_{o u t}

are greater than or equal to 1,

N u m_{S C o n v L S T M}

must be greater than

N u m_{D S C o n v L S T M}

. See Section 3.7 for detailed results.

2.3. Bi-DSConvLSTM-Attention

The bidirectional depthwise separable convolutional long short-term memory neural network model combined with the attention mechanism (Bi-DSConvLSTM-Attention) consists of BiLSTM, DSConvLSTM, and an attention layer. Its structure is shown in Figure 3:

(1): BiLSTM layer: By considering both forward and reverse information, it helps to fully understand the context in time series data and helps to improve the model’s ability to model time series patterns. BiLSTM consists of two LSTM layers: one LSTM layer processes the input sequence in order, and the other LSTM layer processes the input sequence in reverse order; finally, the outputs of the two LSTM layers are merged to obtain the complete context information at each time step. The LSTM unit structure is shown in Figure 4.

The LSTM is calculated as follows:

\begin{matrix} F_{t} & = σ (U_{f} \times x_{t} + W_{f} \times h_{t - 1} + b_{f}) \end{matrix}

(4)

\begin{matrix} C_{t}^{'} & = t a n h (U_{c} \times x_{t} + W_{c} \times h_{t - 1} + b_{c}) \end{matrix}

(5)

\begin{matrix} I_{t} & = σ (U_{i} \times x_{t} + W_{i} \times h_{t - 1} + b_{i}) \end{matrix}

(6)

\begin{matrix} C_{t} & = F_{t} C_{t - 1} + I_{t} C_{t}^{'} \end{matrix}

(7)

\begin{matrix} O_{t} & = σ (U_{o} \times x_{t} + W_{o} \times h_{t - 1} + b_{o}) \end{matrix}

(8)

\begin{matrix} h_{t} & = O_{t} \times t a n h (C_{t}) \end{matrix}

(9)

where

σ

represents the sigmoid activation function,

x_{t}

represents the input data,

h_{t - 1}

represents the previous time step’s output,

U_{f}, U_{c}, U_{i}

, and

U_{o}

represent the weight matrix multiplied by

x_{t}

, and

b_{f}

,

b_{c}

,

b_{i}

, and

b_{o}

represent the biases for

x_{t}

under the forget gate, cell state, input gate, and output gate, respectively.

W_{f}

,

W_{c}

,

W_{i}

, and

W_{o}

denote the weights for

h_{t - 1}

under the forget gate, cell state, input gate, and output gate, respectively. By applying the hyperbolic tangent activation function tanh, we obtain the new candidate cell state

C_{t}^{'}

, and then, using Formula (7), we compute the new cell state value.

After passing through the BiLSTM layer, the output h becomes

[h_{1}, h_{2}, \dots, h_{t}, \dots]

, and this result is reshaped into a 5D tensor format required for the DSConvLSTM layer input.

(2): DSConvLSTM layer: DSConvLSTM introduces DSConv operations into the traditional LSTM recurrent structure, treating commodity futures prices as translations in the temporal direction. By leveraging the translation invariance property of convolution, DSConvLSTM can automatically learn spatial and temporal features in time series data, thereby improving the model’s performance and expressive power. We replace the fully connected layers in the LSTM with DSConv to capture local patterns in the sequence. The computation formula for DSConvLSTM is as follows:

\begin{matrix} I_{t} & = σ (W_{m i} \times m_{t} + W_{n i} \times n_{t - 1} + W_{c i} \times C_{t - 1} + b_{i}) \end{matrix}

(10)

\begin{matrix} F_{t} & = σ (W_{m f} \times m_{t} + W_{n f} \times n_{t - 1} + W_{c f} \times C_{t - 1} + b_{f}) \end{matrix}

(11)

\begin{matrix} G_{t} & = t a n h (W_{m g} \times m_{t} + W_{n g} \times n_{t - 1} + b_{g}) \end{matrix}

(12)

\begin{matrix} C_{t} & = (F_{t} \otimes C_{t - 1} + I_{t} \otimes G_{t}) \end{matrix}

(13)

\begin{matrix} O_{t} & = σ (W_{m o} \times m_{t} + W_{n o} \times n_{t - 1} + W_{c o} \times C_{t - 1} + b_{o}) \end{matrix}

(14)

\begin{matrix} n_{t} & = (O_{t} \otimes t a n h (C_{t})) \end{matrix}

(15)

The formula assumes that the current time step is t, the input is

m_{t}

, the hidden state and the cell state at the previous time step are

n_{t - 1}

and

C_{t - 1}

, the output is

n_{t}

, the cell state is

C_{t}

, the input gate is

I_{t}

, the forget gate is

F_{t}

, the output gate is

O_{t}

, and the candidate state is

G_{t}

; ⊗ represents the Hadamard product; * represents the convolution operation.

W_{m \cdot}

,

W_{n \cdot}

, and

W_{c \cdot}

denote the convolution kernels for the input, hidden layer, and cell state, respectively, while

b_{i}

,

b_{f}

,

b_{g}

, and

b_{o}

denote the bias terms, respectively.

(3): Attention layer: The attention mechanism is widely used in natural language processing, image detection, speech recognition, and other fields [32].

Attention consists of three main phases:

Phase 1: Calculate the similarity weight of each of the

h_{t}

data, as shown in Formula (16).

Phase 2: Normalize the similarity weight by the softmax function, as in Formula (17).

Phase 3: The normalized similarity weights and their corresponding data are weighted and summed to obtain the output matrix R, as shown in Formula (18).

\begin{matrix} e_{t} & = t a n h (W_{t} \times h_{t} + {b_{t}}^{★}) \end{matrix}

(16)

\begin{matrix} a_{t} & = s o f t m a x (e_{t}) = e x p (e_{t}) / \sum_{j = 1}^{t} e_{j} \end{matrix}

(17)

\begin{matrix} R = \sum_{j = 1}^{t} a_{j} h_{j} \end{matrix}

(18)

where

W_{t}

denotes the connection weight matrix,

{b_{t}}^{★}

is the bias, and

h_{j}

and

e_{j}

, like

h_{t}

and

e_{t}

, denote the hidden state vector and its similarity weight at time j, respectively.

The attention mechanism will enhance the model’s attention given to relevant features by dynamically adjusting the weights of different time steps, which is helpful to solve the problems of the performance degradation of neural networks caused by the increase of the input length and the low computational efficiency caused by the unreasonable input order.

2.4. Data Presentation

Considering the diversity of grains, and the wide application of wheat and soybean, the market is more transparent, so this paper takes wheat as an example to carry out the experiments. According to the literature [33], this paper selects historical daily oil and natural gas prices and inflation indices from the Kaggle website [34] and collects U.S. wheat, U.S. soybean, corn, and gold futures prices, as well as the exchange rate between USD and RMB, wheat output, and other data from websites such as Investing.com (accessed on 23 July 2023). [35]. Firstly, the data collected from different sources were integrated, and then, the data with missing values and outliers were deleted. Finally, 2118 wheat and soybean data were retained. The features possessed are shown in Table 1, and there were 42 features. The training and test data sets were divided according to the ratio of 4:1, that is 1669 data in the training data set and 399 data in the test data set.

3. Analysis and Discussion

3.1. Evaluation Metrics

In order to accurately evaluate the model’s performance, this paper will use four indicators to evaluate the model: the root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination (

R^{2}

). Their computed expressions are shown in Formulas (19)–(22), respectively.

\begin{matrix} R M S E & = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - {\hat{x}}_{i})}^{2}} \end{matrix}

(19)

\begin{matrix} M A E & = \frac{1}{n} \sum_{i = 1}^{n} | x_{i} - {\hat{x}}_{i} | \end{matrix}

(20)

\begin{matrix} M A P E & = \frac{1}{n} \sum_{i = 1}^{n} | \frac{x_{i} - {\hat{x}}_{i}}{x_{i}} | \end{matrix}

(21)

\begin{matrix} R^{2} & = 1 - \frac{S S E}{S S T} \end{matrix}

(22)

In the formulas, n is the number of measurements,

x_{i}

is the actual value of the i-th sample, and

{\hat{x}}_{i}

is the corresponding predicted value. SSE represents the sum of the squared differences between the actual observed values and the values predicted by the model. SST represents the sum of the squared differences between the actual observations and the mean of the observations.

3.2. Environment

The main software and third-party library versions used in the experiment were Python 3.6.13, Numpy 1.14.6, Pandas 1.0.5, Tensorflow 2.6.2, Scikit-learn 0.23.2, Keras 2.6.0, and Attention 4.1. After multiple training iterations, the parameters of the Bi-DSConvLSTM-Attention model in the experiment were set as shown in Table 2.

3.3. Feature Selection Analysis

The mutual information value ranking of each feature and the wheat futures price was obtained by the mutual information method in Section 2.1, as shown in Figure 5, which shows the mutual information values of each feature with a partial top ranking. Features with low relevance to the target were excluded, and features with higher relevance were selected as the model input to achieve faster training speed and better model performance.

From Figure 5, it can be found that the mutual information values of different features and wheat futures are arranged in gradient order, and the first four features have a high correlation, which are wheat closing_price, low_price, high_price, and open_price. After the fifth feature, the correlation does not change much, so in this experiment, the first 1, 2, 3, 4, 5, 6, and 7 features were selected as the input for experimental comparison to determine the optimal number of features.

For the top 1, 2, 3, 4, 5, 6, and 7 features selected by the mutual information method as the input of the Bi-DSConvLSTM-Attention model for comparison, the experimental results of various performance evaluation indicators are shown in Table 3.

The experimental results show that the model had the best performance when the first four features were selected as the input. Therefore, the optimal number of features was four, that is wheat_close, wheat_low, wheat_high, and wheat_open were selected as the input features of the model, which greatly reduced the data dimension and improved the training efficiency of the model.

3.4. Attention Mechanisms’ Analysis

On the basis of the Bi-DSConvLSTM model, a comparative analysis was conducted by adding eight-head attention, four-head attention, two-head attention, self-attention, and no attention. The prediction results are shown in Table 4.

According to the experiment, it can be seen that the attention mechanism can effectively improve the accuracy of the model. Compared with the self-attention mechanism, although the RMSE, MAE, MAPE, and

R^{2}

of the eight-head attention mechanism and the four-head attention mechanism were basically the same for wheat futures prediction, the training time was longer under the same epoch. However, the two-head attention mechanism had a higher error rate. Therefore, in the futures price prediction, this paper chooses the self-attention mechanism.

3.5. The Performance Analysis

Considering computational resources and efficiency, we selected wheat_close, wheat_low, wheat_high, and wheat_open as the input features for the models. We compared the price prediction performance of seven existing models: Bi-ConvLSTM-Attention, LSTM, BiLSTM, LSTM-Attention, TCN-Attention, CNN-BiLSTM-Attention, and BiLSTM-Attention [36]. The performance of wheat price prediction on the test data set for each model is shown in Table 5.

The experimental results showed that the Bi-DSConvLSTM-Attention model performed better on various performance evaluation indicators compared with Bi-ConvLSTM-Attention, LSTM, BiLSTM, ConvLSTM, TCN-Attention, CNN-BiLSTM-Attention, and BiLSTM-Attention.

The RMSE experimental results of the Bi-DSConvLSTM-Attention model were 17.50%, 64.27%, 61.73%, 52.46%, 63.15%, 45.59%, and 57.77% higher than those of Bi-ConvLSTM-Attention, LSTM, BiLSTM, ConvLSTM, TCN-Attention, CNN-BiLSTM-Attention, and BiLSTM-Attention, respectively. The MAE experimental results were 21.60%, 71.97%, 70.32%, 55.30%, 67.82%, 57.54%, and 63.48% higher than those of Bi-ConvLSTM-Attention, LSTM, BiLSTM, ConvLSTM, TCN-Attention, CNN-BiLSTM-Attention, and BiLSTM-Attention, respectively. The MAPE experimental results were 25.68%, 72.36%, 71.50%, 56.00%, 70.90%, 58.65%, and 60.71% higher than those of Bi-ConvLSTM-Attention, LSTM, BiLSTM, ConvLSTM, TCN-Attention, CNN-BiLSTM-Attention, and BiLSTM-Attention, respectively. On R², the proposed Bi-DSConvLSTM-Attention model also outperformed the other common models. The prediction results of these models for the futures price of the next 180 days are visualized in Figure 6.

3.6. Generalization Analysis

In order to verify the universality of Bi-DSConvLSTM-Attention on different grains, this paper also conducts the same experiment test on soybean. The experimental results are shown in Table 6.

The experimental results showed that the error of Bi-DSConvLSTM-Attention was much lower than that of the commonly used time series prediction models. It was verified that the model has strong generalization ability.

The prediction results of these models, for the futures price of the next 180 days, are visualized in Figure 7.

3.7. Efficiency Analysis

According to Section 2.2, DSConvLSTM has fewer parameters than SConvLSTM, so Bi-DSConvLSTM-Attention should have higher efficiency. In this paper, we compared the average time of Bi-DSConvLSTM-Attention and Bi-ConvLSTM-Attention over multiple runs. The experimental results are shown in Table 7.

The experimental results showed that the use of DSConv can reduce the run time by about 10%, and the number of parameters is reduced by about 25%, so the use of DSConv has a significant efficiency advantage.

3.8. Return Test Analysis

According to the literature [37], in order to assess whether our model can yield positive returns, we conducted multiple training iterations and calculated the average profit and loss (PNL) to evaluate the outcomes on the test dataset.

The average PNL on the test set was 1.52%.

Since the PNL was positive, holding or selling grain according to the forecast results of our prediction model can obtain positive returns.

4. Conclusions

Accurate and efficient prediction of grain futures prices has important positive implications for risk management, investment decisions, policy making, market transparency, and global market impact. In this paper, we propose a grain futures price prediction model based on BiLSTM, DSConvLSTM, and an attention mechanism. Through the experimental analysis, the following conclusions were drawn:

(1): The combined model based on BiLSTM, DSConvLSTM, and the attention mechanism can effectively learn the eigenvalues of the data and enhance the prediction ability.
(2): Compared with ordinary convolution algorithms, the model proposed in this paper introduces DSConv, which is more flexible, achieves lower memory usage, and accelerates the calculation speed.
(3): The Bi-DSConvLSTM-Attention model proposed in this paper has good universality, which can solve the impact caused by data differences to a certain extent.
(4): The Bi-DSConvLSTM-Attention model has the ability to predict short-term futures prices and obtain positive returns.

This model can accurately predict the price of grain futures and has a clear reference significance and promotion value for the field of futures price forecasting. In future research, we will explore feature extraction and model combination methods in depth, and we will further investigate strategies to improve the efficiency and generalization ability of the model to optimize the performance of the model.

Author Contributions

Conceptualization, B.Y. and J.L.; methodology, B.Y. and J.L.; writing—original draft preparation, J.L. and Y.M.; writing—review and editing, J.L. and Y.Z.; supervision and project administration, B.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China under grant agreement No. 61972357 and by the Industry-University-Research innovation fund of Chinese colleges under grant agreement No. 2022IT009.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Baffes, J.; Haniotis, T. What explains agricultural price movements? J. Agric. Econ. 2016, 67, 706–721. [Google Scholar] [CrossRef]
Cao, Y.L.; Mohiuddin, M. Sustainable emerging country agro-food supply chains: Fresh vegetable price formation mechanisms in rural China. Sustainability 2019, 11, 2814. [Google Scholar] [CrossRef]
Su, X.; Wang, Y.; Duan, S.; Ma, J. Detecting Chaos from Agricultural Product Price Time Series. Entropy 2014, 16, 6415–6433. [Google Scholar] [CrossRef]
Devyatkin, D.; Otmakhova, Y. Methods for mid-term forecasting of crop export and production. Appl. Sci. 2021, 11, 10973. [Google Scholar] [CrossRef]
Xiao, L.; An, R.F.; Zhang, X. A Deep Learning Approach Based on Novel Multi-Feature Fusion for Power Load Prediction. Processes 2024, 12, 793. [Google Scholar] [CrossRef]
Tang, Z.; Zhang, T.; Wu, J.; Du, X.; Chen, K. Multistep-ahead stock price forecasting based on secondary decomposition technique and extreme learning machine optimized by the differential evolution algorithm. Math. Probl. Eng. 2020, 2020 Pt 26, 2604915. [Google Scholar] [CrossRef]
Haider, S.A.; Naqvi, S.R.; Akram, T.; Umar, G.A.; Shahzad, A.; Sial, M.R.; Khaliq, S.; Kamran, M. LSTM Neural Network Based Forecasting Model for Wheat Production in Pakistan. Agronomy 2019, 9, 72. [Google Scholar]
Jaiswal, R.; Jha, G.K.; Kumar, R.R.; Choudhary, K. Deep long short-term memory based model for agricultural price forecasting. Neural Comput. Appl. 2022, 34, 4661–4676. [Google Scholar] [CrossRef]
Jiang, Z.H.; Wang, Y.X.; Yan, J.J.; Zhou, T.H. Modeling and analysis of cotton price forecast based on bilstm. J. Chin. Agric. Mech. 2021, 42, 151–160. [Google Scholar]
Yang, Y.; Zhao, Y.; Lai, S. Temporal Convolutional Network for Pork Price Prediction. In Proceedings of the 2020 International Conference on Big Data in Management, Manchester, UK, 15–17 May 2020; pp. 91–95. [Google Scholar]
Lee, S.W.; Kim, H.Y. Stock market forecasting with super-high dimensional time-series data using ConvLSTM, trend sampling, and specialized data augmentation. Expert Syst. Appl. 2020, 161, 113704. [Google Scholar] [CrossRef]
Zhu, L.J.; Xun, Z.H.; Wang, Y.X.; Cui, Q.; Chen, W.Y.; Lou, J.C. Short-term power load forecasting based on CNN-BiLSTM. Power Syst. Technol. 2021, 45, 4532–4539. [Google Scholar]
Lin, Y.; Chen, K.C.; Zhang, X.; Tan, B.; Qin, L. Forecasting crude oil futures prices using BiLSTM-Attention-CNN model with Wavelet transform. Appl. Soft Comput. 2022, 130, 109723. [Google Scholar] [CrossRef]
Beiki, A.; Kamali, R. Novel attention-based convolutional autoencoder and ConvLSTM for reduced-order modeling in fluid mechanics with time derivative architecture. Phys. Nonlinear Phenom. 2023, 454, 133857. [Google Scholar] [CrossRef]
Xiao, Y.T.; Yin, H.S.; Zhang, Y.D.; Qi, H.G.; Zhang, Y.D.; Liu, Z.Y. A dual-stage attention-based Conv-LSTM network for spatio-temporal correlation and multivariate time series prediction. Int. J. Intell. Syst. 2021, 36, 2036–2057. [Google Scholar] [CrossRef]
Wang, X.Q.; Rob, J.; Hyndman; Li, F.; Kang, Y.F. Forecast combinations: An over 50-year review. Int. J. Forecast. 2023, 39, 1518–1547. [Google Scholar] [CrossRef]
Wu, H.; Levinson, D. The ensemble approach to forecasting: A review and synthesis. Transp. Res. Part Emerg. Technol. 2021, 132, 103357. [Google Scholar] [CrossRef]
Sun, C.X.; Pei, M.H.; Cao, B.; Chang, S.H.; Si, H. A Study on Agricultural Commodity Price Prediction Model Based on Secondary Decomposition and Long Short-Term Memory Network. Agriculture 2024, 14, 60. [Google Scholar] [CrossRef]
Lu, W.J.; Li, J.Z.; Li, Y.F.; Sun, A.J.; Wang, J.Y. A CNN-LSTM-based model to forecast stock prices. Complexity 2020, 2020, 6622927. [Google Scholar] [CrossRef]
Jia, N.; Zheng, C.J. Short-term forecasting model of agricultural product price index based on LSTM-DA neural network. Comput. Sci. 2019, 46, 62–65+71. [Google Scholar]
Liu, X.; Zhou, J. Short-term wind power forecasting based on multivariate/multi-step LSTM with temporal feature attention mechanism. Appl. Soft Comput. 2024, 150, 111050. [Google Scholar] [CrossRef]
Yang, G.; Ma, J.X.; Sun, G.D. A structural pruning method for lithium-ion batteries remaining useful life prediction model with multi-head attention mechanism. J. Energy Storage 2024, 86, 111396. [Google Scholar]
Zhang, Z.; Wang, J.; Wei, D.; Luo, T.; Xia, Y. A novel ensemble system for short-term wind speed forecasting based on Two-stage Attention-Based Recurrent Neural Network. Renew. Energy 2023, 204, 11–23. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Sun, Q.S.; Zhang, J.X.; Cheng, H.Y.; Zhang, Q.; Wei, X.P. Financial time series data prediction by attention-based convolutional neural network. J. Comput. Appl. 2022, 42, 290–295. [Google Scholar]
Luo, A.; Zhong, L.; Wang, J.L.; Wang, Y.; Li, S.J.; Dai, W.P. Short-Term Stock Correlation Forecasting Based on CNN-BiLSTM Enhanced by Attention Mechanism. IEEE Access 2024, 12, 29617–29632. [Google Scholar] [CrossRef]
Sun, F.H.; Meng, A.Y.; Zhang, Y.; Wang, Y.; Jiang, H.T.; L, Z. Agricultural Product Price Forecasting Methods: A Review. Agriculture 2023, 13, 1671. [Google Scholar] [CrossRef]
Zhou, Y.; Kang, X.; Ren, F.; Lu, H.; Nakagawa, S.; Shan, X. A multi-attention and depthwise separable convolution network for medical image segmentation. Neurocomputing 2024, 564, 126970. [Google Scholar] [CrossRef]
Li, Z.Q.; Du, J.Q.; Nie, B.; Xiong, W.p.; Huang, C.Y.; Li, H. Summary of Feature Selection Methods. Comput. Eng. Appl. 2019, 55, 10–19. [Google Scholar]
Xiao, Z.J.; Zhang, Z.K.; Hung, K.W.; Lui, S. Real-time video super-resolution using lightweight depthwise separable group convolutions with channel shuffling. J. Vis. Commun. Image Represent. 2021, 75, 103038. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.L.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Shen, X.; Hou, X.; Yin, C. State attention in deep reinforcement learning. CAAI Trans. Intell. Syst. 2020, 15, 317–322. [Google Scholar]
Sun, Y.; Gao, M.M. A comparative study on the influencing factors and types of bulk commodity prices. Price Theory Pract. 2018, 9, 86–89+102. [Google Scholar]
Historical Commodity Prices from 2000–2022. Available online: https://www.kaggle.com/datasets/prasertk/historical-commodity-prices-from-20002022 (accessed on 23 July 2023).
Grains Commodities Prices. Available online: https://cn.investing.com/commodities/grains (accessed on 23 July 2023).
Huang, Y.; Wan, X.; Zhang, L.; Lu, X. A novel deep reinforcement learning framework with BiLSTM-Attention networks for algorithmic trading. Expert Syst. Appl. 2024, 240, 122581. [Google Scholar] [CrossRef]
Thaker, A.; Chan, L.H.; Sonner, D. Forecasting Agriculture Commodity Futures Prices with Convolutional Neural Networks with Application to Wheat Futures. J. Risk Financ. Manag. 2024, 17, 143. [Google Scholar] [CrossRef]

Figure 1. Process of price prediction model.

Figure 2. Comparison of standard convolution and depthwise separable convolution.

Figure 3. Bi-DSConvLSTM-Attention model architecture.

Figure 4. LSTM model structure.

Figure 5. Ranking of mutual information values between some features and wheat futures price.

Figure 6. Wheat futures price prediction.

Figure 7. Soybean futures price prediction.

Table 1. Features of wheat data.

Serial Number	Feature Types	Specific Features
1	Related agricultural products (10 features)	open price, close price, high price, low price, and volume change rate for U.S. corn and soybeans
2	Energy and metals (25 features)	opening price, closing price, highest price, lowest price and trading volume of Brent crude, WTI crude, natural gas, fuel oil, gold
3	Economy (3 features)	USD/RMB exchange rate, U.S. wheat production, world wheat production
4	Wheat futures (4 features)	open, close, high, and low price of U.S. wheat futures

Table 2. Model parameter settings.

Hyper-Parameter	Parameter Values
DSConv filters	12
DSConv kernel size	(1, 3)
lr	0.001
epoch	30
dense unit	1
batch size	1
time steps	25

Table 3. Model performance evaluation metrics with different numbers of features selected.

Number of Features	RMSE	MAE	MAPE	$R^{2}$
1	20.35	15.64	2.45	0.9762
2	12.91	9.14	1.42	0.9904
3	10.93	8.24	1.29	0.9931
4	5.61 *	3.63 *	0.55 *	0.9984 *
5	5.95	3.68	0.57	0.9980
6	6.37	3.97	0.61	0.9977
7	10.05	8.37	1.30	0.9942

* Indicates the best performing model based on the evaluation criteria.

Table 4. Comparison of Bi-DSConvLSTM with different attention mechanisms.

Attention Type	RMSE	MAE	MAPE	$R^{2}$	Run Time (s)
Multi-head Attention (8 heads)	5.76	3.57 *	0.57	0.9981	177.5783
Multi-head Attention (4 heads)	5.90	3.73	0.64	0.9981	142.8869
Multi-head Attention (2 heads)	7.46	5.54	0.84	0.9967	132.0444
Self-Attention Mechanisms	5.61 *	3.63	0.55 *	0.9984 *	132.5239
Without Attention Mechanisms	7.76	3.68	0.89	0.9965	129.0359 *

* Indicates the best performing model based on the evaluation criteria.

Table 5. Forecasting performance of wheat futures prices for each model.

Model	RMSE	MAE	MAPE	$R^{2}$
Bi-DSConvLSTM-Attention	5.61 *	3.63 *	0.55 *	0.9984 *
Bi-ConvLSTM-Attention	6.80	4.63	0.74	0.9973
LSTM	15.73	12.95	1.99	0.9858
BiLSTM	14.66	12.23	1.93	0.9877
ConvLSTM	11.80	8.12	1.25	0.9920
BiLSTM-Attention	13.29	9.94	1.40	0.9899
TCN-Attention	15.24	11.28	1.89	0.9858
CNN-Bi-LSTM-Attention	10.31	8.55	1.33	0.9939

* Indicates the best performing model based on the evaluation criteria.

Table 6. Forecasting performance of soybean futures prices under each model.

Model	RMSE	MAE	MAPE	$R^{2}$
Bi-DSConvLSTM-Attention	12.51 *	9.14 *	0.81 *	0.9961 *
Bi-ConvLSTM-Attention	15.42	11.60	1.04	0.9926
LSTM	20.78	17.07	1.56	0.9865
BiLSTM	16.87	13.37	1.20	0.9911
ConvLSTM	17.02	12.92	1.12	0.9909
BiLSTM-Attention	14.61	10.01	0.83	0.9960
TCN-Attention	16.69	12.81	1.12	0.9913
CNN-Bi-LSTM-Attention	15.12	10.97	1.03	0.9929

* Indicates the best performing model based on the evaluation criteria.

Table 7. Comparison of model run times.

Model	Number of Parameters	Run Time/s
Bi-DSConvLSTM-Attention	4497	132.0444
Bi-ConvLSTM-Attention	5985	146.2326

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yun, B.; Lai, J.; Ma, Y.; Zheng, Y. Research on Grain Futures Price Prediction Based on a Bi-DSConvLSTM-Attention Model. Systems 2024, 12, 204. https://doi.org/10.3390/systems12060204

AMA Style

Yun B, Lai J, Ma Y, Zheng Y. Research on Grain Futures Price Prediction Based on a Bi-DSConvLSTM-Attention Model. Systems. 2024; 12(6):204. https://doi.org/10.3390/systems12060204

Chicago/Turabian Style

Yun, Bensheng, Jiannan Lai, Yingfeng Ma, and Yanan Zheng. 2024. "Research on Grain Futures Price Prediction Based on a Bi-DSConvLSTM-Attention Model" Systems 12, no. 6: 204. https://doi.org/10.3390/systems12060204

APA Style

Yun, B., Lai, J., Ma, Y., & Zheng, Y. (2024). Research on Grain Futures Price Prediction Based on a Bi-DSConvLSTM-Attention Model. Systems, 12(6), 204. https://doi.org/10.3390/systems12060204

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Grain Futures Price Prediction Based on a Bi-DSConvLSTM-Attention Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Mutual Information Feature Selection

2.2. Depthwise Separable Convolution

2.3. Bi-DSConvLSTM-Attention

2.4. Data Presentation

3. Analysis and Discussion

3.1. Evaluation Metrics

3.2. Environment

3.3. Feature Selection Analysis

3.4. Attention Mechanisms’ Analysis

3.5. The Performance Analysis

3.6. Generalization Analysis

3.7. Efficiency Analysis

3.8. Return Test Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI