Next Article in Journal
Synthetic User Generation in Games: Cloning Player Behavior with Transformer Models
Previous Article in Journal
Benchmarking Methods for Pointwise Reliability
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Price Prediction of Stock Price Index Based on Combination Method with Introduction of Options Market Information

1
School of Management, Suzhou Vocational University, Suzhou 215104, China
2
School of Finance, Nanjing University of Finance and Economics, Nanjing 210023, China
*
Author to whom correspondence should be addressed.
Information 2025, 16(4), 328; https://doi.org/10.3390/info16040328
Submission received: 11 March 2025 / Revised: 7 April 2025 / Accepted: 17 April 2025 / Published: 21 April 2025

Abstract

:
This study establishes a combination method-based prediction model for the CSI 300 stock index price embedded with options market information. Firstly, utilizing options and spot market information, a BP neural network is employed to predict the CSI 300 stock index price. Secondly, a logical framework based on a combination method is constructed to further optimize the CSI 300 stock index price prediction through decomposition–clustering, error adjustment, and weighted integration approaches. The results demonstrate the following: (1) Compared to price predictions based solely on spot market information, the introduction of options market information significantly enhances the forecasting performance for the CSI 300 index price. (2) From the perspective of options moneyness classification, after incorporating options information, different types of options contracts exhibit varying impacts on the CSI 300 index price prediction. Prior to optimization, predictions incorporating in-the-money call options with maximum trading volume yield the optimal performance based on the MSE metric. (3) Under the logical framework of the combination method, the prediction effect for the CSI 300 stock index price is gradually improved after introducing the decomposition–clustering method, the error adjustment method, and the price-weighted integration method, which shows that it is appropriate to use the combination method to optimize the price prediction. Overall, this study proposes a combination method for price forecasting incorporating options market information across diverse contract types. It allows for weighted integration of prediction results derived from various options information, offering a novel research angle for spot market price prediction. The study also underscores the importance of implicit information mining and multi-market information fusion for price prediction, which is expected to become a key research focus in this field.

1. Introduction

With the continuous development of artificial intelligence technology in recent years, machine learning-based time series price prediction has attracted research attention [1,2,3,4,5,6,7]. This approach has also gained increasing popularity among investors, who seek to leverage it for enhanced profitability in the stock market. The existing literature extensively discusses various stock price prediction methods. To enhance the performance of financial time series forecasting models, Cheng and Wei [8] proposed a hybrid model integrating the empirical mode decomposition (EMD) with support vector regression to predict the stock index price of the Taiwan Stock Exchange. Their study found that the improved price forecasting model significantly outperformed traditional linear regression prediction models. Li and Bao [9] utilized the empirical mode decomposition (EMD) method to extract multi-time-scale features from each sequence, separately modeled low-frequency and high-frequency components, and applied an optimization algorithm to five datasets from major global markets. The findings demonstrated the model’s superiority in the price prediction process. Tang and Lin [10] adopted a value investing perspective by selecting the top 200 stocks in terms of market capitalization from the A-share market. They employed the random forest (RF) method to screen financial feature variables significantly influencing the performance of support vector regression (SVR) and dynamically searched for SVR parameters using a quantum genetic algorithm (QGA), establishing an RF-QGA-SVR model for annual stock ranking. The return rate of the constructed stock portfolio significantly outperformed the market benchmark returns. Liu et al. [11] proposed a model based on convolutional neural networks and long short-term memory neural networks (CNN-LSTM) to analyze quantitative strategies in the stock market. They utilized a CNN to develop a quantitative stock selection strategy for identifying stock trends then employed the LSTM to formulate a quantitative timing strategy for profit enhancement, achieving returns superior to the benchmark index. Wei and Chaudhary [12] proposed a systematic method integrating manual selection, segmentation algorithms, and training error feedback, employing recurrent neural networks (RNNs) to identify trend characteristics in stock price time series, which significantly enhanced the predictive power. In addition, Lin et al. [13] proposed a hybrid model based on the combination of LSTM and the complete ensemble empirical mode decomposition with the adaptive noise method (CEEMDAN) to predict the CSI 300 index and the S&P 500 index and used loss functions as evaluation criteria. Wu et al. [14] predicted Brent crude oil futures closing prices using an LSTM model, followed by a GRU-based error correction method, which significantly improved forecast accuracy.
From the above literature, it can be seen that there have been many studies on price prediction based on machine learning methods. However, most of the literature focuses on price prediction based on spot market information. Options market price information can guide spot market price [15,16,17,18,19,20]. Ahn et al. [21] investigated the price discovery mechanism between SSE 50 index options and the corresponding spot markets, finding that the options market rapidly exhibits a price leadership role in the underlying spot market. Goncalves-Pinto et al. [22] investigated the predictive power of options markets on stock returns and pointed out that the options-implied prices can provide an anchor for the fundamental value of stocks. Patel et al. [23] showed that derivatives markets have stronger price discovery abilities than previously thought. About a quarter of new information first appears in options prices before spreading to stock prices. Additionally, Cui et al. [18] conducted regression analyses demonstrating that the various types of information implicit in options prices can provide significant incremental information, thereby enhancing the predictive power of regression models. Wang [24] found that the CSI 300 stock index options played a very important role in price discovery and risk control based on the options pricing model and GARCH model. Therefore, it is necessary to incorporate information from the options market when conducting asset price forecasting in the spot market.
In summary, machine learning-driven time series price forecasting has garnered significant scholarly attention, yielding numerous methodological advancements. Researchers have explored diverse technical combinations by introducing novel elements from varied perspectives to enhance predictive performance. However, prevailing approaches predominantly focus on spot market data analysis, neglecting the introduction of options market information. Given the leading role of derivatives prices in spot prices, it is necessary to incorporate options information into spot price forecasting to provide valuable incremental information. However, existing studies have rarely proposed combined forecasting methods using different options contracts. This area requires further exploration. Hence, this study proposes a combination method for predicting the CSI 300 stock index price based on a logical framework of “original series decomposition–sample entropy clustering–machine learning prediction–error adjustment-weighted integration”, providing an effective supplement to the research of time series price prediction methods. Furthermore, within this combination forecasting framework, we innovatively incorporate options-implied spot price signals derived from diverse contracts to generate multiple corresponding prediction results and then perform weighted integration of the prediction results derived from various options information, offering a novel research angle for spot market price prediction.
The remainder of this paper is organized as follows. Section 2 presents the logical framework based on a combination method for CSI 300 index price forecasting. Section 3 reports the detailed investigation into combination method-based price prediction and presents the results. Finally, Section 4 details the research conclusions and outlines future research directions.

2. Theoretical Framework

This section primarily discusses the logical framework of the proposed combination method for CSI 300 index price prediction and elaborates on the main models and technical approaches employed in this combination method, thereby establishing a theoretical foundation for subsequent price prediction research. With this consideration, the discussion in this section focuses on two aspects: first, the logical framework of the combination method for CSI 300 index price prediction, which is detailed in Section 2.1, and second, the relevant models and technical approaches within this logical framework, as discussed in Section 2.2.

2.1. Logical Framework of Combination Method

This study proposes a combination method incorporating options market information for CSI 300 index price forecasting. The options pricing formula is utilized within the logical framework of this combination method to extract the spot price information, which is implicit in the options market. From the perspective of moneyness classification, we investigated the differences in prediction performance arising from spot price information embedded in five categories of options contracts: at-the-money options, in-the-money options with maximum trading volume, in-the-money options with maximum open interest, out-of-the-money options with maximum trading volume, and out-of-the-money options with maximum open interest. This study explores the effectiveness of integrating decomposition–clustering, error adjustment, and price-weighted integration techniques. The distinct forecasting results derived from different options contracts establish the foundation for obtaining weighted integration predictions with the combination method. The logical framework of the proposed approach is illustrated in Figure 1.
As shown in Figure 1, the combination method for CSI 300 index price prediction proposed in this paper can be decomposed into three components: price prediction optimization based on the decomposition–clustering method, price prediction optimization based on error adjustment, and price prediction optimization based on weighted integration.

2.2. Technical Approaches Within the Logical Framework of the Combination Method

The spot price information implied by the options market is first extracted based on the logical framework of the combination method for CSI 300 index price prediction proposed in Section 2.1, followed by the analysis of differences in prediction effectiveness influenced by distinct options information. In the specific price forecasting process, CEEMDAN is employed to decompose time series into subsequences. A BP neural network was then applied for subsequence-based price prediction, and the final forecasting results were obtained through error adjustment and weighted integration methods. This section primarily focuses on discussing the options pricing formula, CEEMDAN, and BP neural network employed in this research.

2.2.1. Options Pricing Formula

In this research, extracting the implied spot price information embedded in the options market requires the use of an options pricing formula. By inverting the pricing formula, the implied spot price S is derived from the observed market prices of options. The options pricing formula is provided by Formula (1).
c t = S t e q ( T t ) N d 1 X e r ( T t ) N d 2
where d 1 = ln ( S t / X ) + r q + σ 2 / 2 T t σ T t , and d 2 = d 1 σ T t . c t denotes the call options price, S is the spot price of the underlying asset, r denotes the continuous risk-free rate, and q represents the continuous dividend yield of the underlying asset. X is the strike price, and T is the time to maturity. The variable δ denotes volatility, measured in this research using the 90-day historical volatility. According to Formula (1), the spot price S can be derived inversely by utilizing the other five known variables within the formula. The implied spot price S is the implied information from the options market employed in this research. The data for the variables required to obtain the implied spot price information from the options market can be downloaded from the Wind Database.

2.2.2. CEEMDAN

Financial data contain significant noise that affects the prediction performance of price forecasting models. By applying the CEEMDAN algorithm, the original time series can be decomposed into multiple subsequences, where the low-frequency and medium-frequency subsequences exhibit smaller fluctuations and clearer deterministic trends, making them easier to predict. Consequently, better prediction results for the CSI 300 index can be achieved through the improved forecasting of these subsequences and summing the predictions of these subsequences to obtain the final prediction result.
The CEEMDAN algorithm [25] has been improved on the basis of EMD [26] and EEMD [27]. Modal aliasing occurs in EMD and EEMD due to some defects. CEEMDAN was developed to address the aforementioned issues, and its specific implementation process is as follows.
(1) Add Gaussian white noise G i t to the original signal M i t to be decomposed, forming a new composite signal N i t , as shown in Formula (2):
N i t = M i t + G i t
(2) Decompose the newly generated signal N i t using the EMD method, retain the first intrinsic mode function (IMF) d obtained each time, repeat this process n times, and average the n sets of d to obtain I M F 1 .
(3) Remove the first IMF (the averaged signal I M F 1 ) from the original composite signal to obtain the residual signal x i t , as shown in Formula (3):
x i t = N i t I M F 1
(4) Repeat the above operations on the new residual signal: add Gaussian white noise, decompose the generated signal, retain the first IMF, repeat n times, take the average (denoted as I M F 2 ), then remove I M F 1 and I M F 2 from the original signal. Continue this above process until the final residual signal can no longer be decomposed.

2.2.3. BP Neural Network

The original time series is decomposed using CEEMDAN from Section 2.2.2 to obtain multiple subsequences. Further, modeling is performed at the subsequence level to achieve predictions for each subsequence. This study proposes a combination approach for forecasting the CSI 300 index price. Within the logical framework of this combination method, machine learning tools are employed to predict individual subsequences. Meaning machine learning serves as one pathway for price prediction within this framework. A BP neural network is used for price prediction modeling in this research. Notably, other machine learning tools could also be applied for this purpose, as discussed in the conclusion. This section primarily provides a brief introduction to the BP neural network.
A BP neural network [28] has the characteristics of multi-layer forward feedback propagation, and its excellent mapping ability of high-dimensional functions enables it to deal with complex nonlinear problems in parallel. It consists of a three-layer structure of an input layer ( m ), a hidden layer ( q ), and an output layer ( l ). The mathematical relationship between these three layers is as follows [29]:
From the input layer to the hidden layer,
b i = f b γ i + h = 1 m w h i × a h , i = 1 , 2 , , q
where a h denotes the input of the hth neuron in the input layer; b i denotes the output of the ith neuron in the hidden layer; w h i denotes the weight of the connection between the hth neuron in the input layer and the ith neuron in the hidden layer; γ i denotes the deviation of the input layer from the hidden layer; and f b denotes the activation function of the hidden layer.
From the hidden layer to the output layer,
y j = f y ω j + i = 1 q v i j × b i , j = 1 , 2 , , l
where y j denotes the input of the jth neuron in the output layer; v i j denotes the connection weight of the ith neuron in the hidden layer to the jth neuron in the output layer; ω j denotes the deviation of the hidden layer from the output layer; and f y denotes the activation function of the output layer.

3. Price Prediction Based on Combination Method

Section 2 established the logical framework of the combination method for CSI 300 index price prediction, laying the foundation for this section’s research. Building upon Section 2, CSI 300 index price forecasting was investigated based on the combination method. Within the logical framework outlined in Section 2, prediction performance was optimized in this study through decomposition–clustering methods, error adjustment methods, and weighted integration methods. Additionally, to enable comparative analysis with the above approaches, price prediction performance was analyzed without employing the combination method, as described in Section 3.1. This analysis also reveals the influence of spot price information implied by different options contracts on CSI 300 price prediction accuracy, establishing a comparative baseline for model performance in subsequent research. The dataset employed in this research was obtained from the Wind Database, spanning all trading days from 1 January 2020 to 31 December 2023, yielding 970 daily observations. During the research, the training set and testing set were divided in a ratio of 7:3. We used data from day t − 1 to predict the closing price of the CSI 300 index for day t.

3.1. Price Prediction Based on the Original Time Series with Options Market Information

This study proposes a combination method that incorporates spot price information implied by the options market to forecast the CSI 300 index price. To achieve this, it is essential to first extract the spot price information implied by the CSI 300 index options market. Based on the aforementioned options pricing formula, the spot price implied by the current options market price can be derived. During the research, the spot price information implied by the options market was categorized into five categories based on the options’ moneyness: information from at-the-money (ATM) call options, in-the-money (ITM) call options with the maximum trading volume, in-the-money (ITM) call options with the maximum open interest, out-of-the-money (OTM) call options with the maximum trading volume, and out-of-the-money call (OTM) options with the maximum open interest.
When incorporating options market information for predicting the closing price, the input data include the previous day’s closing price of the CSI 300 index and the spot price information implied by the corresponding options. Figure 2 shows the predicted price movements and prediction errors of the test set for the CSI 300 closing price prediction.
Figure 2 provides an intuitive visual representation of the CSI 300 index price prediction results. It illustrates the trend variations between predicted and actual values as well as the fluctuation range of errors. One can visually assess the prediction performance by examining the consistency of trends and the magnitude and concentration of error ranges. In Figure 2, the vertical axis denotes the price level (or prediction error) of the CSI 300 index, which is a dimensionless numerical value typically measured in “points” in practice. The x-axis represents time, indicating time indices, which correspond sequentially to the sample points in the test set.
Figure 2a–j visually demonstrate the prediction effects after incorporating spot price information implied by different options contracts. The options corresponding to Figure 2a–j are the CSI 300 stock index ATM call options, ITM call options with the maximum trading volume, ITM call options with the maximum open interest, OTM call options with the maximum trading volume, and OTM call options with the maximum open interest, respectively. As can be seen from Figure 2, the overall forecasting performance for the CSI 300 index closing price is good, with predicted and actual values exhibiting broadly consistent trends. Furthermore, the error fluctuation range in the test set is primarily clustered within the interval of −50 to 50.
The effectiveness of price prediction after the introduction of options market information is evaluated as shown in Table 1. The CSI 300 index closing price prediction results based solely on spot market information are provided to compare the importance of introducing options information, as shown in Table 1. When making predictions with just spot market data, we used the closing price from the previous day (t − 1) as input. As can be seen from Table 1, the MSE value for price prediction using solely spot market information is 1188.5568. However, when incorporating options-market-implied spot price information, the prediction accuracy improves across all options contract types, with MSE values ranging between [1067.9506, 1170.6870]. Notably, the best prediction performance is achieved by introducing information from in-the-money options with the maximum trading volume, resulting in the smallest MSE value. Compared to the MSE obtained using only spot market information, the difference amounts to 120.6062. In the options market, investors can take long positions or short positions and trade on margin. Thus, when investors anticipate a rise in the spot price, they can buy call options in the options market through margin trading. The options market price at this time implies investors’ views on the spot market trend. Therefore, it is necessary to incorporate the spot price information implied by the options market into the prediction of the CSI 300 index closing price.

3.2. Optimization of Prediction Effect Based on Decomposition–Clustering Method

3.2.1. Decomposition and Clustering of Time Series

As described in Section 2, CEEMDAN was applied in this study to decompose the original sequences (CSI 300 index closing prices and spot price sequences implied by the five types of options contracts) into multiple subsequences, as shown in Figure 3.
Figure 3 specifically illustrates the decomposition results of the CSI 300 index closing prices. Figure 3 demonstrates that the decomposition yields 10 subsequences: IMF1 to IMF9 and a residual sequence. From Figure 3, it can be observed that IMF1 exhibits the highest frequency, which decreases sequentially from IMF1 to IMF9, while the residual sequence reflects the average trend. From high to low frequency, the subsequences demonstrate decreasing volatility and increasing trend dominance, enhancing their predictability. Accurate predictions of each frequency-specific subseries were aggregated to reconstruct the original series’ forecast, with CEEMDAN-based denoising critically improving low- and medium-frequency predictions, thereby boosting overall accuracy.
Modeling price predictions would significantly increase computational complexity separately for all 10 subsequences after decomposing the original sequence of the CSI 300 index into multiple subsequences via CEEMDAN. To address this, the 10 subsequences were categorized into distinct classification dimensions for predictive modeling. Since each CEEMDAN-derived subsequence exhibited varying complexity levels, sample entropy—an effective tool for measuring time series complexity—was employed to conduct a means-based clustering analysis [30]. The subsequences were classified into three frequency-based clusters (high-, medium-, and low-frequency components). Subsequences within each cluster were subsequently summed to generate three aggregated time series. This process establishes a foundation for subsequent subsequence-based price prediction. We first calculated the sample entropy of each subsequence corresponding to the original sequence of the CSI 300 index, as shown in Table 2. Table 2 also lists the sample entropies of the subsequences derived from CEEMDAN-processed spot prices implied by the aforementioned five types of options contracts.
As can be seen from Table 2, the sample entropy of each subsequence obtained via the CEEMDAN method is different, which reflects the different complexity of each subsequence. The sample entropies of the first three subsequences (IMF1-IMF3) all exceed 1.8 and are significantly higher than those of the other subsequences. The sample entropy of subsequence IMF4 is around 1, while the remaining subsequences exhibit notably smaller sample entropy values, all below 0.6. Based on the sample entropy of each subsequence, the K-means clustering algorithm was used to classify these subsequences into three categories: high-frequency, medium-frequency, and low-frequency, as shown in Table 3.
According to Table 3, the high-frequency category corresponds to subsequences IMF1-IMF3, the medium-frequency category corresponds to subsequence IMF4, and the low-frequency category corresponds to subsequences IMF5-IMF9 and the residual subsequence.

3.2.2. Optimization Results and Comparative Analysis Based on the Decomposition–Clustering Method

As described in Section 3.2.1, the time series of the options and spot markets were decomposed using CEEMDAN. Based on the sample entropy of the time series, the K-means clustering method was applied to classify the decomposed subsequences into three frequency categories: high-frequency, medium-frequency, and low-frequency. By aggregating the subsequences within each category at corresponding time points, three composite time series (high-frequency, medium-frequency, and low-frequency) were derived, as formulated in Formula (6).
X ˜ i , t φ = X j , t φ Φ i , t φ X j , t φ
where the variable φ classifies the time series φ = s p for the spot market and φ = o p , with values 1–5 for ATM, ITM (maximum trading volume), ITM (maximum open interest), OTM (maximum trading volume), and OTM (maximum open interest) options, as defined in Table 3; i = H,M,L correspond to the high-frequency, medium-frequency, and low-frequency categories, respectively; X j , t φ denotes the subsequences obtained by applying CEEMDAN to decompose the time series corresponding to variable φ ; Φ i φ represents the set of subsequences within a frequency-specific category corresponding to variable φ ; and X ˜ i , t φ indicates the aggregated time series derived from summing all subsequences in the respective frequency category associated with variable φ .
Further, based on the aggregated time series of the corresponding frequency, the values at day t − 1 are used to predict the values at day t for each frequency component. Specifically, the model uses the aggregated time series values of the CSI 300 index at a certain frequency from day t − 1, combined with the high-, medium-, and low-frequency aggregated time series values of a specific options category from the same day t − 1. These four input features were fed into a BP neural network to predict the aggregated time series values of the CSI 300 index at the corresponding frequency for day t, as formulated in Formulas (7)–(9).
y ^ H , t o p = B P ( X ˜ H , t 1 s p , X ˜ H , t 1 o p , X ˜ M , t 1 o p , X ˜ L , t 1 o p )
y ^ M , t o p = B P ( X ˜ M , t 1 s p , X ˜ H , t 1 o p , X ˜ M , t 1 o p , X ˜ L , t 1 o p )
y ^ L , t o p = B P ( X ˜ L , t 1 s p , X ˜ H , t 1 o p , X ˜ M , t 1 o p , X ˜ L , t 1 o p )
where B P ( ) represents price prediction based on the BP neural network; and y ^ H , t o p , y ^ M , t o p , and y ^ L , t o p represent the predicted values at day t for the high-frequency, medium-frequency, and low-frequency aggregated time series of the CSI 300 index corresponding to variable o p , respectively. The final prediction y ^ t o p of the CSI 300 index closing price is obtained by summing the predicted values from the high-frequency ( y ^ H , t o p ), medium-frequency ( y ^ M , t o p ), and low-frequency ( y ^ L , t o p ) aggregated series, as expressed in Formula (10).
y ^ t o p = y ^ H , t o p + y ^ M , t o p + y ^ L , t o p
The predicted closing price y ^ t s p is expressed by Formula (11) when predicting the CSI 300 index closing price based solely on spot market information.
y ^ t s p = B P ( X ˜ H , t 1 s p ) + B P ( X ˜ M , t 1 s p ) + B P ( X ˜ L , t 1 s p )
The predicted price trends and prediction errors of the test set for the CSI 300 closing price prediction after optimizing the price prediction effect by incorporating options market information and applying time series decomposition and clustering techniques according to Formulas (6)–(11) are shown in Figure 4. Prediction performance optimization was achieved through the integration of the time series decomposition–clustering method with options market information.
Figure 4a–j represent the trend comparison and prediction error situation of the optimized forecast result based on the decomposition–clustering method incorporating the implied spot price information of ATM call options, ITM call options with the maximum trading volume, ITM call options with the maximum open interest, OTM call options with the maximum trading volume, and OTM call options with the maximum open interest, respectively. As can be seen from Figure 4, the overall trends of the predicted and real values of the test set are similar, and the error fluctuation ranges are similar.
The effect of price prediction of the CSI 300 index based on the decomposition–clustering method is shown in Table 4.
As shown in Table 4, compared to Table 1, the prediction errors of the test set are reduced after optimization using the decomposition–clustering method. The MSE is smallest when using in-the-money call options information with the highest trading volume, decreasing by 343.9045 compared to similar options contracts. The decomposition–clustering method proves feasible and effective. Additionally, further improvements in prediction accuracy are observed when introducing options contract information alongside the decomposition–clustering method, compared to predictions relying solely on spot market information. Specifically, the MSE for predictions using in-the-money call options information with the maximum trading volume is 724.0461, with a reduction of 84.4749. This once again illustrates that the introduction of options information enhances the effectiveness of price forecasting. As previously discussed, the trading mechanisms in options markets, including margin trading and long/short position capabilities, allow investors to conveniently execute trades based on their price expectations for the underlying spot market. This trading activity drives price fluctuations in options, which in turn embed market expectations about future spot prices. This reflects the guiding role of derivative products in spot prices. It is necessary to incorporate such information into price prediction frameworks. According to the MSE, the best price forecasting effect is obtained by introducing in-the-money call options with the maximum trading volume.

3.3. Optimization of Prediction Effect Based on Error Adjustment Method

3.3.1. Decomposition and Clustering of Prediction Error Information

Discrepancies between predicted and actual values are inevitable in financial forecasting. This section presents an approach that leverages error adjustment to enhance prediction accuracy, building upon the decomposition–clustering framework discussed in Section 3.2. In this section, the error is defined as the difference between the predicted CSI 300 stock index price, which was optimized through the decomposition–clustering method discussed in Section 3.2, and the actual price ( y ˙ ), as shown in Formula (12).
ε t β = y ˙ y ^ t β
where β = s p or o p ; when β = s p , ε t s p represents the error from price prediction based solely on spot market information; when β = o p , ε t o p represents the error obtained after incorporating options market information for price prediction.
As described in Section 3.2, the error sequence ε t β was decomposed using CEEMDAN to obtain several subsequences corresponding to β , and the sample entropy of each subsequence was calculated, as shown in Figure 5 and Table 5. Figure 5 shows the subsequences obtained after processing the error with CEEMDAN when β = s p . That is, this case corresponds to the error obtained by decomposing and clustering only spot market information for price forecasting. It can be seen from Figure 5 that the frequencies of all subsequences vary over time, with the frequency of IMF1 being the largest and the frequency decreasing sequentially from IMF1 to IMF9, while the residual subsequence reflects the average trend. Furthermore, the sample entropies of the subsequences obtained by using CEEMDAN to decompose the aforementioned errors were calculated, as shown in Table 5.
As can be seen from Table 5, the sample entropies of the subsequences obtained via CEEMDAN for the prediction errors, which were optimized using the decomposition–clustering method, vary. This reflects differing complexity levels among the subsequences. The sample entropy of subsequence IMF1 is markedly higher than that of the other subsequences, at approximately 1.5. In contrast, the sample entropy values of subsequences IMF2 to IMF6 are relatively close, falling within the range of (0.4, 0.85), while the sample entropy values of the remaining subsequences are significantly lower, all below 0.2.
The K-means clustering algorithm was used to classify these subsequences into three categories based on the sample entropy of each subsequence: high-frequency, medium-frequency, and low-frequency, as described in Section 3.2.1 and shown in Table 6. As can be seen from Table 6, the high-frequency category corresponds to subsequence IMF1, the medium-frequency category corresponds to subsequences IMF2-IMF6, and the low-frequency category corresponds to subsequences IIMF7-IMF9 and the residual subsequence.

3.3.2. Optimization Results and Comparative Analysis Based on the Error Adjustment Method

Based on the decomposition–clustering method for time series described in Section 3.2 and referring to Table 6, the aggregated time series of the error sequence ε t β in high-frequency, medium-frequency, and low-frequency dimensions can be obtained and are denoted as ε ˜ H β , ε ˜ M β , ε ˜ L β , respectively. In this study, based on the aggregated time series of the error in corresponding frequencies, the value on day t-1 was used to predict the value on day t for each frequency based on the BP neural network. Subsequently, the predicted values of the error’s aggregated time series across the three frequencies were summed to obtain the predicted error on day t. On this basis, this predicted error was added to the CSI 300 price prediction value optimized using the decomposition–clustering method in Section 3.2, resulting in the CSI 300 price prediction optimized with the error adjustment method. This process is represented by Formulas (13) to (15).
ε ^ i , t β = B P ε ˜ i , t 1 β
ε ^ t β = ε ^ H , t β + ε ^ M , t β + ε ^ L , t β
Y ^ t β = y ^ t β + ε ^ t β
where i = H, M, or L represents high-frequency, medium-frequency, and low-frequency, respectively, as previously mentioned; ε ^ i , t β represents the predicted value of the aggregated time series of errors at the corresponding frequency at time t; ε ^ t β represents the predicted value of error ε t β ; and Y ^ t β represents the optimized prediction result based on the error adjustment method.
As described above, based on Equations (13) to (15), the CSI 300 stock index price prediction results from Section 3.2 were revised based on the error adjustment method, yielding further optimized results, as shown in Figure 6. Figure 6 solely displays the price prediction results incorporating options information; for comparative purposes, the prediction performance based solely on spot market information is presented in Table 7.
Figure 6 visualizes the optimized CSI 300 index price prediction performance after incorporating options information and applying the error adjustment method. Figure 6a–j correspond to ATM, ITM (maximum trading volume), ITM (maximum open interest), OTM (maximum trading volume), and OTM (maximum open interest) options, respectively, as defined in Section 3.2. As observed in Figure 6, the test set predictions for the CSI 300 closing price demonstrate overall strong performance, with the predicted values closely aligning with actual trends and exhibiting narrow error fluctuations.
Table 7 displays the prediction results of the CSI 300 index price optimized with the error adjustment method incorporating options market information in the test set. As shown in Table 7, compared with Table 4, the error adjustment method achieves further optimization. In various scenarios, compared with predictions optimized solely with decomposition–clustering methods, the prediction errors across all categories in the test set are reduced after additional error adjustment optimization. Specifically, the prediction incorporating market information of out-of-the-money call options with maximum open interest attains the minimum mean squared error (MSE) of 399.7328 after error adjustment optimization. The MSE reduction reaches 334.9825, demonstrating enhanced prediction accuracy through error adjustment methods compared with the prediction results in Table 4 (with MSE of 734.7153) for the method that solely employed decomposition–clustering optimization using similar options contract information.
Focusing on error-adjusted results in Table 7, the prediction based solely on spot market information yields the maximum MSE of 454.7686. After incorporating options market information, the MSE values corresponding to different types of options contracts all decrease, with the minimum MSE achieved by introducing OTM call options with maximum open interest, showing an MSE reduction of 55.0358 compared to spot-only predictions. Furthermore, Table 7 reveals that predictions incorporating OTM call options with maximum trading volume exhibit a higher MSE of 427.7069, representing an MSE reduction of 27.0617 compared to spot-only predictions. Consistent with Table 4, the results in Table 7 reaffirm that incorporating options market information effectively reduces prediction errors and enhances price forecasting performance.

3.4. Optimization of Prediction Effects Based on the Weighted Integration Method

As described in Section 3.3, an optimized study on CSI 300 index price prediction was conducted based on the error adjustment method, yielding multiple same-day predicted values derived from options market information. Weighted integration was applied to the predictions based on options market information to integrate these multiple price predictions and reflect the role of options market information in the prediction process and to further investigate whether this integration method enhances prediction results. The primary task addressed here is determining how to assign weights to each predicted value of the CSI 300 index at the same time point, thereby generating a weighted integrated prediction. Prediction errors (i.e., differences between predicted and actual values) vary across individual predictions. Intuitively, larger absolute prediction errors indicate poorer performance, warranting smaller weights. For clarity, weights were assigned by taking the reciprocal of the absolute error for each prediction and then computing the proportion of each reciprocal to the sum of all reciprocals across predictions. This weighting method is formalized in Equation (16).
w t o p = e ^ t o p 1 o p e ^ t o p 1
where o p represents different options contracts, assigned integer values from 1 to 5, corresponding to ATM, ITM (maximum trading volume), ITM (maximum open interest), OTM (maximum trading volume), and OTM (maximum open interest) options, as defined in Section 3.2; furthermore, e ^ t o p = y ˙ Y ^ t o p represents the prediction error of the CSI 300 index price corresponding to the options contract o p .
The aforementioned process introduces a new problem in price prediction: the true closing price for day t + 1 remains unknown when forecasting the CSI 300 index price for day t + 1 using day t’s spot and options market information. Consequently, the true prediction error for day t + 1 cannot be determined, making it impossible to calculate weights for day t + 1 or derive its weighted aggregated prediction. To address this, weights were assigned to the next day’s predictions using historical weights, as formalized in Equation (17). The weighted integration method used in this study involved assigning weights to each trading day of the test set based on historical weights, and the weight calculation formula is shown in Formula (18).
W t + 1 o p = t = 1 T t w t o p T t
Y ^ t + 1 W = Y ^ t + 1 o p × W t + 1 o p
where W t + 1 o p denotes the weight for the prediction corresponding to options contract o p on day t + 1, calculated based on historical weights; T t represents the number of trading days included in the test set before day t + 1; Y ^ t + 1 o p represents the optimized prediction result after error adjustment corresponding to options contract o p ; and Y ^ t + 1 W represents the CSI 300 index price prediction optimized through the weighted integration method.
The market information of the at-the-money call options, in-the-money call options with the maximum volume, in-the-money call options with the maximum position, out-of-the-money call options with the maximum volume, and out-of-the-money call options with the maximum position were used in this study to make price predictions. The weighted integration method of price prediction based on the five kinds of options market information is proposed in order to reflect the five different kinds of options market information comprehensively.
The weighted integration method was applied based on the prediction results optimized using the error adjustment method in Section 3.3 to derive the optimized closing price predictions for the CSI 300 index, as illustrated in Figure 7. Figure 7 displays the predicted price trends and prediction errors in the test set for the CSI 300 index price forecasts, shown in Figure 7a,b, respectively. As shown in Figure 7a, the optimized predictions using the weighted integration method exhibit favorable overall performance in the test set, with predicted values aligning closely with the actual trend. From Figure 7b, the prediction errors in the test set demonstrate relatively small fluctuations, predominantly concentrated within the range of (−20, 20).
Table 8 further demonstrates the CSI 300 index price prediction performance optimized with the weighted integration method using the mean absolute error (MAE), mean squared error (MSE), and mean absolute percentage error (MAPE) metrics. As shown in Table 8, the MAE is 15.5847, the MSE is 383.0989, and the MAPE is 0.0041. Compared with the price prediction results in Section 3.1, Section 3.2 and Section 3.3, all evaluation metrics exhibit reductions. The evaluation metrics of the prediction results optimized with the weighted integration method are smaller than the optimal prediction results optimized using the error adjustment method in Table 6 (with corresponding MSE, MAE, and MAPE of 399.7328, 16.1156, and 0.0042, respectively). This demonstrates that the further introduction of the weighted integration method reduces values of prediction evaluation metrics across categories and enhances the CSI 300 index price forecasting performance.
Furthermore, other researchers have employed various models for CSI 300 index price forecasting. For instance, Chen and Yang et al. [31] utilized multiple machine learning models, including PSO-SVR, PSO-LSTM, and PSO-SVR-GRNN, for CSI 300 index price prediction, achieving a prediction error range of [12,197.00, 21,744.65]. Similarly, Wan et al. [32] applied deep neural networks to forecast CSI 300 index prices, obtaining an MSE range of [1204.3884, 1981.9503]. Lin et al. [33] employed CEEMDAN combined with a BP neural network for CSI 300 index price prediction, reporting a prediction error of 715.0465. In contrast, our proposed combination forecasting framework achieves an MSE of 383.0989 and a better prediction effect.
The forecasting results presented in Table 8 are derived from the weighted integration outcomes of the combination method embedding options market information proposed in this study. Within the logical framework of this combination methodology, the predictive performance progressively improves as the sequential implementation of corresponding procedural steps advances. The attainment of enhanced forecasting accuracy holds significant practical implications for investment practices, as precise price predictions can provide investors with more informed benchmarks for executing entry and exit decisions. A basic timing buy signal can be established when the predicted closing price exceeds the corresponding day’s opening price, and this difference surpasses the prediction error of the price forecasting model. Improved prediction accuracy enables the generation of more precise and frequent buy signals, thereby reducing opportunity costs arising from holding cash positions. Developing more refined trading strategies is a promising direction for future research.

4. Summary

This study proposed a forecasting model for the CSI 300 stock index price. The research was structured around three key aspects. First, we analyzed how the introduction of options market data enhances the prediction accuracy of the CSI 300 stock index price. The results showed that incorporating options market information improves prediction performance compared to models relying solely on spot market data. Second, we examined the impact of different options contract information based on moneyness. The findings indicate that introducing various options contracts leads to differing degrees of improvement in prediction accuracy, with corresponding reductions in MSE values. Notably, prior to optimization, the best prediction performance was achieved by incorporating in-the-money call options with the max trading volumes. Finally, within the framework of the combination method, we explored the effectiveness of integrating decomposition–clustering, error adjustment, and price-weighted integration techniques. The results demonstrated that these methods progressively enhance the prediction accuracy of the CSI 300 stock index price, underscoring the suitability of the combination method for optimizing price forecasts. In conclusion, this study proposed an innovative combination method for CSI 300 index price forecasting that incorporates options market information. Within this predictive framework, the method integrates weighted predictions from different options contracts to generate the final forecast for the CSI 300 index. The methodology not only provides an effective complement to existing time series forecasting techniques but also offers a novel research perspective for asset price prediction in spot markets.
This study has certain limitations and can be extended in the following directions. First, textual information can be incorporated. The current research does not consider textual data. In the era of rapidly advancing information technology, vast amounts of online textual content—such as news articles, social media posts, and financial reports—reflect investor sentiments and may predict or influence stock price movements. Extracting and integrating such textual data into price prediction models could enhance forecasting accuracy. Second, diverse machine learning models can be explored. While this study proposes a hybrid framework for stock index price forecasting, the role of machine learning (e.g., BP neural networks) is limited to providing one component of the combined prediction. Specifically, it is utilized as a component within the combination framework to provide a tool for price prediction. Future work could replace the BP neural network with other advanced models—such as SVMs, LSTMs, CNNs, RNNs, GRUs, transformers, or other deep learning architectures—to further optimize the hybrid framework. Third, future market data can be integrated. Although options market-implied price information was leveraged in this study, futures—another critical derivative product—remain unexplored. Incorporating futures market data could provide complementary signals. Fourth, anomaly processing can be considered. Financial data often contain anomalies, missing values, and biases, which can significantly affect model performance. When models inadequately address these issues, prediction errors may increase. Our current framework does not specifically address anomaly treatment, potentially contributing to prediction errors. To further enhance model performance, future research should integrate anomaly detection and processing mechanisms within the forecasting framework to improve accuracy and practical applicability. Finally, timing strategies based on price predictions can be developed. While the focus of this research is on price forecasting, future work could design timing strategies to translate accurate price predictions into actionable trading strategies (e.g., optimizing entry/exit points), thus maximizing returns.

Author Contributions

Conceptualization, Y.H. and X.S.; methodology, Y.H., X.S., Q.Z. and W.Z.; software, X.S. and W.Z.; formal analysis, Y.H., X.S., Q.Z. and W.Z.; investigation, Y.H., X.S., Q.Z. and W.Z.; writing—original draft preparation, X.S. and W.Z.; writing—review and editing, X.S.; supervision, Y.H. and X.S.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Syst. Appl. 2015, 42, 259–268. [Google Scholar] [CrossRef]
  2. You, S.D.; Liu, C.H.; Chen, W.K. Comparative study of singing voice detection based on deep neural networks and ensemble learning. Hum.-Centric Comput. Inf. Sci. 2018, 8, 34. [Google Scholar] [CrossRef]
  3. Cao, H.; Lin, T.; Li, Y.; Zhang, H. Stock price pattern prediction based on complex network and machine learning. Complexity 2019, 2019, 4132485. [Google Scholar] [CrossRef]
  4. Parray, I.R.; Khurana, S.S.; Kumar, M.; Altalbe, A.A. Time series data analysis of stock price movement using machine learning techniques. Soft Comput. 2020, 24, 16509–16517. [Google Scholar] [CrossRef]
  5. Chen, W.; Zhang, H.; Mehlawat, M.K.; Jia, L. Mean–variance portfolio optimization using machine learning-based stock price prediction. Appl. Soft Comput. 2021, 100, 106943. [Google Scholar] [CrossRef]
  6. Zhang, D.; Lou, S. The application research of neural network and BP algorithm in stock price pattern classification and prediction. Future Gener. Comput. Syst. 2021, 115, 872–879. [Google Scholar] [CrossRef]
  7. Dezhkam, A.; Manzuri, M.T. Forecasting stock market for an efficient portfolio by combining XGBoost and Hilbert–Huang transform. Eng. Appl. Artif. Intell. 2023, 118, 105626. [Google Scholar] [CrossRef]
  8. Cheng, C.H.; Wei, L.Y. A novel time-series model based on empirical mode decomposition for forecasting TAIEX. Econ. Model. 2014, 36, 136–141. [Google Scholar] [CrossRef]
  9. Li, Q.; Bao, L. Enhanced index tracking with multiple time-scale analysis. Econ. Model. 2014, 39, 282–292. [Google Scholar] [CrossRef]
  10. Tang, L.; Lin, Q. Stock selection based on a hybrid quantitative method. Open J. Stat. 2016, 6, 346–362. [Google Scholar] [CrossRef]
  11. Liu, S.; Zhang, C.; Ma, J. CNN-LSTM neural network model for quantitative strategy analysis in stock markets. In Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, 14–18 November 2017; Proceedings, Part II 24; Springer International Publishing: Cham, Switzerland; pp. 198–206.
  12. Wei, Y.; Chaudhary, V. TST: An Effective Approach to Extract Trend Feature in Stock Time Series. In Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 19–22 September 2018; IEEE: Piscataway, NJ, USA; pp. 120–125. [Google Scholar]
  13. Lin, Y.; Lin, Z.; Liao, Y.; Li, Y.; Xu, J.; Yan, Y. Forecasting the realized volatility of stock price index: A hybrid model integrating CEEMDAN and LSTM. Expert Syst. Appl. 2022, 206, 117736. [Google Scholar] [CrossRef]
  14. Wu, J.; Dong, J.; Wang, Z.; Hu, Y.; Dou, W. A novel hybrid model based on deep learning and error correction for crude oil futures prices forecast. Resour. Policy 2023, 83, 103602. [Google Scholar] [CrossRef]
  15. Wei, J.; Han, L.Y. Risk transmission between stock index futures and options markets—The evidences from Hangseng index derivative markets. Appl. Stat. Manag. 2014, 6, 1132–1140. [Google Scholar]
  16. Wu, G.W. The impact of stock index ETF options on the volatility of China stock market—An empirical analysis based on the high frequency data of SSE 50ETF options. China Econ. Trade Her. 2015, 14, 37–38. [Google Scholar]
  17. Wang, S.S.; Xu, T.T.; Wang, J.B.; Yu, Z. Comparative Analysis of Price Discovery in Three SSE 50 index Markets:the index Futures, ETF and ETF Options Markets. Oper. Res. Manag. Sci. 2017, 9, 127–136. [Google Scholar]
  18. Cui, H.; Fei, J.; Lu, X. Can the Implied Information of Options Predict the Liquidity of Stock Market? A Data-Driven Research Based on SSE 50ETF Options. J. Math. 2021, 2021, 9059213. [Google Scholar] [CrossRef]
  19. Tao, L.B.; Zou, Y.; Pan, W.B. Determinants of Price Discovery in Options Market—Empirical Research Based on High Frequency Data of SSE 50 ETF. Rev. Investig. Stud. 2022, 7, 90–105. [Google Scholar]
  20. Wang, X.H.; Wu, Y.L. Study on the correlation between CSI 300 options and spot market prices. Secur. Futures China 2022, 2, 32–40. [Google Scholar] [CrossRef]
  21. Ahn, K.; Bi, Y.; Sohn, S. Price discovery among SSE 50 index-based spot, futures, and options markets. J. Futures Mark. 2019, 39, 238–259. [Google Scholar] [CrossRef]
  22. Goncalves-Pinto, L.; Grundy, B.D.; Hameed, A.; van der Heijden, T.; Zhu, Y. Why do option prices predict stock returns? The role of price pressure in the stock market. Manag. Sci. 2020, 66, 3903–3926. [Google Scholar] [CrossRef]
  23. Patel, V.; Putniņš, T.J.; Michayluk, D.; Foley, S. Price discovery in stock and options markets. J. Financ. Mark. 2020, 47, 100524. [Google Scholar] [CrossRef]
  24. Wang, X.D. Pricing research on CSI 300 stock index options based on B-S model and GARCH model. Financ. Eng. Risk Manag. 2023, 6, 48–55. [Google Scholar]
  25. Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; IEEE: Piscataway, NJ, USA; pp. 4144–4147. [Google Scholar]
  26. Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
  27. Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
  28. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  29. Zhu, C.; Ma, X.; Zhang, C.; Ding, W.; Zhan, J. Information granules-based long-term forecasting of time series via BPNN under three-way decision framework. Inf. Sci. 2023, 634, 696–715. [Google Scholar] [CrossRef]
  30. Zhang, Y.; Pan, G.; Chen, B.; Han, J.; Zhao, Y.; Zhang, C. Short-term wind speed prediction model based on GA-ANN improved by VMD. Renew. Energy 2020, 156, 1373–1388. [Google Scholar] [CrossRef]
  31. Chen, J.; Yang, H. A CSI 300 index Prediction Model Based on PSO-SVR-GRNN Hybrid Method. Mob. Inf. Syst. 2022, 2022, 7419920. [Google Scholar] [CrossRef]
  32. Wan, W.; Xu, Q.; Chen, H.; Chen, Q. Using Deep Learning Neural Networks and Stacking Ensemble Learning to Predict CSI 300 index. In Proceedings of the 2022 9th International Conference on Digital Home (ICDH), Guangzhou, China, 28–30 October 2022; IEEE: Piscataway, NJ, USA; pp. 81–86. [Google Scholar]
  33. Lin, Y.; Yan, Y.; Xu, J.; Liao, Y.; Ma, F. Forecasting stock index price using the CEEMDAN-LSTM model. N. Am. J. Econ. Financ. 2021, 57, 101421. [Google Scholar] [CrossRef]
Figure 1. A logical framework diagram of prediction based on the combination method.
Figure 1. A logical framework diagram of prediction based on the combination method.
Information 16 00328 g001
Figure 2. Price prediction based on options market information. (a,c,e,g,i) show the prediction results of the closing price, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively. (b,d,f,h,j) reveal the prediction errors, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively.
Figure 2. Price prediction based on options market information. (a,c,e,g,i) show the prediction results of the closing price, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively. (b,d,f,h,j) reveal the prediction errors, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively.
Information 16 00328 g002aInformation 16 00328 g002b
Figure 3. Decomposition results of the CSI 300 index closing price based on CEEMDAN.
Figure 3. Decomposition results of the CSI 300 index closing price based on CEEMDAN.
Information 16 00328 g003
Figure 4. Price prediction based on the decomposition–clustering method. (a,c,e,g,i) show the prediction results of the closing price, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively. (b,d,f,h,j) reveal the prediction errors, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively.
Figure 4. Price prediction based on the decomposition–clustering method. (a,c,e,g,i) show the prediction results of the closing price, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively. (b,d,f,h,j) reveal the prediction errors, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively.
Information 16 00328 g004aInformation 16 00328 g004b
Figure 5. Decomposition of prediction errors based on CEEMDAN.
Figure 5. Decomposition of prediction errors based on CEEMDAN.
Information 16 00328 g005
Figure 6. Price prediction incorporating options information based on the error adjustment method. (a,c,e,g,i) show the prediction results of the closing price, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively. (b,d,f,h,j) reveal the prediction errors, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively.
Figure 6. Price prediction incorporating options information based on the error adjustment method. (a,c,e,g,i) show the prediction results of the closing price, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively. (b,d,f,h,j) reveal the prediction errors, corresponding to ATM options, ITM options with the highest trading volume, ITM options with the highest open interest, OTM options with the highest trading volume, and OTM options with the highest open interest, respectively.
Information 16 00328 g006aInformation 16 00328 g006bInformation 16 00328 g006c
Figure 7. Price prediction incorporating options based on the weighted integration method. (a) Prediction results of the closing price; (b) Prediction errors.
Figure 7. Price prediction incorporating options based on the weighted integration method. (a) Prediction results of the closing price; (b) Prediction errors.
Information 16 00328 g007
Table 1. Comparison of prediction effects based on the options and spot market information.
Table 1. Comparison of prediction effects based on the options and spot market information.
InformationMSEMAEMAPE
Spot information (only)1188.556825.96210.0068
ATM options1076.441524.60990.0064
ITM options (trading volume)1067.950624.61030.0064
ITM options (open interest)1145.885825.50430.0066
OTM options (trading volume)1112.861225.05860.0065
OTM options (open interest)1170.687025.88190.0067
Table 2. Sample entropies corresponding to the options and spot markets.
Table 2. Sample entropies corresponding to the options and spot markets.
SubsequenceSpot InformationATM OptionsITM Options
(Trading Volume)
ITM Options
(Open Interest)
OTM Options (Trading Volume)OTM Options (Open Interest)
IMF11.8931.8731.8841.9221.8191.629
IMF21.8511.9121.8871.8861.8961.825
IMF32.1042.0942.0982.1362.1192.076
IMF41.1161.1151.1131.1391.1141.105
IMF50.5810.5930.5930.5830.5980.601
IMF60.4410.4280.4390.4460.4260.45
IMF70.1820.1780.1780.1770.1720.168
IMF80.0690.0650.0650.0660.0660.063
IMF90.0190.0210.0210.0210.0210.021
Residual0.0070.0070.0070.0070.0070.006
Table 3. Clustering results based on sample entropies corresponding to options and spot markets.
Table 3. Clustering results based on sample entropies corresponding to options and spot markets.
FrequencySpot InformationATM OptionsITM Options
(Trading Volume)
ITM Options
(Open Interest)
OTM Options (Trading Volume)OTM Options (Open Interest)
HighIMF1IMF2IMF1IMF2IMF1IMF2IMF1IMF2IMF1IMF2IMF1IMF2
IMF3IMF3IMF3IMF3IMF3IMF3
Medium IMF4IMF4IMF4IMF4IMF4IMF4
LowIMF5IMF6IMF5IMF6IMF5IMF6IMF5IMF6IMF5IMF6IMF5IMF6
IMF7IMF8IMF7IMF8IMF7IMF8IMF7IMF8IMF7IMF8IMF7IMF8
IMF9 residualIMF9 residualIMF9 residualIMF9 residualIMF9 residualIMF9 residual
Table 4. Predictive effects based on the decomposition–clustering method.
Table 4. Predictive effects based on the decomposition–clustering method.
InformationMSEMAEMAPE
Spot information (only)808.521022.19150.0058
ATM options737.759220.77180.0054
ITM options (trading volume)724.046120.76300.0054
ITM options (open interest)761.753621.47610.0056
OTM options (trading volume)788.254121.80410.0057
OTM options (open interest)734.715321.07320.0055
Table 5. Sample entropies corresponding to prediction errors.
Table 5. Sample entropies corresponding to prediction errors.
SubsequenceSpot InformationATM OptionsITM Options
(Trading Volume)
ITM Options
(Open Interest)
OTM Options (Trading Volume)OTM Options (Open Interest)
IMF11.5011.4811.4931.4931.5491.507
IMF20.4440.4280.4010.4010.3550.397
IMF30.7410.7940.8070.8070.8500.818
IMF40.5770.5950.6090.6090.5900.578
IMF50.5790.5880.5860.5860.5850.587
IMF60.3150.4110.3800.3800.4200.455
IMF70.1290.1820.1410.1410.1760.169
IMF80.0480.0650.0610.0610.0950.070
IMF90.0230.0290.0300.0300.0330.024
Residual0.0090.0140.0140.0140.0180.000
Table 6. Clustering results based on sample entropies corresponding to prediction errors.
Table 6. Clustering results based on sample entropies corresponding to prediction errors.
FrequencySpot InformationATM OptionsITM Options
(Trading Volume)
ITM Options
(Open Interest)
OTM Options (Trading Volume)OTM Options (Open Interest)
HighIMF1IMF1IMF1IMF1IMF1IMF1
MediumIMF2IMF3IMF2IMF3IMF2IMF3IMF2IMF3IMF2IMF3IMF2IMF3
IMF4 IMF5IMF6IMF4 IMF5IMF6IMF4 IMF5IMF6IMF4 IMF5IMF6IMF4 IMF5IMF6IMF4 IMF5IMF6
LowIMF7IMF8IMF7IMF8IMF7IMF8IMF7IMF8IMF7IMF8IMF7IMF8
IMF9 residualIMF9 residualIMF9 residualIMF9 residualIMF9 residualIMF9 residual
Table 7. Prediction effects based on the error adjustment method.
Table 7. Prediction effects based on the error adjustment method.
InformationMSEMAEMAPE
Spot information only454.768617.04050.0045
ATM options405.457515.76110.0041
ITM options (trading volume)413.015216.14180.0042
ITM options (open interest)426.135016.43130.0043
OTM options (trading volume)427.706916.50950.0043
OTM options (open interest)399.732816.11560.0042
Table 8. Prediction effect based on the weighted integration method.
Table 8. Prediction effect based on the weighted integration method.
MSEMAEMAPE
383.098915.58470.0041
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, Y.; Sui, X.; Zhang, Q.; Zhang, W. Research on Price Prediction of Stock Price Index Based on Combination Method with Introduction of Options Market Information. Information 2025, 16, 328. https://doi.org/10.3390/info16040328

AMA Style

Hu Y, Sui X, Zhang Q, Zhang W. Research on Price Prediction of Stock Price Index Based on Combination Method with Introduction of Options Market Information. Information. 2025; 16(4):328. https://doi.org/10.3390/info16040328

Chicago/Turabian Style

Hu, Yi, Xin Sui, Qi Zhang, and Wei Zhang. 2025. "Research on Price Prediction of Stock Price Index Based on Combination Method with Introduction of Options Market Information" Information 16, no. 4: 328. https://doi.org/10.3390/info16040328

APA Style

Hu, Y., Sui, X., Zhang, Q., & Zhang, W. (2025). Research on Price Prediction of Stock Price Index Based on Combination Method with Introduction of Options Market Information. Information, 16(4), 328. https://doi.org/10.3390/info16040328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop