# Volatility Forecasting for High-Frequency Financial Data Based on Web Search Index and Deep Learning Model

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Methods and Empirical Procedure

#### 2.1. TCN Model

#### 2.2. Evaluation Criteria

#### 2.3. Empirical Procedures

## 3. Empirical Results

#### 3.1. Realized Volatility

#### 3.2. Construction of Investor Attention Factor

#### 3.2.1. Acquisition of the Baidu Index

#### 3.2.2. Screening of the Baidu Search Index

#### 3.2.3. Synthesis of Investor Attention Factor

#### 3.3. Selection of Parameters

#### 3.3.1. Inputs and Outputs for the Deep Learning Model

#### 3.3.2. Selection of Parameters for TCN Model

#### 3.4. Comparison of Model Prediction Accuracy

## 4. Discussion

#### 4.1. Construct the Attention Factor by Using the Cross-Correlation Coefficient

_{1}and t

_{2}. It may result in future data being used to predict previous data, which is not reasonable.

#### 4.2. Construct the Attention Factor by Using the Correlation Coefficient with RV

#### 4.3. Limitations and Shortcomings

## 5. Conclusions and Extension

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Engle, R.F. Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica
**1982**, 50, 987–1007. [Google Scholar] [CrossRef] - Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom.
**1986**, 31, 307–327. [Google Scholar] [CrossRef][Green Version] - Baillie, R.T.; Bollerslev, T.; Mikkelsen, H.O. Fractionally integrated generalized autoregressive conditional heteroskedasticity. J. Econom.
**1996**, 74, 3–30. [Google Scholar] [CrossRef] - Taylor, S.T. Modeling Financial Time Series; Wiley: New York, NY, USA, 1986. [Google Scholar]
- Andersen, T.; Bollerslev, T. Answering the Sceptics: Yes, Standard Volatility Models Do Provide Accurate Forecasts. Int. Econ. Rev.
**1998**, 39, 885–905. [Google Scholar] [CrossRef] - Corsi, F. A Simple Approximate Long-Memory Model of Realized Volatility. J. Financ. Econom.
**2009**, 7, 174–196. [Google Scholar] [CrossRef] - Andersen, T.; Bollerslev, T.; Diebold, F.; Labys, P. Modeling and Forecasting Realized Volatility. Econometrica
**2003**, 71, 579–625. [Google Scholar] [CrossRef][Green Version] - Jiang, Q.; Tang, C.; Chen, C.; Wang, X.; Huang, Q. Stock Price Forecast Based on LSTM Neural Network. In Proceedings of the Twelfth International Conference on Management Science and Engineering Management; Springer International Publishing: Cham, Switzerland, 2019; pp. 393–408. [Google Scholar]
- Jin, Z.; Yang, Y.; Liu, Y. Stock closing price prediction based on sentiment analysis and LSTM. Neural Comput. Appl.
**2020**, 32, 9713–9729. [Google Scholar] [CrossRef] - Deng, S.M.; Zhang, N.Y.; Zhang, W.; Chen, J.Y.; Pan, J.; Chen, H.J. Knowledge-Driven Stock Trend Prediction and Explanation via Temporal Convolutional Network. In Proceedings of the WWW ‘19 Companion World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019. [Google Scholar]
- Yang, X.; Lv, B.F. Emergency, Investor Attention and Stock Market Volatility: Evidence from Web Search Data. Econ. Manag.
**2014**, 36, 147–158. [Google Scholar] - Chen, Y.; Wang, L.X.; Zhou, Z.M. Research on the impact of investor sentiment on stock market index—Taking the Baidu index and the micro-blog index as the index. Price Theory Pract.
**2017**, 9, 56–59. [Google Scholar] - Zhang, C.; Zhou, T. Shanghai Composite Index Prediction Based on Baidu Index and random forest. Software
**2020**, 41, 56–62. [Google Scholar] - Li, X.; Fan, X. Research on the influence of investors’ attention on Shanghai stock index based on Baidu Index. Sci. Technol. Manag.
**2020**, 22, 85–92. [Google Scholar] - Lecun, Y.; Bottou, L. Gradient-based learning applied to document recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef][Green Version] - Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv
**2018**, arXiv:1803.01271. [Google Scholar] - Chen, W. Comparative study on volatility prediction effect of Shanghai Composite Index Based on deep learning. Stat. Inf. Forum
**2018**, 33, 99–106. [Google Scholar] - Miao, J. The Influence of Investors’ Attention Based on Baidu Index on the Performance of Stock Market. Master’s Dissertation, Xiamen University, Xiamen, China, 2014. [Google Scholar]
- Wang, J. The Impact of Attention on Stock Returns: An Empirical Study of China’s Securities Market. Master’s Dissertation, Shanghai Jiaotong University, Shanghai, China, 2012. [Google Scholar]

Financial Investment | Travel | Car | Mortgage Loan |
---|---|---|---|

Consumption | Commerce | Credit Card | Bankruptcy |

Real Estate | Industry | Advertising | Inflation |

Education | Aviation | Cell Phone | Financial Crisis |

Job | Computer | Small and Medium-sized Enterprises | Luxury |

Insurance | Transaction | Lease | Derivative |

Shopping | E-Commerce |

Order | Coefficient | Order | Coefficient | Order | Coefficient | Order | Coefficient |
---|---|---|---|---|---|---|---|

−20 | 0.173 ** | −9 | 0.131 ** | 1 | 0.143 ** | 11 | 0.112 ** |

−19 | 0.147 ** | −8 | 0.133 ** | 2 | 0.155 ** | 12 | 0.115 ** |

−18 | 0.159 ** | −7 | 0.135 ** | 3 | 0.153 ** | 13 | 0.106 ** |

−17 | 0.159 ** | −6 | 0.132 ** | 4 | 0.146 ** | 14 | 0.108 ** |

−16 | 0.158 ** | −5 | 0.137 ** | 5 | 0.134 ** | 15 | 0.124 ** |

−15 | 0.158 ** | −4 | 0.144 ** | 6 | 0.130 ** | 16 | 0.117 ** |

−14 | 0.149 ** | −3 | 0.151 ** | 7 | 0.136 ** | 17 | 0.138 ** |

−13 | 0.169 ** | −2 | 0.155 ** | 8 | 0.131 ** | 18 | 0.145 ** |

−12 | 0.147 ** | −1 | 0.168 ** | 9 | 0.121 ** | 19 | 0.148 ** |

−11 | 0.133 ** | 0 | 0.168 ** | 10 | 0.119 ** | 20 | 0.146 ** |

−10 | 0.135 ** |

Keywords | Order | Coefficient | Keywords | Order | Coefficient |
---|---|---|---|---|---|

Financial Investment | −2 | 0.235 ** | Consumption | −4 | 0.066 ** |

Aviation | −4 | 0.228 ** | Mortgage Loan | −1 | 0.049 ** |

Credit Card | −7 | 0.196 ** | Computer | −3 | 0.044 ** |

Bankruptcy | −20 | 0.173 ** | Education | −6 | −0.044 ** |

Small and Medium-sized Enterprises | −6 | 0.132 ** | Advertising | −7 | 0.026 ** |

Insurance | −3 | 0.096 ** | Inflation | −2 | 0.022 ** |

Commerce | −3 | 0.088 ** |

Name of Indicators | Explanation |
---|---|

Volume | Daily Transaction Volume |

Bias | (Closing price of the day − five-day average price)/five-day average price |

CDP | (The highest price of the previous day + The lowest price of the previous day + 2 * Closing price of the previous day)/4 |

DMA | Five-day moving average − 10-day moving average |

AR | (Closing price − Opening price)/(Opening price − The lowest price) * 100 |

BR | (The highest price − Closing price)/(Closing price − The lowest price) * 100 |

pctChg | Range of Rise and Down |

night | Opening price − Closing price of the previous day |

RV_V | Volatility of intraday trading volume |

BaiDu | Investor attention factor synthesized by Baidu search index |

RV | Realized volatility |

volume | money | bias | DMA | CDP | AR | BR | pctChg | night | RV_V | Baidu | RV | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

volume | 1 | |||||||||||

money | 0.969 ** | 1 | ||||||||||

bias | 0.194 ** | 0.168 ** | 1 | |||||||||

DMA | 0.196 ** | 0.204 ** | 0.324 ** | 1 | ||||||||

CDP | 0.717 ** | 0.802 ** | 0.028 | 0.122 ** | 1 | |||||||

AR | −0.018 | −0.020 | 0.064 ** | −0.009 | −0.031 | 1 | ||||||

BR | 0.006 | 0.006 | −0.115 ** | −0.002 ** | 0.016 | −0.011 | 1 | |||||

pctChg | 0.071 ** | 0.056 ** | 0.674 ** | 0.025 | −0.033 | 0.110 ** | −0.118 ** | 1 | ||||

night | −0.007 | 0.006 | 0.374 ** | 0.095 ** | −0.058 ** | 0.007 | −0.010 | 0.459 ** | 1 | |||

RV_V | −0.010 | −0.006 | −0.124 ** | −0.049 * | 0.034 | −0.004 | 0.077 ** | −0.157 ** | −0.042 * | 1 | ||

Baidu | 0.414 ** | 0.389 ** | 0.049 * | 0.054 ** | 0.441 ** | −0.001 | 0.009 | 0.022 | −0.042 * | 0.033 | 1 | |

RV | 0.519 ** | 0.494 ** | −0.276 ** | −0.243 ** | 0.305 ** | −0.016 | 0.022 | −0.129 ** | −0.163 ** | 0.111 ** | 0.215 ** | 1 |

Parameters | Value | Parameters | Value |
---|---|---|---|

Training set | 68% | Activation | ReLU |

Filters | 11 | Loss function | mse |

Convolution kernel | 2 | Batch size | 84 |

Validation set | 17% | Epoch | 50 |

Testing set | 15% | Metrics | ‘mse,’‘mae,’‘rmse,’‘mape,’‘msle’ |

Optimizer | Adam | Windows Width | 20 days |

TCN + B | LSTM + B | GARCH-T | GARCH-N | FIGARCH-T | FIGARCH-N | ARFIMA | HAR-RV | |
---|---|---|---|---|---|---|---|---|

TCN + B | 1.00 | |||||||

LSTM + B | 0.00 | 1.00 | ||||||

GARCH-T | 0.00 | 0.00 | 1.00 | |||||

GARCH-N | 0.00 | 0.00 | 0.00 | 1.00 | ||||

FIGARCH-T | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | |||

FIGARCH-N | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ||

ARFIMA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | |

HAR-RV | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |

TCN + B | TCN | LSTM + B | GARCH-T | GARCH-N | FIGARCH-T | FIGARCH-N | ARFIMA | HAR-RV | |
---|---|---|---|---|---|---|---|---|---|

MSE | 0.254 | 0.347 | 0.393 | 1.738 | 1.603 | 1.901 | 1.683 | 0.427 | 0.500 |

RMSE | 0.504 | 0.589 | 0.627 | 1.318 | 1.266 | 1.379 | 1.297 | 0.653 | 0.707 |

MAE | 0.331 | 0.364 | 0.378 | 1.077 | 1.030 | 1.086 | 1.045 | 0.326 | 0.399 |

MAPE | 199.648 | 225.211 | 227.969 | 273.419 | 261.390 | 266.111 | 262.117 | 440.965 | 291.732 |

MSLE | 0.045 | 0.052 | 0.053 | 0.314 | 0.296 | 0.315 | 0.301 | 0.081 | 0.070 |

TCN + B | TCN | LSTM + B | GARCH-T | GARCH-N | FIGARCH-T | FIGARCH-N | ARFIMA | HAR-RV | |
---|---|---|---|---|---|---|---|---|---|

MSE | 0.270 | 0.372 | 0.435 | 1.876 | 1.724 | 1.913 | 1.784 | 0.637 | 0.602 |

RMSE | 0.519 | 0.610 | 0.660 | 1.370 | 1.313 | 1.383 | 1.336 | 0.798 | 0.776 |

MAE | 0.325 | 0.411 | 0.366 | 1.118 | 1.066 | 1.092 | 1.083 | 0.468 | 0.492 |

MAPE | 184.693 | 224.922 | 197.097 | 282.495 | 268.879 | 267.997 | 272.226 | 74.360 | 120.588 |

MSLE | 0.070 | 0.077 | 0.098 | 0.331 | 0.311 | 0.318 | 0.318 | 0.147 | 0.099 |

TCN + B | TCN | LSTM + B | GARCH-T | GARCH-N | FIGARCH-T | FIGARCH-N | ARFIMA | HAR-RV | |
---|---|---|---|---|---|---|---|---|---|

MSE | 0.526 | 0.572 | 0.593 | 2.070 | 1.884 | 2.111 | 1.958 | 0.685 | 0.633 |

RMSE | 0.725 | 0.756 | 0.776 | 1.439 | 1.373 | 1.453 | 1.399 | 0.827 | 0.796 |

MAE | 0.465 | 0.478 | 0.500 | 1.183 | 1.123 | 1.153 | 1.143 | 0.574 | 0.458 |

MAPE | 221.514 | 241.448 | 248.170 | 305.385 | 288.811 | 288.172 | 293.086 | 151.824 | 72.207 |

MSLE | 0.090 | 0.099 | 0.101 | 0.366 | 0.341 | 0.349 | 0.349 | 0.129 | 0.144 |

Keywords | Coefficient | Keywords | Coefficient |
---|---|---|---|

Shopping | 0.298 ** | Education | 0.177 ** |

Financial Crisis | 0.280 ** | Bankruptcy | 0.168 ** |

E-Commerce | 0.241 ** | Small and Medium-sized Enterprises | 0.123 ** |

Financial Investment | 0.237 ** | Real Estate | 0.122 ** |

Aviation | 0.202 ** | Travel | 0.122 ** |

Credit Card | 0.190 ** | Cell Phone | 0.104 ** |

**Table 12.**Comparison between attention factor model based on cross-correlation coefficient and other models.

TCN + B | TCN | LSTM + B | GARCH-T | GARCH-N | FIGARCH-T | FIGARCH-N | ARFIMA | HAR-RV | TCN + C | |
---|---|---|---|---|---|---|---|---|---|---|

MSE | 0.302 | 0.347 | 0.393 | 1.738 | 1.603 | 1.901 | 1.683 | 0.427 | 0.500 | 0.308 |

RMSE | 0.549 | 0.589 | 0.627 | 1.318 | 1.266 | 1.379 | 1.297 | 0.653 | 0.707 | 0.555 |

MAE | 0.347 | 0.364 | 0.378 | 1.077 | 1.030 | 1.086 | 1.045 | 0.326 | 0.399 | 0.335 |

MAPE | 209.204 | 225.211 | 227.969 | 273.419 | 261.390 | 266.111 | 262.117 | 440.965 | 291.732 | 218.455 |

MSLE | 0.045 | 0.052 | 0.053 | 0.314 | 0.296 | 0.315 | 0.301 | 0.081 | 0.070 | 0.049 |

Keywords | Order | Coefficient | Keywords | Order | Coefficient |
---|---|---|---|---|---|

Shopping | 5 | 0.333 ** | Bankruptcy | −20 | 0.173 ** |

Financial Crisis | 1 | 0.333 ** | Travel | 7 | 0.135 ** |

E-Commerce | 3 | 0.274 ** | Small and Medium-sized Enterprises | −6 | 0.132 ** |

Financial Investment | −1 | 0.235 ** | Real Estate | 4 | 0.127 ** |

Aviation | −4 | 0.228 ** | Cell Phone | 2 | 0.107 ** |

Credit Card | −7 | 0.196 ** | Car | 1 | 0.102 ** |

**Table 14.**The error comparison between the TCN model with attention factor based on the RV correlation coefficient and other models.

TCN + B | TCN | LSTM + B | GARCH-T | GARCH-N | FIGARCH-T | FIGARCH-N | ARFIMA | HAR-RV | TCN + N | |
---|---|---|---|---|---|---|---|---|---|---|

RMSE | 0.549 | 0.589 | 0.627 | 1.318 | 1.266 | 1.379 | 1.297 | 0.653 | 0.707 | 0.605 |

MAE | 0.347 | 0.364 | 0.378 | 1.077 | 1.030 | 1.086 | 1.045 | 0.326 | 0.399 | 0.368 |

MAPE | 209.204 | 225.211 | 227.969 | 273.419 | 261.390 | 266.111 | 262.117 | 440.965 | 291.732 | 211.923 |

MSLE | 0.045 | 0.052 | 0.053 | 0.314 | 0.296 | 0.315 | 0.301 | 0.081 | 0.070 | 0.048 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lei, B.; Zhang, B.; Song, Y. Volatility Forecasting for High-Frequency Financial Data Based on Web Search Index and Deep Learning Model. *Mathematics* **2021**, *9*, 320.
https://doi.org/10.3390/math9040320

**AMA Style**

Lei B, Zhang B, Song Y. Volatility Forecasting for High-Frequency Financial Data Based on Web Search Index and Deep Learning Model. *Mathematics*. 2021; 9(4):320.
https://doi.org/10.3390/math9040320

**Chicago/Turabian Style**

Lei, Bolin, Boyu Zhang, and Yuping Song. 2021. "Volatility Forecasting for High-Frequency Financial Data Based on Web Search Index and Deep Learning Model" *Mathematics* 9, no. 4: 320.
https://doi.org/10.3390/math9040320