# New Dataset for Forecasting Realized Volatility: Is the Tokyo Stock Exchange Co-Location Dataset Helpful for Expansion of the Heterogeneous Autoregressive Model in the Japanese Stock Market?

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Literature Review

#### 2.1. Literature Review of Volatility Forecasting Models

#### 2.2. Literature Review of the Relationship between HFTs and Volatility

## 3. Materials and Methods

#### 3.1. NEEDS Tick Data File Preprocessing

#### 3.1.1. Realized Volatility Calculation

#### 3.1.2. Stock Full-Board Dataset Preprocessing

#### 3.2. TSE Co-Location Dataset Preprocessing

#### 3.3. Dataset Formation

#### 3.4. Methods

- Create subsets of training data with random sampling by bootstrap.
- Train a decision tree for each subset of training data.
- Choose the best split of a variable from only the randomly selected m variables at each node of the tree and derive the split function.
- Repeat steps 1, 2 and 3 to produce d decision trees.
- For test data, make predictions by voting or by averaging the most popular class among all of the output from the d decision trees.

## 4. Experimental Results

#### 4.1. Observation Period

#### 4.2. Experimental Results and Consideration

#### 4.2.1. RV Prediction Accuracy

#### 4.2.2. Analyzing Important Variables

#### 4.2.3. Examination of TSE Co-Location Dataset Importance

## 5. Discussion and Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Alpaydin, Ethem. 2014. Introduction to Machine Learning. London: The MIT Press. [Google Scholar]
- Amaya, Diego, Peter Christoffersen, Kris Jacobs, and Aurelio Vasquez. 2015. Does realized skewness predict the cross-section of equity returns? Journal f Financial Economics 118: 135–67. [Google Scholar] [CrossRef] [Green Version]
- Andersen, Torben G., and Tim Bollerslev. 1998. Answering the skeptics: Yes, standard volatility models do provide accurate forecasts. International Economic Review 39: 885–905. [Google Scholar] [CrossRef]
- Andersen, Torben G., Tim Bollerslev, Francis X. Diebold, and Paul Labys. 2001. The distribution of realized exchange rate volatility. Journal of the American Statistical Association 96: 42–55. [Google Scholar] [CrossRef]
- Andersen, Torben G., Tim Bollerslev, Francis X. Diebold, and Paul Labys. 2003. Modeling and forecasting realized volatility. Econometrica 71: 529–626. [Google Scholar] [CrossRef] [Green Version]
- Andersen, Torben G., Tim Bollerslev, and Francis X. Diebold. 2007. Roughing it up: Including jump components in the measurement, modeling, and forecasting of return volatility. Review of Economics and Statistics 89: 701–20. [Google Scholar] [CrossRef]
- Baillie, Richard T., Tim Bollerslev, and Hans O. Mikkelsen. 1996. Fractionally integrated generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 74: 3–30. [Google Scholar] [CrossRef]
- Baillie, Richard T., Fabio Calonaci, Dooyeon Cho, and Seunghwa Rho. 2019. Long memory, realized volatility and heterogeneous autoregressive models. Journal of Time Series Analysis 40: 609–28. [Google Scholar] [CrossRef]
- Barndorff-Nielsen, Ole E., and Neil Shephard. 2002. Estimating quadratic variation using realized variance. Journal of Applied Econometrics 17: 457–77. [Google Scholar] [CrossRef] [Green Version]
- Bekaert, Geert, and Marie Hoerova. 2014. The VIX, the variance premium, and stock market volatility. Journal of Econometrics 183: 181–92. [Google Scholar] [CrossRef] [Green Version]
- Benos, Evangelos, and Satchit Sagade. 2012. High-Frequency Trading Behaviour and Its Impact on Market Quality: Evidence from the UK Equity Market. BoE Working Paper No. 469. London: Bank of England. [Google Scholar]
- Bollerslev, Tim. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 21: 307–28. [Google Scholar] [CrossRef] [Green Version]
- Bollerslev, Tim, Andrew Patton, and Rogier Quaedvlieg. 2016. Exploiting the errors: A simple approach for improved volatility forecasting. Journal of Econometrics 192: 1–18. [Google Scholar] [CrossRef] [Green Version]
- Breiman, Leo. 2001. Random forests. Machine Learning 45: 5–32. [Google Scholar] [CrossRef] [Green Version]
- Breiman, Leo, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and regression trees. In Monterey: Wadsworth. London: Chapman & Hall. [Google Scholar]
- Caivano, Valeria. 2015. The Impact of High-Frequency Trading on Volatility. Evidence from the Italian Market. CONSOB Working Papers No. 80. Available online: https://ssrn.com/abstract=2573677 (accessed on 4 April 2021).
- Chen, Zhen, Ningning He, Yu Huang, Wen Tao Qin, Xuhan Liu, and Lei Li. 2018. Integration of a deep learning classifier with a random forest approach for predicting malonylation sites. Genomics, Proteomics & Bioinformatics 16: 451–59. [Google Scholar]
- Corsi, Fulvio. 2009. A simple approximate long-memory model of realized volatility. Journal of Financial Econometrics 7: 174–96. [Google Scholar] [CrossRef]
- Croft, W. Bruce, Donald Metzler, and Trevor Strohman. 2010. Search Engines: Information Retrieval in Practice. Boston: Addison-Wesley, p. 310. [Google Scholar]
- Ding, Zhuanxin, Clive W. J. Granger, and Robert F. Engle. 1993. A Long Memory Property of Stock Market Returns and a New Model. Journal of Empirical Finance 1: 83–106. [Google Scholar] [CrossRef]
- Engle, Robert F. 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50: 987–1007. [Google Scholar] [CrossRef]
- Glosten, Lawrence R., Ravi Jagannathan, and David E. Runkle. 1993. On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. Journal of Finance 48: 1779–802. [Google Scholar] [CrossRef]
- Haldane, Andy. 2011. The race to zero. Paper presented at International Economic Association Sixteenth World Congress, Beijing, China, July 4–8. [Google Scholar]
- Hansen, Peter R., and Asger Lunde. 2005. A realized variance for the whole day based on intermittent high-frequency data. Journal of Financial Econometrics 3: 525–54. [Google Scholar] [CrossRef]
- Harvey, Andrew C. 1998. Long Memory in Stochastic Volatility. In Forecasting Volatility in Financial Markets. Edited by Stephen Satchell and John Knight. Amsterdam: Elsevier. [Google Scholar]
- Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2008. The Elements of Statistical Learning. New York: Springer. [Google Scholar]
- Hollstein, Fabian, and Mrcel Prokopczuk. 2018. How aggregate volatility-of-volatility affects stock returns. Review of Asset Pricing Studies 8: 253–92. [Google Scholar] [CrossRef] [Green Version]
- Iwaisako, Tokuo. 2017. Nihon ni okeru kohindo torihiki no genjo ni tsuite (Current Status of High-Frequency Trading in Japan). Japan Securities Dealers Association. Available online: https://www.jsda.or.jp/about/iwaisakoronbun.pdf (accessed on 17 August 2020). (In Japanese).
- Japan Exchange Group Connectivity Services. 2021. Available online: https://www.jpx.co.jp/english/systems/connectivity/index.html (accessed on 24 February 2021).
- Kirilenko, Andrei, Albert S. Kyle, Mehrdad Samadi, and Tugkan Tuzun. 2017. The flash crash: High-frequency trading in an electronic market. The Journal of Finance 72: 967–98. [Google Scholar] [CrossRef] [Green Version]
- Linton, Oliver, and Soheil Mahmoodzadeh. 2018. Implication of high-frequency trading for security markets. Annual Review of Economics 10: 237–59. [Google Scholar] [CrossRef] [Green Version]
- Luong, Chuong, and Nikolai Dokuchaev. 2018. Forecasting of realised volatility with the random forests algorithm. Journal of Risk and Financial Management 11: 61. [Google Scholar] [CrossRef] [Green Version]
- Ma, Yillin, Ruizhu Han, and Xiaoling Fu. 2019. Stock prediction based on random forest and LSTM neural network. Paper presented at 19th International Conference on Control, Automation and Systems (ICCAS), Jeju, Korea, October 15–18; pp. 126–30. [Google Scholar]
- Malceniece, Laura, Kārlis Malcenieks, and Tālis J. Putniņš. 2019. High frequency trading and comovement in financial markets. Journal of Financial Economics 134: 381–99. [Google Scholar] [CrossRef] [Green Version]
- Motegi, Kaiji, Xiaojing Cai, Shigeyuki Hamori, and Heifeng Xu. 2020. Moving average threshold heterogeneous autoregressive (MAT-HAR) models. Journal of Forecasting 39: 1035–42. [Google Scholar] [CrossRef]
- Müller, Ulrich A., M. Michel Dacorogna, Rakhal D. Davé, Richard B. Olsen, Oliver V. Pictet, and Jacob E. von Weizsäcker. 1997. Volatilities of different time resolutions: Analyzing the dynamics of market components. Journal of Empirical Finance 4: 213–39. [Google Scholar] [CrossRef]
- Myers, Benjamin, and Austin Gerig. 2015. Simulating the synchronizing behavior of high-frequency trading in multiple markets. In Financial Econometrics and Empirical Market Microstructure. Cham: Springer, pp. 207–13. [Google Scholar]
- Nelson, Daniel B. 1991. Conditional Heteroskedasticity in Asset Returns: A new approach. Econometrica 59: 347–70. [Google Scholar] [CrossRef]
- NIKKEI Media Marketing NEEDS Tick Data. n.d. Available online: https://www.nikkeimm.co.jp/service/detail/id=.317 (accessed on 24 February 2021).
- Ohlson, James A. 1980. Financial ratios and the probabilistic prediction on bankruptcy. Journal of Accounting Research 18: 109–31. [Google Scholar] [CrossRef] [Green Version]
- Patterson, Josh, and Adam Gibson. 2017. Deep Learning: A Practitioner’s Approach, 1st ed. Newton: O’Reilly Media, Inc., p. 39. [Google Scholar]
- Poon, S. Huang, and Clive W. J. Granger. 2003. Forecasting Volatility in Financial Markets: A Review. Journal of Economic Literature 41: 478–539. [Google Scholar] [CrossRef]
- Qiu, Yue, Xinyu Zhang, Tian Xie, and Shangwei Zhao. 2019. Versatile HAR model for realized volatility: A least-square model averaging perspective. Journal of Management Science and Engineering 4: 55–73. [Google Scholar] [CrossRef]
- Shirata, Yoshiko C. 2003. Predictors of Bankruptcy after Bubble Economy in Japan: What Can You Learn from Japan Case? Paper presented at 15th Asian-Pacific Conference on International Accounting Issues, Thailand, November 1. [Google Scholar]
- Tanaka, Katsuyuki, Takuji Kinkyo, and Shigeyuki Hamori. 2016. Random forests-based early warning system for bank failures. Economics Letters 148: 118–21. [Google Scholar] [CrossRef]
- Tanaka, Katsuyuki, Takuo Higashide, Takuji Kinkyo, and Shigeyuki Hamori. 2018a. Forecasting the vulnerability of industrial economic activities: Predicting the bankruptcy of companies. Journal of Management Information and Decision Sciences 20: 1–24. [Google Scholar]
- Tanaka, Katsuyuki, Takuji Kinkyo, and Shigeyuki Hamori. 2018b. Financial hazard map: Financial vulnerability predicted by a random forests classification model. Sustainability 10: 1530. [Google Scholar] [CrossRef] [Green Version]
- Tanaka, Katsuyuki, Takuo Higashide, Takuji Kinkyo, and Shigeyuki Hamori. 2019. Analyzing industry-level vulnerability by predicting financial bankruptcy. Economic Inquiry 57: 2017–34. [Google Scholar] [CrossRef]
- Taylor, Stephen J. 1982. Financial returns modeled by the products of two stochastic processes, a study of daily sugar prices 1961–1979. In Time Series Analysis: Theory and Practice 1. Edited by Oliver Duncan Anderson. Amsterdam: North-Holland, pp. 203–26. [Google Scholar]
- The Japanese Government Financial Services Agency. 2018. Available online: https://www.fsa.go.jp/en./regulated/hst/index.html (accessed on 1 April 2021).
- Toriumi, Fujio, Hirokazu Nishioka, Toshimitsu Umeoka, and Kenichiro Ishii. 2012. Analysis of the market difference using the stock board. The Japanese Society for Artificial Intelligence 27: 143–50. (In Japanese). [Google Scholar] [CrossRef] [Green Version]
- Ubukata, Masato, and Toshiaki Watanabe. 2014. Market variance risk premiums in Japan for asset predictability. Empirical Economics 47: 169–98. [Google Scholar] [CrossRef]
- Watanabe, Toshiaki. 2020. Heterogeneous Autoregressive Models: Survey with the Application to the Realized Volatility of Nikkei 225 Stock Index. Hiroshima University of Economics, Keizai Kenkyu 42: 5–18. (In Japanese). [Google Scholar]
- Zhang, Frank. 2010. High-Frequency Trading, Stock Volatility, and Price Discovery. Social Science Research Network. Available online: http://ssrn.com/abstract=1691679 (accessed on 4 April 2021). [CrossRef]

**Figure 2.**Time series of the TSE Co-Location dataset and RV. In this figure, RV denotes $R{V}_{t}^{\left(d\right)}$, while Colo_C, Colo_Y and Colo_B denote co-location ratios; the ratio of order quantity via the TSE Co-Location area to total order quantity (defined in Equation (5)), the ratio of execution quantity via the TSE Co-Location area to total order of execution quantity (defined in Equation (6)) and the ratio of the value traded quantity via TSE Co-Location area to total value traded quantity (defined in Equation (7)), respectively.

**Figure 4.**Importance variable changes from the first half period to the second half period sorted by first-half period base.

**Figure 5.**Importance variable changes from the first half period to the second half period sorted by second-half period base.

HAR | Volume | TSE Co-Location | Stock Full-Board |
---|---|---|---|

RV_daily | market volume_daily | Colo_C_daily | Cum_Plus_daily |

RV_weekly | market volume_weekly | Colo_Y_daily | Cum_Minus_daily |

RV_monthly | market volume_monthly | Colo_B_daily | Cum_Plus_weekly |

Colo_C_weekly | Cum_Minus_weekly | ||

Colo_Y_weekly | Cum_Plus_monthly | ||

Colo_B_weekly | Cum_Minus_monthly | ||

Colo_C_monthly | |||

Colo_Y_monthly | |||

Colo_B_monthly |

Total Observation Period | ||
---|---|---|

Down | Up | |

training data | 848 | 841 |

test data | 93 | 94 |

Total Observation Period | |||
---|---|---|---|

No | Model | F-Measure | |

Random Forest | Logistic | ||

I | HAR | 0.60 | 0.59 |

II | HAR + Volume | 0.64 | 0.52 |

III | HAR + TSE Co-Location | 0.63 | 0.53 |

IV | HAR + Stock full board | 0.66 | 0.46 |

V | HAR + Volume + TSE Co-Location + Stock full board | 0.68 | 0.46 |

Frequency | RV | Market Volume | Colo_C | Colo_Y | Colo_B | Cum_Plus | Cum_Minus | Average |
---|---|---|---|---|---|---|---|---|

daily | 1 | 1 | 1 | 1 | 3 | 1 | 2 | 1.4 |

weekly | 3 | 2 | 2 | 3 | 2 | 1 | 1 | 2.0 |

monthly | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 1.9 |

F-Measure | ||
---|---|---|

First Half Period | Second Half Period | |

Random Forest | 0.56 | 0.61 |

Logstic | 0.54 | 0.39 |

First Half Period | Second Half Period | |||
---|---|---|---|---|

Down | Up | Down | Up | |

training data | 400 | 387 | 445 | 457 |

test data | 44 | 43 | 52 | 48 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Higashide, T.; Tanaka, K.; Kinkyo, T.; Hamori, S.
New Dataset for Forecasting Realized Volatility: Is the Tokyo Stock Exchange Co-Location Dataset Helpful for Expansion of the Heterogeneous Autoregressive Model in the Japanese Stock Market? *J. Risk Financial Manag.* **2021**, *14*, 215.
https://doi.org/10.3390/jrfm14050215

**AMA Style**

Higashide T, Tanaka K, Kinkyo T, Hamori S.
New Dataset for Forecasting Realized Volatility: Is the Tokyo Stock Exchange Co-Location Dataset Helpful for Expansion of the Heterogeneous Autoregressive Model in the Japanese Stock Market? *Journal of Risk and Financial Management*. 2021; 14(5):215.
https://doi.org/10.3390/jrfm14050215

**Chicago/Turabian Style**

Higashide, Takuo, Katsuyuki Tanaka, Takuji Kinkyo, and Shigeyuki Hamori.
2021. "New Dataset for Forecasting Realized Volatility: Is the Tokyo Stock Exchange Co-Location Dataset Helpful for Expansion of the Heterogeneous Autoregressive Model in the Japanese Stock Market?" *Journal of Risk and Financial Management* 14, no. 5: 215.
https://doi.org/10.3390/jrfm14050215