Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in Spatial-Temporal Data
Abstract
:1. Introduction
2. Spatio-Temporal Recurrent Neural Network
2.1. Traditional Recurrent Neural Network
2.2. Bayesian Spatio-Temporal Recurrent Neural Network
2.3. BAST-RNN Prior Distributions
2.4. Dimension Reduction
3. Computation: Parameter Expansion MCMC
Algorithm 1 PX-MCMC algorithm. |
|
4. Applications
4.1. Validation Measures and Alternative Models
4.2. BAST-RNN Implementation Details
4.3. Simulation: Multiscale Lorenz-96 Model
4.4. Application: Long-Lead Tropical Pacific SST Forecasting
4.5. Application: U.S. State-Level Unemployment Rate
5. Discussion and Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Specification of Priors
Each element in the weight matrix is given the following prior distribution: | |
, for , | |
where , 0.001, 0.20, and 0.20. | |
Each element in the weight matrix is given the following prior distribution: | |
, for , | |
where , 0.0005, 0.20, and 0.025. | |
Each element in the weight matrix is given the following prior distribution: | |
, for , | |
where 0.01, and 0.50. | |
Each element in the weight matrix is given the following prior distribution: | |
, for , | |
where 0.5, 0.05, and 0.25. | |
Finally, , where , , where , , | |
, where and . | |
Appendix B. Details of Algorithm 1
Appendix C. Full-Conditionals for the BAST-RNN Model
- ,for and .
- ,for and .
- ,for and .
- ,for .
- ,for and .
- ,for and .
- .
Appendix D. Trace Plots for the BAST-RNN Model
References
- Fan, J.; Yao, Q. Nonlinear Time Series; Springer: Berlin, Germany, 2005. [Google Scholar]
- Billings, S.A. Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Wikle, C. Modern perspectives on statistics for spatio-temporal data. Wiley Interdiscip. Rev. Comput. Stat. 2015, 7, 86–98. [Google Scholar] [CrossRef]
- Berliner, L.M.; Wikle, C.K.; Cressie, N. Long-lead prediction of Pacific SSTs via Bayesian dynamic modeling. J. Clim. 2000, 13, 3953–3968. [Google Scholar] [CrossRef]
- Wu, G.; Holan, S.H.; Wikle, C.K. Hierarchical Bayesian spatio-temporal Conway–Maxwell Poisson models with dynamic dispersion. J. Agric. Biol. Environ. Stat. 2013, 18, 335–356. [Google Scholar] [CrossRef]
- Hooten, M.B.; Wikle, C.K. Statistical agent-based models for discrete spatio-temporal systems. J. Am. Stat. Assoc. 2010, 105, 236–248. [Google Scholar] [CrossRef]
- Wikle, C.K.; Hooten, M.B. A general science-based framework for dynamical spatio-temporal models. Test 2010, 19, 417–451. [Google Scholar] [CrossRef]
- McDermott, P.L.; Wikle, C.K. A model-based approach for analog spatio-temporal dynamic forecasting. Environmetrics 2016, 27, 70–82. [Google Scholar] [CrossRef]
- Richardson, R.A. Sparsity in nonlinear dynamic spatiotemporal models using implied advection. Environmetrics 2017, 28, e2456. [Google Scholar] [CrossRef]
- Cressie, N.; Wikle, C. Statistics for Spatio-Temporal Data; John Wiley & Sons: New York, NY, USA, 2011. [Google Scholar]
- Tang, B.; Hsieh, W.W.; Monahan, A.H.; Tangang, F.T. Skill comparisons between neural networks and canonical correlation analysis in predicting the equatorial Pacific sea surface temperatures. J. Clim. 2000, 13, 287–293. [Google Scholar] [CrossRef]
- Dixon, M.F.; Polson, N.G.; Sokolov, V.O. Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading. arXiv, 2017; arXiv:1705.09851. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 855–868. [Google Scholar] [CrossRef] [PubMed]
- Ning, G.; Zhang, Z.; Huang, C.; He, Z.; Ren, X.; Wang, H. Spatially supervised recurrent convolutional neural networks for visual object tracking. arXiv, 2016; arXiv:1607.05781. [Google Scholar]
- Yildiz, I.B.; von Kriegstein, K.; Kiebel, S.J. From birdsong to human speech recognition: Bayesian inference on a hierarchy of nonlinear dynamical systems. PLoS Comput. Biol. 2013, 9, e1003219. [Google Scholar] [CrossRef] [PubMed]
- Graves, A. Generating sequences with recurrent neural networks. arXiv, 2013; arXiv:1308.0850. [Google Scholar]
- Jaeger, H. The “Echo State” Approach to Analysing and Training Recurrent Neural Networks-with an Erratum Note; German National Research Center for Information Technology GMD Technical Report: Bonn, Germany, 2001; Volume 148. [Google Scholar]
- Lukoševičius, M.; Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 2009, 3, 127–149. [Google Scholar] [CrossRef]
- McDermott, P.L.; Wikle, C.K. An Ensemble Quadratic Echo State Network for Nonlinear Spatio-Temporal Forecasting. STAT 2017, 6, 315–330. [Google Scholar] [CrossRef]
- Van der Westhuizen, J.; Lasenby, J. Bayesian LSTMs in medicine. arXiv, 2017; arXiv:1706.01242. [Google Scholar]
- Neal, R.M. Bayesian Learning for Neural Networks. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 1994. [Google Scholar]
- Chatzis, S.P. Sparse Bayesian Recurrent Neural Networks. In Machine Learning and Knowledge Discovery in Databases; Springer: Berlin, Germany, 2015; pp. 359–372. [Google Scholar]
- Chien, J.T.; Ku, Y.C. Bayesian recurrent neural network for language modeling. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 361–374. [Google Scholar] [CrossRef] [PubMed]
- Gan, Z.; Li, C.; Chen, C.; Pu, Y.; Su, Q.; Carin, L. Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling. arXiv, 2016; arXiv:1611.08034. [Google Scholar]
- Liu, J.S.; Wu, Y.N. Parameter expansion for data augmentation. J. Am. Stat. Assoc. 1999, 94, 1264–1274. [Google Scholar] [CrossRef]
- Hobert, J.P.; Marchev, D. A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms. Ann. Stat. 2008, 36, 532–554. [Google Scholar] [CrossRef]
- Hobert, J.P. The data augmentation algorithm: Theory and methodology. Handbook of Markov Chain Monte Carlo; Chapman & Hall/CRC: London, UK, 2011; pp. 253–293. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Gated feedback recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 Junly 2015; pp. 2067–2075. [Google Scholar]
- Takens, F. Detecting strange attractors in turbulence. Lect. Notes Math. 1981, 898, 366–381. [Google Scholar]
- Srivastava, N.; Hinton, G.E.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Polson, N.; Sokolov, V. Deep Learning: A Bayesian Perspective. Bayesian Anal. 2017, 12, 1275–1304. [Google Scholar] [CrossRef]
- MacKay, D.J. A practical Bayesian framework for backpropagation networks. Neural Comput. 1992, 4, 448–472. [Google Scholar] [CrossRef]
- O’Hara, R.B.; Sillanpää, M.J. A review of Bayesian variable selection methods: What, how and which. Bayesian Anal. 2009, 4, 85–117. [Google Scholar] [CrossRef]
- George, E.I.; McCulloch, R.E. Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 1993, 88, 881–889. [Google Scholar] [CrossRef]
- George, E.I.; McCulloch, R.E. Approaches for Bayesian variable selection. Stat. Sin. 1997, 7, 339–373. [Google Scholar]
- Ghosh, M.; Maiti, T.; Kim, D.; Chakraborty, S.; Tewari, A. Hierarchical Bayesian neural networks: An application to a prostate cancer study. J. Am. Stat. Assoc. 2004, 99, 601–608. [Google Scholar] [CrossRef]
- Park, T.; Casella, G. The bayesian lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
- Carvalho, C.M.; Polson, N.G.; Scott, J.G. The horseshoe estimator for sparse signals. Biometrika 2010, 97, 465–480. [Google Scholar] [CrossRef] [Green Version]
- Ročková, V.; George, E.I. The spike-and-slab lasso. J. Am. Stat. Assoc. 2018, 113, 431–444. [Google Scholar] [CrossRef]
- Belkin, M.; Niyogi, P. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01), Vancouver, BC, Canada, 3–8 December 2001; pp. 585–591. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Coifman, R.; Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. [Google Scholar] [CrossRef] [Green Version]
- Matheson, J.E.; Winkler, R.L. Scoring rules for continuous probability distributions. Manag. Sci. 1976, 10, 1087–1096. [Google Scholar] [CrossRef]
- Gneiting, T.; Katzfuss, M. Probabilistic forecasting. Annu. Rev. Stat. Appl. 2014, 1, 125–151. [Google Scholar] [CrossRef]
- Majda, A.J.; Timofeyev, I.; Vanden-Eijnden, E. Systematic strategies for stochastic mode reduction in climate. J. Atmos. Sci. 2003, 60, 1705–1722. [Google Scholar] [CrossRef]
- Kravtsov, S.; Kondrashov, D.; Ghil, M. Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability. J. Clim. 2005, 18, 4404–4424. [Google Scholar] [CrossRef]
- Green, P.J. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 1995, 82, 711–732. [Google Scholar] [CrossRef] [Green Version]
- Lukoševičius, M. A practical guide to applying echo state networks. In Neural Networks: Tricks of the Trade; Springer: Berlin, Germany, 2012; pp. 659–686. [Google Scholar]
- Lorenz, E.N. Deterministic nonperiodic flow. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
- Ma, Q.L.; Zheng, Q.L.; Peng, H.; Zhong, T.W.; Xu, L.Q. Chaotic time series prediction based on evolving recurrent neural networks. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics, Hong Kong, China, 19–22 August 2007; Volume 6, pp. 3496–3500. [Google Scholar]
- Chandra, R.; Zhang, M. Cooperative coevolution of Elman recurrent neural networks for chaotic time series prediction. Neurocomputing 2012, 86, 116–123. [Google Scholar] [CrossRef]
- Lorenz, E.N. Predictability: A problem partly solved. In Proceedings of the Seminar on Predictability, Reading, UK, 4–8 September 1995; Volume 1. [Google Scholar]
- Wilks, D.S. Effects of stochastic parametrizations in the Lorenz’96 system. Quart. J. R. Meteorol. Soc. 2005, 131, 389–407. [Google Scholar] [CrossRef]
- Chorin, A.J.; Lu, F. Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics. Proc. Natl. Acad. Sci. USA 2015, 112, 9804–9809. [Google Scholar] [CrossRef] [Green Version]
- Grooms, I.; Lee, Y. A framework for variational data assimilation with superparameterization. Nonlinear Processes Geophys. 2015, 22, 601–611. [Google Scholar] [CrossRef] [Green Version]
- Hu, S.; Fedorov, A.V. The extreme El Niño of 2015–2016: The role of westerly and easterly wind bursts, and preconditioning by the failed 2014 event. Clim. Dyn. 2017, 1–19. [Google Scholar] [CrossRef]
- L’Heureux, M.L.; Takahashi, K.; Watkins, A.B.; Barnston, A.G.; Becker, E.J.; Di Liberto, T.E.; Gamble, F.; Gottschalck, J.; Halpert, M.S.; Huang, B.; et al. Observing and predicting the 2015-16 El Niño. Bull. Am. Meteorol. Soc. 2017, 98, 1363–1382. [Google Scholar] [CrossRef]
- Barnston, A.G.; Tippett, M.K.; L’Heureux, M.L.; Li, S.; DeWitt, D.G. Skill of real-time seasonal ENSO model predictions during 2002–2011: Is our capability increasing? Bull. Am. Meteorol. Soc. 2012, 93, 631–651. [Google Scholar] [CrossRef]
- Barnston, A.G.; He, Y.; Glantz, M.H. Predictive skill of statistical and dynamical climate models in SST forecasts during the 1997–1998 El Niño episode and the 1998 La Niña onset. Bull. Am. Meteorol. Soc. 1999, 80, 217–243. [Google Scholar] [CrossRef]
- Jan van Oldenborgh, G.; Balmaseda, M.A.; Ferranti, L.; Stockdale, T.N.; Anderson, D.L. Did the ECMWF seasonal forecast model outperform statistical ENSO forecast models over the last 15 years? J. Clim. 2005, 18, 3240–3249. [Google Scholar] [CrossRef]
- Tangang, F.T.; Tang, B.; Monahan, A.H.; Hsieh, W.W. Forecasting ENSO events: A neural network–extended EOF approach. J. Clim. 1998, 11, 29–41. [Google Scholar] [CrossRef]
- Gladish, D.W.; Wikle, C.K. Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models. Environmetrics 2014, 25, 230–244. [Google Scholar] [CrossRef]
- Liang, F. Bayesian neural networks for nonlinear time series forecasting. Stat. Comput. 2005, 15, 13–29. [Google Scholar] [CrossRef] [Green Version]
- Sharma, S.; Singh, S. Unemployment rates forecasting using supervised neural networks. In Proceedings of the 2016 6th International Conference Cloud System and Big Data Engineering (Confluence), Noida, India, 14–15 January 2016; pp. 28–33. [Google Scholar]
- Teräsvirta, T.; Van Dijk, D.; Medeiros, M.C. Linear models, smooth transition autoregressions, and neural networks for forecasting macroeconomic time series: A re-examination. Int. J. Forecast. 2005, 21, 755–774. [Google Scholar] [CrossRef] [Green Version]
- Jones, N.A.; Smith, A.S. The Two or More Races Population, 2000; US Department of Commerce, Economics and Statistics Administration, US Census Bureau: Washington, DC, USA, 2001; Volume 8.
- Welling, M.; Teh, Y.W. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 681–688. [Google Scholar]
- Bradley, J.R.; Wikle, C.K.; Holan, S.H. Bayesian spatial change of support for count-valued survey data with application to the american community survey. J. Am. Stat. Assoc. 2016, 111, 472–487. [Google Scholar] [CrossRef]
Model | MSPE | CRPS |
---|---|---|
BAST-RNN | 13.08 | 154.37 |
E-QESN | 13.91 | 168.61 |
GQN | 14.85 | 172.50 |
Lin. DSTM | 15.11 | 166.60 |
Model | Overall MSPE | Niño 3.4 MSPE | CRPS | Niño 3.4 CRPS |
---|---|---|---|---|
BAST-RNN | 0.253 | 0.223 | 3.437 | 0.318 |
E-QESN | 0.272 | 0.319 | 3.455 | 0.408 |
GQN | 0.309 | 0.619 | 3.924 | 0.538 |
Lin. DSTM | 0.328 | 0.785 | 3.752 | 0.699 |
Model | MSPE | CRPS |
---|---|---|
BAST-RNN | 0.612 | 27.11 |
E-QESN | 0.965 | 36.66 |
GQN | 0.964 | 37.32 |
Lin. DSTM | 0.865 | 33.60 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
McDermott, P.L.; Wikle, C.K. Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in Spatial-Temporal Data. Entropy 2019, 21, 184. https://doi.org/10.3390/e21020184
McDermott PL, Wikle CK. Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in Spatial-Temporal Data. Entropy. 2019; 21(2):184. https://doi.org/10.3390/e21020184
Chicago/Turabian StyleMcDermott, Patrick L., and Christopher K. Wikle. 2019. "Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in Spatial-Temporal Data" Entropy 21, no. 2: 184. https://doi.org/10.3390/e21020184
APA StyleMcDermott, P. L., & Wikle, C. K. (2019). Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in Spatial-Temporal Data. Entropy, 21(2), 184. https://doi.org/10.3390/e21020184