Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in SpatialTemporal Data
Abstract
:1. Introduction
2. SpatioTemporal Recurrent Neural Network
2.1. Traditional Recurrent Neural Network
2.2. Bayesian SpatioTemporal Recurrent Neural Network
2.3. BASTRNN Prior Distributions
2.4. Dimension Reduction
3. Computation: Parameter Expansion MCMC
Algorithm 1 PXMCMC algorithm. 

4. Applications
4.1. Validation Measures and Alternative Models
4.2. BASTRNN Implementation Details
4.3. Simulation: Multiscale Lorenz96 Model
4.4. Application: LongLead Tropical Pacific SST Forecasting
4.5. Application: U.S. StateLevel Unemployment Rate
5. Discussion and Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Specification of Priors
Each element in the weight matrix $\mathbf{W}$ is given the following prior distribution:  
${w}_{i,\ell}={\gamma}_{i,\ell}^{w}{\mathrm{TN}}_{[{a}_{w},{a}_{w}]}(0,{\sigma}_{w,0}^{2})+(1{\gamma}_{i,\ell}^{w}){\mathrm{TN}}_{[{a}_{w},{a}_{w}]}(0,{\sigma}_{w,1}^{2})$, for ${\gamma}_{i,\ell}^{w}\sim \mathrm{Bernoulli}({\pi}_{w})$,  
where ${\sigma}_{w,0}^{2}={(1,000)}^{2}$, ${\sigma}_{w,1}^{2}=$ 0.001, ${a}_{w}=$ 0.20, and ${\pi}_{w}=$ 0.20.  
Each element in the weight matrix $\mathbf{U}$ is given the following prior distribution:  
${u}_{i,r}={\gamma}_{i,r}^{u}{\mathrm{TN}}_{[{a}_{u},{a}_{u}]}(0,{\sigma}_{u,0}^{2})+(1{\gamma}_{i,r}^{u}){\mathrm{TN}}_{[{a}_{u},{a}_{u}]}(0,{\sigma}_{u,1}^{2})$, for ${\gamma}_{i,r}^{u}\sim \mathrm{Bernoulli}({\pi}_{u})$,  
where ${\sigma}_{u,0}^{2}={(1,000)}^{2}$, ${\sigma}_{u,1}^{2}=$ 0.0005, ${a}_{u}=$ 0.20, and ${\pi}_{u}=$ 0.025.  
Each element in the weight matrix ${\mathbf{V}}_{1}$ is given the following prior distribution:  
${v}_{1,k,i}={\gamma}_{1,k,i}^{v}\mathrm{Gau}(0,{\sigma}_{{v}_{1},0}^{2})+(1{\gamma}_{1,k,i}^{v})\mathrm{Gau}(0,{\sigma}_{{v}_{1},1}^{2})$, for ${\gamma}_{1,k,i}\sim \mathrm{Bernoulli}({\pi}_{{v}_{1}})$,  
where ${\sigma}_{{v}_{1},0}^{2}=10,{\sigma}_{{v}_{1},1}^{2}=$ 0.01, and ${\pi}_{{v}_{1}}=$ 0.50.  
Each element in the weight matrix ${\mathbf{V}}_{2}$ is given the following prior distribution:  
${v}_{2,k,i}={\gamma}_{2,k,i}^{v}\mathrm{Gau}(0,{\sigma}_{{v}_{2},0}^{2})+(1{\gamma}_{2,k,i}^{v})\mathrm{Gau}(0,{\sigma}_{{v}_{2},1}^{2})$, for ${\gamma}_{2,k,i}\sim \mathrm{Bernoulli}({\pi}_{{v}_{2}})$,  
where ${\sigma}_{{v}_{2},0}^{2}=$ 0.5, ${\sigma}_{{v}_{2},1}^{2}=$ 0.05, and ${\pi}_{{v}_{2}}=$ 0.25.  
Finally, $\mathit{\alpha}\sim \mathrm{Gau}(0,{\sigma}_{\alpha}^{2}\mathbf{I})$, where ${\sigma}_{\alpha}^{2}={(.10)}^{2}$, $\mathit{\mu}\sim \mathrm{Gau}(0,{\sigma}_{\mu}^{2}\mathbf{I})$, where ${\sigma}_{\mu}^{2}=100$, $\delta \sim \mathrm{Unif}(0,1)$,  
${\sigma}_{\u03f5}^{2}\sim \mathrm{IG}({\alpha}_{\u03f5},{\beta}_{\u03f5})$, where ${\alpha}_{\u03f5}=1$ and ${\beta}_{\u03f5}=1$.  
Appendix B. Details of Algorithm 1
Appendix C. FullConditionals for the BASTRNN Model
 $[{w}_{i,\ell},{\gamma}_{i,\ell}^{w}\mid {\mathbf{Y}}_{1:T},{\tilde{\mathbf{x}}}_{1:T},{\tilde{\mathsf{\Theta}}}_{\{{w}_{i,\ell},{\gamma}_{i,\ell}^{w}\}}]\propto {\displaystyle \prod _{t=1}^{T}}\mathrm{exp}\left(\frac{{({\mathbf{Y}}_{t}{\mathbf{g}}_{t})}^{\prime}({\mathbf{Y}}_{t}{\mathbf{g}}_{t})}{2{\sigma}_{\u03f5}^{2}}\right)$$\times \left(\right)open="("\; close=")">\frac{{\gamma}_{i,\ell}^{w}\mathrm{exp}\left(\frac{{w}_{i,\ell}^{2}}{2{\sigma}_{w,0}^{2}}\right)}{\mathsf{\Phi}(\frac{{a}_{w}}{{\sigma}_{w,0}})\mathsf{\Phi}(\frac{{a}_{w}}{{\sigma}_{w,0}})}+\frac{(1{\gamma}_{i,\ell}^{w})\mathrm{exp}\left(\frac{{w}_{i,\ell}^{2}}{2{\sigma}_{w,1}^{2}}\right)}{\mathsf{\Phi}(\frac{{a}_{w}}{{\sigma}_{w,1}})\mathsf{\Phi}(\frac{{a}_{w}}{{\sigma}_{w,1}})}$,for $i=1,\dots ,{n}_{h}$ and $\ell =1,\dots ,{n}_{h}$.
 $[{\alpha}_{i,\ell}\mid {\mathbf{Y}}_{1:T},{\tilde{\mathbf{x}}}_{1:T},{\tilde{\mathsf{\Theta}}}_{{\alpha}_{i,\ell}}]\propto {\displaystyle \prod _{t=1}^{T}}\mathrm{exp}\left(\frac{{({\mathbf{Y}}_{t}{\mathbf{g}}_{t})}^{\prime}({\mathbf{Y}}_{t}{\mathbf{g}}_{t})}{2{\sigma}_{\u03f5}^{2}}\right)$,$\times \left(\right)open="("\; close=")">\frac{{\gamma}_{i,\ell}^{w}\mathrm{exp}\left(\frac{{\left\{{t}_{{\alpha}_{i,\ell}}({\tilde{w}}_{i,\ell})\right\}}^{2}}{2{\sigma}_{w,0}^{2}}\right)}{\mathsf{\Phi}(\frac{{a}_{w}}{{\sigma}_{w,0}})\mathsf{\Phi}(\frac{{a}_{w}}{{\sigma}_{w,0}})}+\frac{(1{\gamma}_{i,\ell}^{w})\mathrm{exp}\left(\frac{{\left\{{t}_{{\alpha}_{i,\ell}}({\tilde{w}}_{i,\ell})\right\}}^{2}}{2{\sigma}_{w,1}^{2}}\right)}{\mathsf{\Phi}(\frac{{a}_{w}}{{\sigma}_{w,1}})\mathsf{\Phi}(\frac{{a}_{w}}{{\sigma}_{w,1}})}$for $i=1,\dots ,{n}_{h}$ and $\ell =1,\dots ,{n}_{h}$.
 $[{u}_{i,r},{\gamma}_{i,r}^{u}\mid {\mathbf{Y}}_{1:T},{\tilde{\mathbf{x}}}_{1:T},{\tilde{\mathsf{\Theta}}}_{\{{u}_{i,r},{\gamma}_{i,r}^{u}\}}]\propto {\displaystyle \prod _{t=1}^{T}}\mathrm{exp}\left(\frac{{({\mathbf{Y}}_{t}{\mathbf{g}}_{t})}^{\prime}({\mathbf{Y}}_{t}{\mathbf{g}}_{t})}{2{\sigma}_{\u03f5}^{2}}\right)$$\times \left(\right)open="("\; close=")">\frac{{\gamma}_{i,r}^{u}\mathrm{exp}\left(\frac{{u}_{i,r}^{2}}{2{\sigma}_{u,0}^{2}}\right)}{\mathsf{\Phi}(\frac{{a}_{u}}{{\sigma}_{u,0}})\mathsf{\Phi}(\frac{{a}_{u}}{{\sigma}_{u,0}})}+\frac{(1{\gamma}_{i,r}^{u})\mathrm{exp}\left(\frac{{u}_{i,r}^{2}}{2{\sigma}_{u,1}^{2}}\right)}{\mathsf{\Phi}(\frac{{a}_{u}}{{\sigma}_{u,1}})\mathsf{\Phi}(\frac{{a}_{u}}{{\sigma}_{u,1}})}$,for $i=1,\dots ,{n}_{h}$ and $\ell =1,\dots ,{n}_{h}$.
 $[\delta \mid {\mathbf{Y}}_{1:T},{\tilde{\mathbf{x}}}_{1:T},{\tilde{\mathsf{\Theta}}}_{\delta}]\propto {\displaystyle \prod _{t=1}^{T}}\mathrm{exp}\left(\frac{{({\mathbf{Y}}_{t}{\mathbf{g}}_{t})}^{\prime}({\mathbf{Y}}_{t}{\mathbf{g}}_{t})}{2{\sigma}_{\u03f5}^{2}}\right)\times {I}_{[0,1]}(\delta )$
 $[{\mu}_{1,k},{\mathbf{V}}_{1,k},{\mathbf{V}}_{2,k}\mid {\mathbf{Y}}_{1:T},{\tilde{\mathbf{x}}}_{1:T},{\tilde{\mathsf{\Theta}}}_{\{{\mu}_{1,k},{\mathbf{V}}_{1,k},{\mathbf{V}}_{2,k}\}}]$,$\mathrm{Gau}\left(\right)open="("\; close=")">{\left(\right)}^{\frac{1}{{\sigma}^{2}\u03f5}}1\phantom{\rule{4pt}{0ex}}$for $k=1,\dots ,{n}_{h}$.
 $[{\gamma}_{1,k,i}^{v}\mid {\mathbf{Y}}_{1:T},{\tilde{\mathbf{x}}}_{1:T},{\tilde{\mathsf{\Theta}}}_{\left\{{\gamma}_{1,k,i}^{v}\right\}}]\propto \mathrm{Bernoulli}\left(\frac{{\varphi}^{{v}_{1}}({v}_{1,k,i}\mid {\gamma}_{1,k,i}^{v}=1)}{{\varphi}^{{v}_{1}}({v}_{1,k,i}\mid {\gamma}_{1,k,i}^{v}=1)+{\varphi}^{{v}_{1}}({v}_{1,k,i}\mid {\gamma}_{1,k,i}^{v}=0)}\right)$,for $i=1,\dots ,{n}_{h}$ and $k=1,\dots ,{n}_{y}$.
 $[{\gamma}_{2,k,i}^{v}\mid {\mathbf{Y}}_{1:T},{\tilde{\mathbf{x}}}_{1:T},{\tilde{\mathsf{\Theta}}}_{\left\{{\gamma}_{2,k,i}^{v}\right\}}]\propto \mathrm{Bernoulli}\left(\frac{{\varphi}^{{v}_{2}}({v}_{2,k,i}\mid {\gamma}_{2,k,i}^{v}=1)}{{\varphi}^{{v}_{2}}({v}_{2,k,i}\mid {\gamma}_{2,k,i}^{v}=1)+{\varphi}^{{v}_{2}}({v}_{2,k,i}\mid {\gamma}_{2,k,i}^{v}=0)}\right)$,for $i=1,\dots ,{n}_{h}$ and $k=1,\dots ,{n}_{y}$.
 $[{\sigma}_{\u03f5}^{2}\mid {\mathbf{Y}}_{1:T},{\tilde{\mathbf{x}}}_{1:T},{\tilde{\mathsf{\Theta}}}_{\left\{{\sigma}_{\u03f5}^{2}\right\}}]\propto \mathrm{IG}(\frac{T{n}_{y}}{2}+{\alpha}_{\u03f5},\frac{1}{2}\sum _{t=1}^{T}{({\mathbf{Y}}_{t}{\mathbf{g}}_{t})}^{\prime}({\mathbf{Y}}_{t}{\mathbf{g}}_{t})+{\beta}_{\u03f5})$.
Appendix D. Trace Plots for the BASTRNN Model
References
 Fan, J.; Yao, Q. Nonlinear Time Series; Springer: Berlin, Germany, 2005. [Google Scholar]
 Billings, S.A. Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and SpatioTemporal Domains; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
 Wikle, C. Modern perspectives on statistics for spatiotemporal data. Wiley Interdiscip. Rev. Comput. Stat. 2015, 7, 86–98. [Google Scholar] [CrossRef]
 Berliner, L.M.; Wikle, C.K.; Cressie, N. Longlead prediction of Pacific SSTs via Bayesian dynamic modeling. J. Clim. 2000, 13, 3953–3968. [Google Scholar] [CrossRef]
 Wu, G.; Holan, S.H.; Wikle, C.K. Hierarchical Bayesian spatiotemporal Conway–Maxwell Poisson models with dynamic dispersion. J. Agric. Biol. Environ. Stat. 2013, 18, 335–356. [Google Scholar] [CrossRef]
 Hooten, M.B.; Wikle, C.K. Statistical agentbased models for discrete spatiotemporal systems. J. Am. Stat. Assoc. 2010, 105, 236–248. [Google Scholar] [CrossRef]
 Wikle, C.K.; Hooten, M.B. A general sciencebased framework for dynamical spatiotemporal models. Test 2010, 19, 417–451. [Google Scholar] [CrossRef]
 McDermott, P.L.; Wikle, C.K. A modelbased approach for analog spatiotemporal dynamic forecasting. Environmetrics 2016, 27, 70–82. [Google Scholar] [CrossRef]
 Richardson, R.A. Sparsity in nonlinear dynamic spatiotemporal models using implied advection. Environmetrics 2017, 28, e2456. [Google Scholar] [CrossRef]
 Cressie, N.; Wikle, C. Statistics for SpatioTemporal Data; John Wiley & Sons: New York, NY, USA, 2011. [Google Scholar]
 Tang, B.; Hsieh, W.W.; Monahan, A.H.; Tangang, F.T. Skill comparisons between neural networks and canonical correlation analysis in predicting the equatorial Pacific sea surface temperatures. J. Clim. 2000, 13, 287–293. [Google Scholar] [CrossRef]
 Dixon, M.F.; Polson, N.G.; Sokolov, V.O. Deep Learning for SpatioTemporal Modeling: Dynamic Traffic Flows and High Frequency Trading. arXiv, 2017; arXiv:1705.09851. [Google Scholar] [CrossRef]
 Hochreiter, S.; Schmidhuber, J. Long shortterm memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
 Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 855–868. [Google Scholar] [CrossRef] [PubMed]
 Ning, G.; Zhang, Z.; Huang, C.; He, Z.; Ren, X.; Wang, H. Spatially supervised recurrent convolutional neural networks for visual object tracking. arXiv, 2016; arXiv:1607.05781. [Google Scholar]
 Yildiz, I.B.; von Kriegstein, K.; Kiebel, S.J. From birdsong to human speech recognition: Bayesian inference on a hierarchy of nonlinear dynamical systems. PLoS Comput. Biol. 2013, 9, e1003219. [Google Scholar] [CrossRef] [PubMed]
 Graves, A. Generating sequences with recurrent neural networks. arXiv, 2013; arXiv:1308.0850. [Google Scholar]
 Jaeger, H. The “Echo State” Approach to Analysing and Training Recurrent Neural Networkswith an Erratum Note; German National Research Center for Information Technology GMD Technical Report: Bonn, Germany, 2001; Volume 148. [Google Scholar]
 Lukoševičius, M.; Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 2009, 3, 127–149. [Google Scholar] [CrossRef]
 McDermott, P.L.; Wikle, C.K. An Ensemble Quadratic Echo State Network for Nonlinear SpatioTemporal Forecasting. STAT 2017, 6, 315–330. [Google Scholar] [CrossRef]
 Van der Westhuizen, J.; Lasenby, J. Bayesian LSTMs in medicine. arXiv, 2017; arXiv:1706.01242. [Google Scholar]
 Neal, R.M. Bayesian Learning for Neural Networks. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 1994. [Google Scholar]
 Chatzis, S.P. Sparse Bayesian Recurrent Neural Networks. In Machine Learning and Knowledge Discovery in Databases; Springer: Berlin, Germany, 2015; pp. 359–372. [Google Scholar]
 Chien, J.T.; Ku, Y.C. Bayesian recurrent neural network for language modeling. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 361–374. [Google Scholar] [CrossRef] [PubMed]
 Gan, Z.; Li, C.; Chen, C.; Pu, Y.; Su, Q.; Carin, L. Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling. arXiv, 2016; arXiv:1611.08034. [Google Scholar]
 Liu, J.S.; Wu, Y.N. Parameter expansion for data augmentation. J. Am. Stat. Assoc. 1999, 94, 1264–1274. [Google Scholar] [CrossRef]
 Hobert, J.P.; Marchev, D. A theoretical comparison of the data augmentation, marginal augmentation and PXDA algorithms. Ann. Stat. 2008, 36, 532–554. [Google Scholar] [CrossRef]
 Hobert, J.P. The data augmentation algorithm: Theory and methodology. Handbook of Markov Chain Monte Carlo; Chapman & Hall/CRC: London, UK, 2011; pp. 253–293. [Google Scholar]
 Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Gated feedback recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 Junly 2015; pp. 2067–2075. [Google Scholar]
 Takens, F. Detecting strange attractors in turbulence. Lect. Notes Math. 1981, 898, 366–381. [Google Scholar]
 Srivastava, N.; Hinton, G.E.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
 Polson, N.; Sokolov, V. Deep Learning: A Bayesian Perspective. Bayesian Anal. 2017, 12, 1275–1304. [Google Scholar] [CrossRef]
 MacKay, D.J. A practical Bayesian framework for backpropagation networks. Neural Comput. 1992, 4, 448–472. [Google Scholar] [CrossRef]
 O’Hara, R.B.; Sillanpää, M.J. A review of Bayesian variable selection methods: What, how and which. Bayesian Anal. 2009, 4, 85–117. [Google Scholar] [CrossRef]
 George, E.I.; McCulloch, R.E. Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 1993, 88, 881–889. [Google Scholar] [CrossRef]
 George, E.I.; McCulloch, R.E. Approaches for Bayesian variable selection. Stat. Sin. 1997, 7, 339–373. [Google Scholar]
 Ghosh, M.; Maiti, T.; Kim, D.; Chakraborty, S.; Tewari, A. Hierarchical Bayesian neural networks: An application to a prostate cancer study. J. Am. Stat. Assoc. 2004, 99, 601–608. [Google Scholar] [CrossRef]
 Park, T.; Casella, G. The bayesian lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
 Carvalho, C.M.; Polson, N.G.; Scott, J.G. The horseshoe estimator for sparse signals. Biometrika 2010, 97, 465–480. [Google Scholar] [CrossRef] [Green Version]
 Ročková, V.; George, E.I. The spikeandslab lasso. J. Am. Stat. Assoc. 2018, 113, 431–444. [Google Scholar] [CrossRef]
 Belkin, M.; Niyogi, P. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01), Vancouver, BC, Canada, 3–8 December 2001; pp. 585–591. [Google Scholar]
 Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
 Coifman, R.; Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. [Google Scholar] [CrossRef] [Green Version]
 Matheson, J.E.; Winkler, R.L. Scoring rules for continuous probability distributions. Manag. Sci. 1976, 10, 1087–1096. [Google Scholar] [CrossRef]
 Gneiting, T.; Katzfuss, M. Probabilistic forecasting. Annu. Rev. Stat. Appl. 2014, 1, 125–151. [Google Scholar] [CrossRef]
 Majda, A.J.; Timofeyev, I.; VandenEijnden, E. Systematic strategies for stochastic mode reduction in climate. J. Atmos. Sci. 2003, 60, 1705–1722. [Google Scholar] [CrossRef]
 Kravtsov, S.; Kondrashov, D.; Ghil, M. Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability. J. Clim. 2005, 18, 4404–4424. [Google Scholar] [CrossRef]
 Green, P.J. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 1995, 82, 711–732. [Google Scholar] [CrossRef] [Green Version]
 Lukoševičius, M. A practical guide to applying echo state networks. In Neural Networks: Tricks of the Trade; Springer: Berlin, Germany, 2012; pp. 659–686. [Google Scholar]
 Lorenz, E.N. Deterministic nonperiodic flow. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
 Ma, Q.L.; Zheng, Q.L.; Peng, H.; Zhong, T.W.; Xu, L.Q. Chaotic time series prediction based on evolving recurrent neural networks. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics, Hong Kong, China, 19–22 August 2007; Volume 6, pp. 3496–3500. [Google Scholar]
 Chandra, R.; Zhang, M. Cooperative coevolution of Elman recurrent neural networks for chaotic time series prediction. Neurocomputing 2012, 86, 116–123. [Google Scholar] [CrossRef]
 Lorenz, E.N. Predictability: A problem partly solved. In Proceedings of the Seminar on Predictability, Reading, UK, 4–8 September 1995; Volume 1. [Google Scholar]
 Wilks, D.S. Effects of stochastic parametrizations in the Lorenz’96 system. Quart. J. R. Meteorol. Soc. 2005, 131, 389–407. [Google Scholar] [CrossRef]
 Chorin, A.J.; Lu, F. Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics. Proc. Natl. Acad. Sci. USA 2015, 112, 9804–9809. [Google Scholar] [CrossRef] [Green Version]
 Grooms, I.; Lee, Y. A framework for variational data assimilation with superparameterization. Nonlinear Processes Geophys. 2015, 22, 601–611. [Google Scholar] [CrossRef] [Green Version]
 Hu, S.; Fedorov, A.V. The extreme El Niño of 2015–2016: The role of westerly and easterly wind bursts, and preconditioning by the failed 2014 event. Clim. Dyn. 2017, 1–19. [Google Scholar] [CrossRef]
 L’Heureux, M.L.; Takahashi, K.; Watkins, A.B.; Barnston, A.G.; Becker, E.J.; Di Liberto, T.E.; Gamble, F.; Gottschalck, J.; Halpert, M.S.; Huang, B.; et al. Observing and predicting the 201516 El Niño. Bull. Am. Meteorol. Soc. 2017, 98, 1363–1382. [Google Scholar] [CrossRef]
 Barnston, A.G.; Tippett, M.K.; L’Heureux, M.L.; Li, S.; DeWitt, D.G. Skill of realtime seasonal ENSO model predictions during 2002–2011: Is our capability increasing? Bull. Am. Meteorol. Soc. 2012, 93, 631–651. [Google Scholar] [CrossRef]
 Barnston, A.G.; He, Y.; Glantz, M.H. Predictive skill of statistical and dynamical climate models in SST forecasts during the 1997–1998 El Niño episode and the 1998 La Niña onset. Bull. Am. Meteorol. Soc. 1999, 80, 217–243. [Google Scholar] [CrossRef]
 Jan van Oldenborgh, G.; Balmaseda, M.A.; Ferranti, L.; Stockdale, T.N.; Anderson, D.L. Did the ECMWF seasonal forecast model outperform statistical ENSO forecast models over the last 15 years? J. Clim. 2005, 18, 3240–3249. [Google Scholar] [CrossRef]
 Tangang, F.T.; Tang, B.; Monahan, A.H.; Hsieh, W.W. Forecasting ENSO events: A neural network–extended EOF approach. J. Clim. 1998, 11, 29–41. [Google Scholar] [CrossRef]
 Gladish, D.W.; Wikle, C.K. Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatiotemporal models. Environmetrics 2014, 25, 230–244. [Google Scholar] [CrossRef]
 Liang, F. Bayesian neural networks for nonlinear time series forecasting. Stat. Comput. 2005, 15, 13–29. [Google Scholar] [CrossRef] [Green Version]
 Sharma, S.; Singh, S. Unemployment rates forecasting using supervised neural networks. In Proceedings of the 2016 6th International Conference Cloud System and Big Data Engineering (Confluence), Noida, India, 14–15 January 2016; pp. 28–33. [Google Scholar]
 Teräsvirta, T.; Van Dijk, D.; Medeiros, M.C. Linear models, smooth transition autoregressions, and neural networks for forecasting macroeconomic time series: A reexamination. Int. J. Forecast. 2005, 21, 755–774. [Google Scholar] [CrossRef] [Green Version]
 Jones, N.A.; Smith, A.S. The Two or More Races Population, 2000; US Department of Commerce, Economics and Statistics Administration, US Census Bureau: Washington, DC, USA, 2001; Volume 8.
 Welling, M.; Teh, Y.W. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 681–688. [Google Scholar]
 Bradley, J.R.; Wikle, C.K.; Holan, S.H. Bayesian spatial change of support for countvalued survey data with application to the american community survey. J. Am. Stat. Assoc. 2016, 111, 472–487. [Google Scholar] [CrossRef]
Model  MSPE  CRPS 

BASTRNN  13.08  154.37 
EQESN  13.91  168.61 
GQN  14.85  172.50 
Lin. DSTM  15.11  166.60 
Model  Overall MSPE  Niño 3.4 MSPE  CRPS  Niño 3.4 CRPS 

BASTRNN  0.253  0.223  3.437  0.318 
EQESN  0.272  0.319  3.455  0.408 
GQN  0.309  0.619  3.924  0.538 
Lin. DSTM  0.328  0.785  3.752  0.699 
Model  MSPE  CRPS 

BASTRNN  0.612  27.11 
EQESN  0.965  36.66 
GQN  0.964  37.32 
Lin. DSTM  0.865  33.60 
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
McDermott, P.L.; Wikle, C.K. Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in SpatialTemporal Data. Entropy 2019, 21, 184. https://doi.org/10.3390/e21020184
McDermott PL, Wikle CK. Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in SpatialTemporal Data. Entropy. 2019; 21(2):184. https://doi.org/10.3390/e21020184
Chicago/Turabian StyleMcDermott, Patrick L., and Christopher K. Wikle. 2019. "Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in SpatialTemporal Data" Entropy 21, no. 2: 184. https://doi.org/10.3390/e21020184