## 1. Introduction

## 2. Boltzmann Machine Learning

## 3. Boltzmann Machine Learning Based on Spatial Monte Carlo Integration Method

#### 3.1. Spatial Monte Carlo Integration Method

#### 3.2. Boltzmann Machine Learning Based on First-Order SMCI Method

## 4. Comparison of 1-SMCI Learning Method and MPLE

#### 4.1. Comparison from Asymptotic Point of View

**Theorem**

**1.**

**Theorem**

**2.**

#### 4.2. Numerical Comparison

## 5. Numerical Comparison with Other Methods

## 6. Conclusions

## Acknowledgments

## Conflicts of Interest

## Appendix A. Proof of Theorem 1

## Appendix B. Proof of Theorem 2

## References

- Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief net. Neural Comput.
**2006**, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed] - Salakhutdinov, R.; Hinton, G.E. An Efficient Learning Procedure for Deep Boltzmann Machines. Neural Comput.
**2012**, 24, 1967–2006. [Google Scholar] [CrossRef] [PubMed] - Blake, A.; Kohli, P.; Rother, C. Markov Random Fields for Vision and Image Processing; The MIT Press: Cambridge, MA, USA, 2011. [Google Scholar]
- Rish, I.; Grabarnik, G. Sparse Modeling: Theory, Algorithms, and Applications; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
- Kuwatani, T.; Nagata, K.; Okada, M.; Toriumi, M. Markov random field modeling for mapping geofluid distributions from seismic velocity structures. Earth Planets Space
**2014**, 66, 5. [Google Scholar] [CrossRef] - Kuwatani, T.; Nagata, K.; Okada, M.; Toriumi, M. Markov-random-field modeling for linear seismic tomography. Phys. Rev. E
**2014**, 90, 042137. [Google Scholar] [CrossRef] [PubMed] - Ackley, D.H.; Hinton, G.E.; Sejnowski, T.J. A learning algorithm for Boltzmann machines. Cogn. Sci.
**1985**, 9, 147–169. [Google Scholar] [CrossRef] - Wainwright, M.J.; Jordan, M.I. Graphical Models, Exponential Families, and Variational Inference. Found. Trends Mach. Learn.
**2008**, 1, 1–305. [Google Scholar] [CrossRef] - Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2013. [Google Scholar]
- Kappen, H.J.; Rodríguez, F.B. Efficient Learning in Boltzmann Machines Using Linear Response Theory. Neural Comput.
**1998**, 10, 1137–1156. [Google Scholar] [CrossRef][Green Version] - Tanaka, T. Mean-field theory of Boltzmann machine learning. Phys. Rev. E
**1998**, 58, 2302–2310. [Google Scholar] [CrossRef] - Yasuda, M.; Tanaka, K. Approximate Learning Algorithm in Boltzmann Machines. Neural Comput.
**2009**, 21, 3130–3178. [Google Scholar] [CrossRef] [PubMed] - Sessak, V.; Monasson, R. Small-correlation expansions for the inverse Ising problem. J. Phys. A Math. Theor.
**2009**, 42, 055001. [Google Scholar] [CrossRef] - Furtlehner, C. Approximate inverse Ising models close to a Bethe reference point. J. Stat. Mech. Theor. Exp.
**2013**, 2013, P09020. [Google Scholar] [CrossRef] - Roudi, Y.; Aurell, E.; Hertz, J. Statistical physics of pairwise probability models. Front. Comput. Neurosci.
**2009**, 3, 22. [Google Scholar] [CrossRef] [PubMed] - Besag, J. Statistical Analysis of Non-Lattice Data. J. R. Stat. Soc. D
**1975**, 24, 179–195. [Google Scholar] [CrossRef] - Aurell, E.; Ekeberg, M. Inverse Ising Inference Using All the Data. Phys. Rev. Lett.
**2012**, 108, 090201. [Google Scholar] [CrossRef] [PubMed] - Hinton, G.E. Training products of experts by minimizing contrastive divergence. Neural Comput.
**2002**, 8, 1771–1800. [Google Scholar] [CrossRef] [PubMed] - Hyvärinen, A. Estimation of non-normalized statistical models using score matching. J. Mach. Learn. Res.
**2005**, 6, 695–709. [Google Scholar] - Sohl-Dickstein, J.; Battaglino, P.B.; DeWeese, M.R. Minimum Probability Flow Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML’11), Bellevue, WA, USA, 28 June – 2 July 2011; pp. 905–912. [Google Scholar]
- Sohl-Dickstein, J.; Battaglino, P.B.; DeWeese, M.R. New Method for Parameter Estimation in Probabilistic Models: Minimum Probability Flow. Phys. Rev. Lett.
**2011**, 107, 220601. [Google Scholar] [CrossRef] [PubMed] - Yasuda, M. Monte Carlo Integration Using Spatial Structure of Markov Random Field. J. Phys. Soc. Japan
**2015**, 84, 034001. [Google Scholar] [CrossRef] - Lehmann, E.L.; Casella, G. Theory of Point Estimation; Springer: Berlin, Germany, 1998. [Google Scholar]
- Hyvärinen, A. Consistency of Pseudo likelihood Estimation of Fully Visible Boltzmann Machines. Neural Comput.
**2006**, 18, 2283–2292. [Google Scholar] [CrossRef] [PubMed] - Lindsay, B.G. Composite Likelihood Methods. Contemporary Math.
**1988**, 80, 221–239. [Google Scholar] - Jensen, J.L.; Møller, J. Pseudolikelihood for Exponential Family Models of Spatial Point Processes. Ann. Appl. Probab.
**1991**, 1, 445–461. [Google Scholar] [CrossRef] - Suzuki, M. Generalized Exact Formula for the Correlations of the Ising Model and Other Classical Systems. Phys. Lett.
**1965**, 19, 267–268. [Google Scholar] [CrossRef] - Tieleman, T. Training Restricted Boltzmann Machines Using Approximations to the Likelihood Gradient. In Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, 5–9 July 2008. [Google Scholar]
- Salakhutdinov, R.; Hinton, G.E. Deep Boltzmann Machines. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS 2009), Clearwater Beach, FL, USA, 16–18 April 2009; pp. 448–455. [Google Scholar]
- Wang, H.; Zhang, Y.; Waytowich, N.R.; Krusienski, D.J.; Zhou, G.; Jin, J.; Wang, X.; Cichocki, A. Discriminative Feature Extraction via Multivariate Linear Regression for SSVEP-Based BCI. IEEE Trans. Neural Syst. Rehabilitat. Eng.
**2016**, 24, 532–541. [Google Scholar] [CrossRef] [PubMed] - Zhang, Y.; Zhou, G.; Jin, J.; Zhao, Q.; Wang, X.; Cichocki, A. Sparse Bayesian Classification of EEG for Brain–Computer Interface. IEEE Trans. Neural Netw. Learn. Syst.
**2016**, 27, 2256–2267. [Google Scholar] [CrossRef] [PubMed] - Zhang, Y.; Wang, Y.; Zhou, G.; Jin, J.; Wang, B.; Wang, X.; Cichocki, A. Multi-kernel extreme learning machine for EEG classification in brain-computer interfaces. Expert Syst. Appl.
**2018**, 96, 302–310. [Google Scholar] [CrossRef] - Jiao, Y.; Zhang, Y.; Wang, Y.; Wang, B.; Jin, J.; Wang, X. A Novel Multilayer Correlation Maximization Model for Improving CCA-Based Frequency Recognition in SSVEP Brain–Computer Interface. Int. J. Neural Syst.
**2018**, 28, 1750039. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Example of the neighboring regions: (

**a**) when $C=\left\{13\right\}$, ${N}_{1}(C)=\{8,12,14,18\}$, ${N}_{2}(C)=\{3,7,9,11,15,17,19,23\}$, and ${R}_{2}(C)={N}_{1}(C)\cup {N}_{2}(C)$, and (

**b**) when $C=\{12,13\}$ and ${N}_{1}(C)=\{7,8,11,14,17,18\}$.

**Figure 2.**The mean absolute errors (MAEs) for various N: (

**a**) the case without the model error and (

**b**) the case with the model error. Each plot shows the average over 200 trials. MPLE, maximum pseudo-likelihood estimation; 1-SMCI, first-order spatial Monte Carlo integration method.

**Figure 3.**Mean absolute errors (MAEs) versus the number of updates of the gradient ascent method: (

**a**) $N=200$ and (

**b**) $N=2000$. Each plot shows the average over 200 trials. RM, ratio matching.

**Table 1.**Real computational times of the four learning methods. The setting of the experiment is the same as that of Figure 3b.

MPLE | RM | MPF | 1-SMCI | |
---|---|---|---|---|

time (s) | 0.08 | 0.1 | 0.04 | 0.26 |

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).