# Average Contrastive Divergence for Training Restricted Boltzmann Machines

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Contrastive Divergence Algorithm

#### 2.1. Contrastive Divergence Algorithm

**Theorem 1.**

**Proof.**

#### 2.2. Contrastive Divergence Algorithm for RBMs

**Theorem 2.**

**Theorem 3.**

**Proof.**

**Corollary 1.**

## 3. Average Contrastive Divergence Algorithm

Algorithm 1 ACD-k-l |

input: RBM $({X}_{1},\cdots ,{X}_{m},{H}_{1},\cdots ,{H}_{n})$, training batch S. |

output: gradient approximation $\u25b5{w}_{ij},\u25b5{b}_{j}$ and $\u25b5{c}_{i}$ for $i=1,\cdots ,n,j=1,\cdots ,m$ |

Initialize $\u25b5{w}_{ij}=\u25b5{b}_{j}=\u25b5{c}_{i}=0$ for $i=1,\cdots ,n,j=1,\cdots ,m$ |

for all the $x\in S$ do |

for $r=1,\cdots ,l$ do |

${x}^{(0)}\leftarrow x$ |

for $t=0,\cdots ,k-1$ do |

for $i=1,\cdots ,n$ do |

Sample ${h}_{i}^{(t,r)}\sim p({h}_{i}|{v}^{(t,r)})$ |

end for |

for $j=1,\cdots ,m$ do |

Sample ${v}_{j}^{(t+1,r)}\sim p({v}_{j}|{h}^{(t,r)})$ |

end for |

end for |

for $i=1,\cdots ,n,j=1,\cdots ,m$ do |

$\u25b5{w}_{ij}\leftarrow \u25b5{w}_{ij}+p({H}_{i}=1|{v}_{j}^{(0)}){v}_{j}^{(0)}-\frac{1}{l}{\sum}_{r=1}^{l}p({H}_{i}=1|{v}_{j}^{(k,r)}){v}_{j}^{(k,r)}$ |

end for |

for $j=1,\cdots ,m$ do |

$\u25b5{b}_{j}\leftarrow \u25b5{w}_{j}+{v}_{j}^{(0)}-\frac{1}{l}{\sum}_{r=1}^{l}{v}_{j}^{(k,r)}$ |

end for |

for $i=1,\cdots ,n$ do |

$\u25b5{c}_{i}\leftarrow \u25b5{c}_{i}+p({H}_{i}=1|{v}_{j}^{(0)})-\frac{1}{l}{\sum}_{r=1}^{l}p({H}_{i}=1|{v}_{j}^{(k,r)})$ |

end for |

end for |

end for |

**Theorem 4.**

**Proof.**

**Theorem 5.**

**Proof.**

## 4. Experiments

#### 4.1. The Artificial Data

#### 4.2. The MNIST Task

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Hinton, G.E. Training products of experts by minimizing Contrastive Divergence. Neural Comput.
**2002**, 14, 1771–1800. [Google Scholar] [CrossRef] [PubMed] - Hinton, G.E. Learning multiple layers of representation. Trends Cognit. Sci.
**2007**, 11, 428–434. [Google Scholar] [CrossRef] [PubMed] - Hinton, G.E.; Osindero, S.; Teh, Y.W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput.
**2006**, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed] - Feng, F.; Li, R.; Wang, X. Deep correspondence restricted Boltzmann machine for cross-modalretrieval. Neurocomputing
**2015**, 154, 50–60. [Google Scholar] [CrossRef] - Bengio, Y.; Lamblin, P.; Popovici, D.; Larochelle, H. Greedy layewise training of deep networks. In Advances in Neural Information Processing (NIPS19); Schoőlkopf, B., Platt, J., Hoffman, T., Eds.; MIT Press: Cambridge, MA, USA, 2007; pp. 153–160. [Google Scholar]
- Bengio, Y.; Delalleau, O. Justifying and generalizing contrastive divergence. Neural Comput.
**2009**, 21, 1601–1621. [Google Scholar] [CrossRef] [PubMed] - Fischer, A.; Igel, C. An Introduction to Restricted Boltzmann Machines; CIARP2012; Springer: Berlin, Germany, 2012; pp. 14–36. [Google Scholar]
- Akaho, S.; Takabatake, K. Information Geometry of Contrastive Divergence; ITSL2008; CSREA Press: Las Vegas, NV, USA, 2008; pp. 3–9. [Google Scholar]
- Sutskever, I.; Tieleman, T. On the convergence properties of Contrastive Divergence. J. Mach. Learn. Res. Proc. Track
**2010**, 9, 789–795. [Google Scholar] - Yuille, A. The convergence of contrastive divergence. In Advances in Neural Processing Systems, NIPS17; Saul, L., Weiss, Y., Bottou, L., Eds.; MIT Press: Cambridge, MA, USA, 2005; pp. 1593–1600. [Google Scholar]
- Carreira-Perpinán, M.Á.; Hinton, G.E. On contrastive divergence learning. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AISTATS), The Society for Artificial Intelligence and Statistics, The Savannah Hotel, Barbados, 6–8 January 2005; pp. 59–66.
- Ma, X.; Wang, X. Convergence analysis of contrastive divergence algorithm based on gradient method with errors. Math. Probl. Eng.
**2015**, 2015, 350102. [Google Scholar] [CrossRef] - Fischer, A.; Igel, C. Bounding the bias of contrastive divergence learning. Neural Comput.
**2011**, 23, 664–673. [Google Scholar] [CrossRef] [PubMed] - Tieleman, T. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the International Conference on Machine learning (ICML), Helsinki, Finland, 5–9 July 2008; Cohen, W.W., McCallum, A., Roweis, S.T., Eds.; ACM: New York, NY, USA, 2008; pp. 1064–1071. [Google Scholar]
- Tieleman, T.; Hinton, G.E. Using fast weights to improve persistent contrastive divergence. In Proceedings of the International Conference on Machine Learning (ICML), Montreal, QC, Canada, 14–18 June 2009; Pohoreckyj Danyluk, A., Bottou, L., Littman, M.L., Eds.; ACM: New York, NY, USA, 2009; pp. 1033–1040. [Google Scholar]
- Cho, K.; Raiko, T.; Ilin, A. Parallel tempering is efficient for learning restricted Boltzmann machines. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; IEEE Press: âĂŐPiscataway, NJ, USA, 2010; pp. 3246–3253. [Google Scholar]
- Cho, K.; Raiko, T.; Ilin, A. Enhanced Gradient for Training Restricted Boltzmann Machines. Neural Comput.
**2013**, 25, 805–831. [Google Scholar] [CrossRef] [PubMed] - Desjardins, U.; Courville, A.; Bengio, Y.; Vincent, P.; Delalleau, O. Tempered Markov Chain Monte Carlo for Training of Restricted Boltzmann Machines; AISTATS: Sardinia, Italy, 2010; pp. 145–152. [Google Scholar]
- Younes, L. Parametric inference for imperfectly observed gibbsian fields. Probab.Theory Relat. Fields
**1989**, 82, 625–645. [Google Scholar] [CrossRef] - LeCun, Y.; Cortes, C.; Burges, C. The MNIST database of handwritten digits. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 2 November 2013).
- Xu, J.; Li, H.; Zhou, S. Improving mixing rate with tempered transition for learning restricted Boltzmann machines. Neurcomputing
**2014**, 139, 328–335. [Google Scholar] [CrossRef] - Fischer, A.; Igel, C. Training restricted Boltzmann machines: An introduction. Pattern Recognit.
**2014**, 47, 25–39. [Google Scholar] [CrossRef] - Salakhutdinov, R.; Murray, I. On the quantitative analysis of deep belief networks. In Proceedings of the International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 872–879.
- Salakhutdinov, R.; Larochelle, H. Efficient Learning of Deep Boltzmann Machines; AISTATS: Sardinia, Italy, 2010; pp. 693–670. [Google Scholar]
- Salakhutdinov, R. Learning Deep Boltzmann Machines Using Adaptive MCMC; ICML2010, Omnipress: Madison, WI, USA, 2010; pp. 943–950. [Google Scholar]
- Salakhutdinov, R.; Hinton, G.E. An efficient learning procedure for deep Boltzmann machines. Neural Comput.
**2012**, 24, 1967–2006. [Google Scholar] [CrossRef] [PubMed]

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ma, X.; Wang, X.
Average Contrastive Divergence for Training Restricted Boltzmann Machines. *Entropy* **2016**, *18*, 35.
https://doi.org/10.3390/e18010035

**AMA Style**

Ma X, Wang X.
Average Contrastive Divergence for Training Restricted Boltzmann Machines. *Entropy*. 2016; 18(1):35.
https://doi.org/10.3390/e18010035

**Chicago/Turabian Style**

Ma, Xuesi, and Xiaojie Wang.
2016. "Average Contrastive Divergence for Training Restricted Boltzmann Machines" *Entropy* 18, no. 1: 35.
https://doi.org/10.3390/e18010035