# Average Contrastive Divergence for Training Restricted Boltzmann Machines

## Abstract

## 1. Introduction

## 2. Contrastive Divergence Algorithm

#### 2.1. Contrastive Divergence Algorithm

**Theorem 1.**

**Proof.**

#### 2.2. Contrastive Divergence Algorithm for RBMs

**Theorem 2.**

**Theorem 3.**

**Proof.**

**Corollary 1.**

## 3. Average Contrastive Divergence Algorithm

Algorithm 1 ACD-k-l |

input: RBM $({X}_{1},\cdots ,{X}_{m},{H}_{1},\cdots ,{H}_{n})$, training batch S. |

output: gradient approximation $\u25b5{w}_{ij},\u25b5{b}_{j}$ and $\u25b5{c}_{i}$ for $i=1,\cdots ,n,j=1,\cdots ,m$ |

Initialize $\u25b5{w}_{ij}=\u25b5{b}_{j}=\u25b5{c}_{i}=0$ for $i=1,\cdots ,n,j=1,\cdots ,m$ |

for all the $x\in S$ do |

for $r=1,\cdots ,l$ do |

${x}^{(0)}\leftarrow x$ |

for $t=0,\cdots ,k-1$ do |

for $i=1,\cdots ,n$ do |

Sample ${h}_{i}^{(t,r)}\sim p({h}_{i}|{v}^{(t,r)})$ |

end for |

for $j=1,\cdots ,m$ do |

Sample ${v}_{j}^{(t+1,r)}\sim p({v}_{j}|{h}^{(t,r)})$ |

end for |

end for |

for $i=1,\cdots ,n,j=1,\cdots ,m$ do |

$\u25b5{w}_{ij}\leftarrow \u25b5{w}_{ij}+p({H}_{i}=1|{v}_{j}^{(0)}){v}_{j}^{(0)}-\frac{1}{l}{\sum}_{r=1}^{l}p({H}_{i}=1|{v}_{j}^{(k,r)}){v}_{j}^{(k,r)}$ |

end for |

for $j=1,\cdots ,m$ do |

$\u25b5{b}_{j}\leftarrow \u25b5{w}_{j}+{v}_{j}^{(0)}-\frac{1}{l}{\sum}_{r=1}^{l}{v}_{j}^{(k,r)}$ |

end for |

for $i=1,\cdots ,n$ do |

$\u25b5{c}_{i}\leftarrow \u25b5{c}_{i}+p({H}_{i}=1|{v}_{j}^{(0)})-\frac{1}{l}{\sum}_{r=1}^{l}p({H}_{i}=1|{v}_{j}^{(k,r)})$ |

end for |

end for |

end for |

**Theorem 4.**

**Proof.**

**Theorem 5.**

**Proof.**

## 4. Experiments

#### 4.1. The Artificial Data

#### 4.2. The MNIST Task

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

