# A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Continuous-Time SGD and the Diffusion Matrix

## 3. Diffusion Metrics and General Relativity

_{ij}are the coefficients of D, and δ

_{wz}is the Kronecker delta.

_{D}f so that Equation (11) becomes

## 4. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Riemannian Geometry

- ${\nabla}_{fX}Y=f{\nabla}_{X}Y$ for all functions f on M;
- ${\nabla}_{X}\left(fY\right)=df\left(X\right)Y+f{\nabla}_{X}Y$.

Architecture | $\mathit{d}=|\mathbf{Weights}|$ | $\mathit{N}=|\mathbf{Data}|$, CIFAR | $\mathit{N}=|\mathbf{Data}|$, SVHN |
---|---|---|---|

ResNet | 1.7 M | 60 K | 600 K |

Wide ResNet | 11 M | 60 K | 600 K |

DenseNet (k = 12) | 1 M | 60 K | 600 K |

DenseNet (k = 24) | 27.2 M | 60 K | 600 K |

