# On the Convergence of Stochastic Process Convergence Proofs

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Main Result. Director Process and the Expected Direction Set

#### 2.1. Locally Bounded Stochastic Processes and Objective of the Work

**Definition**

**1.**

**Example**

**1.**

**Example**

**2.**

**Definition**

**2.**

**locally bounded by**$\varphi $if there is a decomposition of 1-increments $(X,\gamma )$ with γ holding the standard constraint and X locally and linearly bounded by ϕ.

**Example**

**3.**

**Example**

**4.**

**Example**

**5.**

**Bottou resemblance**and

**C.3**, respectively, are not satisfied. That is, because ${Z}_{t}{\xb7}^{\u22ba}{G}_{1}\xb7{G}_{2}\xb7{Z}_{t}$ is possibly negative a.s.

#### 2.2. Main Result

**Theorem**

**1.**

- Z is locally bounded by ϕ;
- Z resembles $\nabla \varphi $.

#### 2.3. Expected Direction Set

**Definition**

**3.**

**Example**

**6.**

#### 2.4. Essential Expected Direction Set

**Definition**

**4.**

**Example**

**7.**

**Corollary**

**1.**

**Proposition**

**1.**

**Proof.**

**Corollary**

**2.**

## 3. Vector Field Half-Spaces and Stochastic Processes. Resemblance

**Definition**

**5.**

**Proposition**

**2.**

**Definition**

**6.**

**Proposition**

**3.**

**Proposition**

**4.**

**Proof.**

#### 3.1. The Half-Space of a Vector Field

**Definition**

**7.**

**Definition**

**8.**

#### 3.2. Resemblance between a Stochastic Process and a Vector Field

**Definition**

**9.**

## 4. Proof of Main Result. Reinterpretation of Convergence Theorems

#### 4.1. Resemblance to Conservative Vector Fields and Convergence

**Proof of main Theorem**

**1.**

**Hessian norm bound**.

**Hessian norm bound**;

**learning rate constraints**. Apply it and deduce that random variables $\psi \left({Z}_{t}\right)$ converge almost surely to a random variable (and so does $\varphi \left({Z}_{t}\right)$) and that;

#### 4.2. Reinterpretation of Bottou’s Convergence Theorem

**Corollary**

**3.**

- Z is locally bounded by ϕ;
- Z resembles $\nabla \varphi $.

**Proposition**

**5.**

**Bottou resemblance**of Theorem A1 and Proposition 5. Deduce from it, that the algorithm Z of the theorem resembles to vector field $\nabla \varphi $.

**Corollary**

**4.**

**Bottou resemblance**holds.

#### 4.3. Reinterpretation of Sunehag’s Convergence Theorem

**Corollary**

**5.**

- Z is locally bounded by l;
- Z resembles $\nabla l$.

**Proposition**

**6.**

**C.1**and

**Sunehag resemblance**to finish our objective with the following corollary

**Corollary**

**6.**

**C.1**and

**Sunehag resemblance**hold.

#### 4.4. Convergence of Process in Example 5

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Appendix A. Convergence Theorems

**Theorem**

**A1**

**.**Let $l:{\mathbb{R}}^{k}\to \mathbb{R}$ be a function with a unique minimum $\overline{\eta}$ and ${Z}_{t+1}={Z}_{t}-\gamma \left(t\right){X}_{t}$ be a stochastic process. Then Z converges to $\overline{\eta}$ almost surely if the following conditions hold;

**Theorem**

**A2**

**.**Let $l:{\mathbb{R}}^{k}\to \mathbb{R}$ be a twice differentiable cost function with a unique minimum $\overline{\eta}$ and let ${Z}_{t+1}={Z}_{t}-{\gamma}_{t}{B}_{t}Y\left({Z}_{t}\right)$ be a stochastic process where ${B}_{t}$ is symmetric and only depends on information available at time t and. Then Z converges to the $\overline{\eta}$ almost surely if the following conditions hold;

**C.5**and modified (and relaxed) conditions

**C.3**and

**C.4**of the original statement. The proof is trivial after the original theorem’s proof, so the modifications present no complications.

**Theorem**

**A3**

**.**Let $l:{\mathbb{R}}^{k}\to \mathbb{R}$ be a twice differentiable cost function with a unique minimum $\overline{\eta}$ and let ${Z}_{t+1}={Z}_{t}-{\gamma}_{t}{B}_{t}{Y}_{t}$ a stochastic process where ${B}_{t}$ is ${\mathcal{F}}_{t}$-measurable. Then Z converges to the $\overline{\eta}$ almost surely if the following conditions hold;

**Theorem**

**A4**

**.**Let $(\Omega ,\mathcal{F},P)$ be a probability space and ${\mathcal{F}}_{1}\subseteq {\mathcal{F}}_{2}\subseteq \cdots $ a sequence of sub-σ-fields of $\mathcal{F}$. Let ${U}_{t},{\beta}_{t},{\u03f5}_{t}$ and ${\zeta}_{t}$, $t=1,2,\cdots $ be non-negative ${\mathcal{F}}_{t}$-measurable random variables, such that

## Appendix B. Proof of Corollary 1

**Proposition**

**A1.**

**Proof.**

## Appendix C. Bottou’s Resemblance

**Proposition**

**A2.**

**Proof.**

## Appendix D. Sunehag’s Resemblance

**Proof.**

## References

- Amari, S.I. Natural Gradient Works Efficiently in Learning. Neural Comput.
**1998**, 276, 251–276. [Google Scholar] [CrossRef] - Thomas, P.S. GeNGA: A generalization of natural gradient ascent with positive and negative convergence results. In Proceedings of the 31st International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014; Volume 5, pp. 3533–3541. [Google Scholar]
- Sánchez-López, B.; Cerquides, J. Convergent Stochastic Almost Natural Gradient Descent. In Proceedings of the Artificial Intelligence Research and Development-Proceedings of the 22nd International Conference of the Catalan Association for Artificial Intelligence, Mallorca, Spain, 23–25 October 2019; Volume 319, pp. 54–63. [Google Scholar]
- Bottou, L. Online Algorithms and Stochastic Approximations. In Online Learning and Neural Networks; Saad, D., Ed.; Cambridge University Press: Cambridge, UK, 1998; Revised, October 2012. [Google Scholar]
- Sunehag, P.; Trumpf, J.; Vishwanathan, S.V.N.; Schraudolph, N. Variable Metric Stochastic Approximation Theory. In Proceedings of the Artificial Intelligence and Statistics, Clearwater, FL, USA, 16–19 April 2009; pp. 560–566. [Google Scholar]
- Lyapunov, A.M. The general problem of the stability of motion. Int. J. Control
**1992**, 55, 531–534. [Google Scholar] [CrossRef] - Robbins, H.; Siegmund, D. A convergence theorem for non negative almost supermartingales and some applications. In Optimizing Methods in Statistics; Rustagi, J.S., Ed.; Academic Press: Cambridge, MA, USA, 1971; pp. 233–257. [Google Scholar]
- Karlin, S.; Taylor, H.M. Elements of stochastic processes. In A First Course in Stochastic Processes, 2nd ed.; Karlin, S., Taylor, H.M., Eds.; Academic Press: Boston, MA, USA, 1975; Chapter 1; pp. 1–44. [Google Scholar] [CrossRef]
- Ross, S.M.; Kelly, J.J.; Sullivan, R.J.; Perry, W.J.; Mercer, D.; Davis, R.M.; Washburn, T.D.; Sager, E.V.; Boyce, J.B.; Bristow, V.L. Stochastic Processes; Wiley: New York, NY, USA, 1996; Volume 2. [Google Scholar]
- Bass, R.F. Stochastic Processes; Cambridge University Press: Cambridge, UK, 2011; Volume 33. [Google Scholar]
- Grimmett, G.; Stirzaker, D. Probability and Random Processes; OUP Oxford: Oxford, UK, 2020. [Google Scholar]
- Sánchez-López, B.; Cerquides, J. Dual Stochastic Natural Gradient Descent and convergence of interior half-space gradient approximations. arXiv
**2021**, arXiv:2001.06744. [Google Scholar] - Tao, T. An Introduction to Measure Theory; Graduate Studies in Mathematics, American Mathematical Society: Providence, RI, USA, 2011. [Google Scholar]
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res.
**2011**, 12, 2121–2159. [Google Scholar] - Zeiler, M.D. ADADELTA: An Adaptive Learning Rate Method. arXiv
**2012**, arXiv:1212.5701. [Google Scholar] - Kingma, D.P.; Ba, L.J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Robbins, H.; Monro, S. A Stochastic Approximation Method. Ann. Math. Stat.
**1951**, 22, 400–407. [Google Scholar] [CrossRef]

**Figure 4.**A stochastic process Z that $\u03f5$-resembles to $\mathbb{X}$ at $\eta $ from T on, since vector set $ED{S}_{Z}(\eta ,T)$ of all expected directions of Z at $\eta $ after time T belongs to ${H}_{\u03f5}\left(\mathbb{X}\right)\left(\eta \right)$.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sánchez-López, B.; Cerquides, J.
On the Convergence of Stochastic Process Convergence Proofs. *Mathematics* **2021**, *9*, 1470.
https://doi.org/10.3390/math9131470

**AMA Style**

Sánchez-López B, Cerquides J.
On the Convergence of Stochastic Process Convergence Proofs. *Mathematics*. 2021; 9(13):1470.
https://doi.org/10.3390/math9131470

**Chicago/Turabian Style**

Sánchez-López, Borja, and Jesus Cerquides.
2021. "On the Convergence of Stochastic Process Convergence Proofs" *Mathematics* 9, no. 13: 1470.
https://doi.org/10.3390/math9131470