A Randomized Distributed Kaczmarz Algorithm and Anomaly Detection
Abstract
:1. Introduction
1.1. Notation
1.2. The Distributed Kaczmarz Algorithm
1.3. Related Work
1.4. Main Contributions
Algorithm 1 Randomized Tree Kaczmarz (RTK) algorithm. |
|
Algorithm 2 Multiple Round Randomized Tree Kaczmarz (MRRTK) algorithm. |
|
Algorithm 3 Multiple Round Randomized Tree Kaczmarz (MRRTKUS) algorithm with Unique Selection. |
|
2. Randomization of the Distributed Kaczmarz Algorithm
2.1. Randomized Variants
2.1.1. Single Active Nodes
2.1.2. Multiple Active Nodes
2.2. Sampling Schemes
2.3. Accelerating the Convergence via Over-Relaxation
3. The RTK in the Presence of Noise
3.1. Convergence Rate in the Presence of Noise
3.2. Anomaly Detection in Distributed Systems of Noisy Equations
4. Numerical Experiments
4.1. The Test Equations
4.2. The Algorithms
- Standard Kaczmarz.
- Sequential block Kaczmarz, with several different numbers of blocks. The equations are divided into a small number of blocks, and the updates are performed as an orthogonal projection onto the solution space of each block, rather than each individual equation.
- Distributed Kaczmarz based on a binary tree as in [5].
- Distributed block Kaczmarz. This is distributed Kaczmarz based on a tree of depth 2 with a small number of leaves, where each leaf contains a block of equations.
- Random standard Kaczmarz; one equation at a time is randomly chosen.
- Random block Kaczmarz; one block at a time is randomly chosen. There is no difference between sequential and parallel random block Kaczmarz.
- Random distributed Kaczmarz based on a binary tree, for several kinds of random choices:
- -
- Generations, that is, we use all nodes at a randomly chosen distance from the root.
- -
- Families, that is, using the children of a randomly chosen node; for a binary tree, these are pairs.
- -
- Subtrees, that is, using the subtree rooted at a randomly chosen node. This is not an incomparable choice, so it is not covered by our theory.
4.3. Numerical Results
- Sequential block Kaczmarz and random block Kaczmarz for 255 blocks are identical to their standard Kaczmarz counterparts and are not shown.
- Sequential methods, including standard Kacmarz and sequential block Kaczmarz, converge faster than parallel methods, such as binary trees or distributed block Kaczmarz. This is not surprising: in sequential methods, each step uses the results of the preceding step; in parallel methods, each step uses older data.
- The same reasoning explains why the Family selection is faster than Generations. Consider level 3, as an example. With Generations, we do 8 equations in parallel. With Families, we do four sets of 2 equations each, but each pair uses the result from the previous step.
- The block algorithm for a single block with converges in a single step, so the convergence factor is 0. At the other end of the spectrum, with 255 blocks of one equation each, the block algorithm becomes standard Kaczmarz. As the number of blocks increases, the convergence factor is observed to increase and approach the standard Kaczmarz value.
- Standard Kaczmarz, deterministic or random, converges precisely for . By the results in [4], this is also true for sequential block Kaczmarz.Distributed Kaczmarz methods are guaranteed to converge for the same range of , but in practice they often converge for larger as well, sometimes up to near 4. Random distributed methods appear to have similar behavior.
- The observed convergence factors for random algorithms are comparable to those for their deterministic counterparts but slightly worse. We attribute this to the fact that in the underdetermined case, all equations are important; random algorithms on an matrix do not usually include all equations in a set of m updates, while deterministic algorithms do.As pointed out in [8], there are types of equations where random algorithms are significantly faster than deterministic algorithms, but our sample equations are obviously not in that category.
- All algorithms converge faster in the overdetermined consistent case than in the underdetermined case. That is not surprising: since we have four times more equations than we actually need; one complete run through all equations is comparable to four complete run-throughs for the underdetermined case.
- For the same reason, the parallel block algorithm with four blocks (or fewer) for converges in a single step.
- We observe that random algorithms are still slower in the overdetermined case, even though the argument from the underdetermined case does not apply here.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kaczmarz, S. Angenäherte Auflösung von Systemen linearer Gleichungen. Bull. Int. Acad. Pol. Sci. Lett. Cl. Sci. Math. Nat. Ser. A Sci. Math. 1937, 35, 355–357. [Google Scholar]
- Tanabe, K. Projection Method for Solving a Singular System of Linear Equations and its Application. Numer. Math. 1971, 17, 203–214. [Google Scholar] [CrossRef]
- Eggermont, P.P.B.; Herman, G.T.; Lent, A. Iterative Algorithms for Large Partitioned Linear Systems, with Applications to Image Reconstruction. Linear Alg. Appl. 1981, 40, 37–67. [Google Scholar] [CrossRef] [Green Version]
- Natterer, F. The Mathematics of Computerized Tomography; Teubner: Stuttgart, Germany, 1986. [Google Scholar]
- Hegde, C.; Keinert, F.; Weber, E.S. A Kaczmarz Algorithm for Solving Tree Based Distributed Systems of Equations. In Excursions in Harmonic Analysis; Balan, R., Benedetto, J.J., Czaja, W., Dellatorre, M., Okoudjou, K.A., Eds.; Applied and Numerical Harmonic Analysis; Birkhäuser/Springer: Cham, Switzerland, 2021; Volume 6, pp. 385–411. [Google Scholar] [CrossRef]
- West, D.B. Introduction to Graph Theory; Prentice Hall, Inc.: Upper Saddle River, NJ, USA, 1996; p. xvi+512. [Google Scholar]
- Hamaker, C.; Solmon, D.C. The angles between the null spaces of X rays. J. Math. Anal. Appl. 1978, 62, 1–23. [Google Scholar] [CrossRef] [Green Version]
- Strohmer, T.; Vershynin, R. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 2009, 15, 262–278. [Google Scholar] [CrossRef] [Green Version]
- Zouzias, A.; Freris, N.M. Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 2013, 34, 773–793. [Google Scholar] [CrossRef]
- Needell, D.; Zhao, R.; Zouzias, A. Randomized block Kaczmarz method with projection for solving least squares. Linear Algebra Appl. 2015, 484, 322–343. [Google Scholar] [CrossRef]
- Needell, D.; Srebro, N.; Ward, R. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Progr. 2016, 155, 549–573. [Google Scholar] [CrossRef] [Green Version]
- Cimmino, G. Calcolo approssimato per soluzioni dei sistemi di equazioni lineari. In La Ricerca Scientifica XVI, Series II, Anno IX 1; Consiglio Nazionale delle Ricerche: Rome, Italy, 1938; pp. 326–333. [Google Scholar]
- Censor, Y.; Gordon, D.; Gordon, R. Component averaging: An efficient iterative parallel algorithm for large and sparse unstructured problems. Parallel Comput. 2001, 27, 777–808. [Google Scholar] [CrossRef]
- Necoara, I. Faster randomized block Kaczmarz algorithms. SIAM J. Matrix Anal. Appl. 2019, 40, 1425–1452. [Google Scholar] [CrossRef]
- Moorman, J.D.; Tu, T.K.; Molitor, D.; Needell, D. Randomized Kaczmarz with averaging. BIT Numer. Math. 2021, 61, 337–359. [Google Scholar] [CrossRef]
- Tsitsiklis, J.; Bertsekas, D.; Athans, M. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 1986, 31, 803–812. [Google Scholar] [CrossRef] [Green Version]
- Xiao, L.; Boyd, S.; Kim, S.J. Distributed average consensus with least-mean-square deviation. J. Parallel Distrib. Comput. 2007, 67, 33–46. [Google Scholar] [CrossRef] [Green Version]
- Shah, D. Gossip Algorithms. Found. Trends Netw. 2008, 3, 1–125. [Google Scholar] [CrossRef]
- Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
- Nedic, A.; Ozdaglar, A. Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 2009, 54, 48. [Google Scholar] [CrossRef]
- Johansson, B.; Rabi, M.; Johansson, M. A randomized incremental subgradient method for distributed optimization in networked systems. SIAM J. Optim. 2009, 20, 1157–1170. [Google Scholar] [CrossRef]
- Yuan, K.; Ling, Q.; Yin, W. On the convergence of decentralized gradient descent. SIAM J. Optim. 2016, 26, 1835–1854. [Google Scholar] [CrossRef] [Green Version]
- Sayed, A.H. Adaptation, learning, and optimization over networks. Found. Trends Mach. Learn. 2014, 7, 311–801. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Liu, J.; Zhu, Z.; Bentley, E.S. Compressed Distributed Gradient Descent: Communication-Efficient Consensus over Networks. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; pp. 2431–2439. [Google Scholar] [CrossRef] [Green Version]
- Scaman, K.; Bach, F.; Bubeck, S.; Massoulié, L.; Lee, Y.T. Optimal algorithms for non-smooth distributed optimization in networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 2740–2749. [Google Scholar]
- Loizou, N.; Richtárik, P. Revisiting Randomized Gossip Algorithms: General Framework, Convergence Rates and Novel Block and Accelerated Protocols. arXiv 2019, arXiv:1905.08645. [Google Scholar] [CrossRef]
- Necoara, I.; Nesterov, Y.; Glineur, F. Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl. 2017, 173, 227–254. [Google Scholar] [CrossRef] [Green Version]
- Necoara, I.; Nesterov, Y.; Glineur, F. Linear convergence of first order methods for non-strongly convex optimization. Math. Progr. 2019, 175, 69–107. [Google Scholar] [CrossRef] [Green Version]
- Bertsekas, D.P.; Tsitsiklis, J.N. Parallel and Distributed Computation: Numerical Methods; Athena Scientific: Nashua, NH, USA, 1997; Available online: http://hdl.handle.net/1721.1/3719 (accessed on 1 December 2021).
- Kamath, G.; Ramanan, P.; Song, W.Z. Distributed Randomized Kaczmarz and Applications to Seismic Imaging in Sensor Network. In Proceedings of the 2015 International Conference on Distributed Computing in Sensor Systems, Fortaleza, Brazil, 10–12 June 2015; pp. 169–178. [Google Scholar] [CrossRef]
- Herman, G.T.; Hurwitz, H.; Lent, A.; Lung, H.P. On the Bayesian approach to image reconstruction. Inform. Control 1979, 42, 60–71. [Google Scholar] [CrossRef] [Green Version]
- Hansen, P.C. Discrete Inverse Problems: Insight and Algorithms; Fundamentals of Algorithms; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 2010; Volume 7, p. xii+213. [Google Scholar] [CrossRef]
- Liu, J.; Wright, S.J.; Sridhar, S. An asynchronous parallel randomized Kaczmarz algorithm. arXiv 2014, arXiv:1401.4780. [Google Scholar]
- Herman, G.T.; Lent, A.; Hurwitz, H. A storage-efficient algorithm for finding the regularized solution of a large, inconsistent system of equations. J. Inst. Math. Appl. 1980, 25, 361–366. [Google Scholar] [CrossRef]
- Chi, Y.; Lu, Y.M. Kaczmarz method for solving quadratic equations. IEEE Signal Process. Lett. 2016, 23, 1183–1187. [Google Scholar] [CrossRef]
- Crombez, G. Finding common fixed points of strict paracontractions by averaging strings of sequential iterations. J. Nonlinear Convex Anal. 2002, 3, 345–351. [Google Scholar]
- Crombez, G. Parallel algorithms for finding common fixed points of paracontractions. Numer. Funct. Anal. Optim. 2002, 23, 47–59. [Google Scholar] [CrossRef]
- Nikazad, T.; Abbasi, M.; Mirzapour, M. Convergence of string-averaging method for a class of operators. Optim. Methods Softw. 2016, 31, 1189–1208. [Google Scholar] [CrossRef]
- Reich, S.; Zalas, R. A modular string averaging procedure for solving the common fixed point problem for quasi-nonexpansive mappings in Hilbert space. Numer. Algorithms 2016, 72, 297–323. [Google Scholar] [CrossRef]
- Censor, Y.; Zaslavski, A.J. Convergence and perturbation resilience of dynamic string-averaging projection methods. Comput. Optim. Appl. 2013, 54, 65–76. [Google Scholar] [CrossRef] [Green Version]
- Zaslavski, A.J. Dynamic string-averaging projection methods for convex feasibility problems in the presence of computational errors. J. Nonlinear Convex Anal. 2014, 15, 623–636. [Google Scholar]
- Witt, M.; Schultze, B.; Schulte, R.; Schubert, K.; Gomez, E. A proton simulator for testing implementations of proton CT reconstruction algorithms on GPGPU clusters. In Proceedings of the 2012 IEEE Nuclear Science Symposium and Medical Imaging Conference Record (NSS/MIC), Anaheim, CA, USA, 27 October–3 November 2012; pp. 4329–4334. [Google Scholar] [CrossRef]
- Censor, Y.; Nisenbaum, A. String-averaging methods for best approximation to common fixed point sets of operators: The finite and infinite cases. Fixed Point Theory Algorithms Sci. Eng. 2021, 21, 9. [Google Scholar] [CrossRef]
- Censor, Y.; Tom, E. Convergence of string-averaging projection schemes for inconsistent convex feasibility problems. Optim. Methods Softw. 2003, 18, 543–554. [Google Scholar] [CrossRef]
- Haddock, J.; Needell, D. Randomized projections for corrupted linear systems. In Proceedings of the AIP Conference Proceedings, Thessaloniki, Greece, 25–30 September 2017; Volume 1978, p. 470071. [Google Scholar]
- Borgard, R.; Harding, S.N.; Duba, H.; Makdad, C.; Mayfield, J.; Tuggle, R.; Weber, E.S. Accelerating the distributed Kaczmarz algorithm by strong over-relaxation. Linear Algebra Appl. 2021, 611, 334–355. [Google Scholar] [CrossRef]
- Needell, D. Randomized Kaczmarz solver for noisy linear systems. BIT 2010, 50, 395–403. [Google Scholar] [CrossRef] [Green Version]
Relaxation Parameter | |||||||
---|---|---|---|---|---|---|---|
0.5 | 1 | 1.5 | 2 | 2.5 | 3 | 3.5 | |
deterministic | |||||||
standard | 0.8138 | 0.5428 | 0.5579 | ||||
sequential blocks | |||||||
4 blocks | 0.7963 | 0.4926 | 0.5227 | ||||
16 blocks | 0.8101 | 0.5362 | 0.5531 | ||||
64 blocks | 0.8135 | 0.5448 | 0.5628 | ||||
parallel blocks | |||||||
4 blocks | 0.9346 | 0.8947 | 0.8604 | 0.8276 | 0.7948 | 0.7616 | 0.7273 |
16 blocks | 0.9800 | 0.9636 | 0.9499 | 0.9379 | 0.9271 | 0.9173 | 0.9080 |
64 blocks | 0.9947 | 0.9897 | 0.9849 | 0.9804 | 0.9761 | 0.9720 | 0.9681 |
255 blocks | 0.9986 | 0.9973 | 0.9960 | 0.9947 | 0.9934 | 0.9922 | 0.9909 |
binary tree | 0.9941 | 0.9903 | 0.9870 | 0.9841 | |||
random | |||||||
standard | 0.8440 | 0.7472 | 0.7039 | ||||
blocks | |||||||
4 blocks | 0.8013 | 0.6736 | 0.6724 | ||||
16 blocks | 0.8146 | 0.7162 | 0.7001 | ||||
64 blocks | 0.8252 | 0.7393 | 0.7136 | ||||
binary tree | |||||||
family | 0.9055 | 0.8510 | 0.8133 | 0.7742 | 0.7692 | 0.7528 | 0.8178 |
(0.9099) | (0.8817) | (0.9099) | |||||
generation | 0.9940 | 0.9903 | 0.9874 | 0.9849 | |||
(0.9985) | (0.9980) | (0.9985) | |||||
subtree | 0.9352 | 0.9017 | 0.8617 | 0.8735 | 0.9078 |
Relaxation Parameter | |||||||
---|---|---|---|---|---|---|---|
0.5 | 1 | 1.5 | 2 | 2.5 | 3 | 3.5 | |
deterministic | |||||||
standard | 0.4923 | 0.2564 | 0.2424 | ||||
sequential blocks | |||||||
4 blocks | 0.0625 | 0.0000 | 0.0625 | ||||
16 blocks | 0.4311 | 0.1863 | 0.1894 | ||||
64 blocks | 0.4759 | 0.2131 | 0.2263 | ||||
255 blocks | 0.4930 | 0.2588 | 0.2409 | ||||
parallel blocks | |||||||
4 blocks | 0.5000 | 0.0000 | 0.5000 | ||||
16 blocks | 0.9125 | 0.8696 | 0.8303 | 0.7919 | 0.7550 | 0.7195 | 0.6795 |
64 blocks | 0.9707 | 0.9483 | 0.9313 | 0.9178 | 0.9063 | 0.8958 | 0.8857 |
256 blocks | 0.9920 | 0.9845 | 0.9775 | 0.9709 | 0.9647 | 0.9590 | 0.9536 |
1023 blocks | 0.9980 | 0.9960 | 0.9940 | 0.9920 | 0.9901 | 0.9882 | 0.9864 |
binary tree | 0.9889 | 0.9817 | 0.9758 | 0.9704 | |||
random | |||||||
standard | 0.5481 | 0.3421 | 0.2748 | ||||
blocks | |||||||
4 blocks | 0.0625 | 0.0000 | 0.0625 | ||||
16 blocks | 0.4451 | 0.2464 | 0.2291 | ||||
64 blocks | 0.4690 | 0.2847 | 0.2579 | ||||
256 blocks | 0.4812 | 0.2874 | 0.2675 | ||||
binary tree | |||||||
family | 0.7040 | 0.5401 | 0.4225 | 0.3356 | 0.2615 | 0.2769 | 0.4462 |
(0.6879) | (0.6073) | (0.6879) | |||||
generation | 0.9889 | 0.9809 | 0.9768 | 0.9709 | |||
(0.9985) | (0.9980) | (0.9985) | |||||
subtree | 0.8349 | 0.7474 | 0.6486 | 0.6408 | 0.7103 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Keinert, F.; Weber, E.S. A Randomized Distributed Kaczmarz Algorithm and Anomaly Detection. Axioms 2022, 11, 106. https://doi.org/10.3390/axioms11030106
Keinert F, Weber ES. A Randomized Distributed Kaczmarz Algorithm and Anomaly Detection. Axioms. 2022; 11(3):106. https://doi.org/10.3390/axioms11030106
Chicago/Turabian StyleKeinert, Fritz, and Eric S. Weber. 2022. "A Randomized Distributed Kaczmarz Algorithm and Anomaly Detection" Axioms 11, no. 3: 106. https://doi.org/10.3390/axioms11030106
APA StyleKeinert, F., & Weber, E. S. (2022). A Randomized Distributed Kaczmarz Algorithm and Anomaly Detection. Axioms, 11(3), 106. https://doi.org/10.3390/axioms11030106