Block-Active ADMM to Minimize NMF with Bregman Divergences
Abstract
:1. Introduction
1.1. Overview of the Matrix Factorization Algorithms
1.2. Nonnegative Matrix Factorization
- V is the original input matrix (Linear combination of W and H);
- W is the feature matrix;
- H is the coefficient matrix;
- k is the low-rank approximation of V ().

1.2.1. Image Processing—Facial Feature Extraction
1.2.2. Contributions
- We present a coordinate descent approach coupled with an innovative strategy for selecting coordinates to address the ADMM subproblems.
- In contrast to the classic ADMM and multiplicative update methods, our proposed algorithm attains a notably reduced error level while showcasing enhanced convergence characteristics, marked by an enhanced stability, smoother trajectories, and expedited convergence.
- We establish the effectiveness of our approach through a rigorous theoretical analysis and substantiate our claims via an array of comprehensive experiments conducted on synthetic and real datasets in Section 6. These experiments collectively serve to underscore the superior performance and potential of our novel methodology.
1.2.3. Discussion
- Comparing the proposed method in Algorithm 3 with the classical ADMM, we use much fewer primal and dual decision variables. Specifically, the ADMM in Algorithm 1 introduces new primal variables and , and dual variables , , and , while the proposed method in Algorithm 3 introduces no primal variables, and only one dual variable . This helps with the efficiency of the algorithm.
- In Section 4, we introduce a new approach termed the “block active method” designed to tackle the problems formulated in (14). Our central result, as established in Theorem 4, rigorously demonstrates that under reasonable assumptions, our proposed method converges towards a stationary point denoted as in Equation (15) at a sublinear rate of convergence.To expound on this, we demonstrate that the error, as defined by on the left-hand side of the equation within Theorem 4, consistently diminishes. This reduction is characterized by the relation , indicative of the error’s gradual decline to zero with iteration count k approaching infinity. This type of convergence behavior is denoted as sublinear [22] due to its property of diminishing error reduction over iterations. This stands in contrast to the linear convergence typified by expressions such as for some constant , where the decline in error remains consistent.
- NMF finds applications in tasks such as face recognition, document clustering, audio signal processing, and recommendation systems. When employing NMF to address analogous optimization problems, there should not be any difference in the theoretical results.
- The image resolution may or may not affect the results. Given NMF is a nonconvex optimization problem, a global min cannot be guaranteed to be found in the general setup. The quality of the solution a method converges to depends on several factors, such as the initialization of W, H, and X, and the learning rate . Thus, improving the resolution of the data or quality of the data may or may not improve the result.
1.2.4. Paper Organization
2. NMF Problem and Previous Work
- (Euclidean distance): ;
- (Kullback–Leibler divergence): ;
- (Itakura–Saito divergence): .
3. Alternating Direction Method of Multipliers
| Algorithm 1: ADMM for NMF [31]. | 
|  | 
4. Block-Active Method
4.1. Block-Active Method
| Algorithm 2: Block-active method to minimize (14) | 
|  | 
4.2. Convergence Analysis of the BCD Method
- (1)
- is a stationary point of (14) if and only if for all .
- (2)
- If x is not a stationary point, then there exists such that
5. Block-Active ADMM
| Algorithm 3: Block active ADMM. | 
|  | 
6. Numerical Experiments
6.1. Synthetic Datasets
6.2. Real Datasets
- UMist (https://cs.nyu.edu/~roweis/data.html, accessed on 2 January 2022): This dataset is an image dataset containing 575 images of 20 people, which consist of images of individuals captured in various poses, ranging from profile to frontal views. All files in the dataset are in the PGM format, have a resolution of approximately pixels, and are 256-bit grayscale images.
- ORL (http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html, accessed on 2 January 2022): The dataset was generated by a 2D imaging sensor and includes 400 different images of each of 40 distinct individuals, where each image has pixels and a depth of 256 levels of gray per pixel. The photographs were taken on different occasions, with variations in lighting, facial expressions, and facial features.
- COIL (http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html, accessed on 2 January 2022): The dataset contains 7200 images in the form of pixels for 100 objects. The images were captured on a motorized turntable against a black background. The dataset was utilized in a real-time recognition system that employed a sensor to detect the objects and display their angular pose.
- YaleB (http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html, accessed on 10 February 2023): The dataset consists of image data generated by a 2D imaging sensor. It comprises 2414 images of size 192 × 168 pixels from 38 individuals. The images were taken under different lighting conditions and a variety of facial expressions.
- NIR (http://vcipl-okstate.org/pbvs/bench/Data/07/download.html, accessed on 10 February 2023): The dataset was created via a near-infrared (NIR) imaging sensor. It includes 3940 NIR face images of 197 persons. The images have a size of pixels, 8-bit, and are not compressed.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Proof of Theorem 1
- (i)
- ⟹ Since x is a stationary point, then . For each , we have and , so that for all .⟸ Suppose for all . Then,By the definition of , we have- (a)
- If , then , so that .
- (b)
- If , then and , so that .
 Therefore, we have for all and soOn the other hand, since H is a positive definite matrix, then the definition of d implies thatTherefore, we have . Moreover, from (a)–(b), we know for all . If for all , then so that x is a stationary point. If for and , then implies , so that x is a stationary point.
- (ii)
- Suppose x is not a stationary point. Consider two index setsThen, . Moreover, if , then for all . Since x is not a stationary point, then is nonempty.Let . If and , thenIf and , then defineNote that here, is either ∞ or a positive number. Then, we define the direction as follows:Therefore, for all , we haveMoreover, we haveso thatAs a result, is a feasible descent direction, so that for any , we have .
Appendix B. Proof of Theorem 2
References
- Maćkiewicz, A.; Ratajczak, W. Principal components analysis (PCA). Comput. Geosci. 1993, 19, 303–342. [Google Scholar] [CrossRef]
- Gottumukkal, R.; Asari, V.K. An improved face recognition technique based on modular PCA approach. Pattern Recognit. Lett. 2004, 25, 429–436. [Google Scholar] [CrossRef]
- Moon, H.; Phillips, P.J. Computational and performance aspects of PCA-based face-recognition algorithms. Perception 2001, 30, 303–321. [Google Scholar] [CrossRef] [PubMed]
- Perlibakas, V. Distance measures for PCA-based face recognition. Pattern Recognit. Lett. 2004, 25, 711–724. [Google Scholar] [CrossRef]
- Platt, J.C.; Toutanova, K.; Yih, W.T. Translingual document representations from discriminative projections. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, USA, 9–11 October 2010; pp. 251–261. [Google Scholar]
- Gomez, J.C.; Moens, M.F. PCA document reconstruction for email classification. Comput. Stat. Data Anal. 2012, 56, 741–751. [Google Scholar] [CrossRef]
- He, X.; Cai, D.; Liu, H.; Ma, W.Y. Locality preserving indexing for document representation. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, 25–29 July 2004; pp. 96–103. [Google Scholar]
- Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
- Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 1548–1560. [Google Scholar] [PubMed]
- Cai, D.; He, X.; Wang, X.; Bao, H.; Han, J. Locality preserving nonnegative matrix factorization. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, 11–17 July 2009. [Google Scholar]
- Wang, Y.; Jia, Y.; Hu, C.; Turk, M. Non-negative matrix factorization framework for face recognition. Int. J. Pattern Recognit. Artif. Intell. 2005, 19, 495–511. [Google Scholar] [CrossRef]
- Guillamet, D.; Vitria, J. Non-negative matrix factorization for face recognition. In Proceedings of the Topics in Artificial Intelligence: 5th Catalonian Conference on AI, CCIA 2002, Castellón, Spain, 24–25 October 2002; pp. 336–344. [Google Scholar]
- Rajapakse, M.; Wyse, L. NMF vs. ICA for face recognition. In Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis, 2003, ISPA 2003, Rome, Italy, 18–20 September 2003; Volume 2, pp. 605–610. [Google Scholar]
- Chen, W.S.; Pan, B.; Fang, B.; Li, M.; Tang, J. Incremental nonnegative matrix factorization for face recognition. Math. Probl. Eng. 2008, 2008, 410674. [Google Scholar] [CrossRef]
- Allab, K.; Labiod, L.; Nadif, M. A semi-NMF-PCA unified framework for data clustering. IEEE Trans. Knowl. Data Eng. 2016, 29, 2–16. [Google Scholar] [CrossRef]
- Gaussier, E.; Goutte, C. Relation between PLSA and NMF and implications. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 15–19 August 2005; pp. 601–602. [Google Scholar]
- Hassan, N.; Ramli, D.A. A comparative study of blind source separation for bioacoustics sounds based on FastICA, PCA and NMF. Procedia Comput. Sci. 2018, 126, 363–372. [Google Scholar] [CrossRef]
- Févotte, C.; Vincent, E.; Ozerov, A. Single-channel audio source separation with NMF: Divergences, constraints and algorithms. In Audio Source Separation; Springer: Berlin/Heidelberg, Germany, 2018; pp. 1–24. [Google Scholar]
- Javed, M.A.; Younis, M.S.; Latif, S.; Qadir, J.; Baig, A. Community detection in networks: A multidisciplinary review. J. Netw. Comput. Appl. 2018, 108, 87–111. [Google Scholar] [CrossRef]
- Gao, T.; Olofsson, S.; Lu, S. Minimum-volume-regularized weighted symmetric nonnegative matrix factorization for clustering. In Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Washington, DC, USA, 7–9 December 2016; pp. 247–251. [Google Scholar]
- Gillis, N. The why and how of nonnegative matrix factorization. In Regularization, Optimization, Kernels, and Support Vector Machines; Chapman & Hall: London, UK, 2014. [Google Scholar]
- Bertsekas, D.P. Nonlinear programming. J. Oper. Res. Soc. 1997, 48, 334. [Google Scholar] [CrossRef]
- Lee, D.D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Proceedings of the NIPS 2001 Conference (Advances in Neural Information Processing Systems 14), Vancouver, BC, Canada, 3–8 December 2001; pp. 556–562. [Google Scholar]
- Févotte, C.; Idier, J. Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput. 2011, 23, 2421–2456. [Google Scholar] [CrossRef]
- Sra, S.; Dhillon, I.S. Generalized nonnegative matrix approximations with Bregman divergences. In Proceedings of the NIPS 2005 Conference (Advances in Neural Information Processing Systems 18 (NIPS 2005), Vancouver, BC, Canada, 5–8 December 2005; pp. 283–290. [Google Scholar]
- Yang, Z.; Oja, E. Unified development of multiplicative algorithms for linear and quadratic nonnegative matrix factorization. IEEE Trans. Neural Netw. 2011, 22, 1878–1891. [Google Scholar] [CrossRef] [PubMed]
- Lin, C.J. Projected gradient methods for nonnegative matrix factorization. Neural Comput. 2007, 19, 2756–2779. [Google Scholar] [CrossRef] [PubMed]
- Cichocki, A.; Zdunek, R.; Amari, S.I. Hierarchical ALS algorithms for nonnegative matrix and 3D tensor factorization. In Proceedings of the International Conference on Independent Component Analysis and Signal Separation, London, UK, 9–12 September 2007; pp. 169–176. [Google Scholar]
- Cichocki, A.; Anh-Huy, P. Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2009, E92-A, 708–721. [Google Scholar]
- Hsieh, C.J.; Dhillon, I.S. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 1064–1072. [Google Scholar]
- Sun, D.L.; Fevotte, C. Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 6201–6205. [Google Scholar]
- Hong, M.; Luo, Z.Q.; Razaviyayn, M. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 2016, 26, 337–364. [Google Scholar] [CrossRef]
- Kim, J.; Park, H. Fast nonnegative matrix factorization: An active-set-like method and comparisons. SIAM J. Sci. Comput. 2011, 33, 3261–3281. [Google Scholar] [CrossRef]
- Boyd, S.; Parikh, N.; Chu, E. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers; Now Publishers Inc.: Norwell, MA, USA, 2011. [Google Scholar]
- Gao, T.; Chu, C. Did: Distributed incremental block coordinate descent for nonnegative matrix factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Gao, T.; Lu, S.; Liu, J.; Chu, C. On the Convergence of Randomized Bregman Coordinate Descent for Non-Lipschitz Composite Problems. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 5549–5553. [Google Scholar]
- Gao, T.; Lu, S.; Liu, J.; Chu, C. Randomized bregman coordinate descent methods for non-lipschitz optimization. arXiv 2020, arXiv:2001.05202. [Google Scholar]
- Lin, C.J. On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans. Neural Netw. 2007, 18, 1589–1596. [Google Scholar]







| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Tyagi, A. Block-Active ADMM to Minimize NMF with Bregman Divergences. Sensors 2023, 23, 7229. https://doi.org/10.3390/s23167229
Li X, Tyagi A. Block-Active ADMM to Minimize NMF with Bregman Divergences. Sensors. 2023; 23(16):7229. https://doi.org/10.3390/s23167229
Chicago/Turabian StyleLi, Xinyao, and Akhilesh Tyagi. 2023. "Block-Active ADMM to Minimize NMF with Bregman Divergences" Sensors 23, no. 16: 7229. https://doi.org/10.3390/s23167229
APA StyleLi, X., & Tyagi, A. (2023). Block-Active ADMM to Minimize NMF with Bregman Divergences. Sensors, 23(16), 7229. https://doi.org/10.3390/s23167229
 
        



