Incremental Weak Subgradient Methods for Non-Smooth Non-Convex Optimization Problems
Abstract
1. Introduction
- Numerical stability: Weak subgradients mitigate the high variance of standard subgradients near non-differentiable points, enabling more stable and smoother convergence trajectories [2].
- Scalability: In large-scale problems like empirical risk minimization, they support efficient incremental or stochastic updates without requiring full subgradient computations [3].
- Adaptive step sizes: Their synergy with dynamic or diminishing step size schemes enhances both convergence speed and robustness in practice [4].
2. Preliminaries
- Let be lower semi-continuous at . Then, f is weakly subdifferentiable at if there is lower locally Lipschitz at and there exists a point such that f is subdifferentiable there.
- The function f is weakly subdifferentiable at if f is lower Lipschitz at .
- The function f is weakly subdifferentiable at if there is lower locally Lipschitz at and bounded below.
- The function f is weakly subdifferentiable at if f is a lower locally Lipschitz at , and there exist numbers and q such that for all .
- If f is a positively homogeneous function bounded from below on some neighborhood of , then f is weakly subdifferentiable at .
3. Incremental Subgradient Method for Non-Smooth Non-Convex Optimization
Algorithm 1 Incremental weak subgradient update step. |
|
3.1. Convergence Analysis for the Constant Step Size
3.2. Convergence Analysis for the Diminishing Step Size
3.3. Convergence Analysis for the Dynamic Step Size
- is the current function value at iteration n;
- is the target level, with being the best function value found so far;
- where is a control parameter;
- is a minimum step size threshold.
Algorithm 2 Incremental weak subgradient method with dynamic step size. |
|
- Problem-aware adaptation: The term allows the step size to automatically adjust based on the gap between the current function value and our target level. This enables larger steps when far from the target (accelerating progress) and smaller, more careful steps as we approach it (preventing overshooting).
- Gradient normalization: The denominator normalizes the step size relative to the magnitudes of the weak subgradients, ensuring appropriate scaling regardless of the problem’s conditioning.
- Lower bound guarantee: The minimum step size prevents the algorithm from stalling in challenging regions of non-convex functions, such as plateaus or shallow local minima.
- Theoretical convergence: The parameter ensures the method maintains theoretical convergence guarantees while provides practical robustness.
- A minimum target level to prevent excessive step size reduction;
- Early termination based on tolerance ;
- Explicit definition of parameter C based on weak subgradient bounds;
- A minimum step size for numerical stability.
- Multiple local minima that can trap optimization algorithms;
- Saddle points where progress may temporarily stall;
- Regions of varying curvature requiring different step sizes;
- Plateaus and steep valleys that demand different exploration strategies.
- The algorithm terminates with ;
- The algorithm terminates with ;
- and .
- Lines 9–11: The algorithm terminates if , which indicates that a necessary condition for optimality holds.
- Lines 12–14: The algorithm terminates if , which is the early termination criterion based on the target level parameter becoming sufficiently small.
4. Numerical Results
4.1. Benchmarking Optimization Methods
- Constant step:
- Diminishing step: , with ,
- Dynamic step: An adaptive strategy based on an incremental weak subgradient framework, where the subgradient bound C from definition (3) is not fixed a priori but adaptively estimated. Specifically, we maintain a vector , where each stores the maximum observed norm of the weak subgradient for the corresponding component function . Initially, for all i, and during optimization, it is updated as follows:
- Rastrigin.This function is a non-smooth variant of the classical Rastrigin function, characterized by a multimodal landscape with many local minima.
- RosenbrockThis is a non-smooth version of the Rosenbrock function, presenting a curved narrow valley with a challenging landscape.
- Smoothly clipped absolute deviation (SCAD)
4.2. Application of Incremental Weak Subgradient Methods to Classification
- Constant Step Size (Case 1): As established in Proposition 1, this method converges to a neighborhood of the optimal solution, but the neighborhood size depends on the fixed step magnitude. This limits fine-tuning near the optimum in the complex SCAD-regularized classification landscape.
- Diminishing Step Size (Case 2): While Proposition 2 guarantees asymptotic convergence, the rigid step reduction schedule cannot adapt to the varying curvature and difficulty regions in the non-convex objective function, resulting in suboptimal classification performance.
- Dynamic Step Size (Case 3): Theorem 3 demonstrates that our adaptive approach provides both theoretical convergence guarantees and superior adaptivity to problem characteristics. The method automatically adjusts step sizes based on the local geometry of the SCAD-regularized hinge loss, enabling effective navigation of the non-convex optimization landscape.
- Model Simplicity: We focused on a linear model with hinge loss and SCAD regularization to emphasize optimization aspects. Applying the weak subgradient framework to more expressive models (e.g., kernels, neural networks, ensembles) could yield better classification results.
- Challenging Preprocessing: Our deliberately complex preprocessing—featuring outliers, correlated features, and uneven scaling—complicates optimization. Future work could explore robust or adaptive preprocessing techniques to preserve difficulty while improving performance.
- Adaptive Hyperparameters: Parameters such as , , and could be tuned dynamically during training using adaptive schemes or meta-learning, enhancing both convergence and generalization.
- Optimization Enhancements: Incorporating momentum, variance reduction, or second-order approximations may accelerate convergence and improve final accuracy by better exploiting the problem’s structure.
5. First Scalability Evaluation in a Parallel Setting
6. Conclusions
Author Contributions
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Nedic, A.; Bertsekas, P.D. Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 2001, 12, 109–138. [Google Scholar] [CrossRef]
- Hiriart-Urruty, J.-B.; Lemaréchal, C. Convex Analysis and Minimization Algorithms II; Springer: New York, NY, USA, 1993. [Google Scholar]
- Bertsekas, D.P. Convex Optimization Algorithms; Athena Scientific: Nashua, NH, USA, 2015. [Google Scholar]
- Polyak, B.T. Introduction to Optimization; Optimization Software, Inc.: New York, NY, USA, 1987. [Google Scholar]
- Liu, J.; Wright, S.J.; Ré, C.; Bittorf, V.; Sridhar, S. An asynchronous parallel stochastic coordinate descent algorithm. In Proceedings of the ICML 2014: 31st International Conference on Machine Learning, Beijing, China, 21–36 June 2014. [Google Scholar]
- Pan, T.Y.; Yang, G.; Zhao, J.; Ding, J. Smoothing Piecewise Linear Activation Functions Based on Mollified Square Root Functions. Math. Found. Comput. 2023, 7, 578–601. [Google Scholar] [CrossRef]
- Azimov, A.; Gasimov, R. On weak conjugency, weak subdifferentials and dualty with zero gap in nonconvex optimiztion. Int. J. Appl. Math. 1999, 1, 171–192. [Google Scholar]
- Dinc Yalcin, G.; Kasimbeyli, R. Weak subgradient method for solving nonsmooth nonconvex optimization problems. Optimization 2021, 70, 1513–1553. [Google Scholar] [CrossRef]
- Kasimbeyli, R.; Mammadov, M. On weak subdifferentials, directional derivatives, and radial epiderivatives for nonconvex functions. SIAM J. Optim. 2009, 20, 841–855. [Google Scholar] [CrossRef]
- Gaivoronski, A.A. Convergence analysis of parallel backpropagation algorithm for neural network. Optim. Methods Softw. 1994, 4, 117–134. [Google Scholar] [CrossRef]
- Grippo, L. A class of unconstrained minimization methods for neural network training. Optim. Methods Softw. 1994, 4, 135–150. [Google Scholar] [CrossRef]
- Bertsekas, D.P. A new class of incremental gradient methods for least squares problems. SIAM J. Optim. 1997, 7, 913–926. [Google Scholar] [CrossRef]
- Solodov, M.V.; Zavriev, S.K. Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 1998, 11, 23–35. [Google Scholar] [CrossRef]
- Tseng, P. An Incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 1998, 8, 506–531. [Google Scholar] [CrossRef]
- Shor, N.Z. Minimization Methods for Nondifferentiable Functions; Naukova Dumka: Kiev, Ukraine, 1979. [Google Scholar]
- Bertsekas, D.P. Nonlinear Programming. J. Oper. Res. Soc. 1997, 48, 334. [Google Scholar] [CrossRef]
- Goffin, J.L.; Kiwiel, K.C. Convergence of a simple subgradient level method. Math. Program. 1999, 85, 207–211. [Google Scholar] [CrossRef]
- Yang, D.; Wang, X. Incremental subgradient algorithms with dynamic step sizes for separable convex optimizations. Math. Methods Appl. Sci. 2023, 46, 7108–7124. [Google Scholar] [CrossRef]
- Wang, X.M. Subgradient algorithms on Riemannian manifolds of lower bounded curvatures. Optimization 2018, 67, 1–16. [Google Scholar] [CrossRef]
- Nesterov, Y. Primal-dual subgradient methods for convex problems. Math. Program. 2009, 120, 221–259. [Google Scholar] [CrossRef]
- Yao, Y.H.; Naseer, S.; Yao, J.C. Projected subgradient algorithms for pseudomonotone equilibrium problems and fixed points of pseudocontractive operators. Mathematics 2020, 8, 461. [Google Scholar] [CrossRef]
- Azimov, A.Y.; Gasimov, R.N. Stability and duality of nonconvex problems via augmented Lagrangian. Cybern. Syst. Anal. 2002, 38, 412–421. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
- Street, W.N.; Wolberg, W.H.; Mangasarian, O.L. Nuclear feature extraction for breast tumor diagnosis. In Proceedings of the IS&T/SPIE’s Symposium on Electronic Imaging: Science and Technology, San Jose, CA, USA, 31 January–5 February 1993; pp. 861–870. [Google Scholar]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Wu, Z.; Xie, G.; Ge, Z.; De Simone, V. Nonconvex multi-period mean- variance portfolio optimization. Ann. Oper. Res. 2024, 332, 617–644. [Google Scholar] [CrossRef]
Problem | Method | Time [s] | |
---|---|---|---|
Ragistrin | Constant | 0.061 | |
Diminishing | 0.037 | ||
Dynamic | 0.036 | ||
Adam | 15.470 | ||
Adagrad | 16.378 | ||
Fminsearch | 1.267 | ||
Patternsearch | 22.766 | ||
Rosembroack | Constant | 0.044 | |
Diminishing | 0.021 | ||
Dynamic | 0.026 | ||
Adam | 3.056 | ||
Adagrad | 3.053 | ||
Fminsearch | 1.386 | ||
Patternsearch | 10.194 | ||
SCAD | Constant | 0.026 | |
Diminishing | 0.020 | ||
Dynamic | 0.015 | ||
Adam | 2.146 | ||
Adagrad | 2.140 | ||
Fminsearch | 2.135 | ||
Patternsearch | 12.383 |
Method | Accuracy (%) | Conv. Time (s) | Rel. Improvement (%) |
---|---|---|---|
Case 1 | 59.64 | 1.43 | – |
Case 2 | 53.22 | 1.87 | −10.76 |
Case 3 | 71.93 | 1.21 | +17.84 |
Cores | Problem Dimension (n) | Execution Time [s] | Time per Variable [ms] |
---|---|---|---|
1 | 500 | 1.1911 | 2.38 |
2 | 1000 | 1.2654 | 1.27 |
4 | 2000 | 1.4477 | 0.72 |
8 | 4000 | 2.1513 | 0.54 |
12 | 6000 | 2.4738 | 0.41 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Araboljadidi, N.; De Simone, V. Incremental Weak Subgradient Methods for Non-Smooth Non-Convex Optimization Problems. Information 2025, 16, 509. https://doi.org/10.3390/info16060509
Araboljadidi N, De Simone V. Incremental Weak Subgradient Methods for Non-Smooth Non-Convex Optimization Problems. Information. 2025; 16(6):509. https://doi.org/10.3390/info16060509
Chicago/Turabian StyleAraboljadidi, Narges, and Valentina De Simone. 2025. "Incremental Weak Subgradient Methods for Non-Smooth Non-Convex Optimization Problems" Information 16, no. 6: 509. https://doi.org/10.3390/info16060509
APA StyleAraboljadidi, N., & De Simone, V. (2025). Incremental Weak Subgradient Methods for Non-Smooth Non-Convex Optimization Problems. Information, 16(6), 509. https://doi.org/10.3390/info16060509