Gradient Methods for Optimization

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Combinatorial Optimization, Graph, and Network Algorithms".

Deadline for manuscript submissions: closed (28 February 2023) | Viewed by 12964

Special Issue Editor


E-Mail Website
Guest Editor
Institute for Machine Learning, ETH Zurich, 8092 Zürich, Switzerland
Interests: optimal transport; optimization; machine learning

Special Issue Information

Dear Colleagues,

With machine learning tasks often cast as high-dimensional minimization problems that need to be solved accurately and efficiently, optimization has been a cornerstone in modern AI research.

Gradient-based algorithms, among numerous competitors, are the most successful, both theoretically and empirically, due to their low per-iteration cost and fast convergence rate, notable examples including the gradient descent and its momentum accelerated variants for convex optimization, the Frank–Wolfe algorithms, the Alternating Direction Method of Multipliers (ADMM) for constrained optimization, and the stochastic gradient descent for non-convex optimization, just to name a few.

Despite huge success in previous studies, there are still many fundamental open problems, such as the algorithmic bias of stochastic gradient methods requiring further detailed investigating.
Moreover, emerging AI tasks have opened up new fields of optimization research, such as federated learning imposing novel privacy constraints regarding the optimization procedure, the training of the generative adversarial network requiring lifting the optimization domain to the more abstract probability manifold, and significantly benefiting from a better understanding of the more intricate min–max optimization.

We invite you to submit high-quality papers to the Special Issue on “Gradient Methods for Optimization", with subjects covering the whole range from theory to algorithms. The following is a (non-exhaustive) list of topics of interest:

  1. Optimization methods and theories for convex, submodular, and non-convex problems.
  2. Optimization in more abstract domains such as the probability manifold.
  3. Optimization for min–max problems.
  4. Optimization methods for federated learning.

Dr. Zebang Shen
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 747 KiB  
Article
Extrinsic Bayesian Optimization on Manifolds
by Yihao Fang, Mu Niu, Pokman Cheung and Lizhen Lin
Algorithms 2023, 16(2), 117; https://doi.org/10.3390/a16020117 - 15 Feb 2023
Viewed by 1454
Abstract
We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and utilizing the uncertainty in that surrogate by deriving an acquisition function. This acquisition function [...] Read more.
We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and utilizing the uncertainty in that surrogate by deriving an acquisition function. This acquisition function represents the probability of improvement based on the kernel of the Gaussian process, which guides the search in the optimization process. The critical challenge for designing Bayesian optimization algorithms on manifolds lies in the difficulty of constructing valid covariance kernels for Gaussian processes on general manifolds. Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensional Euclidean space via equivariant embeddings and then constructing a valid covariance kernel on the image manifold after the embedding. This leads to efficient and scalable algorithms for optimization over complex manifolds. Simulation study and real data analyses are carried out to demonstrate the utilities of our eBO framework by applying the eBO to various optimization problems over manifolds such as the sphere, the Grassmannian, and the manifold of positive definite matrices. Full article
(This article belongs to the Special Issue Gradient Methods for Optimization)
Show Figures

Figure 1

25 pages, 620 KiB  
Article
Personalized Federated Multi-Task Learning over Wireless Fading Channels
by Matin Mortaheb, Cemil Vahapoglu and Sennur Ulukus
Algorithms 2022, 15(11), 421; https://doi.org/10.3390/a15110421 - 09 Nov 2022
Cited by 4 | Viewed by 2212
Abstract
Multi-task learning (MTL) is a paradigm to learn multiple tasks simultaneously by utilizing a shared network, in which a distinct header network is further tailored for fine-tuning for each distinct task. Personalized federated learning (PFL) can be achieved through MTL in the context [...] Read more.
Multi-task learning (MTL) is a paradigm to learn multiple tasks simultaneously by utilizing a shared network, in which a distinct header network is further tailored for fine-tuning for each distinct task. Personalized federated learning (PFL) can be achieved through MTL in the context of federated learning (FL) where tasks are distributed across clients, referred to as personalized federated MTL (PF-MTL). Statistical heterogeneity caused by differences in the task complexities across clients and the non-identically independently distributed (non-i.i.d.) characteristics of local datasets degrades the system performance. To overcome this degradation, we propose FedGradNorm, a distributed dynamic weighting algorithm that balances learning speeds across tasks by normalizing the corresponding gradient norms in PF-MTL. We prove an exponential convergence rate for FedGradNorm. Further, we propose HOTA-FedGradNorm by utilizing over-the-air aggregation (OTA) with FedGradNorm in a hierarchical FL (HFL) setting. HOTA-FedGradNorm is designed to have efficient communication between the parameter server (PS) and clients in the power- and bandwidth-limited regime. We conduct experiments with both FedGradNorm and HOTA-FedGradNorm using MT facial landmark (MTFL) and wireless communication system (RadComDynamic) datasets. The results indicate that both frameworks are capable of achieving a faster training performance compared to equal-weighting strategies. In addition, FedGradNorm and HOTA-FedGradNorm compensate for imbalanced datasets across clients and adverse channel effects. Full article
(This article belongs to the Special Issue Gradient Methods for Optimization)
Show Figures

Figure 1

15 pages, 600 KiB  
Article
Fed-DeepONet: Stochastic Gradient-Based Federated Training of Deep Operator Networks
by Christian Moya and Guang Lin
Algorithms 2022, 15(9), 325; https://doi.org/10.3390/a15090325 - 12 Sep 2022
Cited by 3 | Viewed by 2267
Abstract
The Deep Operator Network (DeepONet) framework is a different class of neural network architecture that one trains to learn nonlinear operators, i.e., mappings between infinite-dimensional spaces. Traditionally, DeepONets are trained using a centralized strategy that requires transferring the training data to a centralized [...] Read more.
The Deep Operator Network (DeepONet) framework is a different class of neural network architecture that one trains to learn nonlinear operators, i.e., mappings between infinite-dimensional spaces. Traditionally, DeepONets are trained using a centralized strategy that requires transferring the training data to a centralized location. Such a strategy, however, limits our ability to secure data privacy or use high-performance distributed/parallel computing platforms. To alleviate such limitations, in this paper, we study the federated training of DeepONets for the first time. That is, we develop a framework, which we refer to as Fed-DeepONet, that allows multiple clients to train DeepONets collaboratively under the coordination of a centralized server. To achieve Fed-DeepONets, we propose an efficient stochastic gradient-based algorithm that enables the distributed optimization of the DeepONet parameters by averaging first-order estimates of the DeepONet loss gradient. Then, to accelerate the training convergence of Fed-DeepONets, we propose a moment-enhanced (i.e., adaptive) stochastic gradient-based strategy. Finally, we verify the performance of Fed-DeepONet by learning, for different configurations of the number of clients and fractions of available clients, (i) the solution operator of a gravity pendulum and (ii) the dynamic response of a parametric library of pendulums. Full article
(This article belongs to the Special Issue Gradient Methods for Optimization)
Show Figures

Figure 1

11 pages, 650 KiB  
Article
Accounting for Round-Off Errors When Using Gradient Minimization Methods
by Dmitry Lukyanenko, Valentin Shinkarev and Anatoly Yagola
Algorithms 2022, 15(9), 324; https://doi.org/10.3390/a15090324 - 09 Sep 2022
Cited by 2 | Viewed by 1817
Abstract
This paper discusses a method for taking into account rounding errors when constructing a stopping criterion for the iterative process in gradient minimization methods. The main aim of this work was to develop methods for improving the quality of the solutions for real [...] Read more.
This paper discusses a method for taking into account rounding errors when constructing a stopping criterion for the iterative process in gradient minimization methods. The main aim of this work was to develop methods for improving the quality of the solutions for real applied minimization problems, which require significant amounts of calculations and, as a result, can be sensitive to the accumulation of rounding errors. However, this paper demonstrates that the developed approach can also be useful in solving computationally small problems. The main ideas of this work are demonstrated using one of the possible implementations of the conjugate gradient method for solving an overdetermined system of linear algebraic equations with a dense matrix. Full article
(This article belongs to the Special Issue Gradient Methods for Optimization)
Show Figures

Figure 1

22 pages, 939 KiB  
Article
Federated Optimization of 0-norm Regularized Sparse Learning
by Qianqian Tong, Guannan Liang, Jiahao Ding, Tan Zhu, Miao Pan and Jinbo Bi
Algorithms 2022, 15(9), 319; https://doi.org/10.3390/a15090319 - 06 Sep 2022
Viewed by 2006
Abstract
Regularized sparse learning with the 0-norm is important in many areas, including statistical learning and signal processing. Iterative hard thresholding (IHT) methods are the state-of-the-art for nonconvex-constrained sparse learning due to their capability of recovering true support and scalability with large [...] Read more.
Regularized sparse learning with the 0-norm is important in many areas, including statistical learning and signal processing. Iterative hard thresholding (IHT) methods are the state-of-the-art for nonconvex-constrained sparse learning due to their capability of recovering true support and scalability with large datasets. The current theoretical analysis of IHT assumes the use of centralized IID data. In realistic large-scale scenarios, however, data are distributed, seldom IID, and private to edge computing devices at the local level. Consequently, it is required to study the property of IHT in a federated environment, where local devices update the sparse model individually and communicate with a central server for aggregation infrequently without sharing local data. In this paper, we propose the first group of federated IHT methods: Federated Hard Thresholding (Fed-HT) and Federated Iterative Hard Thresholding (FedIter-HT) with theoretical guarantees. We prove that both algorithms have a linear convergence rate and guarantee for recovering the optimal sparse estimator, which is comparable to classic IHT methods, but with decentralized, non-IID, and unbalanced data. Empirical results demonstrate that the Fed-HT and FedIter-HT outperform their competitor—a distributed IHT, in terms of reducing objective values with fewer communication rounds and bandwidth requirements. Full article
(This article belongs to the Special Issue Gradient Methods for Optimization)
Show Figures

Figure 1

31 pages, 793 KiB  
Article
ZenoPS: A Distributed Learning System Integrating Communication Efficiency and Security
by Cong Xie, Oluwasanmi Koyejo and Indranil Gupta
Algorithms 2022, 15(7), 233; https://doi.org/10.3390/a15070233 - 01 Jul 2022
Cited by 2 | Viewed by 2073
Abstract
Distributed machine learning is primarily motivated by the promise of increased computation power for accelerating training and mitigating privacy concerns. Unlike machine learning on a single device, distributed machine learning requires collaboration and communication among the devices. This creates several new challenges: (1) [...] Read more.
Distributed machine learning is primarily motivated by the promise of increased computation power for accelerating training and mitigating privacy concerns. Unlike machine learning on a single device, distributed machine learning requires collaboration and communication among the devices. This creates several new challenges: (1) the heavy communication overhead can be a bottleneck that slows down the training, and (2) the unreliable communication and weaker control over the remote entities make the distributed system vulnerable to systematic failures and malicious attacks. This paper presents a variant of stochastic gradient descent (SGD) with improved communication efficiency and security in distributed environments. Our contributions include (1) a new technique called error reset to adapt both infrequent synchronization and message compression for communication reduction in both synchronous and asynchronous training, (2) new score-based approaches for validating the updates, and (3) integration with both error reset and score-based validation. The proposed system provides communication reduction, both synchronous and asynchronous training, Byzantine tolerance, and local privacy preservation. We evaluate our techniques both theoretically and empirically. Full article
(This article belongs to the Special Issue Gradient Methods for Optimization)
Show Figures

Figure 1

Back to TopTop