1. Introduction
Optimization is a fundamental concept in various fields, ranging from engineering and economics to machine learning and logistics. In the case of composite optimization, the objective function or the constraints are composed of multiple individual functions or constraints, making the problem more complex. Composite optimization problems [
1,
2,
3,
4] arise in diverse applications, such as signal processing, image reconstruction, and data analysis. The challenge in composite optimization lies in effectively optimizing these composite functions, potentially involving iterative algorithms or gradient-based methods. Researchers in optimization continually develop new techniques and algorithms to efficiently solve composite optimization problems and improve decision-making processes in various domains.
Solving composite optimization problems efficiently requires specialized algorithms that can exploit the problem’s composite nature to achieve optimal solutions. Many popular methods, including the proximal gradient method, the alternating direction method of multipliers (ADMM), the block coordinate descent, and the primal–dual method can be applied to solve the composite optimization problem [
5,
6,
7,
8,
9,
10,
11]. The proximal gradient method is a widely used optimization algorithm for solving composite optimization problems. This method leverages both gradient descent and proximal operators to handle composite objective functions, where the objective is the sum of a smooth and a nonsmooth component. By iteratively updating variables using gradient steps and proximal operators, proximal gradient methods can efficiently solve composite optimization problems. For example, Sahu et al. [
8] proposed first-order methods such as proximal gradient, which use forward–backward splitting techniques. In addition, they derived convergence rates for the proposed formulations and showed that the speed DJM of convergence of these algorithms is significantly better than the traditional forward–backward algorithm. In [
9], Neal et al. discussed many different interpretations of proximal operators and algorithms; they also described their connections to many other topics in optimization and applied mathematics.
On the other hand, extending some theory results and methods of optimization problems from Euclidean spaces to Riemannian manifolds has attracted significant attention in recent years; see, e.g., [
12,
13,
14,
15,
16,
17,
18,
19,
20,
21]. For instance, Riemannian proximal gradient methods have been developed in [
22,
23], which utilize the Riemannian metric and curvature information to iteratively minimize the objective function. In [
22], Bento et al. presented the proximal point method for finding minima of a special class of nonconvex function on Hadamard manifolds. The well definedness of the sequence generated by the proximal point method was established, and its convergence for a minima was obtained. Based on the results of [
22], Feng et al. proposed a monotone proximal gradient algorithm with a fixed step size on Hadamard manifolds in [
23]. They also established the convergence theorem of the proposed method under the reasonable definition of proximal gradient mapping on manifolds. There are many advantages of transforming algorithms from Euclidean space to algorithms on Riemannian manifolds: First, Riemannian manifolds capture the underlying geometry of the data space, allowing algorithms to leverage this structure for more accurate and efficient computations; second, Riemannian manifolds can model nonlinear and curved data more effectively than Euclidean spaces, enabling algorithms to better represent and process complex data distributions; third, algorithms designed for Riemannian manifolds are often more suitable on curved surfaces or non-Euclidean spaces, leading to improved performance and results.
Composite optimization problems on Riemannian manifolds pose a unique computational challenge due to the non-Euclidean geometry of the manifold. In such problems, the objective function is composed of several terms defined on the Riemannian manifold, which requires specialized optimization techniques that respect the manifold structure. In this paper, we propose the proximal gradient method for composite optimization problems on Riemannian manifolds. The proximal gradient method on Riemannian manifolds performs a gradient descent step and a proximal operator step. For the gradient descent step on Riemannian manifolds, the algorithm computes the gradient of the smooth function with respect to the Riemannian metric at the current iteration. This gradient is then used to update the iteration in the direction that minimizes the smooth component of the objective function on Riemannian manifolds. After the gradient descent step, the algorithm applies a proximal operator to the current iteration on Riemannian manifolds. The proximal operator is a mapping that projects the updated iteration onto the set of feasible points on the manifold, taking into account the nonsmooth component of the objective function. By iteratively alternating between these two steps, the proximal gradient method aims to efficiently minimize the composite objective function on the Riemannian manifold while respecting the manifold’s geometry and constraints. This approach is particularly useful for optimization problems in machine learning, computer vision, and other fields where data lie on complex geometric structures.
The proximal gradient method is a powerful optimization algorithm commonly used to solve composite optimization problems on Euclidean spaces. However, extending this method to Riemannian manifolds involves some challenges due to the non-Euclidean geometry of these spaces. One approach to address this is the Riemannian proximal gradient method, which combines ideas from optimization theory with Riemannian geometry to efficiently solve composite optimization problems on Riemannian manifolds. Since a Riemannian manifold, in general, does not have a linear structure, usual techniques in the Euclidean space cannot be applied, and new techniques have to be proposed. Our results contribute as follows. First, the proximal gradient method for composite optimization problems is introduced on Riemannian manifolds, and its convergence results are established, which generalizes some algorithm results in [
8,
9] from
to Riemannian manifolds. Second, some global convergence results of the proximal gradient method for composite optimization problems are proved using the backtracking procedure, which makes the algorithm more efficient. Furthermore, a sublinear convergence rate of the generalized sequence of function values to the optimal value is established on Riemannian manifolds, and the complexity result of the proximal gradient for convex case is also obtained. Third, since the computation of exponential mappings and parallel transports can be quite expensive in manifolds, and many convergence results show that the nice properties of some algorithms hold for all suitably defined retractions and general vector transports on Riemannian manifolds [
22,
23], the geodesic and parallel transports are replaced by retractions and general vector transports, respectively.
This work is organized as follows. In
Section 2, some necessary definitions and concepts are provided for Riemannian manifolds. In
Section 3, the proximal gradient method for composite optimization problems is presented for Riemannian manifolds. In
Section 4, under some reasonable conditions, some convergence results of the proximal gradient method for composite optimization problems are provided for Riemannian manifolds.
2. Preliminaries
In this section, some standard definitions and results from Riemannian manifolds are recalled, which can be found in some introductory books on Riemannian geometry; see, for example, [
24,
25].
Let M be a finite-dimensional differentiable manifold and . The tangent space of M at x is denoted by and the tangent bundle of M by . is denoted by the inner product on with the associated norm . If there is no confusion, then the subscript x is omitted. If M is endowed with a Riemannian metric g, then M is a Riemannian manifold. Given a piecewise smooth curve joining x to y, that is, and , the length of by can be defined. Minimizing this length functional over the set of all curves, a Riemannian distance , which induces the original topology on M, is obtained.
A Riemannian manifold is complete if, for any
, all geodesics emanating from
x are defined for all
. By the Hopf–Rinow theorem [
13], any pair of points
can be joined by a minimal geodesic. The exponential mapping
is defined by
for each
, where
is the geodesic starting
x with velocity
v, that is,
and
. It is easy to see that
for each real number
t.
The exponential mapping provides a local parametrization of M via . However, the systematic use of the exponential mapping may not be desirable in all cases. Some local mappings to may reduce the computational cost while preserving the useful convergence properties of the considered method.
Definition 1 ([
21]).
Given , a retraction is a smooth mapping , such that- (i)
for all , where denotes the zero element of ;
- (ii)
, where denotes the derivative of , and id denotes the identity mapping.
It is well-known that the exponential mapping is a special retraction, and some retractions are approximations of the exponential mapping.
The parallel transport is often too expensive to compute in a practical method, so a more general vector transport can be considered, see, for example, [
14,
15], which is built upon the retraction
. A vector transport
,
with the associated retraction
is a smooth mapping, such that, for all
in the domain of
and all
, (i)
; (ii)
; and (iii)
is a linear mapping. Let
denote the isometric vector transport (see, for example, [
15,
20]) with
as the associated retraction. Then, it satisfies (i), (ii), (iii), and
In most practical cases,
exists for all
, and this assumption is made throughout the paper. Furthermore, let
denote the derivative of the retraction, i.e.,
Let
denote a fiber bundle with base space
, such that the fiber over
is
, the set of all linear mappings from
to
. From [
21], it follows that a transporter
on
M is a smooth section of the bundle
. Furthermore,
, and
. Given a retraction
, for any
, the isometric vector transport
can be defined by
In this paper, from the locking condition proposed by Huang [
20],
is required. In some manifolds, there exist retractions, such that the above equality holds, e.g., the Stiefel manifold and the Grassman manifold [
20]. Furthermore, from the above results, it follows that
3. The Proximal Gradient Method
In this paper, the following composite optimization problems are studied on Riemannian manifolds:
Suppose the following assumption holds.
Assumption 1. - (i)
is proper, closed, and convex;
- (ii)
is proper and closed, is convex, , and f is —smooth over , that is, for any for some ;
- (iii)
The optimal set of (1) is nonempty and denoted by , and the optimal value of problem (1) is denoted by .
Remark 1. There are some special cases of problem (1).
- (i)
If and , then (1) reduces to the unconstrained smooth minimization problem on Riemannian manifolds where is an -smooth function on Riemannian manifolds.
- (ii)
If , where C is a nonempty closed and convex set on M, then (1) reduces to the problem of minimizing a different function over a nonempty closed and convex set on Riemannian manifolds.
Let
and
denote the pullback of
f and
g through
R, respectively. For any
, let
denote the restriction of
and
to
. In [
21], it follows that
For problem (1), it is natural to define the following iteration:
After some simple manipulation, (4) can be rewritten as
which, by the definition of the proximal operator, is the same as
Now, the proximal gradient method for composite optimization problems is introduced for Riemannian manifolds.
Let
. Then, the general update step of the proximal gradient method can be written as
Lemma 1 ([
11]).
Let be an L-smooth function over a given convex set D. Then, for any , Lemma 2 ([
11]).
Let be a proper, closed, and convex function. Then, for any , the following three claims are equivalent:- (i)
;
- (ii)
;
- (iii)
.
Lemma 3. Suppose that f and g satisfy Assumption 1. Let . Then, for any and , the following inequality holds:where is the operator defined by Proof. Using the notation
. By Lemma 1, it follows that
By Lemma 2, since
, it follows that
which implies that
which, together with (6), implies that
Therefore,
□
Definition 2. Suppose that f and g satisfy Assumption 1. Then, for any , the gradient mapping is the operator defined by The update step of the proximal gradient method can be rewritten as
and
Theorem 1. Let f and g satisfy Assumption 1, and let . Then,
- (i)
for all , where ;
- (ii)
For , it holds that if and only if is a stationary point of problem (1).
Proof. It follows that
(ii)
if and only if
; from Lemma 2, the latter relation holds if and only if
that is,
This implies that
is a stationary point of
. Then,
is a stationary point of problem (1). □
Next, the monotonicity property with the respect to the parameter L is obtained for Riemannian manifolds.
Theorem 2. Suppose that f and g satisfy Assumption 1, and . Then, for any , it holds that Proof. For any
and
, from Lemma 2, the following inequality holds:
Plugging
, and
into the last inequality, it follows that
or
Exchanging the roles of
and
yields the following inequality:
Multiplying the first inequality by
and the second by
and adding them, it follows that
That is,
Note that if
, then, by (8),
. Assume that
, and define
. Then, by (8), it follows that
This implies that
Therefore,
□
4. The Convergence Result
4.1. The Non-Convex Case
In this section, the convergence of the proximal gradient method is analyzed for Riemannian manifolds. Now, the backtracking procedure B1 is considered as follows.
The procedure requires three parameters , where , and . The choice of is performed as follows.
First,
is the set to be equal to the initial
s. Then, while
we set
. In other words,
is chosen as
, where
is the smallest non-negative integer for which the condition
is satisfied.
Lemma 4. Suppose that Assumption 1 holds. Let be the sequence generated by Algorithm 1 with a step size chosen by the backtracking procedure B1. Then, for any , where .
| Algorithm 1The proximal gradient method for Riemannian manifolds. |
| Initialization: pick .
|
| General step: for any execute the following step: |
|
and set where .
|
| Stopping criteria:.
|
Proof. It follows from Lemma 3 that
If
, then
; hence, by (12), it follows that
holds. This implies that the backtracking procedure B1 must end when
. An upper bound on
can be computed: either
is equal to
s, or the backtracking procedure B1 in invoked, meaning that
did not satisfy the backtracking condition, which implies that
; so,
. That is,
This, together with (13), implies that
By Theorem 2, it follows that
This, together with (13), (14) and Theorem 2, implies that
where
.
□
Theorem 3. Suppose that Assumption 1 holds, and let be the sequence generated by Algorithm 1 with a step size chosen by the backtracking procedure B1. Then,
- (i)
The sequence is non-increasing;
- (ii)
;
- (iii)
All limit points of are stationary points of (1).
Proof. (i) By Lemma 4, it follows that
where
. From the above equality, it follows that
.
(ii) Since the sequence
is non-increasing and bounded below, it converges. Thus,
which implies that
Summing the inequality
over
implies that
Since
, it follows that (ii) holds.
(iii) Let
be a limit point
. Then, there exists a subsequence
converging to
. From (15), it follows that
. It is easy to check that when
,
Since
and the right hand side of (16) goes to 0 as
, this implies that
. Therefore, from Theorem 1 (ii), it follows that
is a stationary point of (1). □
4.2. The Convex Case
In this section, suppose that f is convex on M. Under some conditions, some convergence results of the proximal gradient method for composite optimization problems are obtained for Riemannian manifolds.
Definition 3 ([
11]).
A function is called σ-strongly convex for a given if is convex, and the following inequality holds for any and : Lemma 5 ([
11]).
Let be a proper closed and σ-strongly convex function (). Then,where is the unique minimizer of f. Theorem 4. Suppose that f and g satisfy Assumption 1. For any and satisfyingit holds thatwhere Proof. Consider the function
It is easy to check that
is
strongly convex, and
From Lemma 5, it follows that
By (18), it is easy to check that
This, together with (20), implies that
By the definition of
, it follows that
which is equivalent to
□
The following result is a direct result of Theorem 4.
Corollary 1. Suppose that f and g satisfy Assumption 1. For any , for whichit holds that Next, the backtracking procedure B2 for the case
f is convex is considered. The procedure requires two parameters
, where
and
. Define
. The choice
is obtained as follows. First,
is set to be
. Then, while
we set
. That is,
is chosen as
, where
is the smallest non-negative integer for which the condition
is satisfied. Under Assumption 1 and Lemma 1, it follows that
is obvious. For the inequality
, if
, the inequality (24) is not satisfied with
replacing
. By Lemma 1, it follows that
; so,
.
Next, an rate of convergence of the generated sequence of function values to the optimal value is established for Riemannian manifolds. This rate of convergence is called a sublinear rate.
Theorem 5. Suppose that Assumption 1 holds, and f is convex on M. Let be the sequence generated by Algorithm 1 with the backtracking procedure B2. Then, for any and , there exists , such that Proof. For any
, it follows from (19) that
where the last inequality is obtained by the convexity of
f. From [
21], it is easy to check that there exists
, such that
Summing (27) over
and using
, where
, together with (28), implies that
Thus,
From (22), it follows that
for all
; so,
Therefore,
Let
. Then,
□
To derive the complexity result for the proximal gradient method for Riemannian manifolds, let us assume that for some and some constant . For example, if is bounded, the R might be taken as its diameter. In order to obtain an -optimal solution of (1), by (26), it is enough to require that . The following complexity of the proximal gradient method is a direct result of Theorem 5.
Theorem 6. Suppose that Assumption 1 holds, and f is convex on M. Let be the sequence generated by Algorithm 1 with the backtracking procedure B2. For k satisfyingit holds that , where R is an upper bound on for some . Example 1. On the unit sphere considered as a Riemannian submanifold of , the inner product inherited from the standard inner product on is given byThe normal space isand the projections are given byFrom Section 4 in [21], it follows that . The tangent space to , viewed as a subspace of , isThe function is considered as follows:andon the unit sphere , viewed as a Riemannian submanifold of the Euclidean space . Furthermore,andwhose restriction to is f. From Section 4 in [21], it follows thatIt is easy to obtainFrom Algorithm 1, pick , which is defined by (9),and . Set , and . Let be the sequence generated by Algorithm 1. It is easy to check that all assumptions of Theorem 4 are satisfied; so, the sequence converges to sublinearly.