Abstract
In this paper, we delve into the realm of biconvex optimization problems, introducing an adaptive Douglas–Rachford algorithm and presenting related convergence theorems in the setting of finite-dimensional real Hilbert spaces. It is worth noting that our approach to proving the convergence theorem differs significantly from those in the literature.
Keywords:
biconvex optimization problem; regularized optimization problem; Douglas–Rachford algorithm; adaptive algorithm MSC:
65K05; 49M37; 90C26
1. Introduction
In science and engineering, convex optimization has been extensively studied and applied. For convex optimization problems, any local minimum is also a global minimum, simplifying the search for optimal solutions. In parameter estimation, particularly in the context of system identification, convexity ensures the convergence of estimates to the true parameters. However, some identification problems cannot always be formulated as convex optimization problems. For instance, the identification of block-oriented nonlinear systems often leads to a biconvex optimization problem rather than a convex one. Unlike convex optimization, biconvex optimization may have numerous local minima. Nevertheless, it exhibits convex substructures, as a biconvex optimization problem can be divided into multiple convex optimization subproblems. These substructures can be effectively utilized to solve the entire biconvex optimization problem.
From the literature [1], we observe that many optimization problems are multi-convex programming, and many published studies on practical multi-convex programming focus on special practical models, like [2,3,4,5,6,7,8,9,10,11,12,13]. In particular, Wen, Yin and Zhang [12] pointed out that multi-convex programming is an NP-hard problem.
Let and be two real Hilbert spaces, and let be an extended function. Then f is called a biconvex function if is convex for each , and is convex for each . Thus, every convex function is also biconvex, but a biconvex function may still be nonconvex.
The following is a type of biconvex optimization problem, also referred to as a block optimization problem.
where and are real Hilbert spaces, is a block biconvex function, and and are convex functions. Here, f and g are called the regularization functions of (BOP). In general, f and g could be supposed as Fréchet differentiable. It is important to note that the function can be nonconvex, even if f and g are convex functions.
The standard approach to solving the biconvex optimization problem is via the so-called Gauss–Seidel iteration scheme, popularized in the modern era under the name alternating minimization. Indeed, this method could be called the block coordinate descent algorithm (BCD).
In 1992, the proximal BCD algorithm was proposed by Auslender [14] to relax the requirements of the (BCD) convergence theorem.
In 2013, Xu and Yin [13] gave the proximal linearized BCD.
In 2014, Botle et al. [3] gave the proximal alternating linearized minimization.
In 2019, Nikolova and Tan [10] gave the alternating structure-adapted proximal gradient descent algorithm.
In fact, the algorithm of ASAP is equivalent to the following.
Therefore, the algorithm of ASAP can be written in the following form.
On the other hand, the following is a well-known generalized convex optimization problem, and related Douglas–Rachford algorithm.
where H is a real Hilbert space, and and are proper, lower semicontinuous, and convex functions (see Algorithm 1).
| Algorithm 1: ([15], Corollary 27.4) |
Let be generated by the following.
|
Indeed, it is well-known that the Douglas–Rachford algorithm is widely used for convex optimization problems and those involving convexity assumptions. Additionally, despite the lack of theoretical justification, the literature shows that the algorithm has been successfully applied to various practical non-convex problems [16,17]. For further details on the Douglas–Rachford and Peaceman–Rachford algorithms, please refer to [18,19,20,21,22] and related references.
Inspired by the above work, we propose the adaptive Douglas–Rachford algorithm to study the biconvex optimization problem in the finite dimensional real Hilbert spaces.
Remark 1.
In Algorithm 2, if is given by Step 3-1, then this algorithm could be called the Douglas–Rachford algorithm. But, when could be given by Step 3-2, we call this algorithm the adaptive Douglas–Rachford algorithm.
In this paper, we study the biconvex optimization problem and give an adaptive Douglas–Rachford algorithm and related convergence theorems in the setting of finite dimensional real Hilbert spaces. It is worth noting that our approach to proving the convergence theorem differs significantly from those in the literature.
| Algorithm 2: Adaptive Douglas–Rachford Algorithm |
|
2. Preliminaries
Let H be a real Hilbert space with inner product and norm . We denote the strong and weak convergence of to by and , respectively. For each , and , we have
Definition 1.
Let H be a real Hilbert space, be a mapping, and . Thus,
- (i)
- B is monotone if for all .
- (ii)
- B is ρ-strongly monotone if for all .
Definition 2.
Let H be a real Hilbert space, and be a set-valued mapping with domain . Thus,
- (i)
- B is monotone if for any and .
- (ii)
- B is maximal monotone if its graph is not properly contained in the graph of any other monotone mapping.
- (iii)
- B is ρ-strongly monotone () if for all , and all , and .
Definition 3.
Let H be a real Hilbert space, and be a function. Thus,
- (i)
- f is proper if .
- (ii)
- f is lower semicontinuous if is closed for each .
- (iii)
- f is convex if for every and .
- (iv)
- f is ρ-strongly convex () iffor all and .
- (v)
- f is Gâteaux differentiable at if there is such thatfor each .
- (vi)
- f is Fréchet differentiable at x if there is such that
Remark 2.
Let H be a real Hilbert space, and be a function. Then f is a convex function if and only if is a ρ-strongly convex function ([15], Proposition 10.6). Hence, it is easy to establish the relation between convex functions and strongly convex functions.
Definition 4.
Let be a proper lower semicontinuous and convex function. Then the subdifferential of f is defined by
for each .
Lemma 1
([15,23]). Let be a proper lower semicontinuous and convex function. Then the following is satisfied.
- (i)
- is a set-valued maximal monotone mapping.
- (ii)
- f is Gâteaux differentiable at if and only if consists of a single element. That is, .
- (iii)
- Suppose that f is Fréchet differentiable. Then f is convex if and only if is a monotone mapping.
Lemma 2
([15], Example 22.3(iv)). Let , H be a real Hilbert space, and be a proper lower-semicontinuous and convex function. If f is ρ-strongly convex, then is ρ-strongly monotone.
Lemma 3
([15], Proposition 16.26). Let H be a real Hilbert space, and be a proper lower semicontinuous and convex function. If and are sequences in H with for all , and and , then .
Lemma 4
([24]). Let H be a real Hilbert space, be a set-valued maximal monotone mapping, , and be defined by for each . Then is a single-valued mapping.
Definition 5.
Let , H be a real Hilbert space, and be a proper lower-semicontinuous and convex function. Then the proximal operator of g with τ is defined by
for each .
Lemma 5.
Let be a proper, lower semicontinuous, and convex function. Assume and , we have
for each .
3. Main Results
We are interested in solving nonconvex minimization problems of the form
where and are two real Hilbert spaces, , , , and .
Here, if is a solution of the problem (BOP), then
and this implies that
and
for all . That is,
This implies that
So, it is natural to give a condition:
for all and . Indeed, if this condition holds, then (4), (5), and (6) are equivalent.
Example 1.
Let be defined as
for all . Then h satisfies the condition .
Assumption 1.
Assume that:
- (i)
- f and g are proper lower semicontinuous and convex functions;
- (ii)
- h is continuous, and and are proper and convex functions;
- (iii)
- J is lower bounded;
- (iv)
- for all and .
For a fixed y, the partial subdifferential of at x is denoted by . For a fixed x, the partial subdifferential of at y is denoted by . Hence, for , we have
Fermat’s rule, extended to nonconvex and nonsmooth function, is given next.
Proposition 1
([25], Theorem 10.1). Let be a proper function. If f has a local minimum at , then .
Definition 6.
We say that is a critical point of J if . For simplicity, the set of the critical points of J is denoted by .
Remark 3.
We know
Hence, if the condition is satisfied, then if and only if is a solution of the problem .
Here, we consider the first part of Algorithm 2.
Remark 4.
In Algorithm 3, for each and , we set and . Thus, and . Further, if and are bounded, then and are bounded.
| Algorithm 3: Adaptive Douglas–Rachford Algorithm (Part 1) |
|
Let , and be a sequence in , and with , and and be given, and let , , , and be generated as
|
Proof.
Since and , we have
Since is maximal monotone, we know
and this implies that
Similarly, we have
and
Further, it is easy to see and are bounded when and are bounded. □
Lemma 6.
Let , , , and be generated from Algorithm 3. Then, for each and , and , we have
and
Proof.
Take any and , and let be fixed. First, it follows from and Lemma 5 that
and this implies that
Similar to (15), we have
Next, it follows from and Lemma 5 that
and this implies that
Here, we know
By (18) and (19), we have
Similar to (18), we have
Hence, we know
By (21) and (22), we have
Next, we have
By (15), (20), and (24), we have
Similar to (24), we have
By (16), (23), and (26),
So, we obtain the conclusion of Lemma 6. □
The following is the second part of Algorithm 2.
Remark 5.
In Algorithm 4, we know the sequence is chosen from the interval with
| Algorithm 4: Adaptive Douglas–Rachford Algorithm (Part 2) |
|
Theorem 1.
Let , , , and be generated from Algorithms 3 and 4, and we assume that the solution set of the problem (BOP) is nonempty, and we assume that and are finite dimensional. Then there exist , , a subsequence of such that , , , , and is a solution of the problem .
Proof.
Let be any solution of problem (BOP). By (26), (27), and the assumption, we have
Next, we consider the following cases.
Case 1: If , then we choose . Hence,
Case 2: If , then we set as
Case 2 (i): If , then we choose . Hence,
Case 2 (ii): If , then we set . Hence,
Case 2 (iii): If and , then we set and have the following.
Case 2 (iv): If and , then we set as
Thus, we have the following.
So,
and this implies that
Set as
Hence, we obtain the following from (30), (31), (32), (34), and (35).
Therefore, is nondecreasing, and exists, and this implies that , , , are bounded, and
So,
and
Since , , , and are bounded, there exist , , a subsequence of , a subsequence of , a subsequence of , and a subsequence of such that , , , and .
Theorem 2.
In Theorem 1, if and we further assume that f and g are ρ-strongly convex, then there exist , such that and , where is a solution of the problem .
Proof.
In Theorem 1, there exist , , a subsequence of such that , , , , and is a solution of the problem . Further, we have
and
Hence, if is a subsequence of such that and . Clearly, we know is a bounded subsequence of . So, without loss of generality, we may assume that and . Next, it follows from the proof of Theorem 1 that is a solution of the problem , and
and
By (45) and (47), and f is when -strongly convex, we have
and this implies that
Since is convex, we have
By (50) and (51), we know . Similarly, we have . Therefore, we know and , and the proof is completed. □
Author Contributions
Conceptualization, M.-S.L. and C.-S.C.; methodology, M.-S.L.; formal analysis, C.-S.C.; resources, M.-S.L.; writing—original draft preparation, C.-S.C.; writing—review and editing, M.-S.L. All authors have read and agreed to the published version of the manuscript.
Funding
Chih-Sheng Chuang was supported by the National Science and Technology Council (NSTC 112-2115-M-415-001).
Data Availability Statement
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| BOP | Biconvex Optimization Problem (or Block Optimization Problem) |
| BCD | Black Coordinate Descent algorithm |
| PALM | Proximal ALternating Linearized Minimization |
| ASAP | Alternating Structure-Adapted Proximal gradient descent algorithm |
References
- Grant, M.; Boyd, S.; Ye, Y. Disciplined convex programming. In Global Optimization: From Theory to Implementation, Nonconvex Optimization and Its Applications; Liberti, L., Maculan, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 155–210. [Google Scholar]
- Al-Shatri, H.; Li, X.; Ganesan, R.S.; Klein, A.; Weber, T. Maximizing the sum rate in cellular networks using multiconvex optimization. IEEE Trans. Wirel. Commun. 2016, 15, 3199–3211. [Google Scholar] [CrossRef]
- Botle, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program Ser. A 2014, 146, 459–494. [Google Scholar]
- Che, H.; Wang, J. A Two-Timescale Duplex Neurodynamic Approach to Biconvex Optimization. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2503–2514. [Google Scholar] [CrossRef] [PubMed]
- Chiu, W.Y. Method of reduction of variables for bilinear matrix inequality problems in system and control designs. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 1241–1256. [Google Scholar] [CrossRef]
- Fu, X.; Huang, K.; Sidiropoulos, N.D. On identifiability of nonnegative matrix factorization. IEEE Signal Process. Lett. 2018, 25, 328–332. [Google Scholar] [CrossRef]
- Gorski, J.; Pfeuffer, F.; Klamroth, K. Biconvex sets and optimization with biconvex functions: A survey and extensions. Math. Methods Oper. Res. 2007, 66, 373–407. [Google Scholar] [CrossRef]
- Hours, J.H.; Jones, C.N. A parametric multiconvex splitting technique with application to real-time NMPC. In Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA, 15–17 December 2014; pp. 5052–5057. [Google Scholar]
- Li, G.; Wen, C.; Zheng, W.X.; Zhao, G. Iterative identification of block-oriented nonlinear systems based on biconvex optimization. Syst. Control Lett. 2015, 79, 68–75. [Google Scholar] [CrossRef]
- Nikolova, M.; Tan, P. Alternating structure-adapted proximal gradient descent for nonconvex nonsmooth block-regularized problems. SIAM J. Optim. 2008, 29, 2053–2078. [Google Scholar] [CrossRef]
- Shah, S.; Yadav, A.K.; Castillo, C.D.; Jacobs, D.W.; Studer, C.; Goldstein, T. Biconvex Relaxation for Semidefinite Programming in Computer Vision. In Computer Vision—ECCV 2016; Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Wen, Z.; Yin, W.; Zhang, Y. Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Program. Comput. 2012, 4, 333–361. [Google Scholar] [CrossRef]
- Xu, Y.; Yin, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 2013, 6, 1758–1789. [Google Scholar] [CrossRef]
- Auslender, A. Asymptotic properties of the fenchel dual functional and applications to decomposition problems. J. Optim. Theory Appl. 1992, 73, 427–449. [Google Scholar] [CrossRef]
- Bauschke, H.H.; Combettes, P.L. Convex Functions: Variantsn. In Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: Berlin, Germany, 2011; pp. 143–153. [Google Scholar]
- Elser, V.; Rankenburg, I.; Thibault, P. Searching with iterated maps. Proc. Natl. Acad. Sci. USA 2007, 104, 418–423. [Google Scholar] [CrossRef] [PubMed]
- Gravel, S.; Elser, V. Divide and concur: A general approach constraint satisfaction. Phys. Rev. E 2008, 78, 036706. [Google Scholar] [CrossRef] [PubMed]
- Aragón Artacho, F.J.; Borwein, J.M. Global convergence of a non-convex Douglas–Rachford iteration. J. Glob. Optim. 2013, 57, 753–769. [Google Scholar] [CrossRef][Green Version]
- Aragón Artacho, F.J.; Campoy, R. A new projection method for finding the closet point in the intersection of convex sets. Comput. Optim. Appl. 2018, 69, 99–132. [Google Scholar] [CrossRef]
- Bauschke, H.H.; Moursi, W.M. On the Douglas–Rachford algorithm. Math. Program. 2017, 164, 263–284. [Google Scholar] [CrossRef]
- Borwein, J.M.; Sims, B. The Douglas–Rachford Algorithm in the Absence of Convexity. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering; Springer Optimization and Its Applications; Springer: New York, NY, USA, 2011; Volune 49, pp. 93–109. [Google Scholar]
- Eckstein, J.; Bertsekas, D.P. On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 1992, 55, 293–318. [Google Scholar] [CrossRef]
- Butnariu, D.; Iusem, A.N. Totally Convex Functions. In Totally Convex Functions for Fixed Points Computation and Infinite Dimensional Optimization; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2000; pp. 2–45. [Google Scholar]
- Marino, G.; Xu, H.K. Convergence of generalized proximal point algorithm. Comm. Pure Appl. Anal. 2004, 3, 791–808. [Google Scholar] [CrossRef]
- Rockafellar, R.T.; Wets, J.B. Variational Analysis; Springer: New York, NY, USA, 1998. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).