An Adaptive Proximal Bundle Method with Inexact Oracles for a Class of Nonconvex and Nonsmooth Composite Optimization

: In this paper, an adaptive proximal bundle method is proposed for a class of nonconvex and nonsmooth composite problems with inexact information. The composite problems are the sum of a ﬁnite convex function with inexact information and a nonconvex function. For the nonconvex function, we design the convexiﬁcation technique and ensure the linearization errors of its augment function to be nonnegative. Then, the sum of the convex function and the augment function is regarded as an approximate function to the primal problem. For the approximate function, we adopt a disaggregate strategy and regard the sum of cutting plane models of the convex function and the augment function as a cutting plane model for the approximate function. Then, we give the adaptive nonconvex proximal bundle method. Meanwhile, for the convex function with inexact information, we utilize the noise management strategy and update the proximal parameter to reduce the inﬂuence of inexact information. The method can obtain an approximate solution. Two polynomial functions and six DC problems are referred to in the numerical experiment. The preliminary numerical results show that our algorithm is effective and reliable.


Introduction
Consider the following optimization problem: where f : R N → R is a finite convex function and function h is not necessarily convex. So the primal function (1) may be nonconvex and note that functions f and h are not necessarily smooth. In this paper, we consider the case that function h is easy to evaluate while the function f is much harder to evaluate and is time consuming. The sum of two functions can be found in many optimization problems such as the Lasso problem in image problems and the optimization problems in machine leaning and so on. Meanwhile, the composite function (1) can be obtained from other problems such as by splitting technique and nonlinear programming and so on. Concretely, in some cases, the function considered is much complicated and difficult to evaluate, to speed up calculations, dividing the primal function into two functions f and h with relatively simple structure is a possible way. Besides that, another way is the penalty strategy which transfers the constraint problem into an unconstrained problem with the sum form.
Note that the splitting type methods (see [1,2]) and the alternating type methods (see [3,4]) are two classes of important methods for composite optimization. When functions f and h have some special structures, the above methods may be effective and own better convergent results. However, if the functions do not own special structures or the functions are much complex and difficult to evaluate, the above methods may not be suitable for Problem (1). Meanwhile, in the alternating direction type methods, at least two subproblems need to be solved at each iteration, if one of the subproblems is difficult or hard to solve, the algorithms' effectiveness may be slowed down. Then, it is meaningful to seek other suitable methods to deal with Problem (1) without special structures.
In recent years, many scholars have devoted time to seeking effective methods for nonconvex and nonsmooth optimization problems, see [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. Usually, bundle methods are much effective for solving nonsmooth optimization problems [21][22][23][24][25][26]. Bundle methods use "black box" to compute the objective value and one of its subgradients (not special) at each iterations. Then, bundle techniques can be a class of possible effective ways to deal with the composite problem (1). At present, the proximal alternating linearization type methods (see [4,[27][28][29]) are one effective kind of bundle methods for some composite problems. They need to solve two subproblems at each iteration and the data referred are usually exact. When inexact oracles are involved, the above methods may not be suitable and even be not convergent.
In this paper, we design a proximal bundle method for the inexact composite problem (1) and update the proximal parameter µ to reduce the effects of the inexact information. In the following, we first present some cases where inexact evaluations are generated.
Inexact evaluations are usually referred to in stochastic programming and Lagrangian relaxation [30,31]. It is at the very least impractical and is usually not even easy to solve those subproblems exactly. In bundle methods, inexact information are obtained from inexact oracles. There are different types of inexact oracles. In our work, we consider the Upper oracle (see (2a)-(2c) below). The Upper oracles may overestimate the corresponding function values and get negative linearization errors even if the primal function is convex.
In this paper, we focus on a class of nonconvex and nonsmooth composite problems with inexact data. The design and convergence analysis of bundle methods for nonconvex problems with inexact function and subgradient evaluations are quite involved and there are only a handful of papers for this topic, see [15,[32][33][34][35].
In this paper, we present a proximal bundle method with a convexification technique and noise management strategy to solve composite problem (1). Concretely, we design "convexification" technique for the nonconvex function h to make sure the corresponding linearization error nonnegative and then we adopt the noise management strategy for inexact function f . If the error is "too " large and the testing condition (22) is not satisfied, we decrease the value of the proximal parameter µ to obtain better iterative point. We summary our work as follows: • Firstly, we design the convexification technique for the nonconvex function h to make sure the linearization errors of the augment function φ n = h + η n 2 · 2 are nonnegative. Note that the the augment function φ n may not be a convex function, the nonnegative linearization errors can be obtained by the choices of the parameter η n . Similar strategy can also be seen in [10,11,15,16]. • Then, the sum of functions f and φ n is regarded as an approximate function for the composite function (1). We construct respectively the cutting plane models for functions f and φ n and regard the sum of the two cutting plane models as the cutting plane model for the approximate function which may be a better cutting plane model. It should be noted that since inexact information are referred, the corresponding cutting plane model may not always be below function f . • Although we design the cutting plane models for functions f and ϕ n , respectively, only one quadratic programming (QP) subproblem needs to be solved at each iteration. By the construction of cutting plane models, we have that the QP subproblem is strictly convex and has unique solution which makes our algorithm more effective. • In the method, we construct the noise management step to deal with the inexact data for function values and subgradient values where the errors are only bounded and need not to vanish. If the noise error is "too" large and the testing condition (22) is not satisfied, we decrease the value of µ to obtain a better iterative point.
• Two polynomial functions with twenty different dimensions and six DC (difference of convex) problems are referred to in the numerical experiment. In exact cases, our method is comparable with the method in [16] and has higher precision. In five different types of inexact oracles, we obtain that the exact case has the best performance and the performance of the vanishing error cases are generally better than that in the constant error cases. We also apply our method to six DC problems and the results show that our algorithm is effective and reliable.
The remainder of this paper is organized as follows. In Section 2, we review some variational analysis definitions and some preliminaries for proximal bundle method. Our proximal bundle method is given in Section 3. In Section 4, we present the convergent property of the algorithm. Some preliminary numerical testings are implemented in Section 5. In Section 6, we give some conclusions.

Preliminaries
In this section, we firstly review some concepts and definitions and then present some preliminaries for a proximal bundle method.

Preliminary
In this subsection, we recall concepts and results of variational analysis that will be used in the latter of the paper. The definition of lower-C k is given in Definition 10.29 in [36]. For completeness, we state it as follows: in which the function F t are of class C k on V and the index set T is a compact space such that F t (x) and all its partial derivatives through order k depend continuously not just on x but on (t, x) ∈ T × V.
If k = 2, F is a lower-C 2 function. lower-C 2 function has a special relationship with convexity, see Theorem 10.33 in [36]. We state its equivalent statement as follow: A function F is lower-C 2 on an open set O ⊆ R N if F is finite on O and for any x ∈ O, there exists a thresholdλ ≥ 0 such that F + λ 2 · 2 is convex on an open neighborhood V of x for all λ ≥λ. Specifically, if the function F is convex and finite-valued, then F is lower-C 2 with thresholdλ = 0.
For nonconvex function h, in the following, we consider that h is a lower-C 2 function. Since functions f and h are all not necessarily smooth, then the composite function (1) is also not necessarily smooth. For proper convex function f , the common subdifferential in convex analysis is used, which is denoted by ∂ f (x) at the point x ∈ R N (see in [37]). For proper and regular function h, we utilize the limiting subdifferential and also denote it by ∂h(x) at point x (see in [36]). The definition of the limiting subdifferential is as follows: In nonsmooth analysis for convex function f , ε − subdifferential at point x k is usually used, which is defined: where ε ≥ 0. In the following, we present the inexact data for function f and give some preliminaries for proximal bundle method.

Inexact Information and Bundle Construction
Bundle methods are much effective for nonsmooth problems and always utilize "black box" to compute function value and one subgradient at each iterative point. It should be noted that the obtained subgradient is not special. Along the iterative progress, the generated points are divided into two styles: null points, used essentially to increase the model's accuracy; serious points, significantly decreasing the objective function (and also improve the approximate model's accuracy). The corresponding iterations are respectively called null steps and serious steps. In the literature, serious points are sometimes called as proxcenter or stability center, denoted byx k(n) . Then, the sequence {x k(n) } is a subsequence of the sequence {x n }. For notation simplification, we writex k =x k(n) .
For function f , the oracle can only provide inexact function value and one subgradient value at each iteration,f l := f x l ,ĝ l f := g x l f , with unknown but bounded inaccuracy. That is, Meanwhile θ l ≤θ, and ε l ≤ε.
According to (2a)-(2c), we have the following relationships Note that we only require the relationship θ l + ε l ≥ 0 holds for each index l. The bundle for function f can be noted as Now we present the cutting plane model of function f by the inexact information: is the current stability center with index k(n) corresponding to its candidate point index and e k f ,l is linearization errors, which measures the difference between cutting planes and the function value computed in the oracle for the current serious point, that is Especially, note that the relation ϕ n (x) ≤ f (x) does not necessarily hold. So the linearization error e k f ,l may be negative. In fact, by (2a), (2b) and (6), e k f ,l satisfies that e k f ,l ≥ −(θ k(n) + ε l ).
Meanwhile, cutting plane model ϕ n may overestimate f at some points. By (2b), the following inequality holds For nonconvex function h, linearization errors may be nonnegative. In bundle methods, nonnegative linearization errors are much important for the convergence. For that we present a local "convexification" technique, similar techniques can also be seen in [15,16,38]. For the convexification parameter η n , its choice is as follows , with l ∈ I n , and x l −x k 2 = 0, (9) where I n denotes an index set, i.e., I n ⊆ {0, 1, 2, · · · } and e k h,l is the linearization error of h, which is defined as follows with The bundle for function h can be noted as Next, we introduce the augment function φ n of f , it is defined by where η n ≥ η min n holds. Note that by the definition of φ n , we have h(x k ) = φ n (x k ). By the calculation of subgradient, we have there exists g l h ∈ ∂h(x l ) satisfying Meanwhile, the linearization error of function φ n is By the choice of the convexification parameter η n , we have e k φ,l ≥ 0 holds for all l ∈ I n . In the following, we regard the sum of functions f and φ n as an approximate function for composite function (1): For (13), we utilize the sum of cutting plane models of functions f and φ n as the cutting plane model. The cutting plane model of the augment function φ n is defined as follows Its equivalent form is Then, the cutting plane model for the approximate function Ψ n is The new iterative point x n+1 is given by the following QP (quadratic programming) subproblem where µ n > 0 is the proximal parameter. Note that x n+1 is the unique solution to (15) by strong convexity. The following lemma shows the relation between the current stationary center and the new generated point. Similar conclusion can also be seen in Lemma 10.8 in [39] which is for convex function. Here we omit the proof. Lemma 1. Let x n+1 be the unique solution to the QP subproblem (15) and proximal parameter µ n > 0. Then, we have where Meanwhile α 1 = (α 1 1 , · · · , α n 1 ) and α 2 = (α 1 2 , · · · , α n 2 ) is the solution to In addition, the following relations hold: In the following, we present the concepts of the predict descent. Concretely, the predict descent for functions f , φ n , Ψ n are stated as follows Note that the predict descent is much important for the convergence of bundle methods. By the definitions of functions φ n and φ n , we have δ φ n+1 ≥ 0. Since inexact data are referred to in the computation of function f , the nonnegativity of δ f n+1 can not be guaranteed. Hence the nonnegativity of δ n+1 can not be guaranteed too.
Next we give the aggregate linearization error which is defined by By the term (ii) in Lemma 1 and the definition of δ n+1 in (18), the following relationship holds where R n = µ n + η n . Next we define the aggregate linearization for approximate model Φ n : Then, the aggregate linearization error can be also expressed as the difference between the value of the oracle at the current serious point and the value of aggregate linearization Φ lin n at that point, that is, Indeed, by the definition of Φ lin n (x), we have By the convexity of function Φ n , the inequality Φ n (x) ≥ Φ lin n (x) holds. So for any x ∈ R N , we havef¯k By (8), the following inequality holds under the condition φ n (x) ≤ φ n (x): Note that the condition φ n (x) ≤ φ n (x) may not be hold if the convexification parameter η n is less that the threshold parameterρ (the function φ n (x) may not be convex), but the choice of η n ensures the nonnegativity of e k φ,l for all l ∈ I n . By the nonnegativity of e k φ,l and (7), the aggregate linearization error satisfies Using the fact that x n+1 is the solution of the QP problem (15) and the definition of predict descent in (18), we have that where the second inequality follows from the nonnegativity of e k φ,l . By (5) and x =x k , we have ϕ n (x k ) −f¯k = max l∈B f n −e k f ,l holds, Note that if only "small" errors have been introduced into the model Φ n , then it holds Then, by (20) and (16), (22) has the following equivalent forms : Next, we present a optimality measure. Concretely, it is that V n := max{ G n , e n+1 }.
By the above discussions, we have From the above inequalities, the smaller µ n may lead to higher probability to make inequality (22) hold. Based on that, we will update the parameter µ n to reduce the effects of errors.
In the next section, we will give our proximal bundle algorithm for the primal composite problem (1) with inexact information.

Algorithm
In this section, we present our adaptive bundle algorithm to composite problem (1) with inexact information. To handle inexact information, similar to [17], we introduce the noise management step. Concretely, when the condition (22) does not hold, µ n is reduced in order to make δ n+2 > δ n+1 and increase the probability of condition (22).

Algorithm 1 (Nonconvex Nonsmooth Adaptive Proximal Bundle Method with Inexact Information for a class of composite optimization)
Step 0 (Input and Initialization):

Step 1 (Model generation and QP subproblem):
Having the current proximal centerx k , the current bundles B f n and B h n with index set I n , and the current proximal parameter µ n and the convexification parameter η n . Having the current approximate models ϕ n (x) and φ n (x). Compute the QP problem (15) to get the next iterative point x n+1 and simplex multipliers (α 1 , α 2 ). Then, compute G n , δ n+1 , e n+1 and V n .

Step 3 (Noise Management):
If relationship (22) does not hold, set N MP = 1, µ n+1 = κµ n , n := n + 1, go to Step 1; otherwise, set N MP = 0 and we call the noise is acceptable and go to Step 4.
Step 5 (Update parameter): Apply the rule to compute η n+1 where η min n is given by (9), written with n replaced by n + 1.

Step 6 (Restart step):
If ψ(x n+1 ) > ψ(x k ) + M 0 holds, then the objective increase is unacceptable; Restart the algorithm by setting η 0 := η n , µ 0 := τµ n , R 0 := η 0 + µ 0 , where i k is the index of serious points. Then, loop to Step 1; otherwise, increase k by 1 in the case of the serious step. In all cases, increase n by 1, and loop to Step 1.

Remark 1.
Note that in Algorithm 1, the update of elements in bundles are not stated clearly. For null step and serious step, the updating strategies are different. When a serious step occurs, the new generated point is regarded as a new proximal center and the corresponding linearization errors in the bundles all should be updated. When a null step emerges, the proximal center keeps unchanged and only the new generated information are added into the bundles to improve the model's accuracy. As the iterations proceed, the elements in the bundles may be too large that reduces the efficiency of the algorithm. Then, the active technology (only the active elementα l 1 and α l 2 are kept in the bundles) and the compression strategy can be adopted. For the compression strategy, the number of elements in the bundles can be at least two, the aggregate information and the new generated information. It should be noted that although the compression strategy does not impair the convergence of the algorithm, it may affect the model's effectiveness if the number of elements in the bundles is too small.
In the following, we will focus on the analysis of Algorithm 1 which indicates the algorithm is well defined. If the algorithm loops forever, three situations may occur (the number of restart steps are finite, which can be see in Lemma 3 ): • an very large loop of noise management between Step 1 and Step 3, driving µ n → 0; • a finite number of serious steps, followed by an very large number of null steps; • an very large number of serious steps.
Next, we firstly give the case of very large loop of noise management.

Lemma 2. If an very large loop between
Step 1 and Step 3 in Algorithm 1 occurs, then the optimal measure V n → 0.

Proof. Suppose an very large loop between Step 1 and
Step 3 begins at iterative indexl. According to the algorithm, this means that for all n ≥l, neither the proximal centerx k(n) = x k(l) nor the approximate models ϕ n , φ n , Φ n change. Hence, when solving sequentially the QP optimization problem (15), only parameter µ n is updated. By the strategy for µ n , we have µ n = κ n−l µl, then it holds µ n → 0 as n → ∞ by κ ∈ (0, 1). Using (24), we have Then, the proof is completed.
Note that if very large noise management steps happen, there is finite number of update for the convexification parameter η n . Then, η n eventually is bounded. Before the last two cases, we show that there is only finite number of restart steps in Algorithm 1. For that, we make an assumption, which can also be founded in [16].
By the definition of lower-C 2 , the compactness of set T and the finite covering theorem, there exists a thresholdρ such that for all η ≥ρ, the augmented function The compactness of T allows us to find Lipschitz constants for functions f and h, named L f and L h respectively (by lower-C 2 function's local lipschitz property and the finite covering theorem). The following lemma indicates that the restart step in Algorithm 1 is finite.

Lemma 3.
Suppose only finite number of the noise management steps occur, Assumption 1 holds and consider the sequence of iterative points {x n } generated by Algorithm 1. The index l k ∈ I n denotes the current proximal center index, then there can be only a finite number of restart steps in Algorithm 1. Hence, eventually the sequence {x k } is entirely in T .
Proof. Firstly, new iterative point x n+1 is well defined by the strong convexity of QP subproblems (15). As functions f and h are lipschitz continuous on the level set T and their lipschitz constants are respective L f and L h , ψ(x) is also lipschitz continuous in the compact set T and one of its lipschitz constants is L := L f + L h . By the lipschitz continuity of ψ, there exists > 0 such that for anyx ∈ {x : ψ(x) ≤ ψ(x 0 )}, the open set B (x) is contained in compact set T (indeed, the choice of = M 0 /L suffices). Note that: where l k ∈ I n , and g l k ∈ ∂( ϕ n + φ n )(x k ). It also holds g l k ∈ ∂ψ(x k ), then g l k ≤ L.
In Algorithm 1, µ n increases when the restart steps and the null steps with N MP = 0 happen, eventually proximal parameter µ n becomes large enough that the relationship 2L µ n < holds. Noting that ψ(x k ) ≤ ψ(x 0 ) for any new serious pointx k generated in Algorithm 1 completes the proof.
Next, we focus on the update of convexification parameter η n . The following lemma shows η n eventually keeps unchanged.

Lemma 4.
Suppose there is a finite number of the noise management steps and Assumption 1 holds. Then, there exists an iteration indexn such that for all n ≥n, the convexification parameter η n stabilizes, i.e., η n =η. Moreover, ifη ≥ρ holds, then for all n ≥n, the augmented function Proof. By the update of the convexification parameter η n in Algorithm 1, we have η n is nondecreasing: either η n+1 = η n or η n+1 = τη min n+1 > τη n . Suppose the sequence {η n } does not stabilize, there must exist an very large number iterations such that the convexification parameter is increased by a factor of at least τ, but that is difficult and can lead to a contradiction. Since there exists an indexñ such that ηñ ≥ρ and h(x) + ηñ 2 x − x k(n) 2 is convex on the compact set T . For this iteration, we have e k h,l + ηñ x − x k(ñ) 2 /2 ≥ 0 for all l ∈ Iñ (the linearization error for a convex is always nonnegative).
The optimality measure in Algorithm 1 for inexact information is different with that in exact case. The following lemma justifies the choice of V n as optimality measure and indicates the accumulate point is an approximate solution of primal problem (1).

Lemma 5.
Suppose there is a finite number of the noise management steps and Assumption 1 holds. Suppose that for an very large subset of iterations I ⊆ {0, 1, · · · }, the sequence {V λ } λ∈I → 0 as I λ → ∞. Let {x k(λ) } λ∈I be the corresponding subsequence of serious points and letx acc be an accumulation point. Ifη ≥ρ holds, thenx acc is an approximate solution to the problem (13) with where Ψ * is the optimal value of function Ψ n .
Note that by the definition of function Ψ n and large enough index n, we have Ψ n (x) = ψ(x) +η 2 x −x acc 2 and Ψ n (x acc ) = ψ(x acc ). By the above discussions, we have ψ(x acc ) ≤ ψ * +η 2 x * −x acc 2 + lim sup λ∈I θ k(λ) + lim sup λ∈I (max l∈I n ε l ) where x * and ψ * are local optimal solution and optimal value respectively. Then,x acc is an approximate solution to the primal problem (1). There are some corollaries from Lemma 5, which are much important for convergent analysis. We state it here but omit its proof.

Corollary 1.
(i) if for some iteration index λ, η λ ≥ρ holds and the optimality measure satisfies V λ = 0, then the serious pointx k(λ) is an approximate solution to problem (13) with (ii) Suppose that the serious point sequence finally stabilizes, i.e., there exists a constant m such that for all λ ≥ m, we havex k(λ) =x k(m) . If η m ≥ρ holds, thenx k(m) is an approximate solution to the problem (13) with Note that if an very large loop of noise management happens after some iterationl and ηl ≥ρ, then the proximal center keeps unchanged. According to (29), the last serious pointx k(l) is an approximate solution to problem (13). From above lemmas and corollary, we have Algorithm 1 is well defined. In the next section, we will study separately the last two cases.

Convergence Theory
In this section, we study separately the last two cases above. The similar proof process can be found in [13,16,17,38,40]. In the following lemma, the second case, i.e., finite serious step with very large null steps, is considered. Lemma 6. Assumption 1 holds and suppose that, after some iterationn, ηn ≥ρ holds and there is no serious step declared in Algorithm 1. Then, there exists a subsequence {x n } n∈I n , such that V n → 0 as I n n → ∞.
Proof. After some iterationn, no serious step is declared. Hence either noise management steps or null steps are done for n ≥n. The serious point does not change, i.e., for all n ≥n, x k(n) =x k(n) . For notational simplicity, we denotex :=x k(n) .
If the number of noise management steps is very large, we have that µ n → 0 as I n n → ∞. The previous proof indicates that there exists a subsequence {x n+1 } n∈I n such that V n → 0 as I n n → ∞.
Suppose there is only a finite number of noise management steps. Since the number of the restart steps is finite, there exists some iterative indexn, such that (22) holds and only null steps occur for all n ≥n. Consequently, {µ n } is a nondecreasing sequence since µ n+1 ∈ [γµ n , µ max ] for all n >n. Meanwhile µ n →μ ≤ µ max as n → ∞. In the following, we show δ n → 0. Let P n be the partial linearization of the the QP model (15), that is, By Lemma 10.10 in [39], we know that the rules to apply selection on the bundles guarantee that Φ lin n (x) ≤ Φ n+1 (x) holds. By the inequality (8), we have Similarly, evaluating P n at x n+2 , and using the fact that µ n+1 ≥ γµ n , we have Furthermore, x n+1 is the unique solution to (15) , then ∇P n (x n+1 ) = 0. By Taylor's expansion, we get Hence the following two equalities hold Using the relationship above, the fact µ n ≥ µn and (30), we obtain Then, the sequence P n (x n+1 ) n≥n is nondecreasing and bounded. Hence the limit exists: Then, the sequence of null steps {x n } is bounded. By (16) and {µ n } is bounded, we have {G n+1 } is bounded (see [39]). Since for n >n, the serious steps test is not satisfied, so by the definition of δ n+1 , we havê holds, then by the definition of partial linearization and Ω :=f By (32), Theorem 1 in [16] and µ n →μ ≤ µ max , we have the right side of the above inequality vanishes as n → ∞. So Ω → 0 holds as n → ∞. Hence Then, δ n+1 → 0 holds as n → ∞. By (24), V n → 0 holds asl < n → ∞. The case of very large serious points generated in Algorithm 1 is considered in the next lemma. For notational convenience, we denote by K the subset of iterations which are chosen as serious points. Letx k andx k * be two successive serious points.

Lemma 7.
Suppose an very large sequence of serious steps is generated in Algorithm 1, Assumption 1 andη ≥ρ hold. Then, the V n → 0 as K n → ∞.
Proof. Since the serious points satisfy the descent condition (25), for two successive serious pointsx k andx k * , applying the descent condition inequality, we havê Rewriting the above inequality, we havê Then, we have the sequence {f¯k + h(x k )} is strictly decreasing. By summing up this inequality for all serious steps, we deduce that Hence, the above inequality deduces δ k → 0. Since (22) holds, then by (24), we have V k → 0 as K k → ∞.

Theorem 2.
Suppose Algorithm 1 loops forever, there are very large number of serious steps andη ≥ρ holds. Then, any accumulation pointx acc of serious points sequence {x k } k∈K is an approximate solution of the problem (13) with Proof. The conclusion follows from Lemma 5 and Lemma 7.

Numerical Results
In this section, we consider two Ferrier polynomial functions (see [10,15,16]) and some DC (difference of convex) functions (see [41][42][43][44]). The section is divided into three parts. We code Algorithm 1 in MATLAB R2016 and run it on a PC with 2.10 GHZ CPU. Meanwhile, the Quadratic programming solver for Algorithm 1 in this paper is the Quadprog.m, which is available in the Optimization Toolbox in MATLAB. Note that the quadratic programming solver is not special and any solver for quadratic programming is accepted.

Two Polynomial Functions
In this subsection, we first present two polynomial functions which are in the form of the objective function (1). The two polynomial functions have the following forms: where ∀ x ∈ R N and for each i = 1, · · · , N. It is clear that the above functions are nonconvex, nonsmooth, lower-C 2 and have 0 as their global minimizer. If we denote h(x) = ∑ N i=1 |ω i (x)| and f (x) = x 2 /2 or f (x) = x /2, the above functions are clear in the form (1). In the following, we adopt initial points x 0 = [1, 1, . . . , 1] and consider the case N ∈ {1, 2, . . . , 19, 20}. The parameters in this subsection are set as follows: m 1 = 0.01, κ = 0.9, R 0 = 10, M 0 = 10, τ = 2, γ = 2, µ max = 10 20 and Tol = 10 −6 . We also stop the algorithm when the iterative number is over 1000. First, we present the numerical results in Tables 1 and 2 for the caseθ = 0 andε = 0, that is the exact case, and compare them with the results in [16]. We call the algorithm in [16] as the RedistProx algorithm. Meanwhile, we adopt I n = {0, 1, 2, · · · }. Note that in the exact case, we stop the progress when δ k ≤ Tol occurs, which is the same in [16]. In exact case, the linearization error e k f ,i and e k φ,i are nonnegative, so the noise attenuation steps never happen. Then, in the numerical results for exact cases, the NNA is always zero, and we omit the NNA in the Tables 1 and 2. The columns of Tables have the following meanings: Dim: the the tested problem dimension, NS: the number of serious steps, NNA: the number of noise attenuation steps, NF: the number of oracle function evaluations used, fk: the minimal function value found, δ k : the value of δ k at the final iteration, ψ * : the optimal function found, V k : the value of V n at the final iteration, RN: the number of restart steps, Nu: the number of null steps.
From Table 3, Algorithm 1 can solve the two Ferrier polynomial functions in higher dimension successfully and have a reasonable and higher accuracy. The parameters µ and η eventually keep unchanged in exact case, which is illustrated by Figure 1. Next, inexact data are referred and we consider the cases of random noises for function value and subgradient. We introduce two kinds of random noises in matlab codes. The first case is θ j = 0.01 * normrnd(0, 0.1), and ε j = 0.01 * normrd(0, 0.1, 1, dim). The code normrnd(0, 0.1, 1, dim) generates random numbers from the normal distribution with mean value 0, standard deviation 0.1 and scalars dim and 1 are the row and column dimensions. We take m 1 = 0.01, κ = 0.9, γ = 2, τ = 2, M 0 = 5 and R 0 = 1000 in this random error case. The algorithm stops when V k ≤ Tol holds or the number of function evaluated is over 1000. The numerical results for this case are report in Table 4.    From Table 4, we have Algorithm 1 can solve ψ 1 (x) and ψ 2 (x) successfully for random errors in a reasonable accuracy. We also focus on the parameters η and µ in the implementation of Algorithm 1. Although the convexification parameter η n eventually keeps unchanged, the update strategy of the proximal parameter µ n is complicated. When the noise management step occurs, parameter µ n is decreasing to reduce the 'noise' errors' impact. When the unacceptable condition happens, we increase the parameter µ n to get a smaller step length. Figure 2 shows the variation of parameters η and µ along NF for ψ 1 (x) with N = 19 in normal random error case. In the following, we introduce the error case of θ j = 0.01 * uni f rnd(0, 1) and ε j = 0.01 * uni f rnd (0, 1, 1, dim). The code uni f rnd (0, 1, 1, dim) gives a similar case with the 'normrd' case. In this case, we adopt two Tol values and two initial proximal parameter values R 0 for different dimension of the variables. Concretely, we take m 1 = 0.01, κ = 0.9, τ = 2, γ = 2, M 0 = 5 and R 0 = 20, Tol = 10 −6 for N ∈ {1, 2, · · · , 8}. For N ∈ {9, 10, · · · , 19, 20}, we take R 0 = 200, Tol = 10 −4 and the other parameters keep unchanged. We also take 1000 as the upper limit of function evaluated. The algorithm stops when V k ≤ Tol holds or the number of function evaluated is over 1000. The numerical results for this error case are reported in Table 5. Table 5. The numerical results of Algorithm 1 for ψ 1 (x) and ψ 2 (x) in unifrnd error case.  Table 5, Algorithm 1 can solve ψ 1 (x) and ψ 2 (x) in a reasonable accuracy for the 'unifrnd' random error case. In this inexact case, we also illustrate the vary of η n and µ n in Figures 3 and 4. The parameter η n is eventually stable. Although the vary of the proximal parameter µ n is complicate for inexact case, the hypothesis about the upper limits for µ n in the numerical experiment is reasonable, which are illustrated in the numerical testing.

Noise's Impact on Solution Accturacy
The error have different types. To analysis the impact of different noise types, we test five different types of inexact oracles: In the numerical experiment, the parameters involved are the same with that in the 'unifrnd' error case. We present the numerical results of no noise error case (exact values case) in Table 6. In the test, for N = 2, we take Tol = 10 −5 . In exact case, the number of NN A is always 0, then we omit the columns of NN A in Table 6.
In the following, we present the numerical results of constant noise error case in Table 7. The parameters are same with that in the NNE case except that for ψ 1 (x) with N = 2, 18. For the case of ψ 1 (x) with N = 2, we take Tol = 10 −1 , R 0 = 200 and the other parameters keep unchanged. For the case of ψ 1 (x) with N = 18, we take Tol = 10 0 , R 0 = 200 and the other parameters keep unchanged.
Next, Table 8 presents the results for the vanishing noise error case. The parameters keep unchanged except that for ψ 1 (x) with N = 2 and ψ 2 (x) with N = 8. In the case of ψ 1 (x) with N = 2, we take TOL = 10 −1 and the other parameters keep unchanged. For the ψ 2 (x) with N = 8 case, we take R 0 = 10 and other parameters keep unchanged. It also should note the results for the case of ψ 2 (x) with N = 19.
In the following, Table 9 presents the results for the constant subgradient noise error case (CGNE). The parameters keep unchanged except the case of ψ 1 (x) with N = 18. In this case, we take Tol = 10 0 and R 0 = 500. Table 10 presents the results for the vanishing subgradient noise error case (VGNE). The parameters keep unchanged except the cases of ψ 1 (x) with N = 2, 7, 14, 18. In these cases, we take Tol = 10 −1 , R 0 = 200 and other parameters keep unchanged.   Next, we compare the numerical performance of different noise type. For comparing the performances, we adopt the formula Precision = |log10(| f k|)| and regard the NNE case as a benchmark for comparison. The cases of constant noise (CNE and CGNE) and exact form (NNE) are referred to in Figure 5. It is clear that the exact case has the best performance and Algorithm 1 can achieve a reasonable accuracy for the constant case. Meanwhile, the performance of CGNE case is better that than of CNE case. Similarly, Figure 6 reports the numerical performance for the vanishing cases (VNE and VGNE) and the exact form (NNE). From Figure 6, the performance of the VGNE case is comparable with that of the exact (NNE) case. Meanwhile, the performance of the vanishing error cases are generally better than that in the constant error cases.

Application to Some DC Problems
In this subsection, we test some unconstrained DC examples to illustrate the effectiveness of Algorithm 1. These examples come from [42][43][44]. Usually, the DC function has the form: ψ(x) = f (x) − g(x). If we take h(x) = −g(x), the problems are in the form of (1).
Relevant information: x 0 = (3, 1, 3, 1) T , x * = (7/3, 1/3, 0.5, 2) T , ψ * = 11/6; Problem 5. Dimension: N = 2, 5, 05j, j = 1, 2, · · · , 20, Relevant information: x 0 = (1/N, 0, · · · , 0) T , x * = (1/N, 1/N, · · · , 1/N) T , ψ * = 0. Problem 6. Dimension: N = 2, 4, For the effectiveness of Algorithm 1, we compare it with the TCM algorithm, NCVX algorithm and penalty NCVX algorithm in [42]. The values of parameters in Algorithm 1 are that: m 1 = 0.01, κ = 0.9, τ = 2, γ = 2, M 0 = 5, R 0 = 10 and Tol = 10 −3 . The results can be seen in Table 11. Meanwhile the * in Table 11 means the obtained value is not optimal. From Table 11, we have that Algorithm 1 can successfully solve these DC problems, however the TCM algorithm cannot solve Problem 4, NCVX algorithm cannot solve Problems 1 and 4 and the penalty NCVX algorithm can not solve Problem 1. Then, Algorithm 1 is reliable. From the obtained function value and the number of function evaluations, Algorithm 1 is also effective. For the above DC problems, we consider the vanishing noise error (VNE) case and the exact (NNE case. We also take 1000 as the upper limit of function evaluated. The algorithm stops when V k ≤ Tol holds or the number of function evaluated is over 1000. For the vanishing noise case, we set θ i = min{0.01, x − x * /100} and ε i = min{0.01, x − x * /100} except Problem 3. In Problem 3, the optimal solutions vary with the dimensions, then we set θ i = min{0.01, x 2 /100} and ε i = min{0.01, x 2 /100}. Table 12 presents the results for the vanishing noise error case (VNE) and the exact case (NNE). The column Pr in Table 12 denotes the index of problems. We also compute the Precision. However it is not suitable since the optimal value is not 0. To deal with that, we take a k = ( f k − f * )/ f * and Precision = |log10(|a k |)|. The numerical results are reported as follows.
From Table 12, Algorithm 1 can successfully solve the above DC problems in a higher precision and is effective to the VNE case in a reasonable accuracy. Then, Algorithm 1 is effective and reliable to the above DC problems. During the numerical experiment, we also focus on the variation of parameters η and µ, which are both bounded and the parameter η eventually keeps unchanged, which can be illustrated in Figures 7 and 8.

Conclusions
In this paper, we consider a special class of nonconvex and nonsmooth composite problem. The problem is constituted by the sum of two functions, one is finite convex with inexact information and the other is a nonconvex function (lower-C 2 ). For the nonconvex function, we utilize the convexification technique and adjust the parameter dynamically to make sure the linearization errors of the augment function nonnegative and construct the corresponding cutting plane models. Then, we regard the sum of the convex function and the augment function as an approximate function. For the convex function with inexact information, we construct the cutting plane model by its inexact information and notice that the cutting plane model may not be below the convex function. Then, the sum of the cutting plane models of the convex function with inexact information and the augment function is regarded as a cutting plane model of the approximate function. After that, we design an adaptive proximal bundle method. Meanwhile, for the convex function with inexact information, we utilize the noise management strategy and update adaptively the proximal parameter to reduce the influence of inexact information. Two polynomial functions including five different inexact types and six DC problems with different dimension are referred to in the numerical experiment. The preliminary numerical results show our algorithm is interesting and reliable. Meanwhile, our method can also be applied to some constraint problems and stochastic programming in the future.