Convergence of Subtangent-Based Relaxations of Nonlinear Programs

: Convex relaxations of functions are used to provide bounding information to deterministic global optimization methods for nonconvex systems. To be useful, these relaxations must converge rapidly to the original system as the considered domain shrinks. This article examines the convergence rates of convex outer approximations for functions and nonlinear programs (NLPs), constructed using afﬁne subtangents of an existing convex relaxation scheme. It is shown that these outer approximations inherit rapid second-order pointwise convergence from the original scheme under certain assumptions. To support this analysis, the notion of second-order pointwise convergence is extended to constrained optimization problems, and general sufﬁcient conditions for guaranteeing this convergence are developed. The implications are discussed. An implementation of subtangent-based relaxations of NLPs in Julia is discussed and is applied to example problems for illustration.


Introduction
Many process engineering problems may be formulated and approached as optimization problems. Typical examples include seeking optimal operating conditions for combined cooling, heating, and power (CCHP) systems [1][2][3]; seeking optimal schedules for thermal power plants [4,5]; and maximizing recovered energy in heat-waste recovery systems [6,7], subject to thermodynamic and financial constraints.
Process models may exhibit significant nonconvexity, making them challenging to analyze, simulate, and optimize. For example, periodic trigonometric functions are often employed to estimate the total incident radiation in solar power generation systems [8]. In crude-oil scheduling problems, bilinear terms are typically used to model constraints involving compositions of mixtures transferred from holding tanks [9]. In the modeling of counterflow heat exchangers, logarithmic mean temperatures are generally used to describe the temperature difference between hot and cold streams [10]. Individual process units such as compressors [11] and hydroelectric generators [12] are often modeled using nonconvex correlations. Moreover, models of thermodynamic properties often introduce nonconvexity; such models include the Peng-Robinson equation of state [13], the Antoine equation, and descriptions of the dependence of heat capacity on temperature [7]. Physical property models involving empirical correlations may also introduce nonconvexities through fractional or logarithmic terms, such as models of heat transfer coefficients [14] and friction factors in fluid mechanics [11].
Due to the size and nonconvexity of chemical process models, stochastic global search algorithms [15] are often used in practice to optimize them; such methods include genetic on subgradients of an underlying convex relaxation scheme. These sufficient conditions are straightforward to check and, thereby, provide a useful and novel guarantee that a broad class of outer approximation methods will not introduce clustering into a branch-and-bound method for global optimization. Practical recommendations for constructing appropriate piecewise-affine relaxations are discussed. Second, these sufficient conditions are extended to nonlinear programs (NLPs) with nontrivial constraints. It is shown that these constraints can introduce pathological behavior; this behavior is shown nontrivially to be suppressed once sufficiently strong regularity assumptions are imposed. This development is, to our knowledge, the first discussion of a second-order pointwise convergence for optimization problems with nontrivial constraints. Third, the discussed relaxation approaches are implemented in Julia [39], and it is shown through several numerical examples that even a simple branch-and-bound method based on the described piecewise-affine relaxations can compete with the state-of-the-art.
The remainder of this article is structured as follows. Section 2 summarizes relevant established results and methods, including a recent result [36] concerning subtangent convergence. Section 3 extends this result to piecewise-affine relaxations based on subtangents, thus demonstrating their rapid convergence to the underlying system. The notion of second-order pointwise convergence is extended to constrained optimization problems in a useful sense, and sufficient conditions are developed under which this convergence is guaranteed. Practical considerations when constructing piecewise-affine relaxations are discussed. Section 4 places the results of this article in context with outer approximation methods [40][41][42] and recent subtangent-based tightening methods [34]. Section 5 presents an implementation of this article's approximation approach in the language Julia; this implementation is applied to various test problems for illustration.

Background
Relevant definitions and established results are summarized in this section. Throughout this article, scalars are denoted as lowercase letters (e.g., p), vectors are denoted as boldfaced lowercase letters (e.g., z), their components are denoted with subscripts (e.g., z i ), inequalities involving vectors are to be interpreted component-wise, sets are denoted as uppercase letters (e.g., Q), and the standard Euclidean norm · and inner product ·, · are employed.

Branch-and-Bound Optimization Using Convex Relaxations
Branch-and-bound methods [24,43,44] are deterministic global optimization algorithms that guarantee the location of a global minimum for a nonlinear program (NLP) to within a specified tolerance. These methods proceed by evaluating upper and lower bounds of the objective function, which are refined progressively as the domain is subdivided. Along the way, knowledge concerning the feasibility and the best bounds determined thus far may be used to exclude certain subdomains from consideration. Once the best bounds are equal up to the specified tolerance, the method terminates successfully.
From here, only continuous problems are considered. If branch-and-bound is applied to such a minimization problem, the supplied upper bounds can be any local minima obtained by local NLP solvers. Furnishing appropriate lower bounds, on the other hand, requires global knowledge of the model and constraints. A common approach is to replace the objective function and constraints with convex relaxations, which are underestimators that are convex and, therefore, relatively easy to minimize to global optimality using local methods. Thus, the local optimization of a convex relaxation yields a lower bound for the original problem. Several established methods may be used to generate convex relaxations, such as McCormick relaxations [30,31], αBB relaxations [28,29], and relaxations obtained by Auxiliary Variable Methods [45]. Each of these relaxation schemes can be called a scheme of convex relaxations, as defined below. Definition 1 (from Bompadre and Mitsos [26]). Let Q ⊂ R n be a nonempty compact set and f : Q → R be a continuous function, and define IQ := IR n ∩ Q. Suppose that, for each interval W ∈ IQ, a function f C W : W → R is convex and underestimates f on W. Then, the collection { f C W } W∈IQ is a scheme of convex relaxations of f on Q.

Convergence of Convex Relaxation Schemes
The following definition from Bompadre and Mitsos [26] formalizes a notion of the convex relaxations of a function approaching that function rapidly as the underlying subdomain shrinks.
Second-order pointwise convergence is beneficial in global optimization to avoid cluster effects [25,27], in which a branch-and-bound method must divide excessively often on subdomains that include or are near a global minimum.

Subtangents of Convex Relaxations
Standard definitions of subtangents, subgradients, and subdifferentials for convex functions are as follows.
Definition 3 (adapted from Rockafellar [46]). Given a convex set Z ⊂ R n and a convex function φ : in which case the affine mapping z → φ(ζ) + s, z − ζ is a subtangent of φ at ζ. The subdifferential ∂φ(ζ) ⊂ R n is the collection of all subgradients of φ at ζ.
Consider a particular convex relaxation scheme with second-order pointwise convergence. A recent result by Khan [36] showed that, although any convex relaxation dominates all of its subtangents, subtangents for this relaxation scheme will nevertheless inherit the second-order pointwise convergence under mild assumptions. This result is reproduced below; it employs the constructions in the following definition and assumption. Definition 4 (from Khan [36]). For any interval W := [w, w] ∈ IR n , denote the midpoint of W as w mid := 1 2 (w + w) ∈ W. For any α ∈ [0, +∞), define a centrally-scaled interval of W as where the addition and subtraction are in the sense of Minkowski.
Observe that w mid ∈ s α (W) regardless of the value of α.

Assumption 1.
Consider a nonempty open set Z ⊂ R n and a function f : Z → R of which the gradient ∇ f exists on Z and is locally Lipschitz continuous. Let Q ⊂ Z be a nonempty compact set, and suppose that a scheme of convex relaxations { f C W } W∈IQ of f on Q has a second-order pointwise convergence.
Theorem 1 (adapted from Khan [36]). Suppose that Assumption 1 holds, and choose some α ∈ [0, 1). For each W ∈ IQ, choose some ζ W ∈ s α (W) and a subgradient σ W ∈ ∂ f C W (ζ), and consider a subtangent f C sub,W Then, { f C sub,W } W∈IQ is also a scheme of convex relaxations of f on Q with a second-order pointwise convergence.
Observe that this theorem does not place any differentiability requirements on the underestimators f C W and does not require the twice-continuous differentiability of f . Choosing each ζ W to be an element of s α (W) rather than W is crucial; Khan [36] presents a counterexample in which second-order pointwise convergence is not obtained when this requirement is violated. Nevertheless, Khan [36] also shows that if the entire scheme of convex relaxations is uniformly continuously differentiable in a certain sense, then we may replace s α (W) with W without affecting the conclusion of Theorem 1.

New Convergence Results
This section extends the second-order pointwise convergence result of Theorem 1 to piecewise-affine outer approximations of functions and nonlinear programs (NLPs), as the underlying domain is reduced in size. This result complements an analysis by Rote [37], who considered the effect on convergence of including additional cutting planes rather than shrinking the domain. Sufficient conditions for a second-order pointwise convergence of NLPs with nontrivial constraints are also developed and presented.

Relaxations of Functions
To proceed, we must first strengthen Assumption 1.

Assumption 2.
Suppose that Assumption 1 holds and that the convex relaxations f C W are Lipschitz continuous on their respective domains.
Definition 5. Suppose that Assumption 2 holds, and consider some α ∈ [0, 1). For each interval W ∈ IQ, choose a point ζ W ∈ s α (W) and a subset B W ⊂ W. Define a subgradient-cutting mapping: where, in each case, σ W (ζ) is a finite subgradient of f C W at ζ.
Observe that any subgradient-cutting mapping f C The claim then follows from Theorem 1.

Relaxations of Constrained Optimization Problems
Established definitions and analyses concerning second-order pointwise convergence [26,27,35,36,38] have focused on applications in global optimization problems with only bound constraints. To extend this analysis, this section considers the second-order pointwise convergence of optimization problems with nontrivial inequality constraints and applies Theorem 2 to handle piecewise-affine relaxations. Equality constraints may be regarded similarly, though with some care. As Example 1 below will illustrate, nontrivial constraints introduce obstacles to analysis that are not present in the box-constrained case, and must be circumvented by enforcing additional assumptions.
This section considers optimization problems that are represented as constrained nonlinear programming problems (NLPs) as follows. are Lipschitz continuous on W. Let F denote the collection of all sets W ∈ IQ for which there exists z ∈ W that satisfies g(z) ≤ 0, and assume that F is nonempty.
Supposing that Assumption 3 holds, consider the following NLP for each W ∈ F. (Here "subject to" is abbreviated as "s.t.".) min For each W ∈ F, Weierstrass's Theorem implies the NLP (2) has at least one solution and has a finite optimal objective function value v(W). Replacing the objective function and constraints in (2) by the convex underestimators provided in Assumption 3, we obtain the following auxiliary NLP.
Again, Weierstrass's Theorem implies the convex NLP (3) has at least one solution and has a finite optimal objective function value v(W). Since the objective function and constraints of Equation (2) were replaced in (3) by underestimators, (3) is a relaxation of (2) in that v(W) ≤ v(W) for each W ∈ F. Such relaxations are commonly employed in deterministic methods for constrained global optimization [44]. Equality constraints in (2) may be relaxed analogously by replacing them with inequalities involving a convex underestimator and a concave overestimator; this was not presented here for simplicity.
We will explore conditions under which second-order pointwise convergence of the schemes of underestimators for f and g i in Assumption 3 translate to second-order pointwise convergence of (3) to (2), in the following sense. Definition 6. Suppose that Assumption 3 holds, and consider the optimal-value mappings v and v as defined above. Over all W ∈ F, the relaxed NLP (3) exhibits second-order pointwise convergence to the original NLP Since branch-and-bound methods require only bounding and feasibility information to proceed, constrained second-order pointwise convergence in this sense plays the same role in eliminating clustering [27] as second-order pointwise convergence for bound-constrained global optimization. Nontrivial constraints may also be leveraged in range-reduction techniques, though we will not consider these further.
In light of Theorem 2, the convex underestimators f C W and g C i,W in Assumption 3 may be chosen to be subgradient-cutting mappings. In this case, the relaxed NLP (3) may be rearranged to exploit its structure for efficiency. Suppose that points ζ W ∈ s α (W) and subsets B W ⊂ W are chosen as in Definition 5 and that analogous points ζ are Lipschitz continuous, such subgradients exist.) Then, for each W ∈ IQ, if subgradient-cutting mappings are employed as the estimating schemes in Assumption 3, the relaxed NLP (3) has the same optimal objective function value as the following linear program (LP): Such an LP can generally be solved more efficiently than a typical NLP of a similar size; these relaxations will be applied to several numerical examples in Section 5 below.
We might hope that the relaxed NLP (3) exhibits second-order pointwise convergence to (2) with no additional requirements beyond Assumption 3. However, the following counterexample shows that this is not always the case. Example 1. Consider sets Z := R and Q := [−1, 0] ⊂ Z, a function f : Z → R for which f (z) ≡ z, and a function g : Z → R for which g(z) ≡ z 2 . Consider the following schemes of convex estimators for f and g over all intervals W ∈ IQ: Observe that f and g are convex and smooth, as are their convex underestimators, and that Assumption 3 With these choices of functions and sets, it is readily verified that each W ∈ F and the NLP (2) is trivially solved on W for each ∈ [0, 1] to yield an optimal objective function value of with a minimum of z * := 0 in each case.
Next, for each ∈ [0, 1], observe that and so the constraint g C W (z) ≤ 0 is satisfied for each z ∈ W . Hence, the relaxed NLP (3) is trivially solved and so the relaxed NLP (3) does not exhibit second-order pointwise convergence to (2).
The above example shows that nontrivial constraints may determine whether second-order pointwise convergence holds, even when schemes of convex underestimators for the objective and constraints are available with second-order pointwise convergence and even when the original NLP (2) is convex. This is essentially because, as in the above example, it is possible for a small perturbation of a nontrivial inequality constraint to change the corresponding feasible set significantly. In such cases, it is possible for the optimal objective function value of (3) to approach the optimal objective function value of (2) slowly as W shrinks, if at all. A nonconvexity of the components of g may present additional obstacles but is not the primary hindrance here.
A sufficient condition for the second-order pointwise convergence of (3) may, nevertheless, be obtained by strengthening Assumption 3 as follows. The extra requirements of this assumption are adopted from Shapiro [47], who used similar requirements to rule out pathological behavior in a perturbation analysis for NLPs. These requirements are essentially second-order sufficient conditions to ensure that the feasible set of (2) is somewhat stable under perturbations. Recall that no such additional assumptions were needed in the bound-constrained case explored in Theorem 2.
Assumption 4. Suppose that Assumption 3 holds and that the functions f and g are twice-continuously differentiable on Z. For each W ∈ F, express the bound constraints in the NLPs (2) and (3) as explicit inequality constraints, and append these to g. For each W ∈ F, let M(W) ⊂ W denote the optimal solution set for the NLP (2), and suppose that all of the following conditions are satisfied for each y ∈ M(W).
1. The NLP (2) satisfies the linear-independence constraint qualification (LICQ). That is, with I(y) denoting the subset of {1, . . . , m} for which g i (y) = 0, the gradients ∇g i (y) for i ∈ I(y) are linearly independent. Hence, as shown by Rockafellar [46], there exist unique multipliers µ i (W) ≥ 0 (depending on W but not y) for each i ∈ {1, . . . , m} for which 2. µ i (W) > 0 for each i ∈ I(y). 3. Any vector w ∈ R n that satisfies both of the following conditions • w T ∇g i (y) = 0 for each i ∈ I(y), and is also an element of the linear space tangent to M(W) at y. Theorem 3. If Assumption 4 holds, then the relaxed NLP (3) exhibits second-order pointwise convergence to (2) over all W ∈ F, as does the LP (4) based on subgradient-cutting mappings.
Proof. Under Assumption 4, Theorem 2 shows that the LP (4) is, in each case, equivalent to an instance of (3) (adopting different schemes of underestimators that nevertheless still satisfy Assumption 4). Hence, it suffices to show only that the relaxed NLP (3) exhibits second-order pointwise convergence to (2). By Assumption 4, there exists τ > 0 for which, for each W ∈ F, For each W ∈ F, consider the following variants of the NLP (2): and min and let v † (W) and v † (W) denote the respective optimal objective function values for (5) and (6) (Such a choice is always possible due to Weierstrass's Theorem and Lemma 1 in Section 5 of Filippov [48].) By comparing the feasible sets and objective functions of the constructed NLPs, observe that v(W) ≥ v(W) ≥ v † (W) for each W ∈ F. Hence, it suffices to establish the existence of τ v > 0 for which, for each Now, by construction of ξ † (W) and noting that (5) and (6) share the same feasible set, observe that f (ξ † (W)) ≤ f (ξ † (W)). Thus, for each Next, under Assumption 4, observe that (2) satisfies the hypotheses of Corollary 3.2 by Shapiro [47]. Hence, M is "upper Lipschitzian" in the sense of [47]; since Q is compact, this implies the existence of With k f denoting a Lipschitz constant for f on Q, it follows that, for each Adding Equations (7) and (8) yields for each W ∈ F, as required.
To our knowledge, this is the first result establishing sufficient conditions for second-order pointwise convergence for relaxations of NLPs with nontrivial constraints. While Assumption 4 is somewhat stringent, it crucially does not require each optimal solution set M(W) to be a singleton, as is typically assumed in quantitative sensitivity analyses for NLPs. Observe that the NLP considered in Example 1 does not satisfy the LICQ and, thus, does not satisfy Assumption 4.
The proof of Theorem 3 does not make use of the convexity of the relaxations f C W and g C i,W at all, beyond establishing the independence of the multipliers µ i (W) to y in Assumption 4. It may be possible-though nontrivial-to exploit this convexity to weaken the hypotheses of Theorem 3. While any equality constraint h(z) = 0 may be represented as the pair ±h(z) ≤ 0 of inequality constraints, this transformation yields an NLP that can never satisfy the LICQ and, thus, cannot satisfy Assumption 4. One way to extend Theorem 3 to NLPs with equality constraints without violating the LICQ is to relax each equality constraint h(z) = 0 by replacing it with two weaker inequality constraints: for small > 0. Affine equality constraints may alternatively be eliminated by changing variables appropriately.

Constructing Subgradient-Cutting Mappings
Constructing subgradient-cutting mappings for functions or outer-approximating LPs (4) for NLPs in practice involves making several decisions; this section presents some suggestions for handling these decisions.
The simplest way to generate suitable points ζ W ∈ s α (W) is to choose ζ W to be the midpoint of the interval W. This choice is valid regardless of α and is straightforward to compute.
The sets B W on which subgradients are evaluated may, in principle, be chosen in any manner; we suggest using points for which data is already available if possible or leveraging any prior knowledge concerning which points might be useful. In the absence of any such prior knowledge, Latin hypercube sampling (as described by Audet and Hare [49]) is a straightforward method for generating pseudo-random points that, in a sense, sample all of W. Intuitively, including more elements in B W results in a larger LP (4) and demands more subgradient evaluations to set up but also yields a tighter relaxation (4) of (2). We consider the effect of the number of linearization points in B W on several test problems in Section 5 below.
As described earlier, several established relaxation schemes may be used to construct schemes of underestimators { f C W } W∈IQ and {g C i,W } W∈IQ with second-order pointwise convergence. Subgradients may then be computed using standard automatic differentiation tools [50] when all functions involved are continuously differentiable; in this case, subgradients coincide with gradients. Otherwise, if nonsmooth relaxations are employed, then dedicated nonsmooth variants of automatic differentiation [30,51] may be applied to compute subgradients efficiently.

Comparison with Established Relaxation Methods
In this section, we compare the subtangent-based approach of this article with established outer approximation (OA) methods [40][41][42]52] and a recent subgradient-based enhancement of the McCormick relaxations by Najman and Mitsos [34].
OA approximates nonlinear convex relaxations as the pointwise maximum of the collection of their affine supports [42]. There are several established schemes for constructing the affine supports in OA, such as interval bisection, slope bisection, maximum error rules, and cord rules [42]. The convergence result of this paper applies to all of these schemes in principle, assuming that they are based on an underlying relaxation scheme with second-order pointwise convergence and that a subdomain's midpoint is always chosen as one linear support. These complement the convergence results in References [37,42], which show that for any fixed subdomain, the piecewise-affine convex relaxations constructed by combining the linear supports converge quadratically to the dominating convex relaxations as the number of linearization points increases. Najman and Mitsos [34] propose a tighter variant of McCormick relaxations of compositions of known intrinsic functions (which include the standard scientific calculator operations). This variant provides improved interval bounds for every composed function compared with classic McCormick relaxations and, thus, typically results in tighter overall relaxations for the composite function. The improved bounds are described by constructing simple affine or piecewise-affine relaxations for each factor and minimizing or maximizing these relaxations, resulting in tighter bounds. This process can be repeated based on the new bounds to get further tighter bounds of each factor. Then, tighter overall relaxations are approximated by their subtangents at midpoints of intervals. In the current article's approach, on the other hand, the overall composite function is approximated by piecewise-affine relaxations; individual functions are not considered directly, and there is no requirement to use McCormick relaxations. Both approaches can be combined; the tight relaxations of the Najman-Mitsos method may be employed as the underlying convex relaxations of which the subtangents are used in Theorem 2. Notably, the numerical results of Najman and Mitsos [34] suggest that it is not computationally worthwhile to select linearization points (such as the sets B W we consider), optimally using optimization solvers; based on this result, we do not recommend choosing the sets B W by solving a nontrivial optimization problem.

Implementation and Examples
This section discusses an implementation of the outer-approximating LPs (4), which was used to illustrate the convergence features discussed in this article.

Implementation in Julia
A numerical implementation in Julia v0.6.4 [39] was developed to solve nonconvex NLPs of the form (2) to global optimality, by using outer-approximating LPs of the form (4) to obtain lower bounds in an overarching branch-and-bound method for a deterministic global optimization. For simplicity, the central-scaling factor α is set to 0 in each case, so the midpoint of each interval W is always selected to be one of the linearization points at which the subgradients are evaluated. For each i ∈ {1, . . . , m}, the sets B This implementation uses EAGO v0.1.2 [53,54] to carry out a simple branch-and-bound method (without any range reduction) and to compute convex relaxations of nonconvex functions. JuMP v0.18.2 [55] is used as an interface with optimization solvers; CPLEX v12.8 is used to solve LPs, and IPOPT v3.12.8 [56] is used to solve NLPs. Convex relaxations and subgradients are computed automatically in EAGO using either the standard McCormick relaxations [30,31] or the differentiable McCormick relaxations [38], combined with the interval-refining algorithm of Najman and Mitsos [34]. Latin hypercube sampling, as described by Audet and Hare [49], is adopted to select linearization points pseudo-randomly while sampling the entire subdomains in question. All numerical results presented in the following section were obtained by running this implementation on a Dell Precision T3400 workstation with a 2.83 GHz Intel Core2 Quad CPU. One core and 512 MB of memory were dedicated to each job.

Convergence Illustration
A first numerical example illustrates the second-order convergence results of this article. [38], consider the function

Example 2. As in Reference
This function is plotted in Figure 1, along with a series of subgradient-cutting relaxations constructed as described in Definition 5. These relaxations were evaluated using the implementation described above on intervals of the form X( ) := [0.5 − , 0.5 + ], for each := 0.4(2 −k ), where k ∈ {1, . . . , 20}. With φ X( ) denoting the relaxation of f constructed on X( ), Figure 2 presents a log-log plot of the maximum discrepancy sup x∈X( ) ( f (x) − φ X( ) (x)) against the width of the interval X( ), together with a reference line with a slope of 2. The agreement between these two suggests that the convex relaxations φ x ( ) exhibit second-order pointwise convergence to f as → 0 + .

Optimization Test Problems
The described implementation was applied to three nonconvex NLP instances from the MINLPLib benchmark library [57], to examine the performance of LP outer approximations in a global optimization setting, and to consider the effect of including more linearization points in each set B W . Small but relatively difficult nonconvex problems were chosen in each case. For comparison, these problems were also solved using the state-of-the-art global optimization solver BARON v18.5.8 [58,59] in GAMS. GAMS formulations for these problems were downloaded from the MINLPLib website [57] and were adapted to include bounds on any unbounded variables. These problems were considered to be solved to global optimality (by either our implementation or BARON) whenever the determined upper and lower bounds were equal to within either an absolute tolerance of 10 −6 or a relative tolerance of 10 −3 . In each of these cases, our implementation (which does not employ range reduction) outperformed BARON with range reduction disabled; while not conclusive, this does suggest that piecewise-affine outer approximations of McCormick relaxations without auxiliary variables are competitive. Example 3. The first considered NLP instance was bearing, which has 14 continuous variables, 10 equality constraints, and 3 inequality constraints, and is as follows.
Since finite upper bounds of some decision variables were not provided in MINLPLib, we set them to reasonable values for branch-and-bound purposes. The considered domains for the variables x are shown in Table 1. Applying our implementation in EAGO with lower bounds computed using LP outer approximations Equation (4), this problem is solved to a global minimum of t * = 1.931. Table 2 shows the impact on solution time of the number of linearization points (1 + |B W |) whose subgradients are used to construct the subgradient-cutting mappings. As described above, these were chosen pseudo-randomly using Latin hypercube sampling. Among the numbers of linearization points considered, 8 was time-optimal for this example; we expect this is because too few linearization points yields looser relaxations and poorer lower bounds early in the branch-and-bound process, whereas too many linearization points yields larger LP outer approximations (4) for lower bounding, requiring excessive computational time to solve. Our implementation was run both with and without the Najman-Mitsos interval-refining algorithm [34]; Table 3 shows comparable solution times in each case.  This problem was also solved with BARON in GAMS, both with and without range reduction, with the results shown in Table 4. (Our implementation did not incorporate range reduction.) No solution was identified before the allocated time of 1000 s in either case. We were unable to determine why; this did not appear to be an issue of setting tolerances incorrectly (such as constraint satisfaction tolerances), and the lower-bounding LP statistics were not readily available. Perhaps the highly nonlinear logarithmic terms interfere with BARON's outer approximation methods in this case.
Using our implementation, the problem is solved to a global minimum of t * = −2.67 × 10 −6 . Table 5 summarizes the results of our implementation, including the impact on solution time of the number of linearization points and the interval-computing algorithm. No range reduction was employed. This case study also suggests that using multiple linearization points can reduce the computational time required. BARON could not solve this problem in 1000 s without range reduction, throwing an error suggesting the bounds were inappropriate, but did solve it with range reduction during preprocessing, obtaining the same optimal solution as our implementation. The corresponding statistics are displayed in Table 6.
Our Julia implementation solves this problem to a global minimum of t * = −1161.34; the solution statistics are listed in Table 7. Qualitatively, the results display similar trends to Example 3. Table 7. The solution statistics for our implementation applied to (11).  When range reduction was turned off in BARON, the upper and lower bounds did not converge by the end of the allocated 1000 s. (Recall that range reduction is not yet included in our implementation.) The best lower bound obtained by that time wast = −3085.43. With range reduction, BARON solved the problem rapidly. The solution statistics for BARON are presented in Table 8.

Conclusions and Future Work
This article shows that, under mild assumptions, if piecewise-affine convex relaxations of functions are constructed using subtangents of an original relaxation scheme, then the piecewise-affine relaxation scheme will inherit second-order pointwise convergence from the original scheme. A foundation for the second-order pointwise convergence of NLPs with nontrivial constraints is also provided, along with sufficient conditions motivated by Shapiro [47] for exhibiting this convergence. Combining these results, if outer-approximating LPs (4) are constructed from NLPs using subtangents, and the NLP-specific sufficient conditions for convergence are satisfied, then these LPs inherit second-order pointwise convergence. Ultimately, these results motivate using subtangents in practice: even though their use weakens a relaxation scheme, they do not weaken the scheme enough to introduce clustering into a branch-and-bound-style global optimization method when the developed sufficient conditions are satisfied. Moreover, their simplicity makes them easier to use in practice, as our implementation in Julia via EAGO demonstrates.
Future work will involve developing the Julia implementation further and further developing the convergence analysis for NLPs to better make use of convexity. Moreover, second-order pointwise convergence does not address a branch-and-bound method's performance early on in the branch-and-bound tree; we expect that developing tighter convex relaxation schemes will still be beneficial to the performance of global optimization algorithms.