Approximation of the Constant in a Markov-Type Inequality on a Simplex Using Meta-Heuristics

Markov-type inequalities are often used in numerical solutions of differential equations, and their constants improve error bounds. In this paper, the upper approximation of the constant in a Markov-type inequality on a simplex is considered. To determine the constant, the minimal polynomial and pluripotential theories were employed. They include a complex equilibrium measure that solves the extreme problem by minimizing the energy integral. Consequently, examples of polynomials of the second degree are introduced. Then, a challenging bilevel optimization problem that uses the polynomials for the approximation was formulated. Finally, three popular meta-heuristics were applied to the problem, and their results were investigated.


Introduction
Markov-type polynomial inequalities (or inverse inequalities) [1][2][3][4][5][6], similarly to Bernstein-type inequalities [1,[5][6][7][8][9], are often found in many areas of applied mathematics, including popular numerical solutions of differential equations. Proper estimates of optimal constants in both types of inequalities can help to improve the bounds of numerical errors. They are often used in the error analysis of variational techniques, including the finite element method or the discontinuous Galerkin method used for solving partial differential equations (PDEs) [10]. Moreover, Markov's inequality can be found in constructions of polynomial meshes, as it is a representative sampling method, and is used as a tool for studying the uniform convergence of discrete least-squares polynomial approximations or for spectral methods for the solutions of PDEs [11][12][13]. Generally, finding exact values of optimal constants in a given compact set E in R N is considered particularly challenging. In this paper, an approach to determining the upper estimation of the constant in a Markovtype inequality on a simplex is proposed. The method aims to provide a greater value than that estimated by L. Bialas-Ciez and P. Goetgheluck [4].
For an arbitrary algebraic polynomial P of N variables and degree not greater than n, Markov's inequality is expressed as , P E = sup{|P(x)| : x ∈ E}, C > 0, E is a convex compact subset of K N (with non-void interior), and K ∈ R. The exponent 2 in (1) is optimal for a convex compact set E (with non-void interior) [2,4].
In 1974, D. Wilhelmsen [14] got an upper estimate for the constant C < 4 ω E in Markov's inequality (1), where ω E is the width of E. The width of E ∈ R N is defined as ω E := inf{dist(Pl 1 , Pl 2 ) | (Pl 1 , Pl 2 ) ∈ G} [4], where G is the set of all possible couples of parallel planes Pl 1 and Pl 2 such that E lies between Pl 1 and Pl 2 . Furthermore, for any convex compact set E ⊂ R N , the author proposed the hypothesis that C = 2 ω E . M. Baran [3,7] and Y. Sarantopoulos [15] independently proved that Wilhelmsen's conjecture is true for centrally symmetric convex compact sets. In turn, L. Bialas-Ciez and P. Goetgheluck [4] showed that the hypothesis is false in the simplex case and proved that the constant C > 2.842534.
The best constant in Markov's inequality (1) for a convex set was also studied by D. Nadzhmiddinow and Y. N. Subbotin [6] and by A. Kroó and S. Révész [16]. It is worth mentioning that for any triangles in R 2 , A. Kroó and S. Révész [16] generalized the result for a simplex given in [4], considering an admissible upper bound for C ≤ √ 10 ω E , and received an upper estimate value of constant C in (1): C ≤ s(∆) w(E) , where s(∆) := 2 a a 2 + b 2 + 2ab cos γ , with ∆ being a triangle with the sides a, b, c and the angles α , β , γ , respectively, such that c ≤ b ≤ a and γ ≤ β ≤ α .
In the recent literature, many works have considered Markov inequalities. For example, S. Ozisik et al. [10] determined the constants in multivariate Markov inequalities on an interval, a triangle, and a tetrahedron with the L 2 -norm. They derived explicit expressions for the constants on all above-mentioned simplexes using orthonormal polynomials. It is worth noticing that exact values of the constants are the key for the correct derivation of both a priori and a posteriori error estimates in adaptive computations. F. Piazzon and M. Vianello [11] used the approximation theory notions of a polynomial mesh and the Dubiner distance in a compact set to determine error estimates for the total degree of polynomial optimization on Chebyshev grids of the hypercube. Additionally, F. Piazzon and M. Vianello [12] constructed norming meshes for polynomial optimization by using a classical Markov inequality on the general convex body in R N . A. Sommariva and M. Vianello [13] used discrete trigonometric norming inequalities on subintervals of the period. They constructed norming meshes with optimal cardinality growth for algebraic polynomials on the sections of a sphere, a ball, and a torus. Moreover, A. Kroó [17] studied the so-called optimal polynomial meshes and O. Davydov [18] presented error bounds for the approximation in the space of multivariate piecewise polynomials admitting stable local bases. In particular, the bounds apply to all spaces of smooth finite elements, and they are used in the finite element method for general fully nonlinear elliptic differential equations of the second order. L. Angermann and Ch. Heneke [19] proved error estimates for the Lagrange interpolation and for discrete L 2 -projection, which were optimal on the elements and almost optimal on the element edges. In another study, M. Bun and J. Thaler [5] reproved several Markov-type inequalities from approximation theory by constructing explicit dual solutions to natural linear programs. Overall, these inequalities are a basis for proofs of many approximated solutions that can be found in the literature.
In this paper, an approach to determining an approximation of the constant in Markov's inequality using introduced minimal polynomials is proposed. Then, the polynomials are used in the formulated bilevel optimization problem. Since many optimization techniques could be employed to find the constant, three popular meta-heuristics are applied and their results are investigated.
The rest of this paper is structured as follows. In Section 2, minimal polynomials are introduced. Then, in Section 3, their application in the approximation of the constant in a Markov-type inequality is considered. In Section 4, in turn, the approximation is defined as an optimization problem and the problem is solved using three meta-heuristics. Finally, in Section 5, the study is summarized and directions for future work are indicated.
Let E be a compact set in a complex plane that contains at least |α| + 1 points. Based on the classical theorem of Tonelli [26], among the monic polynomials of degree |α|, there exists only one polynomial T |α| that minimizes the supremum norm on E. The T |α| is called the Chebyshev polynomial of order |α|. It is defined as where the infimum is taken with respect to (a 0 , a 1 , ..., a |α|−1 ) ∈ C |α| . In other words, T |α| − x |α| is the polynomial of best approximation to x |α| on E. This polynomial is called the minimal (monic) polynomial of degree |α|.
The complex equilibrium measure for a standard simplex in R N is calculated for

Estimation of the Constant C in a Markov-Type Inequality
Let S 2 be a simplex in R 2 . Consequently, The normalized equilibrium measure S 2 , as a subset of C 2 , is a two-dimensional Lebesgue measure with the weight A(x 1 , . For continuous functions f , g on S 2 , the scalar product is defined as Then, a polynomial Q(x 1 , x 2 ) = x 1 + x 2 − 1/2 is defined. It is the minimal polynomial for the simplex, and P(x 1 , for some real numbers a 0 , a 1 , a 2 , a 3 , a 4 , and a 5 . Thus, 3} are symbols given in Table 1.  We want the polynomial P to be orthogonal to Q so that (P, Q) = 0. Substituting the values given in Table 1 into the expression above, a 0 /7 + a 1 /7 + a 2 /21 + a 3 /5 + a 4 /5 + a 5 /3 = 0 is obtained. Finally, a 5 = −3a 0 /7 − 3a 1 /7 − a 2 /7 − 3a 3 /5 − 3a 4 /5.
Here, a counterexample to the Wilhelmsen hypothesis can be formulated, which is different from that discussed by L. Bialas-Ciez and P. Goetgheluck [4], to support the claim that the approach leads to a better approximation. Counterexample: Consider the following polynomial P(x 1 , It is easy to calculate that P S 2 is reached for point (13/14, 1/14), and its value is 297/392. Similarly, gradP(0, 1) S 2 = √ 58500/28. Hence, this simple and far from optimal approximation value is greater than those of L. Bialas-Ciez and P. Goetgheluck [4]: gradP S 2 /(4 P S 2 ) = 2.850293143.

Optimization Problem in a Markov-Type Inequality
The estimation of the constant C in Markov's inequality can be improved by considering a minimal polynomial and an equilibrium measure. The exemplary calculations of minimal polynomials and their corresponding Markov inequalities on the simplex are shown in Table 2. In the first example of the polynomial, the obtained constant is equal to the value reported by L. Bialas-Ciez and P. Goetgheluck [4]. In the remaining two polynomials, the constants are improved. However, a slight change in the coefficients decreases the constant, as is shown in the fourth example. Therefore, in this paper, the polynomial is proposed as a result of a nontrivial optimization problem in which an improvement of the estimate of the constant C is considered.
In this paper, the constant in Markov's inequality on the simplex for the proposed minimal polynomials is estimated. Assuming that gradP(x, u) S 2 /(4 P(x, u) S 2 ) = C(x, u), where P is one of the following minimal polynomials: and u = [a 0 , a 1 , a 2 , a 3 , a 4 ], in order to determine the greatest value of the function C(x, u), an optimization problem is formulated. R. M. Aron and M. Klimek [27] gave explicit formulas for supremum norms: The vectors u = [a 0 , a 1 , a 2 , a 3 , a 4 ] and x = [x 1 , x 2 ] are regarded as decision variables in the optimization problem. Consequently,û andx denote the vectors that maximize the C(x, u). Considering the relationship between P 1 and P 2 , or P 3 and P 4 , only the coefficients that maximize their corresponding C 1 (x, u) and C 3 (x, u) need to be found, i.e., they are then applied to the remaining polynomials. For polynomials P 3 and P 4 , a 3 = a 4 = 0. The calculation of Equation (2) requires finding the extrema of the polynomial P inside the simplex limited by values (x 1 , x 2 ) ∈ {(0, 0), (0, 1), (1, 0)}. Therefore, the following bilevel optimization problem is solved. Here, one optimization task is nested within the other, i.e., the decision variables of the upper problem are the parameters in the lower one, and the lower problem is a constraint to the upper problem, limiting the feasible u values to those that are lower-level optimal [28].
The problem is given by where argmax denotes the arguments of the function that maximize its value. Finally, the values ofx andû are found as the solutions to the optimization problem. In order to determine the greatest value of C, the nonlinearity of the considered objective function, as well as the existence of multiple local optima, must be taken into account. Here, different values of u can lead to the same value of C, or close values of u can result in a large difference between the corresponding values of C. This can be seen in Figure 1  To solve the defined optimization problem, a suitable method should be determined. Therefore, in this paper, a hybrid approach is proposed. It is justified by the need to find the extrema of the polynomial P inside the simplex for the computation of C. Such extrema are found using the Interior Point Algorithm (IPT) [29,30], i.e.,x is found for an examined u from the upper level. The upper-level optimization is more difficult. Since the considered problem has not been formulated and solved in the literature, it cannot be determined in advance which method should be used to determine the solution. Therefore, taking into account possible time-consuming computations, three popular meta-heuristics are used and compared: the Particle Swarm Optimization method (PSO) [31][32][33], Genetic Algorithm (GA) [34,35], and Simulated Annealing technique (SA) [36,37]. The application of population-based algorithms for solving different bilevel problems is popular (e.g., [28]).
The PSO is a global evolutionary method based on swarm intelligence. It finds the optimal solution by exploring the search space using a set of particles. Each particle represents a candidate solution to the problem. The PSO mimics the movement of a bird flock or a fish school; hence, the dynamics of the set of particles in the solution space are iteratively governed by the following equations [31]: where p i (n) and v i (n) are the position and the velocity of the i-th particle at time n, n = 1, 2, . . . , N; pb i is the best position of the i-th particle, while pB is the best position of the swarm at n. w is the inertia weight, and c 1 and c 2 are coefficients constraining the parts of the equation that simulate cognitive and social factors of the swarm [31][32][33]); rnd 1 and rnd 2 are random numbers drawn from the range [0, 1]. The termination criterion of the algorithm is the number of iterations N or the violation of the minimal acceptable difference in the objective function C between consecutive iterations. Both criteria are commonly used in most heuristic optimization algorithms. Finally, the PSO returnsû = pB(N).
The GA uses a population of individuals (called chromosomes), representing candidate solutions [34,35]. In the GA, at each generation (iteration), a population of individuals is assessed to select the best solutions, which are then used to create the population in the next generation. The assessment is based on the values of the objective function, while new individuals are obtained using crossover and mutation operators. The crossover combines the information carried by two selected parents in one solution by mixing values of their vector representations. The mutation, in turn, introduces random changes to the individual. After the predefined number of generations, the algorithm returns the best individualû.
The simulated annealing, unlike the population-based PSO and GA, uses much fewer resources and requires fewer objective function computations, since it processes only one solution at each iteration [36,37]. The SA mimics the process of heating a metal and slowly lowering its temperature. The algorithm starts from a random solution, which is iteratively modified. At each iteration, reflecting a temperature drop, the algorithm modifies the current solution and compares the values of the objective function before and after the modification. If a new solution is better or the difference between their values is within a predefined range, the modification is accepted. The acceptance of a slightly worse solution helps the method to escape local minima [36,37].
All of these methods are commonly used for solving nonlinear optimization problems with local optima. However, their suitability for a given problem is mostly determined experimentally. In this paper, they are used to determineû. However, in order to calculate C for any u, the lower-level optimumx must be solved each time. In this paper, it is obtained using the IPT algorithm [29,30]. Finally, the considered problem is solved using the hybrid approach [28]. In the IPT algorithm, an optimization problem is defined as where f (x) is objective function to be minimized and h and g are equality and inequality constraints, respectively. The solved approximate problem is defined as where s i are slack variables and µ > 0 is a barrier parameter. As µ converges to zero, the sequence of approximate solutions converges to the minimum of f [29,30].
In the numerical experiments, the solution processed by heuristic algorithms was represented by a five-dimensional vector of floating-point values, since u = [a 0 , a 1 , a 2 , a 3 , a 4 ], and the IPT used a two-dimensional vector x = [x 1 , x 2 ]. To ensure a fair comparison of meta-heuristics, each technique was run 50 times, and a maximum of 25,000 objective function calls were performed in each run. The number of function calls was achieved by selecting a large-enough population size, aiming at the best performance of the PSO and GA. The implementations of the algorithms that can be found in the Matlab Optimization Toolbox are used. Algorithms were run with their default parameters: the PSO (w=[01, 1.1], c 1 = 1.4, c 2 = 1.4), the GA (EliteCount = 0.05 of the population size, CrossoverFraction = 0.8, the scattered crossover, Gaussian mutation, and stochastic uniform selection), and the SA (InitialTemperature = 100, temperature is equal to InitialTemperature * 0.95 iteration ). Note that the values in u were not bounded (i.e., u ∈ R 5 ), and the GA and PSO, unlike the SA, did not require the specification of the starting point of the optimization, as the initial populations of the solutions were selected randomly. For this reason, the SA started from u = [0, 0, 0, 0, 0]. As demonstrated in Table 3, the PSO was able to determine the best C for each polynomial. The algorithms explored the solution space in different ways, obtaining different results. To further show the difficulty of the optimization task for the used algorithms, Figure 2 presents the convergence of the algorithms towards the optimum. Here, the progress of the set of solutions at each calculation of the objective function is reported. As presented, the approaches with the PSO and GA quickly converge, and most of the time is spent on the exploration of the solution space near the optimum. The SA, unlike the population-based methods, processes one solution at each iteration, which influences its convergence rate. The results reveal that the PSO is better suited to the problem than the GA, since it can determine a better C (see Table 3), and it finds it with fewer function calls. Interestingly, the set of solutions in the first function call is more diverse for the PSO, which may contribute to its superior performance, as in the remaining computation time, it is more focused on escaping the local optima. Similarly, it can be seen that the solutions emerging in the run of the SA exhibit large variability. Note that the number of runs (50) and the maximal number of function calls in these experiments (25,000) are purposely greater than could be concluded from the convergence plots to give the methods enough resources to find better solutions. However, the methods often terminate the computations if the change in the best function value is too small. This experiment indicates that the proposed approach with the PSO can provide the best C. Therefore, in the second experiment, the parameters of this algorithm were set to conduct an even better search of the solution space. By increasing the number of particles and generations, the maximal number of function calls was set to 25 million, and the method was run 100 times. Note that the estimated time to finish one run of the PSO with 25 million function calls in Matlab on the i7-6700 K 4 GHz with 64 GB RAM CPU is 335 h. The calculations were run in a parallel manner, and the methods were allowed to terminate the optimization if a function value change was small. They are presented in Table 4. The second experiment improved the already obtained values for C 1 (or C 2 ) and C 3 (or C 4 ) at the fifth and eighth decimal places, respectively (see Table 3). The run in which the best C 1 was obtained required 720 thousand function calls, and C 3 was found in a run after the objective function was calculated 1570 thousand times.

Conclusions
In this paper, it was demonstrated that the best constant C in Markov's inequality for the minimal polynomials on the simplex can be determined as a solution to a nested optimization problem. First, the examples of the polynomials that allow estimation of the constant were proposed. Then, the optimization problem was defined and its solution was investigated. Since it could not be determined in advance which heuristic optimization algorithm should be used, three popular meta-heuristics were adopted and compared. Finally, the optimal value of the constant was obtained with a hybrid of the ITP and PSO.
Future work will focus on the application of the polynomials of the higher degrees to the considered problem. In addition, the usage of the introduced bilevel optimization problem as an evaluation tool for the comparison of the performance of meta-heuristics seems to be a promising future direction.