Using parameter elimination to solve discrete linear Chebyshev approximation problems

We consider discrete linear Chebyshev approximation problems in which the unknown parameters of linear function are fitted by minimizing the least maximum absolute deviation of approximation errors. Such problems find wide application in the solution of overdetermined systems of linear equations that appear in many practical contexts. The least maximum absolute deviation estimator is extensively used in regression analysis in statistics when the distribution of errors has bounded support. To derive a direct solution of the approximation problem, we propose an algebraic approach based on a parameter elimination technique. As a key component of the approach, an elimination lemma is proved that allows to handle the problem by reducing to a problem with one unknown parameter eliminated together with a box constraint imposed on this parameter. We apply this result to derive direct solutions of problems of low dimension, formulated as linear regression problems with one and two parameters. Furthermore, we develop a procedure to solve multidimensional approximation (multiple linear regression) problems. The procedure is based on a direct solution method that comprises two phases: backward elimination and forward determination (substitution) of the unknown parameters. We describe the main components of the procedure and estimate its computational complexity. We implement symbolic computations in MATLAB to obtain exact solutions for two illustrative numerical examples.


Introduction
Discrete linear Chebyshev (minimax) approximation problems where the errors of fitting the unknown parameters are measured by the Chebyshev (max, infinity, uniform or L ∞ ) norm are of theoretical interest and practical importance in many areas of science and engineering. Application of the Chebyshev norm leads to the least maximum absolute deviation of errors as the approximation criterion and dates back to Laplace's classical work [7, book 3, chap. V, §39] (see also [10,25]).
An important area of applications of the discrete linear Chebyshev approximation is the solution of overdetermined systems of linear equations [27,2,21] that appear in many practical contexts. The least maximum absolute deviation estimator is widely used in regression analysis in statistics when the distribution of errors has bounded support. Specifically, the Chebyshev estimator is known to be a maximum likelihood estimator if the error distribution is uniform [22,2,9,15]. Moreover, this estimator can be useful even if errors are not uniform, but controlled in some way and small relative to the observed values. Examples of particular applications include problems in nuclear physics [13,4], parameter estimation of dynamic systems [20,1] and finance [14].
To solve the Chebyshev approximation problem, a number of approaches are known which apply various iterative computational procedures to find numerical solutions (see a comprehensive overview of the algorithmic solutions given by [11,12,9,23]). For instance, the approximation problems under consideration can be reduced to linear programs and then solved numerically by computational algorithms available in the linear programming, such as the simplex algorithm and its variations. For linear programming solutions as well as other related algorithms one can consult early works [28,26,22,27,2,29,24,3] as well as more recent publications [16,5,6,8].
Along with existing iterative algorithms that find use in applications, direct analytical solutions of the linear Chebyshev approximation problem are also of interest as an essential instrument of formal analysis and treatment of the problem. A useful algebraic approach to derive direct solutions of problems that involve minimizing the Chebyshev distance is proposed in [17,18,19]. The approach offers complete solutions of the problems in the framework of tropical (idempotent) algebra, which deals with algebraic systems with idempotent operations. The solutions obtained in terms of tropical algebra are then represented in the usual form.
In this paper, we reshape and adjust algebraic techniques implemented in the above mentioned approach to develop a direct solution of the discrete linear Chebyshev approximation problem in terms of conventional algebra. As a key component of the proposed method, an elimination lemma is proved that allows to handle the problem by reducing to a problem with one unknown parameter eliminated and a box constraint imposed on this parameter. We apply this result to derive direct solutions of problems of low dimension, formulated as linear regression problems with one and two parameters.
Furthermore, we construct a procedure to solve multidimensional approximation (multiple linear regression) problems. The procedure is based on a direct solution method that comprises two phases: backward elimination and forward determination (substitution) of the unknown parameters. We estimate computational complexity and memory requirements of the procedure, and implement symbolic computations in the MATLAB environment to obtain exact solutions for two illustrative numerical examples.
The rest of the paper is organized as follows. In Section 2, we formulate the approximation problem of interest in both scalar and vector form. Section 3 presents the main result of the paper, which offers a reduction step to separate the problem into a problem of lower dimension by eliminating an unknown parameter, and a box constraint for the parameter eliminated. We apply the obtained result in Section 4 to derive direct explicit solution for linear regression problems with one and two unknown parameters. In Section 5, we describe a computational procedure to solve linear approximation problems of arbitrary dimension and discuss its computational complexity. In Section 6, software implementation is discussed and numerical examples are given. Section 7 presents concluding remarks.

Linear Chebyshev approximation problem
We start with an appropriate notation, preliminary assumptions and formal representation of the discrete linear Chebyshev approximation problem.
Suppose that, given X ij , Y i ∈ R for all i = 1, . . . , M and j = 1, . . . , N , where M and N are positive integers, we need to find the unknown parameters θ j ∈ R for all j = 1, . . . , N that solve the minimax problem Without loss of generality, we assume that for each j = 1, . . . , N , there exists at least one i such that X ij = 0. Otherwise, if X ij = 0 for some j and all i, the parameter θ j does not affect the objective function and thus can be removed.
Note that we can represent problem (1) in vector form by introducing the matrix and column vectors With this matrix-vector notation and the Chebyshev norm, which is defined for any vector V = (V i ) as the approximation problem takes the form To solve problem (1), we first show that the problem can be reduced to a problem of the same form, but with one unknown parameter less.

Elimination lemma
The next result offers a reduction approach to the problem, which provides the basis for the proposed solution.
Lemma 1. Solving problem (1) is equivalent to solving the problem where µ is the minimum of the objective function in problem (2).
Proof. To examine problem (1), we first introduce an auxiliary unknown parameter λ to rewrite the problem as follows: Note that the inequality constraint is readily represented as the system of inequalities which, in particular, puts the problem into the form of a linear program. Next, we continue rearrangement by solving for θ N those inequalities in which X iN = 0 to write change the sign of these differences and hence the sign of the entire righthand side of each inequality in the system. As a result, for every pair of indices i and k, the system includes both the inequality and the inequality After coupling the paired inequalities and considering that the equality |X iN X kN | = |X iN ||X kN | is valid, we rearrange the system as follows: Furthermore, we combine the inequalities for all 1 ≤ i, k ≤ M and add the last inequality at (4) to replace the condition X iN , X kN = 0 by that in the form |X iN | + |X kN | = 0 and rewrite the system as one inequality We now observe that the term under the max operator is invariant under permutation of the indices i and k, and is equal to zero if i = k. Therefore, we can reduce the set of indices defined as 1 ≤ i, k ≤ M by that given by the condition 1 ≤ i < k ≤ M and represent the lower bound on λ as Since the minimum of λ is bounded from below by the expression on the right-hand side, we need to find the minimum of this expression with respect to θ 1 , . . . , θ N −1 , which leads to solving problem (2).
To conclude this section, we note that the reduced problem at (2) has the same general form as problem (1) with the parameter θ N eliminated. This offers a potential for solving the problem under study by recurrent implementation of Lemma 1. We discuss application of the lemma to derive direct solutions of problems of low dimension and to develop a recursive procedure to solve problems of arbitrary dimension in what follows.

Solution of one-and two-parameter regression problems
We now apply the obtained result to derive direct, exact solutions to regression problems of low dimension. We start with one-parameter simple linear regression problems, which have well-known solutions, and then find a complete solution for a two-parameter linear regression problem.

One-parameter linear regression problems
Let us suppose that, for given explanatory (independent) variables X i ∈ R and response (dependent) variables Y i ∈ R, we find the unknown regression parameter θ ∈ R that solve the problem A direct application of Lemma 1 with N = 1 and substitution X i1 = X i for all i = 1, . . . , M and θ 1 = θ yields the next results.

Proposition 2. The minimum error in problem (5) is equal to
and all solutions of the problem are given by the condition We now consider a special case of problem (5) in the form To handle the problem, we set X i = 1 for all i in the previous solution. As a result, we find the minimum error in problem (6) to be

The solution is given by the condition
which, after substitution of µ, leads to the unique result

Two-parameter linear regression problem
We now turn to two-parameter problems, which can be solved by twofold application of Lemma 1. To avoid cumbersome calculations, we concentrate on a special case in which, given variables X i , Y i ∈ R for all i = 1, . . . , M , our aim is to find the parameters θ 1 , θ 2 ∈ R to achieve min Proposition 3. The minimum error in problem (7) is equal to and all solutions of the problem are given by the conditions Proof. We apply Lemma 1 with N = 2, where we take X i1 = 1 and X i2 = X i for all i = 1, . . . , M , to reduce problem (7) to the one-parameter problem and box constraint for θ 2 in the form of the double inequality at (10), where µ is the minimum in the one-parameter problem.
Furthermore, we note that the objective function in the problem does not change if we replace the condition 1 ≤ i < k ≤ M for indices over which the maximum is taken, by the extended condition 1 ≤ i, k ≤ M .
In a similar way as in Lemma 1, we first represent the one-parameter problem under consideration as The inequality constraint in this problem is equivalent to the system of inequalities After solving the inequalities for θ 1 , we rewrite the system as By combining these inequalities, we obtain The first two inequalities yield the double inequality max 1≤i,k≤M Since, under the condition X k = X i , the condition |X i | + |X k | = 0 holds as well, we exclude the latter one. Observing that the terms under the max and min operators are invariant under permutation of indices i and k, we adjust the condition on the indices to write the box constraint for θ 1 as (9).
The feasibility condition for the box constraint to be valid for θ 1 takes the form of the inequality max 1≤i,k≤M As before, we represent this inequality as the system of inequalities for each i, k and p, r, and then solve these inequalities for λ. After combining the solutions back into one inequality and adding the last inequality at (11), we obtain where the expression on the right-hand side determines the minimum µ.
Finally, we note that the fractional term under maximization is invariant with respect to interchanging indices i and k as well as p and r. Therefore, we can replace the conditions 1 ≤ i, k ≤ M and 1 ≤ p, r ≤ M by the conditions 1 ≤ i < k ≤ M and 1 ≤ p < r ≤ M , which yields the representation for µ in the form of (8).

General solution procedure
We now use Lemma 1 to derive a complete solution of problem (1) by performing a series of reduction steps, each eliminating an unknown parameter in the problem and determining a box constraint for this parameter. We observe that the elimination of a parameter from the objective function as described in Lemma 1 leaves the general form of the problem unchanged. Therefore, we can repeat the elimination step by step until the function has no more parameters and thus becomes a constant that shows the minimum of the objective function in the initial problem.
At the same time, together with the elimination of a parameter from the objective function, Lemma 1 offers a box constraint for the parameter, represented in terms of those parameters which are retained in the function. We see that the last constraint does not depend on any other parameters and thus is directly given by a double inequality with constant values on both sides. As a result, we can take the box constraints in the order from the last constraint to the first, which yields a system of double inequalities that completely determines the solution set of the problem.
We are in a position to describe the solution procedure formally in more detail. The procedure follows a direct solution method that examines the unknown parameters in reversal order starting from parameter N and going backward to parameter 1. Let n be the number of the parameter in the objective function in the current step of the procedure.
Initially, we take n = N and set M n = M . For all i = 1, . . . , M n and j = 1, . . . , n, we define We also introduce the matrix-vector notation  For each n = N, N − 1, . . . , 1, the procedure produces two-fold outcome: the reduction of the current problem by eliminating an unknown parameter and the derivation of a box constraint for the eliminated parameter.

Elimination of parameters
Assuming that the norm sign in what follows stands for the Chebyshev norm, we start with eliminating the parameter θ n from the problem We now rewrite the representation at (2) to establish formulas of recalculating the objective function when changing to the reduced problem.
First, we consider the condition for indices in (2), which takes the form 1 ≤ i < k ≤ M n . We assume that the pairs of indices (i, k) defined by the condition are listed in the order of the sequence It is not difficult to verify by direct substitution that each fixed pair (i, k) in this sequence has the number (index) calculated as Furthermore, we use (2) to define, for all i and k such that 1 ≤ i < k ≤ M n and for all j = 1, . . . , n − 1, the recurrent formulas Note that if |X n in | + |X n kn | = 0 then the above formulas produce a row of zeros that corresponds to a zero term, which does not contribute to the objective function. We assume that all such zero rows are removed, and the rest of rows are renumbered (reindexed) to preserve continual enumeration.
We denote the number of nonzero rows by M n−1 and observe that Finally, we take the numbers X n−1 ij and Y n−1 i with i = 1, . . . , M n−1 and j = 1, . . . , n − 1 to form the matrix and vector which completely determine the objective function in the reduced problem. Specifically, the reduced problem for n = 1 degenerates into the constant which represents the minimum of the objective function in the initial problem.

Derivation of box constraints
We take the double inequality at (3) and adjust it to write the box constraints for the parameter θ n in the form where µ denotes the minimum value of the objective function in problem (1).
To represent this inequality constraint in vector form, we introduce the following notation. First, for all i = 1, . . . , M n such that X n in = 0 and all j = 1, . . . , n − 1, we define We note that all indices i with X n in = 0 are not taken into account when calculating the maximum and minimum in the double inequality, and hence are excluded from the index set 1, . . . , M n . Assuming that the rest of indices are renumbered to preserve continual enumeration, we introduce the matrix and column vectors T n = (T n ij ), L n = (L n i ), U n = (U n i ).
With this matrix-vector notation, we write the box constraint, if n > 1, as the double inequality max(L n − T n θ n−1 ) ≤ θ n ≤ min(U n − T n θ n−1 ), (15) and, if n = 1, as the inequality where max and min symbols are thought of as operators that calculate the maximum and minimum over all elements of corresponding vectors.

Solution algorithm
We summarize the above consideration in the form of a computational algorithm to solve problem (1) in a finite number of steps. The algorithm includes two sequential phases: backward elimination and forward determination (substitution) of the unknown parameters.
The backward elimination starts with n = N by setting M n = M and Furthermore, for each n = N, . . . , 1, the matrix X n and vector Y n are used as described above to obtain values of X n−1 and Y n−1 if n > 1 or a value of Y 0 if n = 1. As supplementary results, the matrices T n are also evaluated from the matrices X n .
The backward elimination completes at n = 1 by calculating the minimum value of the objective function, given by µ = Y 0 .
The forward determination first uses the obtained minimum µ, matrix X 1 and vector Y 1 to calculate the vectors L 1 and U 1 and then evaluate the box constraint for the unknown θ 1 in the form of double inequality at (16). Then, for each n = 2, . . . , N , the vectors L n and U n are calculated from µ, X n and Y n to represent the box constraints for θ n as in (15).
Note that the bounds in the box constraint for the parameter θ 1 are explicitly defined by constants, whereas the bounds for each parameter θ n with n > 1 are defined as linear functions of the previous parameters θ 1 , . . . , θ n−1 . Therefore, we can first fix a value to satisfy the box constraint for θ 1 and then substitute this value into the box constraint for θ 2 to obtain explicit bounds given by constants. By repeating such calculations to fix a value for a parameter with explicit bounds and to evaluate bounds for the next parameter, we can successively determine a solution of the problem.

Computational complexity
The most computationally intensive and memory demanding component of the algorithm, which determines the overall rate of computational complexity and memory requirement, is the calculation of entries in the matrices X n and vectors Y n for all n = N − 1, N − 2, . . . , 1, and vector Y 0 by using (12). Though calculating one entry involves a few simple operations, the number of all entries grows very fast as M and N increase.
To derive a rough estimate for the number of entries in all matrices, we first evaluate the number of rows in each matrix. Assuming that the matrix X N = X has M rows, we see that the number of rows in X N −1 is bounded from above by Recursive application of this estimate yields an upper bound for the number of rows in the matrices X N −l for each l = 1, . . . , N − 1 in the form At the last step with l = N , we calculate the vector Y 0 , in which the number of entries is no more than Since we have n + 1 columns in the matrix X n together with the vector Y n , the overall number of the entries in all steps can be estimated as Note that the actual number of entries to calculate is less than that given by the bound C(N, M ). As it follows from (12), this number can be further reduced if the matrices X n , or at least the initial matrix X N = X, have many zero entries (sparse matrices). It is clear that, if a matrix has zero entries in a column other than the last one, the columns (and related parameters) can be renumbered to put the column with zero entries on the last place where zero entries can reduce computations. As an example of the case with good chances of having sparse matrices X n , one can consider problems where the initial matrix X has entries that take values from the set {−1, 0, 1}.
Finally, we observe that the calculations by formulas (12) can be performed for different entries quite independently, which offers strong potential for parallel implementation of the procedure on parallel and vector computers to provide more computational and memory resources and hence to extend applicability to problems of higher dimensions.

Software implementation and numerical examples
We conclude with comments on software implementation of the solution procedure and illustrative numerical examples of low dimensions that demonstrate the computational technique involved in the solution. For the sake of illustration, we concentrate on application to problems with exact input data given by integer (rational) numbers to find explicit rational solutions.
To obtain exact solutions, the procedure has been coded for serial symbolic computations in the MATLAB (Release R2020b) environment as a collection of functions that calculate all intermediate matrices and vectors as well as provide the overall functionality of the algorithm. The numerical experiments were conducted on a custom computer with a 4-core 8-thread Intel Xeon E3-1231 v3 CPU at 3.40GHz and 32GB of DDR3 RAM, running Windows 10 Enterprise 64-bit OS.
Example 1. Let us take N = 3 and M = 4 and define We start with the backward elimination phase by setting X 3 = X and Y 3 = Y . We use (12) to calculate the entries in and then the entries in −9/7 −3/2 −3/4 7/9 1/7 9/13 9/7 3/5 1 Next, we obtain the vector Y 0 , which appears to have 105 entries and thus is not shown here to save space. Evaluating the maximum entry of Y 0 as the minimum of the objective function according to (13) yields µ = 2/7.
In parallel with evaluating the entries of X n and Y n , we apply (14) to find the entries in Finally, we calculate the vectors The forward determination (substitution) phase of the procedure involves application of (16) and then (15) to obtain the unique solution θ 1 = 1/3, θ 2 = 5/21, θ 3 = 16/21.
The actual number of entries in the matrices X n and vectors Y n to calculate in the problem is 153, which is less than the upper bound given by C(3, 4) = 600. The computer time to obtain the exact solution by symbolic computation averages 2.3 sec.
The matrices X n and vectors Y n involved in calculations have 444280 entries (while the corresponding upper bound is C(3, 10) = 783900). The symbolic computations take about 52 min of computer time.

Conclusion
A direct computational technique has been proposed to solve discrete linear Chebyshev approximation problems, which find wide application in various areas including the least maximum absolute deviation regression in statistics. First, we have shown that the problem under consideration can be reduced by eliminating an unknown parameter to a problem with less number of unknowns and a box constraint for the parameter eliminated. This result was used to obtain direct solutions to linear regression problems with one and two parameters. To solve approximation problems of arbitrary dimension, we have developed a direct solution algorithm that consists of two parts: backward elimination and forward substitution of the unknown parameters. We have estimated the computational complexity of the algorithm, discussed its MATLAB implementation intended to provide exact solutions by symbolic computations, and presented numerical examples.
Possible lines of further investigation can concentrate on modification and improvement of the algorithm to reduce computational complexity in solving problems in both exact (rational) and inexact (floating point) form. The development of parallel implementations of the algorithm to speed up calculations is also of interest.