Transformation and Linearization Techniques in Optimization: A State-of-the-Art Survey

: To formulate a real-world optimization problem, it is sometimes necessary to adopt a set of non-linear terms in the mathematical formulation to capture speciﬁc operational characteristics of that decision problem. However, the use of non-linear terms generally increases computational complexity of the optimization model and the computational time required to solve it. This motivates the scientiﬁc community to develop efﬁcient transformation and linearization approaches for the optimization models that have non-linear terms. Such transformations and linearizations are expected to decrease the computational complexity of the original non-linear optimization models and, ultimately, facilitate decision making. This study provides a detailed state-of-the-art review focusing on the existing transformation and linearization techniques that have been used for solving optimization models with non-linear terms within the objective functions and/or constraint sets. The existing transformation approaches are analyzed for a wide range of scenarios (multiplication of binary variables, multiplication of binary and continuous variables, multiplication of continuous variables, maximum/minimum operators, absolute value function, ﬂoor and ceiling functions, square root function, and multiple breakpoint function). Furthermore, a detailed review of piecewise approximating functions and log-linearization via Taylor series approximation is presented. Along with a review of the existing methods, this study proposes a new technique for linearizing the square root terms by means of transformation. The outcomes of this research are anticipated to reveal some important insights to researchers and practitioners, who are closely working with non-linear optimization models, and assist with effective decision making.


Introduction
Many optimization problems in management science and operations research have been formulated in the non-linear programming form [1][2][3][4][5]. Due to their non-convex nature, there is no efficient method to locate the optimal solution for this kind of problem [6]. Finding a global optimum for a non-linear programming model in acceptable computational time is known as one of the major challenges in the optimization theory [7]. In comparison with non-linear programming problems, linear forms drive the solution process and have much lower computation time [8]. Therefore, linear programming (LP) forms of the optimization models are often recommended rather than solving integer Mathematics 2022, 10, 283 2 of 26 or non-linear forms [9,10]. The operations research techniques that have generally been adopted to solve optimization problems with non-linear terms can be classified into two broad groups, which include the following: (i) transformations in which the non-linear equations or functions are replaced by an exact equivalent LP formulation to create valid inequalities; and (ii) linear approximations which find the equivalent of a non-linear function with the least deviation around the point of interest or separate straight-line segments.
Transformation into the LP model generally requires particular manipulations and substitutions in the original non-linear model along with the implementation of valid inequalities. After solving the modified problem, the optimal values of the initial decision variables can be easily determined by reversing the transformation. Furthemore, approximation of complex non-linear functions with simpler ones is recognized as one of the common operations research techniques. In mathematics, a linear approximation of a function is an approximation (more precisely, an affine function) that relies on a set of linear segments for calculation purposes. Linear approximations are often adopted by finite difference methods, such as piecewise or first-order methods, to solve non-linear optimization models. A linear approximation to a known curve, as illustrated in Figure 1, can be obtained by dividing the curve and using linear interpolations between the points.
Mathematics 2022, 10, x FOR PEER REVIEW 2 of 27 adopted to solve optimization problems with non-linear terms can be classified into two broad groups, which include the following: (i) transformations in which the non-linear equations or functions are replaced by an exact equivalent LP formulation to create valid inequalities; and (ii) linear approximations which find the equivalent of a non-linear function with the least deviation around the point of interest or separate straight-line segments.
Transformation into the LP model generally requires particular manipulations and substitutions in the original non-linear model along with the implementation of valid inequalities. After solving the modified problem, the optimal values of the initial decision variables can be easily determined by reversing the transformation. Furthemore, approximation of complex non-linear functions with simpler ones is recognized as one of the common operations research techniques. In mathematics, a linear approximation of a function is an approximation (more precisely, an affine function) that relies on a set of linear segments for calculation purposes. Linear approximations are often adopted by finite difference methods, such as piecewise or first-order methods, to solve non-linear optimization models. A linear approximation to a known curve, as illustrated in Figure 1, can be obtained by dividing the curve and using linear interpolations between the points. Piecewise approximations play a major role in many areas of engineering and mathematics [11][12][13]. By adding extra variables and constraints, the piecewise linear approximation forms an alternative linear function that fits the original problem with a non-linear function. The specific aim of this technique is to estimate a one-variable single-valued function by a sequence of linear divisions. A piecewise linear approximation of the function ( ), which has been set on the interval , , approximates a close function ( ) that is represented by a set of linear segments over the same interval. ( ) can be represented as ( ) = + for every in , [14]. The new linearity allows the previous non-linear optimization problem to be solved by common LP approaches, which are much easier to use and more efficient than their non-linear counterparts [15]. Some examples of piecewise linear approximations have been provided in Gajjar and Adil [16], Geißler et al. [17], Sridhar et al. [18], Andrade-Pineda et al. [19], and Stefanello et al. [20].
Taylor's theorem approximates the output of a function ( ) around a given point, such as = , by providing a k-times differentiable function and a polynomial of degree k, which is known as the k th -order Taylor polynomial. In other words, the first-order Taylor polynomial provides a linear approximation of the real-valued function ( ) based on the value and slope of the function at = , given that ( ) is differentiable on Piecewise approximations play a major role in many areas of engineering and mathematics [11][12][13]. By adding extra variables and constraints, the piecewise linear approximation forms an alternative linear function that fits the original problem with a non-linear function. The specific aim of this technique is to estimate a one-variable single-valued function by a sequence of linear divisions. A piecewise linear approximation of the function f (x), which has been set on the interval [a, b], approximates a close function g(x) that is represented by a set of linear segments over the same interval. g(x) can be represented as [14]. The new linearity allows the previous non-linear optimization problem to be solved by common LP approaches, which are much easier to use and more efficient than their non-linear counterparts [15]. Some examples of piecewise linear approximations have been provided in Gajjar and Adil [16], Geißler et al. [17], Sridhar et al. [18], Andrade-Pineda et al. [19], and Stefanello et al. [20].
Taylor's theorem approximates the output of a function f (x) around a given point, such as x = a, by providing a k-times differentiable function and a polynomial of degree k, which is known as the kth-order Taylor polynomial. In other words, the first-order Taylor polynomial provides a linear approximation of the real-valued function f (x) based on the value and slope of the function at and that a is close to b. This means that for a given twice continuously differentiable function of one real variable, the first-order Taylor polynomial can be represented as follows: where h(x)(x − a) is the error term for the approximation. By removing the remainder term, the linear approximation of f (x) for x near the point a (L a (x)) becomes y = f (a) + f (a)(x − a) whose graph is a tangent line to the graph y = f (x) at the point (a, f (a)). Figure 2 provides an illustration of the example graph of f (x) = e x (blue) with its linear approximation L a (x) = 1 + x (red) at a = 0. As x tends to be closer to a, the error declines to zero much faster than f (a)(x − a), which makes L a (x) ≈ f (x) a useful approximation. , (or , ) and that is close to . This means that for a given twice continuously differentiable function of one real variable, the first-order Taylor polynomial can be represented as follows: where ℎ( )( − ) is the error term for the approximation. By removing the remainder term, the linear approximation of ( ) for near the point ( ( )) becomes = ( ) + ( )( − ) whose graph is a tangent line to the graph = ( ) at the point , ( ) . Figure 2 provides an illustration of the example graph of ( ) = (blue) with its linear approximation ( ) = 1 + (red) at = 0. As tends to be closer to , the error declines to zero much faster than ( )( − ), which makes ( ) ≈ ( ) a useful approximation. The reformulation techniques, including transformation and linear approximation procedures, are able to generate a representation of the increasing degree of strength, albeit by increasing the size of the problems. Given the recent advances in LP, these techniques enhance the solvability of the problems within various exact or heuristic solution approaches by incorporating a tighter representation. A significant number of studies on the transformation process with LP representations have been conducted in the past, including Meyer [21], Jeroslow and Lowe [22,23], Balas [24], Johnson [25], Wolsey [26], Sherali and Adams [27,28], and Williams [29]. The performance of solution algorithms is directly associated with the tightness or strength of the adopted LP representations. Development of tight equivalents for separable or polynomial non-linear integer programming formulations can be achieved by generating valid inequalities [27,28,[30][31][32][33][34][35][36].
The present study specifically concentrates on common operational research techniques, which have been widely used in the state-of-the-art approaches to convert non-linear components of optimization models into their LP equivalents. Although there are several sources that provide detailed information regarding various LP techniques [29,37], there is a lack of a holistic and comprehensive survey of the state-of-the-art transformation and linearization approaches for converting non-linear optimization models into their linearized forms. Along with the review of the existing methods, this study proposes a new technique for linearizing the square root terms by means of transformation to obtain tight LP relaxations. Furthermore, a new approach is presented to incorporate quadratic integers into LP mathematical formulations. The aforementioned The reformulation techniques, including transformation and linear approximation procedures, are able to generate a representation of the increasing degree of strength, albeit by increasing the size of the problems. Given the recent advances in LP, these techniques enhance the solvability of the problems within various exact or heuristic solution approaches by incorporating a tighter representation. A significant number of studies on the transformation process with LP representations have been conducted in the past, including Meyer [21], Jeroslow and Lowe [22,23], Balas [24], Johnson [25], Wolsey [26], Sherali and Adams [27,28], and Williams [29]. The performance of solution algorithms is directly associated with the tightness or strength of the adopted LP representations. Development of tight equivalents for separable or polynomial non-linear integer programming formulations can be achieved by generating valid inequalities [27,28,[30][31][32][33][34][35][36].
The present study specifically concentrates on common operational research techniques, which have been widely used in the state-of-the-art approaches to convert nonlinear components of optimization models into their LP equivalents. Although there are several sources that provide detailed information regarding various LP techniques [29,37], there is a lack of a holistic and comprehensive survey of the state-of-the-art transformation and linearization approaches for converting non-linear optimization models into their linearized forms. Along with the review of the existing methods, this study proposes a new technique for linearizing the square root terms by means of transformation to obtain tight LP relaxations. Furthermore, a new approach is presented to incorporate quadratic integers into LP mathematical formulations. The aforementioned contributions are expected to enhance the solvability of optimization models with non-linear terms and, ultimately, facilitate decision making. The remaining sections of this manuscript are arranged in the following order. Section 2 focuses on the existing transformation approaches for different scenarios, including the multiplication of binary variables, multiplication of binary and continuous variables, multiplication of continuous variables, maximum/minimum operators, absolute value function, floor and ceiling functions, square root function, and multiple breakpoint function. Section 3 discusses linear approximations, including piecewise approximating functions and log-linearization via Taylor series approximation. Section 4 concludes this study, summarizes the key outcomes from the conducted work, and proposes some directions and opportunities for the future research.

Transformations
The techniques presented in this section are exact transformations of the original nonlinear programs. In particular, the non-linear problems resulting from the multiplication of binary variables, multiplication of binary and continuous variables, multiplication of continuous variables, maximum/minimum operators, absolute value function, floor and ceiling functions, square root function, and multiple breakpoint function and their equivalent LP versions will be discussed in detail.
Table 1 examines the validity of these constraints and lists all possible scenarios by varying the values of binary variables x i and y j .
When binary variables have power (x p i ), without loss of generality, one can omit the power of p (x i := x p i ), and it consequently can be linearized using the same technique. The extension to products of more than two variables is straightforward. In general, the multiplication of binary variables x p i k (k ∈ {1, . . . , K}, i k ∈ I k = {1, . . . , m k }) for K ≥ 2 with different powers p can be linearized by replacing it with a new variable z j (z j := ∏ K k=1 x p i k ), where j = {i 1 , . . . , i K }, and adding the following constraints: z j ∈ {0, 1}, ∀j ∈ ∪ K k=1 I k . Modelling Guide [37] provides a hint for bounded variables by which the product of two continuous variables can be transformed into a separable form. We assume that term x 1 ·x 2 must be converted. First of all, we define two new continuous variables y 1 and y 2 as follows: The product x 1 ·x 2 can be now replaced with the below separable function: which can be linearized by using the technique of the piecewise approximation as stated in Section 3.1 [29]. Note that one can eliminate the non-linear function at the cost of having to approximate the objective. If l 1 ≤ x 1 ≤ u 1 and l 2 ≤ x 2 ≤ u 2 , then the lower and upper bounds on y 1 and y 2 are: It should be noted that the product x 1 ·x 2 can be substituted with a single variable z whenever (i) one of the variables is not referenced in any other term except in the products of the above form, and (ii) the lower bounds l 1 and l 2 are non-negative. Suppose x 1 is such a variable (not used in any other terms). Then, the non-linear term x 1 ·x 2 can be replaced with z just by adding the following constraint: Once the resulting mathematical formulation is solved in terms of z and x 2 , it is required to calculate x 1 = z x 2 whenever x 2 > 0. x 1 is undetermined when x 2 = 0. The extra constraints on z ensure that l 1 ≤ x 1 ≤ u 1 when x 2 > 0.

Maximum/Minimum Operators
The maximum and minimum operators are viewed as explicit non-linear terms. These terms can also be linearized to efficiently solve the optimization model in which they are directly used [42]. Assume there is a general non-linear structure in the form of max i∈I {x i }, where I = {1, . . . , n}. This structure can be further converted to an equivalent mathematical model after adding a new continuous variable z, a set of new binary variables y i , and introducing supplementary constraint sets (22) to (25).
where m is a sufficiently large number. Constraints (22) assure that z is greater than all x i . Constraints (23) and (24) are applied to prevent z from becoming infinite. In particular, constraints (23) and (24) are used to assure that for only one i, z has to be less than or equal to x i . Constraints (23) and (24) do not have to be applied when the objective function is of a minimization type. However, constraints (23) and (24) (22) and (23) have to be altered as follows: where z := min i∈I {x i } and constraints (24) and (25) stay unaltered. In this case, constraints (24) and (27) do not have to be applied when the objective function is maximization.

Absolute Value Function
One of the special cases in non-linear programming is optimization problems using the absolute value function, on which it is extremely hard to apply standard optimization methods. Operating on the absolute value expression is relatively difficult because it is sometimes not a continuously distinguishable function. However, it is possible to avoid these difficulties and solve the problem using LP procedures by simply manipulating the absolute values [43][44][45][46][47]. If a linear function locates in an absolute value function, then we can alternatively use a valid inequality instead of the absolute value. This means that z = | f (x)| can be effectively converted to two linear expressions if the function f (x) is linear itself. For the simplest example, z = |x|, the function can practically be reformulated by combining two piecewise functions: The aforementioned procedure is the basis of running LP formulations that contain the absolute value functions.

Absolute Value in Constraints
In the case | f (x)| ≤ z, we are able to reformulate the expression as the combination of f (x) ≤ z and − f (x) ≤ z. This relation is demonstrated in Figure 3 using a number line. The figure depicts that the two formulations, the above two linear functions, as well as the absolute value function, are equivalent. The same logic can be used to reformulate where is a sufficiently large number. Constraints (22) assure that is greater than all . Constraints (23) and (24) are applied to prevent from becoming infinite. In particular, constraints (23) and (24) are used to assure that for only one i, has to be less than or equal to . Constraints (23) and (24) do not have to be applied when the objective function is of a minimization type. However, constraints (23) and (24) are required when the objective function is of a maximization type and/or the considered optimization model has multiple objective functions. Constraints (25) denote the integrality restrictions on the values of binary variables . For the non-linear term min ∈ , where = 1, … , , Equations (22) and (23) have to be altered as follows: where ≔ min ∈ and constraints (24) and (25) stay unaltered. In this case, constraints (24) and (27) do not have to be applied when the objective function is maximization.

Absolute Value Function
One of the special cases in non-linear programming is optimization problems using the absolute value function, on which it is extremely hard to apply standard optimization methods. Operating on the absolute value expression is relatively difficult because it is sometimes not a continuously distinguishable function. However, it is possible to avoid these difficulties and solve the problem using LP procedures by simply manipulating the absolute values [43][44][45][46][47]. If a linear function locates in an absolute value function, then we can alternatively use a valid inequality instead of the absolute value. This means that = | ( )| can be effectively converted to two linear expressions if the function ( ) is linear itself. For the simplest example, = | |, the function can practically be reformulated by combining two piecewise functions: = if ≥ 0 and − = if < 0. The aforementioned procedure is the basis of running LP formulations that contain the absolute value functions.

Absolute Value in Constraints
In the case | ( )| ≤ , we are able to reformulate the expression as the combination of ( ) ≤ and − ( ) ≤ . This relation is demonstrated in Figure 3

Absolute Value in the Objective Function
In certain cases, we can also reformulate the absolute value used in the objective function to become linear. A model can be easily transformed to be solved using LP when its objective function is a maximization in the form of −| ( )| + ( ) or a minimization in the form of | ( )| + ( ). Unfortunately, when the objective function of the model is a maximization in the form of | ( )| + ( ) or a minimization in the form of −| ( )| + ( ), the model cannot be turned into a standard linear form, and instead must be solved using mixed-integer LP. In the first two cases, as mentioned in Mangasarian [44] and Caccetta et al. [47], the model can be reformulated by substituting a new variable with

Absolute Value in the Objective Function
In certain cases, we can also reformulate the absolute value used in the objective function to become linear. A model can be easily transformed to be solved using LP when its objective function is a maximization in the form of −| f (x)| + g(y) or a minimization in the form of | f (x)| + g(y). Unfortunately, when the objective function of the model is a maximization in the form of | f (x)| + g(y) or a minimization in the form of −| f (x)| + g(y), the model cannot be turned into a standard linear form, and instead must be solved using mixed-integer LP. In the first two cases, as mentioned in Mangasarian [44] and Caccetta et al. [47], the model can be reformulated by substituting a new variable z with | f (x)| within the original objective function, and adding two extra constraints f (x) ≤ z and − f (x) ≤ z.

Minimizing the Sum of Absolute Deviations
The transformation approach for minimizing the sum of absolute deviations presented herein was introduced by Ferguson and Sargent [49]. Let the deviations be represented by . . , m}), b i is an observation, and x i gives the deviation. The objective of the mathematical formulation aims to minimize the deviation and can be formulated in the following basic form: Mathematics 2022, 10, 283 (28) and the linear constraints are The absolute value function generates the non-linearity in this form. However, the model can be rewritten as an LP model by replacing x i . To linearize the problem, x i is substituted by i are positive variables), and the model can be reformulated as: At the optimal solution, it can be proven that x + i ·x − i = 0. Therefore, the model is reformulated to a linear programming form, as x i , y j ∈ R, ∀i ∈ {1, . . . , m}, ∀j ∈ {1, . . . , n}.

Minimizing the Maximum of Absolute Values
In some cases, such as the evaluation of the maximum forecast error by applying the Chebyschev criterion, the problem aims to minimize the largest absolute deviation rather than the sum. Such a formulation can be expressed using Equations (41) to (43). s.t.
where variable x i indicates the deviation for the ith observation and y j indicates the jth variable in the forecasting equation. The constraints have been described in the previous subsection. To solve this model, variable x is introduced, which satisfies the following two inequalities: x ≥ − b i − ∑ n j=1 a ij ·y j , ∀i ∈ {1, . . . , m}.
Mathematics 2022, 10, 283 9 of 26 The inequalities (44) and (45) guarantee that x is greater than or equal to the largest |x i |. Therefore, as stated by McCarl and Spreen [50], the original formulation can be represented in the following form: Minx, x ≥ 0 (49)

Floor and Ceiling Functions
The floor function is a mathematical function that takes a certain real number x as an input and returns the greatest integer that is less than or equal to x as an output. In a similar fashion, the ceiling function is a mathematical function that rounds x to the least integer that is greater than or equal to x. Let denote the floor integer function and consider the non-linear equation f (x) . The value of function f (x) can be represented as f (x) = y + r, where y is the integral part of f (x) and 0 ≤ r < 1. Therefore, the floor function f (x) can be linearized by replacing it with the integer variable y (y f (x) ) and adding the following constraints: Equation (52) is the integrality constraint. A similar approach can be used to linearize the ceiling function f (x) by replacing it with the integer variable y (y f (x) ) and adding the following constraints to the problem:

Square Root Function
A square root of a certain number x is defined in mathematics using another number y such that y 2 = x; alternatively, a number y that has a squared value (i.e., the resulting value of multiplying a given number by itself or y · y) of x [51]. Each real-valued non-negative number x has a unique non-negative square root (that is also referred to as the principal square root), which can be represented by mathematical notation √ x. The linearization of a square root function, to the authors' knowledge, has not yet been studied in the literature (or been applied in practice). Several studies addressing such a function only used the Taylor-series approximation method by writing the equation of the line tangent to the function at a given constant. Kwon and Draper [52] and Del Moral and Niclas [53] are examples of Taylor-series expansion algorithms developed for the square root function.
This section of the manuscript proposes a new technique for linearizing the square root terms. The suggested linearization method is based on some numerical and theoretical techniques in the area of mixed-integer programming that were presented by Rahil [38].
Consider the square root f (x) which is an explicit non-linear term named here as radical.
To linearize the radical term, it can be replaced with a new positive integer variable A as below: where is the ceiling bracket sign that rounds up radical to the nearest integer that is greater than or equal to the expression. In Equation (55), radical is a positive real number, and A is a positive integer number. Thus, A ignores the fraction part of radical (if there is any fraction) and approximates it with a strictly lower than one unit error. This amount of error can be negligible, especially when radical is a very large number. By using Theorem 1, the value of A can be converted into its binary equivalent. Theorem 1. Assume that w is an integer variable. Then, where y i are binary variables, u denotes the upper bound of w, and n is set to log 2 (u + 1) .
It should also be noted that integer numbers greater than 45 cannot be constructed by Equation (58). Therefore, the upper bound condition, u, is never violated. Proof of Theorem 1 is provided in Appendix A. The upper bound can be determined by calculating the maximum function f (x) = max x { f (x)} as follows: As shown in Equation (53), A = f (x) , then f (x) ≤ A. To calculate the upper bound of function f (x), we rise the two sides of this inequality to the power of 2: At this point, we can use Theorem 2 to precisely simulate the quadratic term A 2 .

Theorem 2.
Assume w is an integer variable that has an upper bound of u. After that, w 2 can be further linearized based on the following equation: where y i and z ij are binary variables, β = (u − 2 n + 1), and n is set to log 2 (u + 1) .
The proof of Theorem 2 is provided in Appendix B. The linearization of the quadratic integers is presented in Appendix C. By utilizing Theorems 1 and 2, we obtain the linearized equivalent of a square root by substituting f (x) by A and adding the following constraints: To show how Equations (61)-(68) can be applied in a specific case, let us estimate the square root √ x with an upper bound of 103. n is calculated accordingly as log 2 (103 + 1) = 6 . The square root function can be linearized by replacing it with the integer variable A and adding the following constraints to the problem:

Multiple Breakpoint Function
In this section, we describe two linearization techniques for multiple breakpoint functions based on the information presented in Tsai [54] and Mirzapour Al-e-hashem et al. [55]. Suppose there is a general continuous multiple breakpoint function that can be defined as follows: x ∈ R.
Based on the methodology suggested by Tsai [54], the equivalent valid inequality of Equation (76) can be further simplified to the following form: As can be seen, this formulation contains an explicit non-linear term t i ·x. As proven by Tsai [54], an equivalent linear equation for z = t·g(x); t ∈ {0, 1} can be reformulated as: where m is a sufficiently large number. Let us consider a i ·x + b i as g(x); then, Tsai's techniques can be rewritten as follows: Mirzapour Al-e-hashem et al. [55] proposed another linearization technique for a multiple breakpoint function. The authors have shown that the multiple breakpoint function f (x) can be linearized by introducing some binary variables t i and also converting variable x to n independent variables x i , where x = ∑ i x i . Thus, Equation (76) can be rewritten as follows: As proven by Mirzapour Al-e-hashem et al. [55], the linear equivalent mathematical structure of f (x) can be written by introducing new constraints as follows: x i ∈ R, ∀i ∈ {1, . . . , n}.
For a non-continuous multiple breakpoint function of type: x ∈ R.
The Mirzapour's technique can be used by modifying Equation (95) as follows: where m is a sufficiently large number.
Comparing the two aforementioned methodologies, the same amount of binary and continuous variables is used. However, the method proposed by Mirzapour Al-e-hashem et al. [55] requires five new different classes of constraints for the initial model. On the contrary, the Tsai's technique imposes seven new different classes of constraints for the original model.

Approximate Linearization Methods
Approaches to non-linear programming typically use approximation techniques that may be either iterative or non-iterative (i.e., the ones that require just one iteration). This section discusses the main approximation techniques that can be implemented for the majority of non-linear problems. The considered approximation techniques could be divided into two broad groups, including (i) piecewise linear approximation techniques; and (ii) log-linearization via Taylor series approximation. For more information regarding alternative approximation techniques, the interested readers can refer to Dembo [56], McCarl and Tice [57], and McCarl and Onal [58].

Piecewise Linear Approximation
In many studies over the past decades, piecewise linear approximation (PLA) techniques have been used to convert the non-linear LP models into their linear forms or mixedinteger convex optimization problems to obtain approximate global optimal solutions. To reformulate the original non-linear optimization problem, new variables of binary and continuous nature along with extra constraints are generally applied in the transformation process. The additional variables and constraint sets typically improve the effectiveness of obtaining solutions for the converted problem. The next sections of the manuscript present an overview of common PLAs and analyze their computational efficiency.

Formulations
Consider a general non-linear function f (x) of a single variable x, which is within the interval [a 0 , a n ]. Such a continuous function has the advantage that we can approximate its non-linear expressions to piecewise linear ones as commonly used in the non-linear programming literature [59][60][61][62]. For comparing PLA techniques, interested readers can refer to the computational results reported by Li et al. [59] and Lin et al. [63].

Method 1
First, divide f (x) into n separate segments and define a i (i ∈ {0, 1, . . . , n}) as the breakpoints of f (x), a 0 < a 1 < . . . < a n . Then, we can approximately linearize the non-linear function f (x) over the interval [a 0 , a n ] as follows: where only two adjacent y i are allowed to be non-zero. Figure 4 illustrates the piecewise linearization of f (x). The above terms include n extra binary variables t 0 , t 1 , . . . , t n−1 and n + 1 new continuous variables y 0 , y 1 , . . . , y n . The number of these newly added variables increases with the number of breakpoints (n + 1), which leads to an exponential increase in computational time. The more linear segments there are, the more accurate the approximation will be at the expense of increasing the computational complexity.  •

Method 2
Li and Yu [64] introduced a method for the global optimization of non-linear mathematical models in which the objective along with the constraint sets may be non-convex. First, a univariate mathematical function is formulated via a piecewise linear function using a sum of absolute expressions. Denote ( ∈ 0,1, … , − 1 ) as the slopes of each line between and , computed using Equation (114): An equivalent piecewise linear form of non-linear function ( ) can then be reformulated as follows: •

Method 2
Li and Yu [64] introduced a method for the global optimization of non-linear mathematical models in which the objective along with the constraint sets may be non-convex. First, a univariate mathematical function is formulated via a piecewise linear function using a sum of absolute expressions. Denote s i (i ∈ {0, 1, . . . , n − 1}) as the slopes of each line between a i and a i+1 , computed using Equation (114): An equivalent piecewise linear form of non-linear function f (x) can then be reformulated as follows: If the successive slopes of the PLA are non-decreasing ( is a concave (non-convex) function when these slopes are increasing. After linearization of the absolute term, Li and Yu [64] included additional binary variables t i to convert the non-convex model to a piecewise linear form as follows: where u is the upper bound of x. Compared to Method 1, which used binary variables for all parts, the binary variables applied by the second approach have only been used to linearize the non-convex intervals of f (x). Thus, the second method generally employs fewer binary variables than Method 1.

Method 3
Another representing form of the piecewise approximating function has been used in the studies conducted by Li and Tsai [65], Topaloglu and Powell [66], Padberg [67], Li [68], and Croxton et al. [69]. The equivalent function can be formulated as demonstrated below.
where m is a sufficiently large constant. When n + 1 breakpoints are used, the above approach requires extra n binary variables and 4n new constraint sets to model a piecewise linear function.

Method 4
More linear pieces in the piecewise linear programming enhance the accuracy of the approximating function, but in the meantime, the execution time is negatively affected. To decrease the number of new binary variables added throughout the process of approximation, Li et al. [70] proposed a piecewise linearization technique that contains a logarithmic number of variables with binary nature in n. Consider the same continuous function f (x) expressed above, where x is assumed to be within the interval [a 0 , a n ] with n + 1 breakpoints. Let k be a positive integer number that can be represented using the following form: where y i are binary variables, n − 1 denotes the upper bound of k, and h is set to log 2 (n + 1) . Consider a set A(k) ⊆ {0, 1, . . . , h − 1} such that  [70] proposed the following expressions for approximation of a univariate non-linear function: Considering n + 1 breakpoints, this method requires h binary variables, 2n nonnegative variables, and h·(n + 1) free-signed continuous variables. Although the method proposed by Li et al. [70] uses fewer variables of binary nature, Vielma et al. [71] showed that this approach is not a theoretically and computationally superior representation method for piecewise linear functions.

Method 5
To approximate the non-linear functions of a variable, Vielma and Nemhauser [72] developed a new expression that needs fewer variables and constraints than previous representation methods. Their formulation requires a logarithmic number of binary variables and constraint sets in expressing a piecewise approximating function as follows. ∀i ∈ {1, . . . , n − 1} : y k = 0} ∪ {i|B(i) ∀i ∈ {0, n} : y k = 0}). The piecewise linear function of f (x) with n + 1 breakpoints where a 0 < a 1 < . . . < a n can be represented as: To form a piecewise linear programming function with n pieces, this method applies log 2 n variables of binary nature, n + 1 variables of continuous nature, and 2 + 2· log 2 n constraint sets. Method 5 induces a piecewise linear function with an independent branch-ing scheme of logarithm depth and constructs a tighter convex estimator, making fewer breakpoints to meet the feasibility and optimality tolerance. Experimental results from the literature [71,73] show that this method provides a significant computational advantage in the linearization process and outperforms other methods, especially when more breakpoints are used.

PLA-Based Algorithms
Establishing the PLA generates its own optimization problem. To properly fit a curve, the most accurate PLA uses an infinite amount of segments. The complexity associated with this procedure becomes similar to the initial non-linear model. In solving the new decision problem, the main goal is to determine the best approximation with the least number of linear pieces. There are a number of commercial solvers, such as MINOS and CONOPT used within the General Algebraic Modeling System (GAMS), that can solve these problems [74]. There are also several optimal search algorithms using PLAs to solve complicated models with both concave and convex objective functions [75][76][77][78]. The most prominent algorithms developed based on PLA techniques include the following:

• Approximating Planar Curves
PLAs are not just restricted to two-dimensional cases but can be utilized to fit multidimensional planes and curves. Williams [79] presented the first algorithm that efficiently fits flat curves by using the required number of linear vectors. The proposed methodology of approximating the fitting of straight lines to a plane is based on the geometric analysis of the curvature of the plane, with subsequent geometrically accurate and numerically stable calculations.

• Single Pass PLA Algorithm
Gritzali and Papakonstantinou [80] devised an algorithm for finding different parts of a formulated waveform function and identifying a number of peak points. The points marked as peaks are those where the derivative of the function equals zero. This algorithm starts with a maximum point plotted on the piecewise curve and develops the PLA into the function in order that all points on the waveform have the same difference with respect to the piecewise approximation. Such an algorithm can obviously be beneficial in real-time applications like an electrocardiogram readout where peaks are important.

• Branch and Refine
The branching and refining algorithm is based on the well-known PLA techniques. This is an effective way to find a global optimum for a non-linear problem. The algorithm applies PLAs to determine global lower bounds for mixed-integer non-linear optimization models, and from there, the upper bounds of the problem are provided by the feasible solutions. As the amount of iterations increases, the amount of segments increases, and the algorithm moves towards the global solution. For more information regarding the branching and refining algorithm, the interested readers can refer to Leyffer et al. [81] and Gong and You [82].

PLAs for Accuracy
The algorithm designed by Nishikawa [83] estimates the global and local asymptotic L 2 error as a piecewise continuous linear approximation in a manner that the target error on the curve can be achievable. The latter task can be accomplished via the local error analysis, which is followed by using key terms for the approximation. Some numerical tests are then performed to verify that the error is around L 2 .

Log-Linearization via Taylor Series Approximation
There exist many different types of non-linear problems (e.g., dynamic stochastic general equilibrium), for which there exist no closed-form solutions and solving these problems is challenging. The log-linearization approach can be applied for such cases. When using the log-linearization approach, it is necessary to approximate the non-linear equations characterizing the equilibrium with log-linear ones. The strategy is to first take the natural logs of the non-linear equations and then use the first-order Taylor approximation around the steady-state to replace the logged difference equations with linear approximations in the log-deviations of the variables. There are different ways to perform log-linearization [84][85][86]. One of the main theories is to apply the Taylor Series expansion as suggested by Griffith and Stewart [87]. Taylor's theorem indicates that the first-order approximation of an arbitrary mathematical function f (x) centered at x = x * (differentiable n times at some point x * ) can be represented as follows: For example, the function f (x) = ln(1 + x) can be approximated at x = 2 by the first-order Taylor polynomial as follows: The first-order Taylor approximations can also be used to convert equations with more than one endogenous variable to a log-deviation form. The first-order Taylor polynomial of the function f (x, y) at the steady-state values x = x * and y = y * gives This methodology could be used to log-linearize equations and take the log-deviation around the steady state-value. Log-linearization means around a steady state. The logdeviation of the variable x from its steady state x * is defined as The right-hand side of Equation (160) can be rewritten as Using the first-order Taylor polynomial mentioned in Equation (157), the log expression can be approximated at the steady state x = x * as Thus, we get log-deviation of x about x * as Consider an example of the Cobb-Douglas production function y t = a t ·k α t ·n 1−α t and then take a log of the function: ln y t = ln a t + α· ln k t + (1 − α)· ln n t .
Since ln y * = ln a * + α· ln k * + (1 − α)· ln n * , we can cancel out the relevant parts of the approximation, which will result in the following expression: For notational ease, the equations are defined as the percentage deviation about the steady-state. Thus, applying log-deviation stated in Equation (163) to this approximation leads to The method of taking logs and then subtracting the log terms from the steady-state equation is very convenient. However, it does not always work. It is only useful for multiplicative equations or when the log removes exponents and converts multiplication into addition to significantly simplify the equation. However, the method of taking logs and then subtracting the log terms from the steady-state equation should not be used for the equations that involve expectation terms, even when the equation is multiplicative [85]. This is because taking the expectation of a logarithmic term is not the same as taking the log from the expectation term.

Conclusions
In this study, a comprehensive and holistic review of transformation and linearization techniques was provided to deal with non-linear terms in optimization models. The following groups of operations research techniques for solving optimization problems with non-linear terms were analyzed: (i) transformations in which the non-linear equations or functions are replaced by an exact equivalent linear programming (LP) formulation to create valid inequalities; and (ii) linear approximations which find the equivalent of a nonlinear function with the least deviation around the point of interest or separate straight-line segments. The existing transformation approaches for different scenarios were considered, including the multiplication of binary variables, multiplication of binary and continuous variables, multiplication of continuous variables, maximum/minimum operators, absolute value function, floor and ceiling functions, square root function, and multiple breakpoint function. As for linear approximations, the present survey provided a detailed review of piecewise approximating functions and log-linearization via Taylor series approximation.
The main advantages and disadvantages of using common transformation and linearization techniques were investigated as a part of this survey as well. Along with a review of the existing methods, this study proposed a new technique for linearizing the square root terms by means of transformation to obtain tight LP relaxations. Furthermore, a new approach was presented to incorporate quadratic integers into LP mathematical formulations. The aforementioned contributions are expected to enhance the solvability of optimization models with non-linear terms and, ultimately, facilitate decision making. Furthermore, the information presented in this survey study can be used by scientists and practitioners, who often work with optimization models that have non-linear terms, and selection of the appropriate transformation and linearization techniques. In conclusion, a detailed review of the relevant literature confirms that transformation methods were found be efficient, as they ensure model feasibility and allow reducing the computation time.
The scope of future research for this study can focus more on additional techniques that can be used to decrease computational time even further after transformation of the original optimization model into its linear form (e.g., Lagrangian relaxation, Benders decomposition). Furthermore, a number of iterative optimization algorithms that directly rely on linearization techniques have been applied in the literature for different decision problems [88][89][90]. Another interesting research direction would be the investigation of computational complexity changes due to the deployment of linearization techniques in such algorithms. Moreover, future research can compare the computational performance of various PLA techniques in more detail for different scenarios. Last but not least, the future research can concentrate more on different domains where transformation and linearization techniques have been applied the most to discover additional tendencies and potential implementation challenges. Data Availability Statement: This is a survey study and all the data are available within the main body of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.