Design of Optimal Controllers for Unknown Dynamic Systems through the Nelder–Mead Simplex Method

: This paper presents an efﬁcient method for designing optimal controllers. First, we established a performance index according to the system characteristics. In order to ensure that this performance index is applicable even when the state/output of the system is not within the allowable range, we added a penalty function. When we use a certain controller, if the state/output of the system remains within the allowable range within the preset time interval, the penalty function value is zero. Conversely, if the system state/output is not within the allowable range before the preset termination time, the experiment/simulation is terminated immediately, and the penalty function value is proportional to the time difference between the preset termination time and the time at which the experiment was terminated. Then, we used the Nelder–Mead simplex method to search for the optimal controller parameters. The proposed method has the following advantages: (1) the dynamic equation of the system need not be known; (2) the method can be used regardless of the stability of the open-loop system; (3) this method can be used in nonlinear systems; (4) this method can be used in systems with measurement noise; and (5) the method can improve design efﬁciency.


Introduction
In the real world, all dynamic systems are nonlinear; linear systems exist only in theory. Many methods of analysis and control have been proposed for nonlinear systems [1,2]. Among these, Lyapunov's method can not only be used to analyze the stability of nonlinear systems; it is also often used to design feedback controllers. At present, many nonlinear control techniques based on Lyapunov theory have been proposed and applied to actual physical systems, such as Lyapunov redesign, backstepping, adaptive control, sliding mode control, etc. The recent developments and applications of the above methods are as follows.
Tavasoli and Enjilela utilized Lyapunov redesign to stabilize the vibration of a boundarycontrolled flexible rectangular plate in the presence of exogenous disturbances [3]. Xu et al., applied output-feedback Lyapunov redesign to control a magnetic suspension system [4]. A backstepping controller was applied to control a quadrotor unmanned aerial vehicle [5], microgyroscope [6], and multiphase motor drives [7]. Adaptive control is a control method that can adapt to parameter changes or initially uncertain controlled systems. Recently, Liu et al., developed a novel adaptive fault-tolerant control strategy to suppress the vibrations of a flexible panel [8]. Liang et al., proposed neural-network-based event-triggered adaptive control for nonaffine nonlinear multiagent systems with dynamic uncertainties [9]. Wang and Na presented parameter estimation and adaptive control for servo mechanisms with friction compensation [10]. Since the publication of the survey paper on sliding mode control in IEEE Transactions of Automatic Control in 1977 [11], sliding mode control has been extensively studied and used in practical applications due to its simplicity and robustness with respect to external disturbances and modeling uncertainties [12][13][14]. In addition, gain scheduling is used to design corresponding linear controllers for nonlinear systems at different operating points or regions [15,16]. Feedback linearization is based on the theory of differential geometry to find an appropriate conversion between the control input and state variables and to convert the nonlinear system into an equivalent linear system. This method has been used for induction motors [17], boost converters [18], an unmanned bicycle robot [19], etc.
However, these methods usually require information on the approximate/nominal dynamic equation of the system and the upper bound of uncertainty or disturbance. Moreover, real controllers are always limited in magnitude. Therefore, in the actual design of a controller, we usually adopt a trial-and-error method. However, during adjustment of the parameters, the state or output of the system may not be within the allowable range. When this occurs, the experiment must be immediately terminated and relevant safety or protective measures must be initiated to prevent injury to the operator or damage to equipment. Therefore, designing a controller with good performance for real applications is difficult and time consuming.
To overcome the aforementioned difficulties and complexity associated with designing actual controllers, we propose an efficient and optimized controller design method for nonlinear unstable systems. First, we establish a performance index (objective function) according to the characteristics of the system to be controlled. This performance index may include a time function and any measurable system state or output signal; the dynamic equation of the system need not be known in advance. Moreover, the performance index itself or the dynamic equation of the system can also contain non-differentiable terms. Most importantly, to ensure that the performance index is applicable even when the state or output of the system is not within the allowable range, we add a special penalty function. The penalty function is used to solve constrained optimization problems. It is used to convert constrained problems into unconstrained problems by introducing an artificial penalty for violating the constraint.
The implication of the penalty function proposed in this paper is as follows. We assume that for the same system, the time interval between the beginning and termination of each experiment or simulation is fixed. When we use a certain set of controller parameters to control the system, if the state or output of the system is within the allowable range within the preset termination time, the penalty function value is zero. Conversely, if the system state or output is not within the allowable range before the preset termination time, the experiment or simulation is terminated immediately, and the penalty function value is proportional to the time difference between the preset termination time and the time at which the experiment was terminated.
To keep the performance index as low as possible, we used the Nelder-Mead (N-M) simplex method (also known as the downhill simplex method) to search for controller parameters. The original concept of the N-M simplex method was proposed by Spendley, Hext, and Himsworth [20], and it was further improved by Nelder and Mead [21]. The convergence properties of the Nelder-Mead simplex method are discussed in references [22][23][24][25]. Because the N-M simplex method is easy to implement, it has been widely used to search for the minimum or maximum value of the objective function in a multidimensional parameter space. In particular, the N-M simplex method does not require the derivative of the objective function; therefore, the real system is applicable even if it has nondifferentiable problems or the objective function value contains noise. The N-M simplex method continues to be applied in different fields, such as parameter estimation [26,27], optimization of machining parameters [28], power plant optimization [29], optimization of the production parameters for bread rolls [30], etc.
To verify the feasibility of the proposed method, we adopted an inverted pendulum system with measurement noise as an example. Then, we employed the N-M simplex method to search for the controller parameters iteratively. The simulation results revealed that even if the initial controller parameters cannot stabilize the system, after the algorithm reaches the iterative termination condition we set in advance, the system is stable and exhibits good transient response performance.

Design of Optimal Controllers for Unknown Parameter Systems
Optimal control is a branch of optimization problems that deals with finding a controller for a dynamical system over a period of time such that an objective function is optimized [31,32]. An objective function is usually called a performance index in the field of control. The purpose of optimization is to obtain a parameter vector such that the objective function is at a minimum. However, in many cases, the choice of parameters is not arbitrary but subject to certain restrictive conditions. We term this the constrained optimization problem. A general constrained minimization problem may be written as follows: For a constrained optimization problem, we usually convert the constraints into a suitable penalty function and add this function to the original objective function. Thus, we transform a constrained optimization problem into an unconstrained problem; moreover, the solution of the unconstrained problem converges to the solution of the original constrained problem.
In the field of optimization control, the commonly used performance indices are as follows [31][32][33]:

1.
Integral squared error (ISE): The smaller the value of this index, the closer the error of the control system in the time interval [0, T f ] is to zero.

2.
Integral absolute error (IAE): The meaning of this index is similar to that of the ISE.

3.
Integral time-weighted absolute error (ITAE): The smaller the value of this index, the closer the error of the control system in the time interval [0, T f ] to zero and the faster the convergence.

4.
Integral time-squared error (ITSE): The meaning of this index is similar to that of the ITAE.

5.
Quadratic performance index: where T f , e, x = [ x 1 x 2 · · · x n ] T , and u = [ u 1 u 2 · · · u m u ] T denote the terminal time, output error, system state, and control input, respectively. Additionally, F and Q are positive semidefinite matrices, and R is a positive definite matrix. When the control objective is to keep the state small, the control input not too large, and the final state as close to zero as possible, we can use this performance index.
The method proposed in this study does not require information on the dynamic equation of the system in advance, and it can use any of the above performance indices or other suitable performance indices. To ensure that the selected performance index is applicable even when the state or output of the system is not within the allowable range, we add a penalty function to the performance index. Before formally defining the penalty function, we first assume that, for the same system, the start time of each experiment (or simulation) is zero, and the terminal time is fixed and represented by T f . If the state or output of the system is within the allowable variation range until the terminal time is reached, the penalty function weight W r is equal to zero. Conversely, if the state or output leaves the allowable variation range before reaching the terminal time, the experiment (or simulation) is terminated immediately; we denote this instant as T r , and we then let the penalty function weight W r be a sufficiently large positive constant. Then, we define the penalty function as follows: To design controller parameters such that the performance index reaches the minimum value, we use the N-M simplex method to search for the controller parameters.
The N-M simplex method proposed by Nelder and Mead [21] is used for solving N-dimensional unconstrained optimization problems of the following form: where J(p) is defined as an objective function, which is usually called the performance index in the control field. After the form of the performance index is determined, the N-M simplex method generates a sequence of simplices, where each simplex is defined by N + 1 distinct vertices, namely, p 0 , . . . , p N , for which the corresponding function values are J 0 , . . . , J N . The points p 0 , . . . , p N are assumed to be sorted such that J 0 ≤ · · · ≤ J N−1 < J N , and p represents the centroid of points p 0 , . . . , p N−1 . In each iteration, simplex transformations in the N-M simplex method are controlled by the parameters α, β, and γ. These parameters should satisfy the following conditions: These parameters have typical values of α = 1, β = 0.5, and γ = 2. The values of α, γ, β, and −β yield the reflection point p r , expansion point p e , outer contraction point p c , and inner contraction point p cc , respectively. The objection functions at these four points are denoted as J r , J e , J c , and J cc , respectively. If none of the four points represents an improvement in the current worst point p N , the algorithm shrinks the points p 1 , . . . , p N toward the lowest p 0 , thereby producing a new simplex. During the shrinking process, each value of p i is replaced by p 0 + 0.5(p i − p 0 ) for i = 1, . . . , N. A new iteration is automatically triggered after the shrinking process is complete. The iterative process continues until the specified termination criteria are satisfied (e.g., the iterations reach the allowed maximum number or the function value J 0 is lower than the default value). We list the various vertices that may be tried during the iteration of the N-M simplex method in Table 1. The pseudo code of the N-M simplex method is shown in Algorithm 1.
Outer contraction The pseudo code of the N-M simplex method.
Define α = 1, β = 0.5, γ = 2 Choose initial p 0 , . . . , p N and calculate J 0 , . . . , J N while termination conditions are not satisfied Sort p 0 , . . . , p N such that J 0 ≤ · · · ≤ J N The N-M simplex method is easy to implement; therefore, it has been widely used to solve unconstrained optimal problems in an N-dimensional parameter space. In particular, the N-M simplex method does not require the derivative of the objective function, and the real system is thus applicable even if the real system has nondifferentiable problems or the objective function value contains noise. For the concept and detailed algorithm of the N-M simplex method, please refer to the literature [20][21][22][23][24][25][26].
Because the N-M simplex method has the abovementioned characteristics, it is suitable for use in optimal controller design. If all the signals in the performance index are available, we need not know the dynamic equation of the system, and we can calculate the performance index values corresponding to each set of controller parameters. We then use the N-M simplex method to gradually find the optimal controller parameters that will allow the performance index to reach the minimum.

Numerical Simulation
Let us consider an inverted pendulum system (Figure 1). We assume that the length of the linear cart rail is 2 m and the middle point is x = 0; the pendulum rod is rigid and massless. All frictional forces in the system can be neglected. Under such an assumption, the entire pendulum mass is concentrated at the center of the pendulum ball. The symbol definitions and simulation conditions for this system are as follows: where u is subjected to the following saturation condition: (10) The dynamic equation of the inverted pendulum is expressed as follows [34,35]: ..
The state variables of the system are defined as follows: x (13) The dynamic equation of the inverted pendulum system can be rewritten as follows: . . . .
The state vector is defined as follows: In this example, we assume that the states x 1 = θ and x 3 = x can be measured but are disturbed by v θ and v x , respectively. Both v θ and v x are Gaussian noises with a mean value of zero and a standard deviation of 0.0001. The states x 2 = .
x ∼ = (x(k) − x(k − 1))/∆t. The controller used in the inverted pendulum system is the following state feedback controller: The purpose of control is to fix the cart at the middle point of the rail and to maintain the angle between the inverted pendulum and the plumb line at 0. In addition, the initial state is x = 0.7 0 0 0 T ; the sampling time ∆t is 0.002 s; the simulation termination time T f is 2 s; the allowable variation range of the pendulum is −θ max = θ min ≤ θ ≤ θ max , where θ max = 0.7; the allowable range of the cart is −x max = x min ≤ x ≤ x max , where x max = 0.8; and T r denotes the time at which θ or x moves out of the allowable range. Additionally, the discrete performance index is defined as follows: (20) where N f ≡ T f /∆t =1000 and N r ≡ T r /∆t. W r is the weighting of the penalty function. When the output signal remains within the allowable range within the termination time T f , W r is zero. Conversely, if the output signal leaves the allowable range before the time reaches T f , then W r is a positive constant that is much higher than the value of the first term on the right-hand side of the equation in function (20). A useful reference value is The actual value used in this example is 10 6 . According to the above description, we define W r in this example as follows: if θ or x exceeds the allowable range at the N r th sampling otherwise (21) When the output θ or x is not within the allowable range, along with adding the penalty function to the performance index, we also immediately terminate (stop) the control.
Because the state feedback controller used in this example has four parameters, we first arbitrarily design five sets of controller parameters as the five vertices of the initial simplex. The five sets of parameters and the corresponding performance indices are listed in Table 2. The time responses of the system are shown in Figure 2. Among these, the parameters p 0 and p 1 can keep the system state within the allowable safe range before the time reaches T f ; therefore, both the penalty function values are 0 and the corresponding index values are small. When using p 2 , p 3 , and p 4 , all corresponding states exceed the allowable range before the time reaches T f . Therefore, the penalty function achieves the expected effect that the corresponding index values are much larger than the index values corresponding to both p 0 and p 1 . From this result, it can also be seen that the earlier the simulation/experiment is interrupted, the larger the corresponding index value (representing a worse corresponding parameter). Table 2. Initial controller parameters for the inverted pendulum system. Then, we used the above five sets of parameters as the five vertices of the initial simplex. We set the maximum number of iterations for searching the optimal parameters to 50. The results obtained using the N-M simplex optimal search method are shown in Figures 3 and 4, where the resulting parameters are k 1 = −276.7, k 2 = −48.12, k 3 = −289.5, and k 4 = −122.1. The corresponding performance index is J = 2.292 × 10 4 , which is also lower than those for the initial four sets of controllers.  Table 2.
Then, we used the above five sets of parameters as the five vertices of the initial simplex. We set the maximum number of iterations for searching the optimal parameters to 50. The results obtained using the N-M simplex optimal search method are shown in Figures 3 and 4 which is also lower than those for the initial four sets of controllers.  Table 2.   Table 2 is used to search for the controller parameters of the inverted pendulum system.
Obtaining a global optimal controller for nonlinear systems is difficult, especially when the dynamic equation of the system is unknown and the state or output signal includes measurement noise. Therefore, the results obtained in the above examples may be local optimal controllers based on specific initial conditions. In practical applications, the most important goal is to design a stable or robust controller effectively, not necessarily to obtain a global optimal controller.
To demonstrate the feasibility of this method, we simulated the above inverted pendulum system with the same state feedback controller, u = kx= −276.7 −48.12 −289.5 −122.1 x. In this simulation, we used 50 different states as initial conditions.
The distribution ranges of the initial states were x(0) = 0. Figure 5 shows the time responses of the above simulation. The results show that the 50 different initial states approached equilibrium within 2 s.

Conclusions
In this study, we proposed a systematic method for designing optimal controllers for systems with unknown dynamic equations. First, we proposed an original performance index based on the characteristics and control aim of the controlled system. The performance index can include the state, output, error, or control input of the system. To ensure that this performance index was applicable even when the state or output of the system was not within the allowable safety range, we added a key penalty function. Then, we used the N-M simplex method to search for the optimal controller parameters iteratively. In addition to the ease of implementation of the N-M simplex method, another important advantage of the algorithm is that it only needs all the signals in the performance index to be available, without the need to know the dynamic equation of the system in advance.
To demonstrate the feasibility of the proposed method, we adopted an inverted pendulum system with measurement noise as the example. The simulation results showed that even if the initial controller parameters could not stabilize the system, after the algorithm reached the iterative termination condition, not only was the system stable but it also exhibited good transient response performance.
The optimal controller parameter search method proposed in this study has the following advantages: (1) the dynamic equation of the system need not be known; (2) the method can be used regardless of the stability of the open-loop system; (3) the method can be applied to both linear and nonlinear systems; (4) the method can be used in systems containing measurement noise; and (5) the systematic nature of the method can improve the design efficiency.