A Machine Learning Algorithm That Experiences the Evolutionary Algorithm’s Predictions—An Application to Optimal Control

: Using metaheuristics such as the Evolutionary Algorithm (EA) within control structures is a realistic approach for certain optimal control problems. They often predict the optimal control values over a prediction horizon using a process model (PM). The computational e ﬀ ort sometimes causes the execution time to exceed the sampling period. Our work addresses a new issue: whether a machine learning (ML) algorithm could “learn” the optimal behaviour of the couple (EA and PM). A positive answer is given by proposing datasets apprehending this couple’s optimal behaviour and appropriate ML models. Following a design procedure, a number of closed-loop simulations will provide the sequences of optimal control and state values, which are collected and aggregated in a data structure. For each sampling period, datasets are extracted from the aggregated data. The ML algorithm experiencing these datasets will produce a set of regression functions. Replacing the EA predictor with the ML model, new simulations are carried out, proving that the state evolution is almost identical. The execution time decreases drastically because the PM’s numerical integrations are totally avoided. The performance index equals the best-known value. In di ﬀ erent case studies, the ML models succeeded in capturing the optimal behaviour of the couple (EA and PM) and yielded e ﬃ cient controllers.


Introduction
Controlling a process subjected to a performance index is a usual task in process engineering.Theoretical control laws can be implemented in favourable situations where the process has certain mathematical properties.On the other hand, when the process has profound nonlinearities, or its model is uncertain, imprecise or incomplete, using metaheuristic algorithms (EA, Particle Swarm Optimization, etc.) (see [1][2][3]) within an appropriate control structure could be a realistic solution.Control engineering recorded many examples of using metaheuristics [4][5][6][7][8][9] owing to their robustness and capacity to cope with complex problems.
Generally speaking, the metaheuristic algorithm's role within a controller is to predict the optimal (quasi-optimal) control values.The predictor forecasts the optimal control sequence for a prediction horizon, and the controller decides the next optimal control value.A control structure adequate for this kind of controller is receding horizon control (RHC) [10][11][12].It is used to solve optimal control problems (OCPs) and includes a process model (PM).
Because this work's main result is applied to the RHC, we recall hereafter the basic principles due to which this closed-loop structure controls the process optimally: - The controller acquires the process's current state and makes optimal predictions to establish the current optimal control values.- The controller embeds a PM (for example, a set of algebraic and differential equations) to compute the predictions via the PM's numerical integration.- The controller organizes the shifting of the prediction horizon.
A possible organization of the receding prediction horizon is given in [10,13].An RHC particular case is the well-known model predictive control (MPC) that minimizes the prediction errors at each sampling period.There are plenty of articles addressing MPC and covering different aspects, from which we recall a few: theoretical works in [14,15], tutorial reviews in [16], and surveys of industrial applications in [17][18][19].
Many works have integrated genetic algorithms (GAs), EAs, and other metaheuristics inside the RHC and implemented successfully real-time control structures.The book by Jayaraman and Siarry [6] describes many applications of this kind.Goggos and King [20] introduced the evolutionary predictive control technique.At every sampling moment, evolutionary algorithms generate and evaluate a family of optimum predictive controllers having different parameters, and the best performer is selected.
Other works make EAs or GAs fit in the MPC structure; emphasis is placed on the operators' definition.The authors of [21] proposed a specialized GA optimization method based on the Takagi-Sugeno model for fuzzy predictive control.Nonlinear MPC strategies are described in [22]; the paper proposes stochastic optimization algorithms associated with a polynomial-type process model.
The RHC was also used for flood control in [23], and ulteriorly, paper [24] described a real-time flood control system using an EA and the RHC.
In previous works, the authors have studied implementing the prediction module using EAs [25,26].The EAs provided a realistic solution due to many possibilities to reduce the predictor's execution time.Certain control engineering aspects, such as the continuous signals' discretization and the dynamic sub-systems' time constants, determine the choice of sampling period (T).So, the value T could not be increased at will.Within an iterative process, the controller makes predictions for the process's evolution over a prediction horizon, hT (h variable).A larger value for h is desirable since there is an increased chance of quasi-optimal behaviour along the control horizon.
On the other hand, the larger the value h, the larger the prediction calculation time.However, the controller execution time, including EA predictions, cannot exceed the sampling period.Due to the large computational effort, this is the controller's most restrictive time constraint.So, the predictor's execution time decrease is the challenge of this approach (EA + RHC), which is mainly appropriate for slow processes with large sampling periods.

A Machine Learning Algorithm Extending the Applicability of the EA Predictions
Extending the applicability of the EA predictions is a challenge, involving techniques and control structures diminishing the execution time [26].One can consider that our work addresses the execution time's decreasing, but the proposed machine learning (ML) task (see [27][28][29][30]) largely exceeds this topic.
This work proposes an interesting issue: whether a machine learning algorithm could "learn" the optimal behaviour of the couple (EA and PM).A positive answer would have very favourable consequences for the controller implementation.The substitution of the couple (EA and PM) with an accurate ML algorithm could cause the predictions' computation time to decrease significantly and the controller's structure to be much simpler.In this context, the ML algorithm has to construct usable models experiencing datasets catching the optimal behaviour of the couple (EA and PM).Concretely, this work's main objective is to answer the above-mentioned question positively by proposing the following:

•
Realistic datasets apprehending the optimal behaviour of the couple (EA and PM); • Appropriate ML models.
Hence, this work will propose an ML model that can substitute the couple (EA and PM) inside the optimal controller while keeping the control performances.In other words, the ML model would be equivalent in a certain sense to the optimal behaviour of the EA plus PM; both entities are predictors [31].According to our knowledge, this "intelligent equivalence" can be considered a new issue.
Before using the controller within the closed-loop system in real time, a simulation program must validate the designed controller via the process's evolution along the imposed control horizon; the process and the PM are considered identical.The simulation's results are usually sequences of control and state variables' values along the control horizon that can be stored or recorded.This data is a mark of the system's evolution made up of the control profile (sequence of control values) and state trajectory (sequence of state variables' values).Repeating the simulation many times, we can aggregate these data and generate datasets for the ML algorithm.The simulations are conducted offline, so there is no execution time constraint.The time constraint mentioned above exists only when the closed loop works in real time.
An important remark is that we do not need data from the real process to be included in the datasets.The predictor module predicts optimal trajectories using only the EA and PM, whose inter-influence must be captured by the datasets.When an accurate ML model replaces the initial predictor, the controller should behave quasi-identically within the simulations, which is our goal.The ML model only generalizes in real time when the real process's states are used as initial states.
This work will ascertain the previous considerations and propose an approach to construct the datasets and ML model, starting from the OCP to solve.Besides general considerations, we will apply the proposed methods to a specific OCP, exemplifying the proposed methods and algorithms to make the presentation easy to follow.
We will consider OCPs with a final cost that uses predictions to exemplify the equivalence mentioned above and implement ML controllers.Section 2 recalls the general approach to solving the OCPs using EAs developed in previous authors' works [25,26].The Park Ramirez Problem (PRP) (see [26,32,33]) is a kind of benchmark problem already treated in this context, which is addressed as an example.This paper will report partially previous results for comparison and take over the EA predictor's implementation.Section 2 is mainly necessary because the optimal control using the EA will supply the datasets for ML.Although the aspects presented in this section are not among this paper's contributions, they introduce the notations and keep the discourse self-contained.
Section 3 answers the following three basic questions: • What data do we need to capture the optimal behaviour of the couple (EA and PM)?

•
How do we generate the datasets for the ML algorithm?

•
What ML model can be used to design an appropriate controller (we will name it ML controller)?
These are general questions, each of which has subsumed aspects to clarify.Section 3.1, describing the proposed method, answers these questions succinctly; details will be given in the next sections.It also establishes a controller design procedure.
The starting point in our approach is that we have already solved the considered OCP, and the implemented controller has a prediction module using a specific EA.Section 3.2.1 describes an algorithm achieving the closed-loop simulation along the control horizon devoted to the EA predictor.The simulation will record the sequence of optimal control values-the optimal control profile-and the optimal trajectory (sequence of states).The two sequences can be regarded as a series of couples (state, control value), each couple associated implicitly with a sampling period.
Repeating the simulation M times (e.g., M equals 200), data concerning these optimal evolutions of the closed loop is collected and aggregated into a data structure presented in Section 3.2.2.These aggregated data structure characterizes the optimal behaviour of the couple (EA and PM) globally, that is, for the whole control horizon.The extraction of datasets characterizing each sampling period from the aggregated data creates the premise to find a model of optimal behaviour for each sampling period.Section 3.2.3describes how the aggregated data is split into datasets for each sampling period.Moreover, training and testing datasets are created.
In Section 3.3, the ML models are constructed according to an important choice.The ML model will be a set of regression functions; a linear regression function will be determined for each sampling period [34][35][36].The first reason for this choice is the model's simplicity, which is important, especially when the control horizon is large.Secondly, the control law will be directly determined.This fact is appropriate for the controller implementation, which is now straightforward.Section 3.3.1 proposes simple models with linear terms for each state variable.In contrast, Section 3.3.2uses the stepwise regression strategy to generate regression functions, which are allowed to include nonlinear terms as the interactions.The ML model succeeds in reproducing the optimal behaviour of the EA predictor with a high accuracy.
The simulation of the closed-loop system is the way to test the generalization aptitude of the new predictor after its implementation.In its first part, Section 4 describes an algorithm that simulates the closed-loop system using the ML controller along the control horizon.The second part compares the simulation results to those anteriorly obtained with the EA predictor for the PRP case [26].The state evolutions are practically identical, and the performance index equals the best of M evolutions, which is true for the two types of regression functions.
The simulation results (in the PRP case and other case studies not presented in this paper) proved that the proposed ML models succeeded in apprehending the optimal behaviour of the couple (EA and PM) and engendered efficient controllers.
We consider that our work has the following findings:

•
The interesting issue itself which is to find an ML model experiencing the datasets generated by an EA (or another metaheuristic), trying to capture the latter's optimal behaviour.

•
The dataset's construction as a dynamic trace of the EA predictor, aggregating the trajectories and CPs.

•
The dataset extraction for each sampling period as a premise to find a temporally distributed ML model.

•
The design procedure for the ML controller and all associated algorithms (simulation and models' construction algorithms).

•
The outstanding decrease in the ML controller's execution time.
Special attention was addressed to the implementation aspects such that the interested reader can find support to apprehend and eventually reproduce parts of this work or use it in other projects.With this aim in view, all algorithms used in this work are implemented, the associated scripts are attached as supplementary materials, and all the necessary details are given in Appendices A-E.

Optimal Control Using Evolutionary Algorithms
This section recalls the general approach to solving the OCPs using EAs developed in anterior papers [25,26].The minimal elements presented here introduce the notations and keep the discourse self-contained.

Optimal Control Problems with a Final Cost
The structure of an OCP being well known, we consider in the sequel only the defining elements adapted to the problem taken as an example in this section.Rigorous mathematical details will be avoided to simplify the presentation.

Process Model
In our approach, the controller includes a process model constituted by algebraic and ordinary differential equations: ( , , ) 0, 1, , where -a vector with n state variables; -a vector with m control variables.An example of a process model is Equation (7) from Section 2.1.4.

Constraints
There are many constraint types, but we mention only those used in the case study presented in this paper.

Control horizon:
Bound constraints: , ; 0 The values max min , are the technological bounds of the variable ( ) If T is the sampling period of the control system, we can divide the control horizon into H sampling periods:

Cost Function
The problem is to determine the control function ( ) U ⋅ optimizing (max or min) a specific cost (objective) function J, whose general form is given below: The function L determines the integral component (Lagrange term) of function , while final J (Mayer term) rewards (or penalizes) the final state (in most cases).

Remark 1.
When the final cost is present, and the controller makes predictions, whether or not there is an integral term, the prediction horizon must end at the final time, involving the biggest computational complexity.
Given Remark 1, we consider only the final cost, a situation suited to our OCP (see Section 2.1.4).
( ) . The problem's solution is the function that engenders the cost function's optimal value.This value, J0, is called the performance index:

An Example of OCP with a Final Cost
The Park-Ramirez problem (PRP) is a kind of benchmark problem ( [26,32,33]) that can exemplify a final cost OCP.The nonlinear process models a fed-batch reactor, which produces secreted protein.This problem has been addressed in many works to study integration methods.

Bound constraints:
[ ] An open-loop solution can not be used in real time because the process and PM have different dynamics (even when there are small differences); this will produce unpredictable efficiency.We want to generate a controlled optimal process (a closed-loop solution) starting from a given X0 whose final cost should be J0.

A Discrete-Time Solution Based on EAs
A control structure that can generate the optimal solution is RHC ( [25,26]).Its controller includes the process model and a prediction module (see Figure 1).The last one predicts, at each moment kT, the optimal control sequence until the final time.Then, the controller outputs this sequence's first element as the optimal value and inputs the next process's state.
The prediction module using an EA proved a realistic solution due to many possibilities to reduce its execution time [26].
To use an EA, we append to our OCP the discretization constraint: ( ) So, the control variables are step functions.For the sake of simplicity, the time moment k T ⋅ will be denoted in the sequel.For example, inside the sampling period , the control vector is The control structure using EAs.
We name a "control profile" (CP) a complete sequence of H control vectors,  The EA yields candidate predictions over prediction horizons and evaluates the cost function J.For the sampling period [k, k + 1), a candidate prediction is a control sequence having the following structure: The vector ( ) X k is the process's current state.It is also the initial state for the candidate prediction with H-k elements.This fact justifies the appellation of "Receding Horizon Control".Using Equations ( 1) and ( 8), the EA also calculates the corresponding state sequence (with At convergence, the EA returns (to the controller) the optimal prediction sequence, denoted ( )  V k : Finally, the first value of the sequence, V(k), becomes the controller's best output, denoted U * (k), sent towards the process: Remark 2. The optimal control * ( ) U k is also a function of the current state ( ) X k , which does not appear as a distinct argument in (11) to keep the notation simple and easy to follow.Nevertheless, this dependence is essential for the machine learning models as well.
The other values of the sequence ( ) V k are forgotten, and the controller will treat the The controller equipped with the EA constructs the optimal CP for the given initial state X0 = X(0) and the entire control horizon, concatenating the optimal controls * ( ) . It forces the system to pass through a sequence of "optimal states", the optimal trajectory 0 ( ) X Γ : ( ) ) ..., These two sequences completely characterize the optimal evolution of the closed loop over the control horizon.Theoretically, the optimal cost function will reach the value J0 if the process and its model are identical.Practically, this value will be very close to J0, such that Ω(X0) is a quasi-optimal solution of our OCP.
The flowchart of the prediction module is drawn in Appendix A. The interested reader can find the main characteristics of the implemented EA in [26] or the supplementary materials appended to this work.In our implementation, the EA's code is presented in the script RHC_Predictor_EA.m.The initial population is generated using the control variables' bounds.The cost function is coded within the file eval_PR_step.m.All scripts are included in the folder ART_Math, as the other functions called by the predictor and implementing the EA's operators and the PM.

The General Description of the Proposed Method
The PRP and other problems of this kind allow the validation of the designed optimal controller by simulating the control loop.
The simulation of the control loop is an important design tool in this context, which can supply the sequence ( ) 12) and ( 13)) that describe the quasi-optimal evolution of the loop.By repeating the control loop simulation M times, we obtain M different optimal (actually quasi-optimal) couples (CP-trajectory), even if the initial states would be identical due to the EA's stochastic character.Moreover, in the case of PRP, the initial state can be perturbed to simulate the imperfect realization of the initial conditions when launching a new batch.Let us consider a lot of the M simulations illustrated in Figure 3.At step k of the control horizon, the controller must predict the optimal control output (sent towards the process) using its predictor module based on the EA and MP described before.Data concerning the same step have some common aspects:

•
The initial process's state is input data for the EA.

•
The prediction horizon is the same: The PM is the same.

•
The M simulations use the same EA.
The EA calculates and returns the optimal control vector * ( ), . A dataset including the simulation results for step k can be constructed as follows: The dataset resembles a table due to the transposition operator.If M is big enough, this dataset collects an essential part of the EA's ability to predict optimal control values for step k.It would be useful to answer the question: How can we generalize this ability for other current state values the process could access at step k?A machine learning algorithm is the answer.For example, linear regression (see [34,36]) can construct a function using a dataset like that presented before.
Remark 3. The linear regression function k f models how the EA determines the optimal prediction at step k.The set of functions is the couple (EA-PM) machine learning model.The behaviour of the EA, which, in turn, depends on the PM, is captured by the set of functions Φ .
Hence, it would be possible to successfully replace the predictor with EA by this set of functions and obtain a faster controller.
Logically, a few steps lead us to a design procedure for the optimal controller.Design procedure: 1. Implement a program to simulate the closed loop functioning over the control horizon (H) using the controller based on the EA.To simplify the presentation, we will call it ControlLoop_EA in the sequel.The output data are the quasi-optimal trajectory and its associated control profile ( ( ) ).
2. Repeat M times the module ControlLoop_EA to generate and save M quasi-optimal trajectories and their control profiles.3. Extract, for each step k, datasets similar to Table 1 using the data saved at step 2. 4. Determine the machine learning model (for example, the set of functions Φ ) experiencing the M trajectories and control profiles.5. Implement the new controller based on ML.It will be called ML controller in the sequel.
Simulation of the closed loop using the new controller: 6. Write a simulation program called ControlLoop_ML for the closed loop working with the new controller.This simulation will test the feasibility of the proposed method, the quality of the quasi-optimal solution, the performance index, and the execution time of the new controller.
The set of functions Φ can determine the optimal CP starting from a given state 0 X , following the transfer diagram from Figure 2, and applying the functions to the current state successively: Remark 4. The design procedure shows that the new controller is completely designed offline.No data collected online from the process is necessary.The result of this procedure is the new controller, ML controller, which could be used in real time after a robustness analysis.

Dataset Generation for Machine Learning Model
The first two steps of the design procedure will be described in this section, trying to keep generality.Only some aspects will refer to the PRP to simplify the presentation.At every moment k, the controller calls the predictor based on the EA, RHC_Predic-tor_EA.The last one returns the optimal control value * ( ) U k , which is used by the function RHC_RealProcessStep to determine the next process's state.
The optimal control value and the optimal states are stored in the matrices uRHC (H × m) and state (H × n), respectively, having the structure presented in Figure 5.
T X ( ) T U H − Figure 5.The matrices for the quasi-optimal trajectory and its CP.
Hence, the optimal CP and trajectory are described by the matrix uRHC and state, respectively, which are the images of ( ) sequences (see ( 12) and ( 13)).
In the case of the PRP, an example of matrices describing a quasi-optimal evolution is given in Figure 6.Notice that this time, m = 1 and the 16th state is the final one.state ( 16 Figure 6.An example of matrices for the optimal trajectory and its CP (PRP case).

Aggregation of Datasets concerning M Optimal Evolutions of the Closed Loop
This subsection corresponds to step 2 of the design procedure.The controller's optimal behaviour learning process needs data from an important number of optimal evolutions.Practically, the program ControlLoop_EA.m will be executed repeatedly M times (e.g., M = 200) in a simple loop described in the script LOOP_M_ControlLoop_EA.m.The objective is to create aggregate data structures and store the optimal trajectories and their CPs.
Figure 7 illustrates possible data structures, a cell array for M tables storing the trajectories, and a matrix storing their CPs.The optimal CPs could also be stored in a cell array, but in the PRP case (m = 1), a matrix (M × H) can store these values simpler.We call it UstarRHC (200 × 15); on each line, it memorizes the transposed vector uRHC from Figure 6.
The performance index values are also stored in a column vector JMAT (M × 1) for different analyses.All these data structures could be saved in a file for ulterior processing.

Extraction of Datasets Characterizing Each Sampling Period
This subsection details step 3 of the design procedure.For each sampling period k, a dataset similar to Table 1 Using the data structures defined before, it holds ( ) STATEi designates the i th component of the cell array STATE.In the PRP case, we have .
Remark 5.Because the controller's optimal behaviour has repercussions on each sampling period, the learning of its optimal behaviour will be split at the level of each interval [k, k + 1).After constructing the matrix SOCSK, we will convert it into a table, which seems more convenient for processing datasets for machine learning in some programming and simulation systems.The result is the table named datak, which has variables and properties.After that, this table's lines are split into table datakTrain with 140 examples for training (70%) and table datakTest with 60 data points for testing (30%).Finally, these tables are stored in cell #k of the DATAKTrain and DATAKTest array, respectively.They will be used by the machine learning algorithm ulteriorly.

Construction of Machine Learning Models
This section covers step #4 of the design procedure.As stated by Remark 3, the set of functions is the machine learning model for the optimal behaviour of the couple (PM-EA).The data points are couples (process state-optimal control values) related to moment k, which the function k f can learn.
Of course, another kind of machine learning model globally treating the learning process (PM-EA's control profile) could be addressed without splitting the learning at the level of sampling periods.The resulting model would be more complex and difficult to train and integrate into the controller.
In this work, we mainly chose multiple linear regression as a machine learning algorithm because of its simplicity.This characteristic is important, especially when H is large.Secondly, the linear regression functions k f for each sampling period are appropriate for a controller implementation; these functions directly give the control law (see Equation ( 14)).To emphasize this aspect, Algorithm 1 presents the general structure of the ML controller when using the set of functions Φ .
Algorithm 1.The structure of the controller's algorithm using linear regression functions.
1 Get the current value of the state vector, X(k); /* Initialize and ( ) We can determine the set of functions Φ via multiple linear regression considering different models containing an intercept, linear terms for each feature (predictor variable), products of pairs of distinct features (interactions), squared terms, etc.In other words, the resulting functions could be nonlinear as functions of process states.
In our case study, we will also apply the strategy of stepwise regression that adds or removes features starting from a constant model.

Models with Linear Terms for Each State Variable
Remark 6.Our objective is not to find the "best" set of linear regression models but to prove that our approach is working and this new model can replace the EA.
That is why, for the beginning, we adopt a very simple model such that each regression function k f is a simple linear model involving only linear terms for each state variable and an intercept.Each model is trained and tested separately, considering the datasets already prepared in the cell arrays DATAKTrain and DATAKTest.The construction of these models is presented by the pseudocode in Algorithm 2. The datasets for training and testing, described in Section 3.2.3, are now input data for this algorithm.The coefficients of each function k f having the form (15), are stored in an output matrix called COEFF.These models are objects stored in an output cell array called MODEL ( 1 H × ).Line #4 creates the model "mdl" using the function fitting_to_data that fits the function (15) to the dataset "datakTrain".At line #6, the function get_the_coefficients extracts the six coefficients, which are put in a line of matrix COEFF afterwards.
The predicted values corresponding to datakTest are stored in the vector "uPred" by the function fpredict to be compared with the experienced values.Details concerning the implementation of fitting_to_data, get_the_coefficients, and predict functions are given in Appendix B. For the PRP, e.g., the coefficients are given in Table A1 (Appendix B).A comparison of the values in "datakTest" to the predicted values "uPred" is given in Figure 9 [34,36].The blue line is the plot of test values against themselves.The 60 predicted values are disposed along the blue line at a distance inferior to 0.2 for most of them.

Models Constructed via Stepwise Regression
This section will propose a more elaborate model that allows the possibility to include nonlinear terms as the interactions, that is, the product of predictor variables (ex.: x1*x4).
As an example, for the PRP case, we will apply the strategy of stepwise regression that adds or removes features starting from a constant model.This strategy is usually implemented by a function (T) stepwise returning a model that fits the dataset in Table T.

model (T) ← stepwise
The script GENERATE_ModelSW.m constructs de set of models using this function (see Appendix C).This script yields the regression functions given in Table 2, where the colored terms are, actually, nonlinear.We can see how the function stepwise works and what are the statistic parameters that validate the model.

Simulation of the Control-Loop System Equipped with the ML controller
To evaluate the measure in which the set of functions Φ succeeded in "learning" how the EA acts as an optimal predictor, we will refer to the simulation results of the control loop.The script ControlLoop_ML.mimplements step #6 of the design procedure and is described by the pseudocode in Algorithm 3.  In line #6, the function feval returns the value uML(k) that the model mdl predicts when the current state is X0 (also a local variable).Instead of using feval, one can use the product coefficients-state variable.The function step_PP_RH(uML(k), X0) calculates the next state (at the moment k + 1) by integration of state Equation (7).The codes for all the proposed functions are given in the folder ART_Math.
In the PRP case, we used ControlLoop_ML.m to simulate the control loop using the models constructed in Section 3.3.1.The ML controller yielded the CP drawn in Figure 11.This one engenders the quasi-optimal state evolution depicted in Figure 12b, which can be compared to the typical evolution, Figure 12a, generated by the RHC endowed with an EA. Figure 12a was produced within the authors' previous work [21].The resemblance between the two process's responses is very high and proves that the set of functions k f succeeded in emulating the optimal behaviour of the EA.Moreover, the performance index's value achieved by the CP in Figure 12b is very good 0 32.0986J = because it equals the maximum value (J0 = 32.0936)recorded in the M data points generated by the EA.Hence, the PRP's solution found by the ML model is also quasi-optimal.

The Controller's Execution Time
Because the initial reason for this research is just the EA's execution time decrease, we compared the execution time of the two controllers, the EA and ML controllers.Actually, we compared the simulation times for the two closed loops with the EA and ML models because we have already written the programs ControlLoop_EA.m and Con-trolLoop_ML.m.The simulations were carried out using MATLAB-2023 system and the processor Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz.The resulting times are 38 s and 0.08 s, respectively (see Appendix D).The average execution times of the controller are 38/H and 0.08/H seconds.H equals 15 in the PRP case.
Remark 7. The decrease in the controller's execution time is outstanding, all the more so because the ML controller keeps the evolution accuracy.In anterior research, we have obtained a small decrease but with an acceptable accuracy decrease as well.
How can we explain this outstanding decrease in the controller's execution time?The key is how the EA predictor works.The EA algorithm generates a population of solutions and then evaluates the cost function by numerical integration of the PM for each solution and along many generations.The PM's numerical integration for thousands of times is the most time-consuming.Nothing of these happens when the ML model works, only a single evaluation of a simple regression function.The EA's computational complexity is huge compared to a few calculations.
The model developed in Section 3.3.2 2 Φ (considering 1 Φ the model constructed in Section 3.3.1)leads us to a new ML controller.This time, we used the script Con-trolLoop_MLSW.m (see Appendix C) to simulate the closed-loop system.Figure 13 plots the resulting CP in a blue line.The CP in Figure 11 is also plotted in red to facilitate the comparison.The two CPs are practically identical, which will engender the similarity of the state evolutions.Figure 14 confirms that situation and plots the state evolutions involved by the two controllers.This remarkable similarity is because the models 1 Φ and 2 Φ emulate the same couple (EA-PM), and both carry it out very well.Consequently, the performance index is practically the same ( 0 32.0986J = ).Although the results are identical, the sets of regression functions are different.The functions set 2 Φ seems more appropriate for the controller's implementation because of their simpler formulas.
In real time, the difference between the PM's state and the state of the process is the biggest problem.We can consider that the process is affected by a noise representing this difference.When the noise is important, the control loop could loose totally its functionality.Our desideratum is that the controller keeps acceptable performances for a significative noise range.Inside this range, the controller must reject this difference as much as possible, involve a quasi-optimal process evolution, and produce a performance index near J0 (see Appendix E).

Discussion
This paper positively answers the issue raised in Section 1: whether an ML algorithm could "learn" the optimal behaviour of the predictor based on an EA.The proof was made in the context of OCPs, giving rise to some findings: 1.The issue statement oneself: To find an ML model experiencing the datasets generated by the couple (EA and PM), trying to capture its optimal behaviour.An EA is a product of computational intelligence.This link between two "intelligent" entities is interesting; further developments can be derived from this "equivalence".The same issue can be considered when instead of EA is another metaheuristic.2. The dataset's construction as a dynamic trace of the EA predictor by aggregating the trajectories and CPs.The number M is a procedure's parameter established according to the process complexity.The dynamic trace must include couples (state, optimal control value) spread throughout the evolution space.In this way, the ML model could generalize well.3. The dataset extraction for each sampling period, which is a premise to find an ML model for each k. 4. The outstanding decrease in the ML controller's execution time. 5.The design procedure for the ML controller and all associated algorithms (simulation and models' construction algorithms).
Finding #3 is related to the fact that we are looking for an ML model comprising a set of regression functions Φ .We underline the motivation: its simplicity, especially when the control horizon has many sampling periods.If need be, obtaining the model during the sampling period is conceivable.On the other hand, a regression function for each sampling period means that the control law is directly implemented.The controller implementation is now straightforward.The degree to which the ML controller accurately reproduces the couple (EA and PM)'s behaviour outclasses another achievement of the controller: its execution time.Because the initial reason for this research is just the EA's execution time decrease, this subject deserves more attention.The simulation time for the closed loop using the EA predictor for the PRP case is 38 s, while the same simulation using the ML controller is 0.08 s.This outstanding achievement is due to three factors:

•
The ML model is split at the level of each sampling period; a single regression function k f is the current model; • A regression function has a very simple expression, which is, in fact, just the control law; • The PM's numerical integration is totally avoided.
According to the authors' experience addressing this subject, the execution time decrease to a small extent is carried out by paying the price of predictions' accuracy decrease as well.In the case of the ML controller, remarkably, that does not happen; the accuracy is kept.
As we mentioned before, predictors based on metaheuristics are usually used for slow process control.Owing to its small execution time, the ML controller largely extends the set of processes, which can be controlled using EA or another metaheuristic.We must implement the EA predictor in the first phase of the controller's design, which will produce the datasets for the ML algorithm's training.Finally, the ML predictor will be used as a faster equivalent of the EA predictor in the control structure.
Special attention was addressed to the implementation aspects related to the PRP case.All algorithms used in this work are implemented, the associated scripts are attached as supplementary materials, and all the necessary details are given in Appendices A-E.Readers can find support to apprehend and eventually reproduce parts of this work.
Finally, the ML models and the controller must fulfil their task in real conditions, namely when the control system works with a real process.Although the design procedure does not depend on the real process, an interested reader can ask how to forecast the closed-loop behaviour when the real process and the PM differ.At the end of section #4 and in Appendix D, some elements can help to simulate this situation.
To simplify the presentation, we applied the proposed methods only in the PRP case.The methods were also applied in other case studies not presented in this paper, with the same favourable conclusion; the proposed ML models succeeded in apprehending the optimal behaviour of the couple (EA and PM) and yielded efficient controllers.
Obviously, another ML model treating globally the learning process is conceivable, without splitting the learning at the level of sampling periods.The resulting model would be more complex and difficult to train and integrate into the controller, but it could be useful in other applications.In a future research project, we will make investigations in this direction.

Conclusions
The ML algorithm presented in this paper is not in line with the control techniques developed inside the control systems theory, like the PID, adaptive, robust, nonlinear control, optimal control, MPC, RHC, etc. Their result is a controller that controls a process and fulfils the control objectives.The proposed ML model, stemming from computer science theory, is the "intelligent equivalent" of the optimal behaviour of the couple (EA and PM).We need the dataset capturing the couple's optimal behaviour to obtain such a model.Consequently, we must implement the PM and the EA controller and simulate the closed loop running many times to obtain the dataset.In other words, we must implement the control technique, RHC with EA predictions, as a sine-qua-non condition to construct the ML model.Once available, the latter will replace the EA predictor to achieve the socalled ML controller.
Our proposed approach is suitable to solve our main problem, the predictor's execution time decrease, owing to the following aspects: -The controller's design procedure is feasible using only offline simulations of the closed loop (with EA and PM).The real process states' evolution is not needed in this phase.All we need is a large dataset that captures the response of the EA predictor in different states and times, regardless of if the state belongs to a real or simulated process.The EA predictor acts in the same way because it uses only the PM in both situations.- The regression functions are expressed straightforwardly by simple formulas, which are actually the control laws for each sampling period.- The ML controller works very well in closed-loop mode; it generalizes accurately when the controller obtains the real process states.-The controller's execution time decreases remarkably.
The simulations described and used in our work, except the one from Section 4, are parts of the controller's design procedure.The result of this procedure, the ML controller, was tested in Section 4 to see if it generalizes well in a closed-loop structure with a real process.This time, we conducted a simulation study considering the process model identical to the PM.The process evolutions in both cases (with EA and ML controllers) were quasi-identical, proving that the ML model generalizes very accurately.The case when the PM is affected by an additive noise is also briefly addressed.
Obviously, the ML controller could be used successfully in a real-time control application.The PRP, used as an example, is not a theoretical problem but a real one.So far, many control techniques have been successfully implemented in real time, from numerical integration methods to RHC.As in many other chemical or biochemical processes, every batch must be optimally controlled (with the same parameters [27,28]).The authors have been involved in a real-time control application concerning microalgae growth [8] that could be addressed in a future project using the ML controller.
However, there is a more important challenge; the ML controller opens a new perspective to control fast processes (having small time constants) besides slow processes (as biochemical processes).Even a small sampling period could be sufficient to calculate a simple formula extracted from the ML model.So, the RHC structure with EA and PM can be extended to a wider range of processes but using the ML predictor when implementing.
The main objective of the presented work was to prove that a machine learning algorithm could "learn" the optimal behaviour of the couple (EA and PM).The proposed algorithm is a multiple linear regression that helped us to fulfil our main objective; we proved that optimal behaviour can be learned.The ML model is a set of regression functions, one for each sampling period.This choice is not to avoid better models, possibly more difficult to construct.Its simplicity, i.e., materialized by a non-complex formula, makes it suitable, especially when the control horizon has many sampling periods.The second reason is that the regression function directly implements the control law for each sampling period.The optimal control value's computing involves simple and fast calculations.That is not the case for the nonparametric techniques (Support vector machines, decision trees, Gaussian process regression, and neural networks, …) that should call prediction functions to return optimal control values.
Our future work concerning this topic could have two directions.The first refers to regularising the proposed linear regression model (Ridge and Lasso Regression and Elastic Net).The second one will aim to construct a global ML model, which will not be split at the level of each sampling period.We will use nonparametric learning models (especially Gaussian process regression and neural networks).The advantages mentioned above will be lost, but we expect to obtain global ML models covering the entire control horizon for more complex PMs.
The cost function values of the offspring are calculated inside the EA's operators.Variable F is used to stop the iterative process towards the optimal solution.When the performance index is close to J0, the EA stops, considering that a very good solution is found, and the best solution becomes the predicted control sequence.

Appendix B
The Models' Construction Script Our implementation is based on the MATLAB system, in which "fitlm", "model.Coefficients", and "predict" correspond to fitting_to_data, get_the_coefficients, and fpredict (the functions proposed in Section 3.3.1),respectively.The algorithms in Figure 8 and Algorithm 1 are joined and implemented by the script Model_Construction.m.The coefficients found by the script Model_Construction.m are listed in the table below.

Appendix E
The ML controller's Capacity to Reject Noises Affecting the Process State The general problem of noise modelling depends on the process specificity and overpass the aim of this work.Our immediate goal is to provide a simple technique to test the existence of a noise range where the controller keeps acceptable performances.
In our simulation, we considered the noise is added to the PM's state variables.For our example, the noise I i n , equivalent to all influences, is added to the PM's vector state: We adopted the hypothesis that the noise is a random variable RT in the interval [-Li, Li], where Hence, the noise depends on the state variable's absolute value to avoid annulling its influence.We have chosen a constant value of 4%, which means an 8% interval for placing the noise.The controller keeps the regression models from Figure 10.
These elements have been inserted into the control loop simulation script (Con-trolLoop_ML_noise.m), which has been carried out many times.Algorithm 3 presents only three CPs and the associated performance index to evaluate whether the control loop has a margin of robustness to noise.The state evolution for the two first simulations is presented in Figure A2.After simulations, some conclusions can be drawn: -The controller succeeded in keeping the stability, but the state evolution changed the look compared to

Citation:
Mînzu, V.; Arama, I.A Machine Learning Algorithm That Experiences the Evolutionary Algorithm's Predictions-An Application to Optimal Control.
 .It will generate the transfer diagram drawn in Figure 2.

Figure 2 .
Figure 2. The state trajectory yield by a control profile.

Figure 3 .
Figure 3.A set of M quasi-optimal trajectories produced by control loop simulation using a controller based on the EA and MP.

3. 2 . 1 .
The Simulation of the Closed-Loop System Based on EA Predictions This subsection corresponds to step 1 of the design procedure.Figure4shows the flowchart of the simulation program for the closed loop (the script ControlLoop_EA.m).

Figure 4 .
Figure 4.The closed loop simulation to generate a quasi-optimal trajectory and its CP.

Figure 7 .
Figure 7. STATE and UstarRHC: data structures representing the M optimal trajectories and CPs.

Figure 8
Figure 8 gives the flowchart of the data processing to obtain datasets for training and testing the machine learning algorithm at the level of each moment k (datakTrain and datak-Test).To save these datasets for each k, we will use cell array DATAKTrain ( 1 H × ) and

Figure 8 .
Figure 8. Preparing the training and testing datasets for machine learning at the level of each sampling period.

5
Wait for the next sampling period.

Figure 9 .
Figure 9. Predicted data and test data of the regression model for k = 4.

Figure 10 .
Figure 10.Construction of the regression model by stepwise function for k = 2.

Figure 11 .Figure 12 .
Figure 11.The CP achieved by the ML controller, linear regression version.

Figure 13 .
Figure 13.The CP achieved by the ML controller, stepwise regression version.

Figure 14 .
Figure 14.Comparison between the state evolutions involved by the two ML controllers.

Table 1 .
Dataset for step k.
is extracted from STATE and UstarRHC data structures.

Table 2 .
The regression functions following the stepwise strategy.

Table A1 .
The coefficients of the linear regression fk.

Table A2 .
Three simulations of the control loop with noise.