Global Optimization Algorithm Based on Kriging Using Multi-Point Inﬁll Sampling Criterion and Its Application in Transportation System

: Public trafﬁc has a great inﬂuence, especially with the background of COVID-19. Solving simulation-based optimization (SO) problem is efﬁcient to study how to improve the performance of public trafﬁc. Global optimization based on Kriging (KGO) is an efﬁcient method for SO; to this end, this paper proposes a Kriging-based global optimization using multi-point inﬁll sampling criterion. This method uses an inﬁll sampling criterion which obtains multiple new design points to update the Kriging model through solving the constructed multi-objective optimization problem in each iteration. Then, the typical low-dimensional and high-dimensional nonlinear functions, and a SO based on 445 bus line in Beijing city, are employed to test the performance of our algorithm. Moreover, compared with the KGO based on the famous single-point expected improvement (EI) criterion and the particle swarm algorithm (PSO), our method can obtain better solutions in the same amount or less time. Therefore, the proposed algorithm expresses better optimization performance, and may be more suitable for solving the tricky and expensive simulation problems in real-world trafﬁc problems.


Introduction
In general, simulation optimization methods can be divided into three categories [1]: simulation-based optimization, optimization-based simulation and the optimization of simulation. The simulation-based optimization is basically formed around the black-box concept; its objective function values are calculated through simulation models. The optimization-based simulation uses the optimization process to generate data which are linked to the simulation needed. The optimization of simulation usually focuses on the search of appropriate simulation parameters. Among them, the simulation-based optimization problem is common and worth studying. The objective functions of many simulationbased optimization (SO) problems do not have explicit mathematical expressions, and the evaluation process often relies on complex simulations. For example, Zheng et al. [2] constructed a simulation-based model to simulate the signal timing under uncertainties in the real world, then they optimized the output variables of the model. This SO problem is extremely common in engineering, but difficult to deal with [3], because of its high time-consuming features.
At present, the optimization algorithms can mainly be divided into two categories: gradient-based and non-gradient-based. The gradient-based algorithm is not suitable for the SO, because the objective functions of SO usually do not have explicit mathematical

Literature Reviews about KGO Used in Traffic Area
The Kriging-based global optimization (KGO), which is also called the Bayesian optimization based on the Gaussian process in the machine learning field, has some applications in the transportation system: Li et al. [29] incorporated the fastest-rising ideas in response surface analysis into Bayesian optimization in the field of machine learning and established a SA-BO algorithm model to improve optimization efficiency. In addition, the passenger flow simulation model and the simulation optimization based on the SA-BO algorithm constituted the overall bus schedule simulation optimization based on passenger flow big data. In the work of Lv and Zhao [30], the K-means clustering method was used to establish a logistics park location model, MATLAB software was used to iteratively calculate the established model, and the Bayesian discriminant method was introduced to analyze the reliability of the clustering results. Zhang et al. [31] proposed a deep learning-based multitask (MLT) learning model to predict network-wide traffic speed, and used Bayesian optimization to optimize the hyperparameters of MTL model with limited computational costs. Wang et al. [32] employed the Bayesian optimization for the SVR regression model, which is applied to predict traffic flow; on the basis of it, a novel regression framework for short-term traffic flow prediction with automatic parameter tuning is proposed. Gu et al. [33] extracted decision variables from three aspects: characteristics based on physical state, characteristics based on interactive perception, and characteristics based on road structure, so as to make the factors considered in the decision-making process of lane changing model more comprehensive. Then, in view of the many factors that exist in the decision-making process of free lane changing, for nonlinear problems, a support vector machine (SVM) decision model based on Bayesian optimization algorithm (BOA) is proposed. Tian and Zhang [34] conducted the accurate and effective identification and sorting of black spots in road traffic accidents, proposed an optimized Bayesian black spot identification method based on accident statistics and accident prediction models, and optimized the engineering practicability of this method. At the same time, an optimized empirical Bayesian blackspot sorting method is proposed from two aspects: the degree of the risk of accidents and the improvement space of safety management. In order to improve the estimation accuracy of the OD matrix, a layered optimization OD matrix estimation model based on the Bayesian method is proposed by Yu [35]. This model divided the OD matrix estimation into three optimization problems: (1) wardrop minimum variance optimization model, using it to obtain the path selection probability; (2) least squares optimization problem to obtain OD sample data; (3) maximum likelihood optimization problem to perform parameter estimation.

Kriging Model
Metamodel is also called a surrogate model; it is a simple model used to simulate the complex processes. Kriging is a kind of metamodel. The basic formula of the Kriging model [5] is: where, X is the surrogate model variable; y(X) is the unknown surrogate model; β is the regression coefficient; f (X) is the determined basis function; Z(X) is the error of random distribution, its mean is 0, and its variance is σ Z 2 The covariance is: where, x i and x j are any two sample points in the training sample. [R ij (θ, x i , x j )] is the correlation functions contained θ and represents the positional relationship between the training sample points. The relationship between the sample points is related to the distance, so the given function relationship is: where, n 1 is the number of design variables, and x i k , x j k are the k-th component of the training sample points x i and x j respectively, R k (θ, d k ) is thespatial correlation function (SCF).
If the training sample contains m1 sample points, the predicted value y(x) of x at any point within the range of the design variables is: The relevant parameters are the maximum likelihood estimates θ k , which can be obtained by solving Equation (6): where:

Expected Improvement (EI)
Suppose y min to be the minimum value of the response of the evaluated design point, the expression for the improvement at a point x: I(x) is: Its mathematical expectation can be written as: where, Φ is the cumulative distribution function of the standard normal distribution, and ϕ is the probability density function of the standard normal distribution, y(x) is the predicted value of the Kriging model, while s(x) is the predicted standard deviation of the Kriging model. According to the basic principle of the Kriging proxy model, for any unknown point x, the Kriging model provides the predicted value y(x) and the standard deviation s(x) of the predicted value. How to use the two aspects of information provided by the Kriging model to select the most potential point as the update point is the core issue of the infill sampling criterion. On the one hand, we can select the minimum value of Kriging model prediction value y(x) as the update point. On the other hand, we can select the maximum value of the standard deviation s(x) of the Kriging model as the update point. Selecting the minimum value of y(x), as the update point can fully explore the area near the current optimal solution and further improve the current optimal solution. However, such a search is concentrated in a local area, which may cause the search to fall into a certain local maximum of the original problem advantage. Selecting the maximum value of s(x) as the update point can explore the unknown area as much as possible, and choose the update point in the area where the sampling points are sparse, so that the search jumps out of the local area, but this search is very slow, and requires a large number of supplementary update points to find the optimal solution of the original problem. Since the EI criterion considers both of them, it is still widely used as an efficient method today.

A Multi-Point Infill Sampling Criterion Based on EI Criterion
Traditional infill sampling criteria such as EI criteria only search for single new design point in each iteration, which is inefficient. Moreover, according to Sobester et al. [27], sometimes the EI cannot balance exploitation and exploration. To solve these problems, Feng et al. [28] proposed a multi-point infill sampling criterion based on multi-objective optimization problem (MOP). The MOP can be written as: where, the set x is belong to the design space: D.
In most cases, the various sub-goals of the MOP are in conflict with each other; that is, the improvement of some sub-goals will cause the performance of other sub-goals to decrease. Hence, it is impossible for all sub-goals to reach the optimal rate at the same time. The ultimate goal of the MOP is to coordinate and compromise between each subgoal, so that each sub-goal is as optimal as possible. Therefore, there is a huge difference between the optimal solution of the MOP and the optimal solution of the single objective problem. In order to solve the MOP problem correctly, it is necessary to define the concept of its solution:   It is obvious that the essence of multi-objective optimization is to find a set of nondominated Pareto optimal solutions and their corresponding Pareto front. Hence, if we want to gain exploration-exploitation trade-off design points, solving a bi-objective optimization problem whose two objective functions measure local exploitation and global exploration respectively, is a feasible approach.
The EI can be divided into two parts: EI 1 (x) represents the local exploitation, while the EI 2 (x) represents the global exploration. However, the EI is hard to balance exploration and exploitation.

Solution Algorithm
In order to better solve the MOP (13), this paper employs the decomposition-based multi-objective evolutionary algorithm (MOEA/D) [36,37]. First, the MOP is decomposed into multiple scalar optimization sub-problems; each sub-problem hastens the search speed by exchanging the information of its respective solutions. In order to avoid the algorithm "premature", the exchange of solution information generally occurs between adjacent subproblems, and the adjacent sub-problems are usually determined by the Euclidean distance of the aggregation coefficient. This is because we assume that the closest aggregation coefficient produces the most excellent solutions are also similar. The solutions retained for each sub-problem are the best solutions for the corresponding aggregation coefficient so far. It can be seen that the basis is the decomposition strategy.
The so-called decomposing into multiple scalar optimization sub-problems refers to: instead of processing as a whole but decomposing one into a single-objective optimization problem. The decomposition is achieved through the polymerization method. Common aggregation methods are: weighted sum method, Chebyshev method, and penalty-based boundary crossing method, and can be seen as follows: Chebyshev decomposition method where, tch represents the Chebyshev decomposition method, z is the reference point, is the target number of the multi-objective optimization problem. The setting of the reference point can make the population distribution more uniform and improve the effect of the algorithm. Weighted sum decomposition method This method achieves the purpose of transforming multi-objective optimization into multiple single-objective optimization by multiplying the target vector with its corresponding weight vector.
Among them, ws represents the weighted sum decomposition method. In [30], it is pointed out that this method can achieve better results in multi-objective optimization problems with convex Pareto frontiers, but often cannot achieve better solutions for other situations.
Penalty-based boundary cross decomposition method where, z is the reference point and has the same meaning in the Chebyshev decomposition method. d 1 is the distance from the solution in the target space to the corresponding weight vector, d 2 is the distance from the foot of the solution in the target space to the corresponding weight vector and the reference point, and θ is the penalty factor. The penalty-based boundary cross decomposition method takes the linear sum of d 1 and d 2 through the penalty factor as the optimization goal. This method is more effective than the other two methods for multi-objective optimization problems with more than two objectives, and can produce more uniform solutions. However, this method is very sensitive to penalty parameters, and usually cannot handle different complex multi-objective optimization problems. In general, as the Chebyshev decomposition method is more suitable in solving both non-convex and convex problems, this paper adopts the Chebyshev decomposition method in the process of solving the bi-objective optimization problem (13), and the parameters of MOEA/D are consistent with the setting in literature [36].
The pseudo codes of MOEA/D are summarized in Algorithm 1: suppose EP = ∅ (The ∅ represents a empty set).

3.
Calculate the distance between each weight vector and the ownership vector, take the nearest T weight vectors of each weight vector, and store their index in B. For each i = 1, 2, . . . , N, Randomly or by other methods to generate initial population: Initialize reference point z.

7.
while the stop condition is not met 8.
Generate offspring: randomly select two indexes k and l from B(i), and use analog binary crossover operator to generate offspring individuals x* from x k and x l . 10.
Adjustment: if necessary (out of bounds, etc.), then adjust x*.

11.
Calculate the objective function value F(x*).
Update EP: First delete all target vectors dominated by F(x*) in EP, then add the F(x*) to EP.

27.
end % corresponds to the for in line 8 28. end % corresponds to the while in line 7 29. END Sustainability 2021, 13, 10645 8 of 17

Kriging-Based Global Optimization Based on Multi-Point Infill Sampling Criterion
The framework of the Kriging-based global optimization algorithm using multi-point infill sampling criterion proposed in this paper is summarized as: First, the design of experiment is used to obtain the initial sampling points. Then, in each iteration, the candidate sampling points which balancing exploration and exploitation are generated through solving the MOP in the [28]. To select high-quality sampling points from candidates, the kriging values of them are used. Finally, the optimization is conducting until the stopping condition is reached. The pseudo codes are shown in Algorithm 2: Use design of experiment (DOE) to select a small number V of initial design points: {p 1 , p 2 , . . . , p V } %According to the literature [5], the selection number Vis generally 5d or 11d-1, where d is the number of design variables.
Evaluate the response values R(p i ) of the design point p i 5. end 6.
while the given algorithm termination condition is not met (in actual engineering problems, it is generally judged whether a certain number of iterations has been reached) 7.
Use all known design points and their corresponding objective function values to construct a Kriging model. 8.
Solve the MOP through the decomposition-based multi-objective evolutionary algorithm (MOEA/D). 10.
Obtain end % corresponds to the while in line 6. 26.
output the optimal solution.

END
The flowchart is shown in Figure 1:

Numerical and Engineering Examples Based on the Multi-Point Infill Sampling Criterion
All the experiments are run on Matlab 2018a software in a computer with 8 G memory, Intel i5 CPU and Microsoft Windows 10.

Numerical Analysis
Taking two typical benchmark functions as examples, and KGO using the EI criterio (simply EI criterion) is applied to compare with KGO using the multi-point infill samplin criterion (simply multi-point infill sampling criterion) proposed. The performance evalu ation criteria are the size of the optimized value under the same number of iterations, th size of the optimized value and the number of iterations are used to measure the optim accuracy and time respectively. Hence, in the numerical analysis, the stable number iterations is set as the stopping condition. The contour of the function is shown in Figure 2:

Numerical and Engineering Examples Based on the Multi-Point Infill Sampling Criterion
All the experiments are run on Matlab 2018a software in a computer with 8 GB memory, Intel i5 CPU and Microsoft Windows 10.

Numerical Analysis
Taking two typical benchmark functions as examples, and KGO using the EI criterion (simply EI criterion) is applied to compare with KGO using the multi-point infill sampling criterion (simply multi-point infill sampling criterion) proposed. The performance evaluation criteria are the size of the optimized value under the same number of iterations, the size of the optimized value and the number of iterations are used to measure the optimal accuracy and time respectively. Hence, in the numerical analysis, the stable number of iterations is set as the stopping condition.

Six-Hump Camel Back Function (SC)
The contour of the function is shown in Figure 2: The initial sample points are 10 design points, and the results of the two methods are shown in Figure 3 and Table 1:

Hartman 6 Function (H6)
; 0 1 The initial sample points are 30 design points, and the results of the two points addition methods are shown in Figure 4 and Table 2: The initial sample points are 10 design points, and the results of the two methods are shown in Figure 3 and Table 1: The initial sample points are 10 design points, and the results of the two methods are shown in Figure 3 and Table 1: The initial sample points are 30 design points, and the results of the two points addition methods are shown in Figure 4 and Table 2:  The initial sample points are 30 design points, and the results of the two points addition methods are shown in Figure 4 and Table 2:   It can be seen from the above comparison results that the method proposed in this paper has better optimization results. This means that when using high-performance parallel computer, the method proposed in this paper can obtain better optimal solutions at the same time. On high-dimensional issues, the advantages of the multi-point infill sampling criterion are more obvious, and this feature is more in line with real-world engineering problems.

The Test of MOEA/D Parameters
In this section, the influence of MOEA/D parameters (number of neighbors T, the maximum number of iterations gen) is studied by SC function in Tables 3 and 4: It can be seen from the above comparison results that the method proposed in this paper has better optimization results. This means that when using high-performance parallel computer, the method proposed in this paper can obtain better optimal solutions at the same time. On high-dimensional issues, the advantages of the multi-point infill sampling criterion are more obvious, and this feature is more in line with real-world engineering problems.

The Test of MOEA/D Parameters
In this section, the influence of MOEA/D parameters (number of neighbors T, the maximum number of iterations gen) is studied by SC function in Tables 3 and 4:  Table 4. Ten average optimal results of different T.

Optimization Model and Process
In this paper, optimizing the corporate revenue in the simulation model of transportation system [38] is employed as the engineering case to verify the performance of multi-point infill sampling criterion. The simulation model of the transportation system is based on a data-driven timetable optimization method and details of its construction can be seen in the literature [38].
In the simulation model of the transportation system, the variables with decisionmaking value mainly include the maximum number of departures I, the minimum departure interval h min , and the maximum departure interval h max . This article intends to optimize these parameters to improve corporate revenue, and the optimization model constructed is as follows: Subject to: Among them, N p corresponds to the number of passengers under the timetable, and F p is the passenger fare. N v is the number of departures under the corresponding timetable, and C v is the operating cost coefficient for the vehicle to complete one service. T is the time interval between the first and the last bus, which needs to be determined according to the data selection range.

Data Description
The 445 bus line in Beijing city is selected in the case study. There are 19 bus stops on the route, and the collection time of GPS trajectory data and Smart Card data was from 1 to 30 November 2017. In the case of this paper, select the data between 17:30 and 18:30. The data information is shown in Tables 5 and 6: According to the actual condition of the 445 bus line, the value ranges of the three decision variables: maximum number of departures I, minimum departure interval h min and maximum departure interval h max are set as follows: Among them, I should be an integer, and h min and h max can be accurate to one decimal place. Their original values are set to be consistent in the literature [38].
The maximum passenger capacity of the bus is 50, the interval between the first and last buses T is 60 min, the fare of the 445 bus is 3 yuan/person, and the operating cost coefficient of the bus is 39.6 yuan/car.

Result Analysis
The particle swarm algorithm (PSO) and multi-point infill sampling criterion are applied to optimize corporate revenue respectively. Due to the long optimization time, the number of populations in PSO cannot be selected too large. The selections are 20, 50, 80 and 100 in this test. The maximum number of iterations is set to 20, the maximum particle velocity is set to 0.1, the minimum velocity is set to 0, and the maximum number of function evaluations is 1000. The optimization results are shown in Table 7 and Figures 5-8:    Since the time for evaluating corporate revenue is about 900 s, it is much longer than the running time of the optimization algorithm. Therefore, the number of iterations can be used to measure the optimization efficiency of the algorithm under the use of parallel computing technology, and the number of function evaluations can be used to measure the optimization efficiency of the algorithm under the inapplicable parallel technology. The results explain that the multi-point infill sampling criterion can get better optimization results, which is an increase of 7.2% compared to the best result of PSO, 17.5%, compared to the worst result of PSO, and even 45%, compared to the suboptimal value. In terms of optimization efficiency: under the premise of using parallel technology, the multi-point criterion can increase the efficiency by 50% on the basis of obtaining better optimization solutions; under the premise of not using the combination technology, the efficiency of this method is improved more obviously: 64%, 92.5%, 92.8%. It can be seen that, in dealing with expensive black box problems, the method proposed in this paper is far stronger than the PSO in terms of solution accuracy and optimization efficiency.        Since the time for evaluating corporate revenue is about 900 s, it is much longer than the running time of the optimization algorithm. Therefore, the number of iterations can be used to measure the optimization efficiency of the algorithm under the use of parallel computing technology, and the number of function evaluations can be used to measure the optimization efficiency of the algorithm under the inapplicable parallel technology. The results explain that the multi-point infill sampling criterion can get better optimization results, which is an increase of 7.2% compared to the best result of PSO, 17.5%, compared to the worst result of PSO, and even 45%, compared to the suboptimal value. In terms of optimization efficiency: under the premise of using parallel technology, the

Implications
Summary of Sections 4.1 and 4.2, our optimization method has its advantage in solving simulation-based optimization problems. In some developing countries with large populations like China, traffic resources such as urban roads are relatively limited and traffic congestion is getting worse. Public transportation is an important means to alleviate traffic congestion. In the decision-making stage, solving simulation-based optimization problems can provide a good reference for decision makers. Hence, our method has a positive impact on sustainable development, to a certain extent. Moreover, this method is board, not only for the simulation-based optimization in traffic area. In other areas like, for example, simulation model optimization of groundwater dredging, our method can also be tried. As a general method, simulation-based optimization is of great help to the decision-making of sustainable development.

Conclusions and Discussion
Simulation-based optimization is a common but difficult problem to deal with. In order to solve the expensive optimization problem efficiently, this study proposes a novel multi-point infill sampling method based on solving a multi-objective optimization problem to obtain exploration-exploitation trade-off points in each iteration, and then improve the performance of Kriging-based global optimization, which is also called a Bayesian optimization based on Gaussian process in the machine learning field. Moreover, the proposed method may deal with the real-world problems better. The main conclusions are shown as follows: 1.
Considering the disadvantages of EGO and EGO-MO, this paper proposes a Krigingbased global optimization using multi-point infill sampling criterion. The characteristic of comparison to the already existing research is that the multi-point infill sampling criterion uses the method of EGO-MO to generate candidate sampling points, and the Kriging predicted values are employed as judgment standard. In this way, the extra parameters required are greatly reduced.

2.
At present, in the field of transportation, there are a few research studies on how to deal with simulation-based optimization problems. Therefore, the method proposed in this paper has certain reference significance for other time-consuming optimization problems in the transportation field.
The core of our proposed algorithm is the multi-point infill sampling criterion. The criterion selects multiple exploration-exploitation trade-off points to update the Kriging model in each iteration. However, our criterion is mainly based on the traditional EI function-it is difficult to search the area, except the current minimum. Then, how to solve the constructed MOP better is also a problem worth investigating. In addition, the method in this paper is the lack of more practical applications. In the future, we will be more focused on the black-box optimization-combined time-consuming simulations of real-world traffic problems and continuously improve our algorithm in practical applications.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.