Two-Population Coevolutionary Algorithm with Dynamic Learning Strategy for Many-Objective Optimization

Due to the complexity of many-objective optimization problems, the existing manyobjective optimization algorithms cannot solve all the problems well, especially those with complex Pareto front. In order to solve the shortcomings of existing algorithms, this paper proposes a coevolutionary algorithm based on dynamic learning strategy. Evolution is realized mainly through the use of Pareto criterion and non-Pareto criterion, respectively, for two populations, and information exchange between two populations is used to better explore the whole objective space. The dynamic learning strategy acts on the non-Pareto evolutionary to improve the convergence and diversity. Besides, a dynamic convergence factor is proposed, which can be changed according to the evolutionary state of the two populations. Through these effective heuristic strategies, the proposed algorithm can maintain the convergence and diversity of the final solution set. The proposed algorithm is compared with five state-of-the-art algorithms and two weight-sum based algorithms on a many-objective test suite, and the results are measured by inverted generational distance and hypervolume performance indicators. The experimental results show that, compared with the other five state-of-the-art algorithms, the proposed algorithm achieved the optimal performance in 47 of the 90 cases evaluated by the two indicators. When the proposed algorithm is compared with the weight-sum based algorithms, 83 out of 90 examples achieve the optimal performance.


Introduction
In recent years, with the development of technology, more and more new problems appear in the field of industry or control and so on. This problem is usually characterized by containing more than one objective function to be optimized, and these objective functions contradict each other. Such problems are generally called multi-objective optimization problems (MOPs) or many-objective optimization problems (MaOPs). Generally, MOPs are problems having two or three objectives, and MaOPs contain more than four objectives [1]. A MaOP is defined as follows: where M is the objective number. x = (x 1 , x 2 , . . . , x n ) is decision variable, and X ⊆ R n is the decision space of the n-dimensional real number field. F is a mapping from decision space to objective space, and F(x) contains M different objective functions f i (x) (i = 1, 2, . . . , M).
Generally, the optimal solutions of MaOPs are distributed on Pareto front (PF), and the solutions on PF generally show the trade-off on all M objectives. Therefore, it is impossible to obtain an optimal solution by an optimization method to make it optimal on all objectives. What is needed to solve MaOPs is to obtain a set that has a finite number of solutions, so that the solutions in this solution set can well represent the whole PF, no matter in terms of convergence or diversity. The EAs had its unique advantages in solving MaOPs because of its population-based characteristics. After years of development, scholars from all over the world have proposed various many-objective optimization algorithms (MaOEAs) to solve MaOPs. Because different MaOPs also have different characteristics, currently no general algorithm can perfectly solve all MaOPs.
MaOPs usually have more than three objective numbers, so the objective space cannot be visualized, and the high-dimension calculation equation can only be obtained through derivation in solutions with fewer objectives. For example, grid-based evolutionary algorithm (GrEA) [2] deduces the high-dimensional grid calculation equation through a two-dimensional equation. Moreover, with the increasing of the objective number, the number of non-dominated individuals in the population will also increase exponentially. Studies have shown that almost all the solutions in the obtained population will be nondominated when M > 12 [3]. As a result, the selection pressure of the algorithm based on non-dominated sorting is reduced, which makes the algorithm unable to solve the MaOPs well. In addition, the shape and density of PF vary greatly for different problems, which brings great challenges to obtaining a solution set with good convergence and diversity.
To overcome these difficulties with MaOPs, a number of MaOEAs have been proposed. For example, on the basis of fast and elitist multi-objective genetic algorithm (NAGA-II) [4], reference points are introduced to guide individual convergence and help evolution through the concept of domination, called NSGA-III [5,6]. Knee point-driven evolutionary algorithm (KnEA) [7] used knee points to guide individual convergence, and GrEA introduced the concept of grid to choose the better of two non-dominated individuals. In addition, some indicator-based algorithms are proposed, such as indicatorbased selection in multi-objective search (IBEA) [8] and fast hypervolume-based algorithm (HypE) [9], which adopt I ε+ [10] and hypervolume (HV) [11,12] indicators as the criteria for selecting individuals, respectively. The selection process is the evolution process of the whole population or individuals towards the direction with better indicator values. Finally, there is the evolutionary algorithm based on decomposition (MOEA/D) [13], which adopts the idea of mathematical decomposition to decompose a MaOP into M subproblems for simultaneous optimization. Common decomposition approaches include weighted sum approach, Tchebycheff approach, and penalty-based boundary intersection approach. There is a new way called MOEA/D-PaS [14] to combine the decomposition with Pareto adaptive scalarizing methods to balance the selection pressure toward the Pareto optimal and the algorithm robustness to PF.
At the same time, these MaOEAs also contain some disadvantages. The convergence speed of the Pareto-based algorithm is slow, or even unable to converge to PF. Indicatorbased algorithms often tend to favor one or some of special regions of PF. Although the convergence speed of an indicator-based algorithm is generally relatively fast, the diversity of the solution set is usually poor. Besides, the decomposition-based MaOEAs are very dependent on the selection of decomposition approaches, such as weighted sum method in dealing with the non-convex problem (in the case of minimization) of PF shape, and not all Pareto optimal vectors can surely be obtained [13]. In addition, when the shape and density of PF are very complex and changeable, the traditional method to generate a weight vector is not suitable for this environment.
Existing multi-objective optimization algorithms (MOEAs) will still be affected by the increased number of objectives when dealing with MOPs, especially MAOPs. Moreover, the complexity of the PF also brings great challenges to the MOEAs. However, the dynamic learning strategy can pay more attention to the improvement of convergence when the population has poor convergence in the early evolutionary stage, and pay more attention Mathematics 2021, 9, 420 3 of 34 to the improvement of diversity in the late evolutionary stage. In addition, coevolution is a promising idea to improve the quality of individuals in a population through the mode of cooperation (or competition) between multiple populations (or subpopulations). Through coevolution, some key information of the population (such as the evolutionary state of the population) can be obtained in the process of algorithm iteration. The information of the population can be fed back flexibly to change the dynamic level (this will be discussed in Section 4) of dynamic learning strategies. In conclusion, the combination of a dynamic learning strategy and a coevolution model is a promising way to improve the convergence and diversity of the population.
The rest of this paper will be arranged as follows. Section 2 introduces the related works of the coevolutionary algorithm, other background, and the motivation for a twopopulation coevolution algorithm with dynamic learning strategy (DL-TPCEA). Section 3 will give the background of the MaOPs. The algorithm framework, process details, and parameter settings will be introduced in Section 4. Sections 5 and 6 carry out the analysis of experiments and the conclusion, respectively.

Related Works
In biology, the concept of coevolution is defined as follows: an adaptive coevolution in which two interacting species develop in the course of evolution. An evolutionary type of genetic evolution in which one species is influenced by another. At the biological level, it has several major significances: 1.
Promote the increase of biological diversity; 2.
Promote the co-adaptation of species; 3.
Maintain the stability of the biological community.
This idea was successfully introduced into computer algorithms. More and more researchers begin to pay attention to the performance improvement brought by coevolution strategy to the EAs. In order to adapt to increasingly complex problems, Potter et al. incorporated the idea of coevolution into EAs. It extended the evolutionary paradigm of the time and described an architecture that evolved subcomponents into collections of collaborative species [15]. Then they analyzed the robustness of cooperative coevolutionary algorithms (CCEAs), which provided a theoretical basis for the effectiveness of coevolutionary strategies [16]. Wiegand et al. also used evolutionary game theoretic (EGT) models to help understand CCEAs and analyze whether CCEAs are really suitable for optimization tasks [17]. One of the EGT models was the multi-population symmetric game, which can be used to analyze and model the coevolutionary algorithm. In this context, coevolution tended to decompose an evolving population into several small subpopulations and ensured that each subpopulation did not interfere with each other. Then the individual in the population was optimized continuously through the cooperation of each subpopulation. The effectiveness of CCEAs is verified using CCEAs to solve complex problems (or structure) [18,19].
The coevolution strategy of CCEAs can group the population, which is suitable for large-scale optimization problems (LSOPs). The dimension of decision variables in LSOP is too high, so grouping is a good solution at present. This is also the initial application scenario of CCEAs. Yang et al. considered that traditional CCEAs can only deal with and decompose separable LSOPs, but often cannot solve the inseparable LSOPs. Therefore, a stochastic grouping scheme and adaptive weighting were introduced into problem decomposition and coevolution, and a new differential evolutionary algorithm was used to replace the traditional evolutionary algorithm. Through this improvement, the algorithm can effectively optimize the 1000-dimensional indivisible problem [20]. In addition, a multilevel coevolution (MLCC) [21] framework was proposed to solve LSOPs. MLCC was a framework that determined the size of a group when the problem was decomposed. MLCC constructed a set of problem decomposers based on random grouping strategies with different group sizes, and used an adaptive mechanism to select decomposers based on historical performance to self-adapt between different levels.
CCEAs is also applied to optimization problems in other scenarios. Liu et al. used cooperative coevolution (CC) to improve the speed of evolutionary programming (EP) [22]. However, this study showed that the time cost increased linearly as the dimension of the problems was increased. CC was also used to deal with global optimization and find global optimal solutions [23]. Chen et al. proposed a cooperative coevolution with variable interaction learning (CCVIL) framework [24], which treated all variables as independent and put them into separate groups, and then continuously merged groups when found the relationship between them at the iteration.
In addition to the above optimization problems, many researchers in recent years have begun to apply CC to MaOPs. Tan et al. combined SPEA2 and CC effectively and proposed SPEA2-CC [25]. After experimental comparison, the performance of SPEA2-CC was significantly better than that of the original SPEA2 as the number of objectives increases. SPEA2-CC provided theoretical support for the scalability of performance of CC in MaOPs.
A lot of researchers combined CC with the preference of the decision maker to deal with MaOPs, which led to the preference-inspired coevolutionary algorithm (PICEA) [26]. Researchers have shown that PICEA can handle not only MOPs, but also MaOPs [27]. The experiments showed that the preference-driven coevolution algorithm was superior to some other methods under the measurement of a hypervolume indicator. One defect of PICEA was the uneven distribution of the obtained solutions on PF, which means poor diversity. In order to solve this problem, an improved fitness allocation method (PICEAg) [28] was proposed, which can consider the density information of solutions. In addition, a new preference-inspired coevolutionary algorithm using weight vectors (PICEA-w) [29] was proposed. The algorithm coevolved with the candidate solution during the search process. Coevolution adaptively constructed the appropriate weights in the optimization process, thus, effectively led the candidate solutions to the PF.
Liang et al. proposed a multi-objective coevolutionary algorithm based on a decomposition method [30], which used subpopulations to enhance objectives. Running on multiple subpopulations and external archive via the differential evolution (DE) operator to improve each objective and diversify the trade-offs of external archiving solutions. In addition, when an objective was not optimized, computing resources on that objective were allocated to other objectives and external archive strengthens the tradeoffs on all objectives. In addition, PF was approximated by parallel subpopulations [31]. Firstly, the MaOPs were decomposed by using a uniformly distributed weight vector, and then each subpopulation was associated with a weight vector. Using subpopulations to optimize each subproblem, and elite individuals in subpopulations were used to produce offspring. This can not only enhanced the diversity of the population, but also accelerated the convergence rate.
There were also studies that used new approaches to further improve CC performance on MaOPs. Shu et al. proposed a preference-inspired coevolution algorithm (PICEA-g/LPCA) with local principal component analysis (PCA) oriented goal vectors [32]. PICEA-g/LPCA was a further improvement on the basis of PICEA-g, and it used local PCA to extend the ability of PICEA-g and improved the convergence. In addition, a coevolutionary particle swarm optimization algorithm with a bottleneck objective learning (BOL) strategy [33] was proposed to meet the convergence and diversity challenges in finite population size. In this algorithm, multiple subpopulations coevolved to maintain diversity. The BOL strategy was also used to improve convergence across all objectives. Elitist learning strategy (ELS) was also used to jump out of local PFs, and juncture learning strategy (JLS) was used to develop areas that are missing in PF.
As described in Section 1, the solution set obtained by Pareto-based MOEAs has a good distribution on PF, but there is a general problem of slow convergence and the performance will decline with the increase of the objective number. Non-Pareto MOEAs shows good convergence performance, but not good diversity performance. The solution set of non-Pareto MOEAs tends to converge to one or some special regions of PF, especially in the case of extremely irregular PF. Li et al. proposed a bi-criterion evolution (BCE) framework in 2015 [43], which performed well in many-objective optimization. In the BCE framework, two populations evolved simultaneously. One used the Pareto criterion (PC) and the other used the non-Pareto criterion (NPC). The aim was to take advantage of both approaches and compensate for their shortcomings. These two parts work together to promote evolution through the exchange of information between populations. Among them, NPC population led PC population to converge, and PC population can make up for the loss of NPC population in diversity. The two operations included in the framework, population maintenance and individual exploration, were used to preserve good nondominated individuals and explore unexplored areas of NPC population respectively. Although the framework of BCE did not use the method of subpopulation coevolution in CC, the idea of cooperation between the two populations should also belong to CC.
Dynamic learning strategy (DLS) can consider the evolutionary state of solution set during algorithm iteration. It is well known that the initial population of MOEAs is randomly generated without specific requirements. The randomly generated solution is to take the value of the solution in the domain [x min , x max ] in the case of a normal (Gaussian) distribution. The equation is as follows: where rand is a random number generated by a standard normal distribution. So, the convergence of initial population is very poor, just random individuals in the solution space. DLS can pay attention to this point, so that in the initial stage of population evolution, it can ensure rapid convergence of solutions by using more computing resources to the selection of convergence-related solutions. As the iteration goes on, the solutions converge towards PF. At this time, it is necessary to keep the solution set more diversified. Therefore, with the iteration of MaOEAs, computational resources will gradually incline to diversity-related solutions to maintain a better distribution of the population on PF. Therefore, an effective combination of BCE and DLS may yield relatively good results, as confirmed by the experimental results in Section 5. This paper takes advantage of the coevolution of information interaction between the two populations, and introduces DLS into the environmental selection of NPC to better enable the evolution of NPC population. The cost value (CV) [44] will be selected as indicator. This algorithm will be called DL-TPCEA. The detailed algorithm will be described in Section 4.

The Background of MaOPs
At present, many single objective optimization problems in the optimization field have become the focus of research, such as workshop scheduling problems [45][46][47][48][49] and numerical optimization problems [50,51]. Most of these can be solved by classical algorithms and their improved versions, such as artificial bee colony algorithm (ABC) [47,48,52,53], particle swarm optimization (PSO) [51,54], monarch butterfly optimization (MBO) [55][56][57][58], ant colony optimization (ACO) [59,60], krill herd algorithm (KH) [52,[61][62][63][64], elephant herding optimization (EHO) [65][66][67], and other metaheuristic algorithms [68][69][70][71][72][73][74][75][76][77]. However, there are some problems in many-objective optimizations, which cannot be solved by single objective techniques. Because of the conflicts between objectives, all objectives cannot be optimized simultaneously using single objective techniques. MaOPs also have different characteristics, which are described in more detail below. The current MaOPs in the field of many-objective optimizations are mainly divided into the following categories: (1) General MaOPs: As mentioned in Equation (1), general MaOPs are problems with M conflicting objectives. The overall goal of solving MaOPs is to obtain a solution set that Mathematics 2021, 9, 420 6 of 34 can characterize PF, but at the same time there are a variety of problems. For objective numbers, MaOPs are more difficult to resolve than MOPs. Low dimensional optimization is mainly solved by non-dominated sorting, such as NSGA-II [4] and improving the strength of the Pareto evolutionary algorithm (SPEA2) [78]. The non-dominated sorting is described as follows: for the minimization problem, taking two vectors x 1 and x 2 in Ω, if and only if f i (x 1 ) ≤ f i (x 2 ) for each i in {1, 2, . . . , M} and f j (x 1 ) < f j (x 2 ) for at least one j in {1, 2, . . . , M}. Let us call it F(x 1 ) Pareto dominates F(x 2 ), and the notation is F(x 1 ) > F(x 2 ), and if, and only if, no point x in Ω to satisfy F(x) > F(x*), called F(x*) Pareto optimal solution and x* is Pareto optimal point, and the set of all Pareto optimal solutions is PF mentioned above, the set of all Pareto optimal points is called Pareto Set (PS).
(2) Large-scale MaOPs: these problems often involve high dimensional decision variables. In general, MOPs are called large-scale MOPs (LSMOPs) [79] when its decision variable dimension N > 100. The performance of the MaOEAs will decrease as the number of decision variables increases. For example, when using a mutation operator to mutate individuals, the probability of producing good individuals after mutation will also decrease due to the large dimension of decision variables. There are some researches on LSMOPs. At present, most of this work is based on classifying decision variables and dealing with them separately.
Ma et al. proposed a many-objective evolutionary algorithm based on decision variable analysis (MOEA/DVA) [80], which divided the whole population into convergence-related variables and diversity-related variables through decision variable analysis strategy. Moreover, MOEA/DVA optimized the two parts respectively, so that the convergence and diversity of the population were maintained well. Zhang et al. [81] proposed an evolutionary algorithm based on decision variable clustering for large-scale many-objective optimization problems (LMEA). LMEA used k-means clustering method and takes the angle between solutions and the direction of convergence as the feature to carry out the clustering, and divided the decision variables into convergence-related variables and diversity-related variables. LMEA further classified the unclassified individuals in MOEA/DVA to promote the convergence and diversity of the population. In addition, Chen et al. [1] proposed an evolutionary algorithm based on covariance matrix adaptation evolution strategy and scalable small subpopulation to solve large-scale many-objective optimization problems (S 3 -CMA-ES).
The above work is based on the premise of grouping decision variables to deal with large-scale many-objective optimization problems, which makes a great contribution to the large-scale many-objective optimization.
(3) Dynamic MaOPs (DMaOPs): DMaOPs add time (environment) variation to the general MaOPs. It is described as follows: where t is time (environment) variation. When time (environment) changes, PF of the MaOPs also changes, that is, the optimal solution set in the previous state is not necessarily the optimal solution set in the current state. This means that the algorithm is not only required to adapt to the many-objective environment to optimize multiple objectives, but also needs the changes brought by the response time (environment). When the time (environment) changes, the algorithm can respond quickly and get the optimal solution set in the latest environment.
In the environment of DMaOPs, many excellent algorithms have been proposed. Liu et al. proposed a dynamic multi-population particle swarm optimization algorithm (DP-DMPPSO) based on decomposition and prediction [82]. Using the archive update mechanism based on the objective space decomposition and the population prediction mechanism to accelerate the convergence, the results show that the algorithm has a good effect in DMaOPs processing. Finally, there are also many dynamic multi-objective evo-Mathematics 2021, 9, 420 7 of 34 lutionary algorithms (DMOEAs) that use various optimization strategies [83][84][85][86][87] to deal with DMaOPs.
The main purpose of this paper is to solve the general MaOPs with high dimensional objective space, using the Pareto-based and non-Pareto-based methods for coevolution of the two populations, respectively. The two populations make use of the advantages of each other and make up for the disadvantages, which is very promising to solve the difficulty of optimization in the high-dimensional objective space. The details will be introduced in Section 4.

The Framework of DL-TPCEA
In this part, the specific process of the dynamic learning strategy will be introduced first, and then the DL-TPCEA will be introduced. All algorithmic details such as parameter control and algorithmic flow are given. For example, NSGA-II used non-dominated sorting to select the non-dominated solutions in population to control the convergence, and then used crowding distance to select among the non-dominated solutions to improve the diversity of the population. This method is very time-consuming because of Pareto sorting, and tends to have poor convergence effect when the number of objectives is relatively large. However, DLS will make full use of the advantages of fast running speed and good convergence effect of indicator-based algorithm. Moreover, the enhancement of diversity is further strengthened to balance the convergence and diversity of the solution set. This paper will take a two-objective problem as an example to illustrate the advantages of DLS over traditional immutable evolutionary strategies.
As shown in Figure 1a, after the population initialization, the distribution of these individuals in the objective space is very chaotic. In other words, the convergence and diversity of the population are poor. According to the current population, the priority is to get these individuals to converge to PF as soon as possible. This will be guided by indicator-based method. For example, in a practical engineering problem, the individuals on PF are those who can minimize the cost. In this case, more computing resources should be allocated to the process of convergence-related operations to achieve rapid convergence of the population to PF. A small part of the computational resources are then allocated to operations that increase the diversity of the population to ensure that the diversity of the population is not particularly poor.
After the above operation, the distribution of individuals in the population in the objective space will gradually move towards PF, as shown in Figure 1b. However, the convergence level of the whole population is not enough at this time, so high selection pressure is still needed to promote convergence. As the iteration goes on, the distribution of individuals in population in the objective space will gradually become close to PF, as shown in Figure 1c. As introduced in Section 1, the indicator-based algorithm converges quickly but loses diversity easily. The example in Figure 1c shows that these individuals are close to PF under the guidance of the indicator, but the convergence position is more inclined to the central region of PF. At this point, more computing resources need to be tilted to increase the diversity of the population, such as the preference to keep the individual A and the individual B in Figure 1c into the next generation. By changing the computational resource allocation according to the evolutionary state of the population, individuals in the population can maintain good convergence and diversity. As in Figure 1d, the individuals in the resulting solution set are uniformly distributed on PF. After the above operation, the distribution of individuals in the population in the objective space will gradually move towards PF, as shown in Figure 1b. However, the convergence level of the whole population is not enough at this time, so high selection pressure is still needed to promote convergence. As the iteration goes on, the distribution of individuals in population in the objective space will gradually become close to PF, as shown in Figure 1c. As introduced in Section 1, the indicator-based algorithm converges quickly but loses diversity easily. The example in Figure 1c shows that these individuals are close to PF under the guidance of the indicator, but the convergence position is more inclined to the central region of PF. At this point, more computing resources need to be tilted to increase the diversity of the population, such as the preference to keep the individual A and the individual B in Figure 1c into the next generation. By changing the computational resource allocation according to the evolutionary state of the population, individuals in the population can maintain good convergence and diversity. As in Figure 1d, the individuals in the resulting solution set are uniformly distributed on PF.

The Details of DLS
The above is only a brief description of the steps of DLS; the following is a detailed explanation of the specific process of DLS. First, supposing the population size of MaOEAs is N. In an iteration, N new individuals are generated by crossover and mutation operators, at which time the original individuals and newly generated individuals form a new population, which is denoted as P2N here. What needs to be done next is to select N individuals that are most conducive to maintain convergence and diversity through environmental selection as the initial population Pnew of the next iteration. These operations are accomplished through DLS.
As shown in Algorithm 1, the 2N individuals are first layered by non-dominated

The Details of DLS
The above is only a brief description of the steps of DLS; the following is a detailed explanation of the specific process of DLS. First, supposing the population size of MaOEAs is N. In an iteration, N new individuals are generated by crossover and mutation operators, at which time the original individuals and newly generated individuals form a new population, which is denoted as P 2N here. What needs to be done next is to select N individuals that are most conducive to maintain convergence and diversity through environmental selection as the initial population P new of the next iteration. These operations are accomplished through DLS.
As shown in Algorithm 1, the 2N individuals are first layered by non-dominated sorting (Line 1, Algorithm 1). Here, FrontNo is the number of layers that each individual resides in, and MaxFNo is the largest number of layers that are non-dominated. Where MaxFNo satisfies: where L i represents the number of individuals in the ith non-dominated layer (i = 1, 2, . . . , MaxFNo). Here, the non-dominated individuals in Layer 1 to layer MaxFNo-1 will preferentially select into P new (Line 2, Algorithm 1), and then continue to select the remaining individuals in Layer MaxFNo.
Although in the case of 2-or 3-objective problems (MOPs), it may be more clearly layered. This makes the number of individuals in Layer MaxFNo smaller, which means fewer individuals are selected through DLS. However, with the increase of the objective number, the proportion of non-dominated individuals in the whole population also increased, almost all individuals are non-dominated when the objective number is more than 12 which is described in Section 1. This leads to an increase in the number of individuals in Layer MaxFNo, even if all individuals in the population are in Layer MaxFNo. This also makes the role of DLS greatly increased, and become more useful in solving MaOPs.

Algorithm 1 Dynamic Learning Strategy
Although in the case of 2-or 3-objective problems (MOPs), it may be more clea layered. This makes the number of individuals in Layer MaxFNo smaller, which mea fewer individuals are selected through DLS. However, with the increase of the object number, the proportion of non-dominated individuals in the whole population also creased, almost all individuals are non-dominated when the objective number is m than 12 which is described in Section 1. This leads to an increase in the number of indiv uals in Layer MaxFNo, even if all individuals in the population are in Layer MaxFNo. T also makes the role of DLS greatly increased, and become more useful in solving MaO

Algorithm 1 Dynamic Learning Strategy
Next, the values of Cn and Dn will be calculated according to needs (Lines 3-4, Al rithm 1), representing the number of convergence-related individuals and diversitylated individuals that need to be preserved, respectively. This is also the key for DLS ensure dynamic computational resource allocation within algorithm iteration. The cal lation equation of the Cn is as follows: where gen represents the current number of iterations and maxgen represents the ma mum number of iterations. Rgen represents the total number of individuals that need to selected at Layer MaxFNo at generation gen. Moreover, α ϵ [0, 1] is a convergence fac that controls the rate of convergence of the population. Through experimental research is found that when α is about 0.9, the performance can reach the best. In this way, convergence speed can be achieved quickly at the same time; it will not fall into the lo optimal. The symbol ⎾·⏋ rounds the element to the nearest integer greater than or eq to that element. Then the number of diversity-related individuals that needs to be p served will continue to be calculated. The calculation of Dn is as follows: After Cn and Dn are calculated, two indicators of convergence and diversity will calculated for the individuals in population, and Rgen individuals in Layer MaxFNo will retained according to the rules of DLS. In this paper, cost value (CV) [44] will be selec as the convergence-related indicator. Let F(xi) = (f1(xi), f2(xi), …, fM(xi)) be the objective v tor for individual xi (i = 1, 2, …, N). Then the mutual evaluation of individual xi by in vidual xj is as follows: Next, the values of C n and D n will be calculated according to needs (Lines 3-4, Algorithm 1), representing the number of convergence-related individuals and diversityrelated individuals that need to be preserved, respectively. This is also the key for DLS to ensure dynamic computational resource allocation within algorithm iteration. The calculation equation of the C n is as follows: where gen represents the current number of iterations and maxgen represents the maximum number of iterations. R gen represents the total number of individuals that need to be selected at Layer MaxFNo at generation gen. Moreover, α [0, 1] is a convergence factor that controls the rate of convergence of the population. Through experimental research, it is found that when α is about 0.9, the performance can reach the best. In this way, the convergence speed can be achieved quickly at the same time; it will not fall into the local optimal. The symbol · rounds the element to the nearest integer greater than or equal to that element. Then the number of diversity-related individuals that needs to be preserved will continue to be calculated. The calculation of D n is as follows: After C n and D n are calculated, two indicators of convergence and diversity will be calculated for the individuals in population, and R gen individuals in Layer MaxFNo will be retained according to the rules of DLS. In this paper, cost value (CV) [44] will be selected as the convergence-related indicator. Let F( . . , f M (x i )) be the objective vector for individual x i (i = 1, 2, . . . , N). Then the mutual evaluation of individual x i by individual x j is as follows: Then the mutual evaluation of each individual in the population is as follows: This indicator will not be affected by the change of objective number, and the characteristics of this indicator can be clearly understood according to Equations (7) and (8). The first point is that x i is a non-dominated individual when CV i > 1, and the second is that x i is a dominated individual when CV i ≤ 1. Therefore, we can use this indicator as convergence-related indicator to select the individuals in Layer MaxFNo. The individuals that have larger CV will be retained, in other words, retaining the individuals who have better convergence.
As for the diversity-related indicators, the distance between individuals in the population is generally used as the evaluation criterion. For example, Euclidean distance is used to calculate the crowding distance in NSGA-II. In this paper, the L p -norm-based distance is selected to calculate the distance between individuals in the population. It has been experimentally demonstrated that the L p -norm-based distance is more efficient than the Euclidean distance, Manhattan distance, etc., especially when dealing with MaOPs [88]. Parameter p of L p -norm-based distance is recommended as 1/M. Therefore, L p -norm-based distance is selected as the diversity-related indicator in this paper.
After the calculation of two indicators for individuals in the population, the individuals in Layer MaxFNo were selected and saved to P new according to C n and D n , until the size of P new reached N.

The Framework of BCE
There are two main populations in BCE, namely NPC population and PC population. These two parts use the non-Pareto method and Pareto method to evolve the population, respectively. For the NPC population, any non-Pareto evolutionary criterion can be used directly. However, when the next generation is produced through competition, the environmental selection needs to select individuals from both NPC population and PC population (NPC selection). For PC population, non-dominated individuals from NPC and PC population are reserved (PC selection). Since the number of non-dominated solutions is unknown, population maintenance operation is carried out to eliminate some individuals with poor diversity when the number of non-dominated individuals is greater than the predefined threshold N.
Because of NPC population convergence speed is relatively fast, the individuals in NPC population can accelerate the convergence of PC population in PC selection. Because of the diversity of PC population is better, NPC population can explore the unexplored areas on PF through individual exploration operation and use the individuals in NPC population to enhance the diversity of PC population. In this way, the two populations interact with each other to promote the evolution of each population so that the convergence and diversity of the final solution set are good. The final output here is the PC population.

PC Selection and NPC Selection
The process of PC selection is to select non-dominated individuals from the mixed set of PC population and the new individuals produced by PC and NPC evolutions. NPC selection is based on the criteria of NPC evolution, which conducts environmental selection on a mixture of NPC populations and new individuals generated by PC populations. Assuming that the evolution of NPC populations uses indicator-based algorithms, then NPC selection is the selection of individuals having better indicator values in the mixed population for the next generation.
For the evolution of NPC populations, some algorithms rely on the information of the parent generation to update the individual, which is not feasible here. So individuals in the PC population are compared with individuals in the NPC population. If an individual in the PC population is better than one or more individuals in the NPC population according to the evolutionary criteria of the NPC population, then that individual (or a random one of those individuals) in the NPC population will be replaced by that individual in the PC population.

Population Maintenance
In PC selection, all non-dominated individuals are selected from a hybrid population of new individuals resulting from PC and NPC evolutions. Therefore, it is likely to make the number of non-dominated individuals larger than the preset threshold N (population size), especially when the objective number is large. Therefore, an effective means of population maintenance should be added to ensure that the PC population maintains a representative (with better convergence and diversity) group of individuals.
Population maintenance is to ensure the quality of individuals in a population through niche techniques. This is also a popular technique in EAs to assess the crowding degree of each individual in the population by the location and number of individuals in the niche (objective space). The crowding degree of individual p in population P is defined as follows: where d(p, q) is the Euclidian distance between individuals p and q, and r is the radius of the niche. Due to the size of each objective is different, in order to prevent the influence of problem size, the objective value of individuals in population will be normalized by maximum and minimum normalized first when calculating the distance. It means that each is in the other's niche when the Euclidean distance between individuals p and q is less than r. This point can be seen in Equations (9) and (10), and the range of this crowding degree D(p) is [0, 1]. Otherwise, there would be no effect on the crowding of these two individuals since R(p, q) = 1. When d(p, q) ≤ r, the larger the Euclidean distance between the two individuals is, the smaller the calculated crowding degree will be, which means that the two individuals have a good crowding degree. So, this is a good way to eliminate the more crowded (poor diversity) individuals in the population.
Since the population is constantly evolving, it is not appropriate to set a fixed niche radius r. The setting of r must be related to the evolutionary state of the population. The radius r of the niche in BCE was set as the average Euclidian distance from each individual to k closest individuals in the population. The aim is to include one or more individuals in the niche of as many individuals as possible. Here, k is recommended to be set to 3 for better performance. Based on this crowding degree, the most congested individual in the population (the population of non-dominated individuals selected by PC selection) is eliminated each time and the crowding degree is recalculated. This process is repeated until the number of remaining non-dominated individuals is N.

Individual Exploration
The evolution part of the NPC population in BCE usually has high selection pressure; it converges quickly. However, the general NPC evolution tends to converge to one or more regions of PF, rather than the entire PF. This leads to a lack of diversity, as there are areas of PF that have never even been explored. It is through individual exploration that NPC evolution explores unknown areas on PF to achieve the purpose of increasing the diversity of NPC population. Individual exploration will explore some promising individuals in the PC population rather than all individuals in the PC population, because some individuals in the PC population have been well explored by NPC populations. These promising individuals generally have been eliminated (by NPC evolution), are less developed, or are not even visited in NPC evolution. From this point of view, the discussion is mainly focused on two types of individuals in the PC population:
Individuals whose niche has only one NPC individual.
First of all, for the first group of individuals, these individuals are not in the niche of individuals in the NPC population. Such individuals are far away from the individuals in the NPC population in the objective space, obviously not the individuals favored by the NPC criterion. However, it is such individuals that are in areas that NPC evolution has not explored. While the second kind of individuals have an NPC individual in niche, which is not a lot of individuals when considering that the k is set to 3. However, such individuals are likely to be located in areas where NPC evolution is incomplete, and it is necessary to explore such promising individuals.
During individual exploration, the above two kind of individuals contained in the PC population are first marked and stored in set S (individual sets to be explored). Then, the variation operation is carried out on the individuals in set S, and all the new individuals generated by the variation operator are stored in set T (the new individuals set generated by individual exploration) for the next PC selection. The variation operator here can be selected arbitrarily, but it should be noted that the number of parent individuals required by the selected variation operator should be changed accordingly.
The influence of the radius of the niche should also be considered here. A relatively small radius may allow all individuals in the PC population to be explored, as there may not be many NPC individuals in each individual's niche. The reverse is also true, larger radius may cause all individuals to remain unexplored. Therefore, a dynamically varying radius is used here, which can vary with the size of the PC population.
With the continuous evolutions of PC and NPC, more and more non-dominated individuals are produced, and the selection pressure of PC gradually decreases. This slows down PC evolution when the number of newly created non-dominated individuals exceeds the size of the remaining PC population that can be stored. This allows for less individual exploration, allowing the high selection pressure of NPC to play a greater role. The dynamic radius of the niche is set as follows: where N represents the PC population size, and N represents the size of the PC population before population maintenance, and r 0 represents the base niche radius calculated by means of population maintenance.
In the case of fixed computational resources (functional evaluations), this process of adaptive exploration is necessary according to the evolutionary state of the population. On the one hand, individual exploration can make up for the lack of diversity in NPC population. On the other hand, when there is a lack of convergence, more computing resources can be given to NPC evolution to accelerate convergence under higher selection pressure.
As shown in Figure 2, individual exploration on a 3-objective optimization problem is given. The triangle of coordinates in the figure represents the Pareto front of the problem, and the points in the figure represent the distribution of individuals in the population in the objective space. Suppose the NPC population is shown in Figure 2a, and the PC population is shown in Figure 2b. Due to the characteristics of NPC population, the obtained solution set may be distributed in some part of the Pareto front. For example, the population in Figure 2a is concentrated to the left and to the top of the Pareto front, while there is no individual distribution on the right. While PC population is relatively evenly distributed around the Pareto front, but the convergence is not good (some points do not converge to the Pareto front). Moreover, the role of individual exploration is to explore the promising individuals in the PC population to promote the diversity of the NPC population. It can be seen here that several individuals marked in red in Figure 2b are still promising individuals although they have not converged to the Pareto front. By exploring these solutions, it is possible to get some solutions that have never been explored in the PC population but have a good diversity. After continuous individual exploration, the diversity of PC population will also be improved and finally reach the state, as shown in Figure 2c. The population in Figure 2c well balances convergence and diversity, thus, achieving the purpose of individual exploration.
PC population but have a good diversity. After continuous ind diversity of PC population will also be improved and finally rea Figure 2c. The population in Figure 2c well balances converge achieving the purpose of individual exploration.

Two-Population Coevolutionary Algorithm with Dynamic Learnin
From the above description, it can be seen that these strategi and far-reaching significance in solving many-objective optimiza will introduce DL-TPCEA in the above context. . The final output is the population in w a parameter setting (Line 1, Algorithm 2) will be performed, wh current iteration number gen and the maximum iteration numbe In addition, Lp-norm-based distance is also used in DLS for diver the value p is also initialized. The inverse of the objective numbe as the value of p. The parameter settings (Line 5, Algorithm 2) same thing.

Two-Population Coevolutionary Algorithm with Dynamic Learning Strategy
From the above description, it can be seen that these strategies have great advantages and far-reaching significance in solving many-objective optimization problems. Next, we will introduce DL-TPCEA in the above context.

The Process of DL-TPCEA
Algorithm 2 gives the whole process of DL-TPCEA, from which it can be seen that the input parameters of DL-TPCEA include population size N, objective number M, and function evaluations (FEs). The final output is the population in which PC evolution. First, a parameter setting (Line 1, Algorithm 2) will be performed, which is mainly set for the current iteration number gen and the maximum iteration number maxgen in Equation (5). In addition, L p -norm-based distance is also used in DLS for diversity maintenance, where the value p is also initialized. The inverse of the objective number (1/M) will be used here as the value of p. The parameter settings (Line 5, Algorithm 2) in the later steps do the same thing.
Before the proceeding of BCE, the populations (PC population and NPC population) in both evolutionary approaches should first be initialized (Lines 2-3, Algorithm 2). The NPC population randomly generates N decision vectors with dimension D in the domain by satisfying the normal distribution, where D represents the dimension of the decision variable. The PC population is generated by PC selection on the NPC population. This ensures that the individuals stored in the PC population will always be non-dominated. variable. The PC population is generated by PC selection on the NPC population. This ensures that the individuals stored in the PC population will always be non-dominated.

Algorithm 2 Framework of DL-TPCEA
When the algorithm begins to iterate, the individual exploration (Line 6, Algorithm 2) described in Section 4.2.3 is first performed. Exploring whether there are individuals in the PC population that the NPC population has not been (fully) explored. If these individuals existed, it will be stored in set S as described above. Then, the new individuals generated using variation operator to S was store in the set T. Finally, the returned NewPC population is all individuals in the set T. The ExRatio is a ratio coefficient, which represents the proportion of individuals to be explored. The ExRatio is calculated as follows: where Length(•) represents the size of the set or population. When ExRatio is greater than 0, it indicates that there are individuals in the PC population that need to be explored. The larger ExRatio means the more individuals in PC population need to be explored, and the value range of ExRatio is in [0, 1). The ExRatio is set to dynamically change the convergence factor (dynamic convergence factor) of DLS later when using DLS for environment selection. As new individuals are generated by individual exploration, most of these individuals are located in areas that have not been explored or are not fully explored in NPC evolution. Therefore, the exploration at this iteration should pay more attention to these individuals, which means more diversity-related individuals should be appropriately selected to better explore these regions in the evolution of NPC. In this case, the convergence factor is appropriately scaled down according to the size of ExRatio at this iteration to achieve this purpose. The detailed process is described in Section 4.3.2.
After individual exploration, the following is the evolution of NPC population (Lines 7-9, Algorithm 2) and PC population (Line 10, Algorithm 2), respectively. First of all, an environment selection is carried out, and the individuals in mixed population of NewPC population and NPC population is selected by using the non-Pareto criterion and stored in NPC population. The variation operator is then applied to the NPC population to generate a new NewNPC population. Then the individuals with better performance in non-Pareto criterion are selected from the mixed population of NPC population and NewNPC population. The evolution of PC population uses PC selection to select non-dominated individuals in mixed population of original PC population, NPC population, and NewNPC population. This will select all non-dominated individuals from the three populations to archive in the PC population. Population maintenance operation is performed on the PC population if necessary (when Length(PC) > N).
When the algorithm begins to iterate, the individual exploration (Line 6, Algorithm 2) described in Section 4.2.3 is first performed. Exploring whether there are individuals in the PC population that the NPC population has not been (fully) explored. If these individuals existed, it will be stored in set S as described above. Then, the new individuals generated using variation operator to S was store in the set T. Finally, the returned NewPC population is all individuals in the set T. The ExRatio is a ratio coefficient, which represents the proportion of individuals to be explored. The ExRatio is calculated as follows: where Length(·) represents the size of the set or population. When ExRatio is greater than 0, it indicates that there are individuals in the PC population that need to be explored. The larger ExRatio means the more individuals in PC population need to be explored, and the value range of ExRatio is in [0, 1). The ExRatio is set to dynamically change the convergence factor (dynamic convergence factor) of DLS later when using DLS for environment selection. As new individuals are generated by individual exploration, most of these individuals are located in areas that have not been explored or are not fully explored in NPC evolution. Therefore, the exploration at this iteration should pay more attention to these individuals, which means more diversityrelated individuals should be appropriately selected to better explore these regions in the evolution of NPC. In this case, the convergence factor is appropriately scaled down according to the size of ExRatio at this iteration to achieve this purpose. The detailed process is described in Section 4.3.2.
After individual exploration, the following is the evolution of NPC population (Lines 7-9, Algorithm 2) and PC population (Line 10, Algorithm 2), respectively. First of all, an environment selection is carried out, and the individuals in mixed population of NewPC population and NPC population is selected by using the non-Pareto criterion and stored in NPC population. The variation operator is then applied to the NPC population to generate a new NewNPC population. Then the individuals with better performance in non-Pareto criterion are selected from the mixed population of NPC population and NewNPC population. The evolution of PC population uses PC selection to select non-dominated individuals in mixed population of original PC population, NPC population, and NewNPC population. This will select all non-dominated individuals from the three populations to archive in the PC population. Population maintenance operation is performed on the PC population if necessary (when Length(PC) > N).

Environmental Selection in NPC Evolution
The process of environmental selection in NPC evolution is shown in Algorithm 3. The environmental selection mainly uses DLS to select NPC population. However, dynamic convergence factors α' should be set according to the evolutionary state of the current population before selection. As the number of individuals explored by individual exploration is different at each iteration, the value of ExRatio is also different. However, when individuals need to be explored, the convergence factor α should be scaled down. In order to respond to the information of the number of individuals to be explored, the dynamic convergence factor α' is calculated as follows: where ω is a dynamic scaling factor and is set to 0.1. The main purpose of this setting is to prevent the convergence factor from scaling too much, because a good convergence performance can be maintained when the convergence factor is set at 0.9 or so. Since the value interval of ExRatio is [0, 1), the value interval of dynamic convergence factor α' is [0.8, 0.9). This allows DLS to play a better role even in individual exploration.
After the predefined parameters are set, the next step is to select the individuals in the candidate population using DLS as shown in Algorithm 1. The population, here, is generated by the BCE process rather than a hybrid population with a parent-child relationship. In addition, the convergence factors used are dynamic convergence factors α' that are scaled according to the state of individual exploration.

Algorithm 3 Environmental Selection in NPC Evolution
Mathematics 2021, 9,420 15 of 31

Environmental Selection in NPC Evolution
The environmental selection mainly uses DLS to select NPC population. However, dynamic convergence factors α' should be set according to the evolutionary state of the current population before selection. As the number of individuals explored by individual exploration is different at each iteration, the value of ExRatio is also different. However, when individuals need to be explored, the convergence factor α should be scaled down. In order to respond to the information of the number of individuals to be explored, the dynamic convergence factor α' is calculated as follows: where ω is a dynamic scaling factor and is set to 0.1. The main purpose of this setting is to prevent the convergence factor from scaling too much, because a good convergence performance can be maintained when the convergence factor is set at 0.9 or so. Since the value interval of ExRatio is [0, 1), the value interval of dynamic convergence factor α' is [0.8, 0.9). This allows DLS to play a better role even in individual exploration.
After the predefined parameters are set, the next step is to select the individuals in the candidate population using DLS as shown in Algorithm 1. The population, here, is generated by the BCE process rather than a hybrid population with a parent-child relationship. In addition, the convergence factors used are dynamic convergence factors α' that are scaled according to the state of individual exploration.

Experiments
This section will verify the performance of the DL-TPCEA through experiments. First of all, the proposed dynamic convergence factor will be through a number of experiments to get an optimal equation. In addition, this paper will conduct an experimental analysis

Experiments
This section will verify the performance of the DL-TPCEA through experiments. First of all, the proposed dynamic convergence factor will be through a number of experiments to get an optimal equation. In addition, this paper will conduct an experimental analysis of the role of individual exploration in the whole evolutionary process. Finally, the performance of the DL-TPCEA is validated against several state-of-the-art algorithms.

Parameter Setting
In order to give full play to the performance of MaOEAs on MaOPs with different objective number, different FEs and population size N should be set for different objective number M. Taking the WFG [89,90] test suite as an example, the number of dimensions D needs to be dynamically changed. Here is set as recommended D = M + 9. In addition, the number of objectives in the experiments conducted in this paper is divided into five groups, and the number of objectives is 3, 5, 8, 10, and 15, respectively. In terms of population size setting, since reference points are used in both MOEA/D-PaS and NSGA-III, the original reference points need to be generated in a certain way. In this case, Das and Dennis's approach [91] is used to generate the original reference points on the hyperplane, while the other algorithms should have the same initial population size to ensure fairness. In addition, the number of generated reference points is the same with set in NSGA-III [5,6]. So, the corresponding population size N is set to 91, 210, 156, 275, and 135, respectively. The corresponding number of FEs is 10 4 -10 4 × 5. The detailed parameter settings are shown in Table 1. In the experiments of dynamic convergence factor, α in the base DLS are set to the recommended 0.9. In the setting of dynamic convergence factors, various functions monotonically increasing in the interval [0, 1] are adopted for dynamic adjustment, which will be described in detail in Section 5.2. For all comparative algorithms in experiments, the parameter settings on each objective were also consistent with those in Table 1.
In addition, the running device is PC, the system version is Windows 10 enterprise version, the processor is Intel(R) Core (TM) i3-8100 CPU 3.6 GHz, and the RAM is 8 GB.

Experiments on Dynamic Convergence Factors
This paper proposes the concept of dynamic convergence factor in Section 4.3.2. The main approach is to determine the size of convergence factor dynamically based on the basic DLS and the state of individual exploration. The purpose of dynamic convergence factor is to make DLS adapt better to the evolutionary state of the population, so as to achieve the optimal convergence factor setting. Since the optimal value range of the convergence factor α is the interval [0.8, 0.9], we take the value of a monotone increasing function in the interval [0, 1] as shown in Equation (13), multiply it by a dynamic scaling factor ω by the function mapping of proportional coefficient ExRatio, and then subtract the corrected value from the original. In this way, it is possible to dynamically change the value of α' according to the proportionality coefficient ExRatio. In addition, when the ExRatio is relatively large (more diversity-related solutions need to be explored), the convergence factor can be appropriately reduced to better satisfy the convergence and diversity balance of the population.
In order to give full play to the optimal performance of the DL-TPCEA, certain work require to select the monotone increasing function in the interval [0, 1] of Equation (13). Columns 3 through 11 of the first row in Table 2 show some common monotone increasing functions in the interval [0, 1] as a comparative experiment. For example, the corresponding formula of Tan in the fourth column is as follows: In addition, column 2 corresponds to the original DLS that the value of α is set to 0.9. The last column is a control group, and the method used here is that when ExRatio > 0, the value of α is multiplied by a value less than 1 (set as 0.9 here). This is equivalent to setting up a fixed set of transformations instead of making dynamic changes through functional mapping and the proportional coefficient ExRatio. This group is designed to analyze and compare the advantages and disadvantages of fixed transformations over functional mappings.
According to the parameter settings in Table 1, the different algorithms were run independently 30 times on each WFG test suite. The average inverted generational distance (IGD) values of the 30 runs were performed using the Friedman test (the smaller the better) and presented in Table 2. The last row is the average of the five sets of Friedman tests. Dark gray represents the best result and light gray represents the second-best result. From the results, the best performance is obtained when the monotone increasing function is taken as the sine function, and the optimal value (minimum value) is obtained on the 3-and 15-objective WFG, respectively. The second-best result is obtained on the 10-objective WFG. Although the results on 5-and 8-objective WFG are not so good, they are also relatively small in terms of numerical values. At the same time, the sinusoidal results were also the best among the average Friedman results of the five experiments. In addition, when the monotone increasing function is x 1/3 , it performs second-best, and obtains second best results on the 5-and 8-objective WFG, respectively, and also obtains second best results in the average Friedman test.
When the monotone increasing function is selected as x M and x 1/2 , the optimal value is obtained on 5-and 8-objective WFG, respectively. However, the average Friedman test results for these two functions are not very good. It is worth noting that the original DLS obtained the optimal result on 10-objective WFG, followed by the mapping of sine function. In addition, the set of fixed transformations also showed the second-best result on 3-objective WFG. The average Friedman test results of these two strategies are not much different, but they are not as good as the average Friedman test results of the sine function.
Therefore, the sine function mapping of ExRatio is finally selected in dynamic convergence factor. In the experiments in Section 5.3 below, the dynamic convergence factor in DL-TPCEA is calculated in the form shown in Equation (13).
In MOEA/D-PaS, a Pareto adaptive scalarizing (PaS) approximation method was proposed, which approximated the optimal p value of the commonly used scalarizing method. This is the key to balancing Pareto optimal selection pressure and algorithm robustness to PF geometries. It guarantees that any solution can be found along PF for given some weight. PaS is combined with the decomposition-based algorithm (MOEA/D) to increase the ability of balanced convergence and diversity.
NSGA-III is an improved version based on the framework of NSGA-II [4]. The crowding distance operator that was used to balance diversity in NSGA-II is modified into a diversity keeping strategy based on weight vector guidance. NSGA-III used a set of pre-generated uniformly distributed weight vectors to simulate the distribution of PF. When selecting solutions, the candidate solutions with the shortest vertical distance to these weight vectors will be selected.
CMOPSO is an improved version of the multi-objective particle swarm optimization (MOPSO [93]) by adding a competition mechanism. CMOPSO makes particles pairwise competitions to select particles in each generation of population. This makes the performance of CMOPSO less dependent on global and local optimal particles stored in an external archive.
Two_Arch2 uses two external archives, where each archive promotes convergence (CA) and diversity (DA). The two archives use different selection principles, where CA is indicator-based and DA is Pareto-based. At the same time, L p -norm-based diversity maintenance scheme was also proposed in Two_Arch2 to improve the diversity of the population.
DLEA mainly uses the DLS mentioned in Section 4.1. The algorithm mainly used DLS to enhance the balance of convergence and diversity in environmental selection. Two different indicators are used to improve the performance by maintaining the convergence and diversity, respectively. Meanwhile, the convergence factor α in DLEA was fixed at 0.9. Compared with DLEA, DL-TPCEA proposes the concept of dynamic convergence factor. The comparison of these two algorithms is mainly to highlight the performance improvement brought by the dynamic convergence factors and the coevolution of the two populations.
The five comparative algorithms selected have their own characteristics, including operators frequently used in the many-objective optimization field: decomposition-based operator, Pareto-based operator, indicator-based operator, external-archive-based operator, and weight-vector-based operator. Comparing these algorithms can show the performance advantage of an algorithm more significantly.
For DL-TPCEA and other five comparative algorithms, Tables 3 and 4 give the mean and standard deviation (in parentheses) of HV values run on five sets of WFG test suites, and the results in Tables 5 and 6 are corresponding IGD values. Wilkerson Rank-Sum test (α = 0.05) was used to test the significant difference between HV values and IGD values of these six algorithms. The symbols −, +, and ≈ stand for that the indicator values of the comparative algorithms were significantly worse than, better than, and similar to that of DL-TPCEA, respectively. In addition, for each test instance, the best (maximum) HV value and the best (minimum) IGD value were highlighted in gray.      Table 5. IGD results of six algorithms on benchmarks WFG1-WFG9 with 3, 5, and 8 objectives.     As shown in Tables 3 and 4, in terms of HV, the proposed DL-TPCEA was significantly better than the other five algorithms on 26 out of 45 test instances, and performed similarly to them on two test instances. Specifically, DL-TPCEA generated higher HV values than MOEA/D-PaS, NSGA-III, CMOPSO, Two_Arch2, and DLEA on 39,33,39,38, and 36 out of the 45 test instances, respectively. As shown in Tables 5 and 6, in terms of IGD, the proposed DL-TPCEA was significantly better than the other five algorithms on 21 out of 45 test instances, and performed similarly to them on one test instance. DL-TPCEA generated smaller IGD values than MOEA/D-PaS, NSGA-III, CMOPSO, Two_Arch2, and DLEA on 44,31,26,29, and 29 out of the 45 test instances, respectively. The results demonstrated that it was a promising way to approximate the PFs of WFGs via coevolution and dynamic learning strategy in the proposed DL-TPCEA. CMOPSO and DLEA showed better results for IGD values than for HV values, indicating that these two algorithms also had a good ability to maintain the trade-off between convergence and diversity. In addition, from the comprehensive results of HV values and IGD values, NSGA-III was also a good algorithm. However, compared with these three MaOEAs, the proposed DL-TPCEA also showed much better performance with respect to both convergence and diversity.
The superiority of DL-TPCEA can be explained as follows. The other five comparative algorithms, with the exception of Two_Ach2, attempted to simulate PFs of WFGs through balanced convergence and diversity using a single population. However, as the number of objectives increases, the balance between convergence and diversity became more difficult. This was because the increasing number of objectives led to more serious conflicts on multiple objectives, so that the selection pressure of the MOEAs was not as good as when there were fewer objectives. When the number of objectives kept increasing, the solutions generated by these algorithms may only be single convergence-related solutions or diversity-related solutions, but there was no compromise between convergence and diversity over the whole PF. In addition, Two_Arch2 used two different external archives to store convergence-related solutions or diversity-related solutions, respectively. Moreover, Two_Arch2 promoted the evolution between the two archives so that the population maintained a compromise between convergence and diversity. However, these two external archiving methods had poor performance in dealing with MaOPs, especially the objective conflicts were serious. The proposed DL-TPCEA used two populations for coevolution and the shortcomings of each population will be compensated by BCE. It kept a good balance between convergence and diversity, and used dynamic learning strategy to further strengthen the balance. As a result, the proposed DL-TPCEA did not degrade the performance because of the conflicts caused by the number of objectives increased.
From the HV values in Tables 3 and 4, we can see that the HV value obtained by other related algorithms except MOEA/D-PaS on 8-, 10-, and 15-objective WFG3 was zero. This was caused by the calculation of the HV results using a set of reference points set on the corresponding test instance. When the corresponding algorithms failed to obtain any candidate solution dominating the reference point on those test instances, the value of the hypervolume (HV value) formed by the non-dominated population and the reference point was zero. In these three test instances, only MOEA/D-PaS obtained HV results, which also indicates that MOEA/D-PaS had its own advantages in dealing with WFG3. In addition to the three test instances, all the algorithms obtained HV values (non-zero) on the other test instances.
It can also be concluded from the results that DL-TPCEA mainly showed poor performance on WFG1 and WFG3. In terms of the characteristics of the problem, WFG1 is convex and mixed, while WFG3 is linear and degenerate. DL-TPCEA may not be able to find boundary individuals well on such problems like WFG1 or WFG3, thus, resulting in poor performance. In addition, WFG2 is convex and disconnected. However, DL-TPCEA was still able to obtain the optimal HV results, indicating that the two-population coevolution of DL-TPCEA is capable of dealing with disconnected MaOPs. Finally, WFG4-9 is concave, and DL-TPCEA also has the best performance. The performance of DL-TPCEA on 10-and 15-objective WFG4-9 was lower than DLEA, indicating that dynamic learning strategies played a significant role in dealing with concave MaOPs.
In order to more intuitively observe the ability of the six algorithms to balance convergence and diversity on the WFG test suite, the parallel coordinates of the solution set obtained by the six algorithms on the 5-objective WFG2 and 10-objective WFG9 were given in Figures 3 and 4, respectively. In parallel coordinates, the ordinate represents the objective value, and the convergence information can be obtained. An algorithm has good convergence if it can converge to the range of PF. At the same time, the vertical height can also reflect the performance in the diversity. The horizontal axis corresponds to each objective, which can reflect the diversity information of MaOEAs. It is an algorithm that maintains solutions for every objective, and the denser the lines, the better the diversity. Therefore, using the parallel coordinates of the solution set can better compare the performance of MaOEAs.
For 5-objective WFG2, the range of PF on each objective dimension m is from 0 to m * 2 (m = 1, . . . , M). As shown in Figure 3, although the solution sets obtained by all the six algorithms can successfully converged to the range of the corresponding objective dimension on PF, their diversity was significantly different. Among the six algorithms, MOEA/D-PaS and DLEA had the worst performance in diversity. MOEA/D-PaS had a poor diversity in the second objective, while DLEA had a poor diversity in the second to fourth objectives. By contrast, NSGA-III, CMOPSO, and Two_Arch2 algorithms performed better in diversity. However, the diversity of the solution sets obtained by these three algorithms was not as good as that of DL-TPCEA. It can be seen from in Figure 3f that the diversity of the solution set obtained by DL-TPCEA was good in each objective dimension. This indicated that the output solution set of the proposed DL-TPCEA was better than the other five comparative algorithms in terms of convergence and diversity. This result was also consistent with the maximum HV value and minimum IGD value of DL-TPCEA on 5-objective WFG2, as shown in Tables 3 and 5.     For 5-objective WFG2, the range of PF on each objective dimension m is from 0 to m * 2 (m = 1, …, M). As shown in Figure 3, although the solution sets obtained by all the six algorithms can successfully converged to the range of the corresponding objective dimension on PF, their diversity was significantly different. Among the six algorithms, MOEA/D-PaS and DLEA had the worst performance in diversity. MOEA/D-PaS had a poor diversity in the second objective, while DLEA had a poor diversity in the second to fourth objectives. By contrast, NSGA-III, CMOPSO, and Two_Arch2 algorithms performed better in diversity. However, the diversity of the solution sets obtained by these As shown in Figure 4, the solution set obtained by DL-TPCEA was superior to the five comparative algorithms in terms of convergence and diversity. For 10-objective WFG9, the range of PF on each objective dimension m is from 0 to m * 2 (m = 1, . . . , M). Among the six algorithms, MOEA/D-PaS converged to few solutions on PF of 10-objective WFG9, so it cannot approach PF well. As shown in Figure 4c,e, the solution set obtained by CMOPSO had a relatively poor diversity on the sixth objective, while DLEA had a relatively poor diversity on the seventh and ninth objectives. NSGA-III and Two_Arch2 algorithms performed well, second only to the convergence and diversity of the solution set obtained by DL-TPCEA on 10-objective WFG9. This was consistent with the HV values and IGD values in Tables 4 and 6. Figure 5 showed the IGD value trajectories obtained by running six algorithms on the 5-objective WFG test suite. The algorithm for each trajectory was identified in the bottom legend, and DL-TPCEA was specifically highlighted in red. Each subgraph was marked with a different problem, and its abscissa was the number of evaluations during algorithm iteration, and its ordinate was the IGD value. As can be seen from Figure 5, the IGD value trajectory of DL-TPCEA generally declines fastest (except for WFG1 and WFG3), which indicated that DLEA converged very quickly. In addition, from the final result, the IGD value obtained by DL-TPCEA is usually the minimum or not far from the minimum. On 5-objective WFG1, the final IGD value obtained by DL-TPCEA was second only to DLEA. And DLEA obtained the best IGD values on WFG3, while DL-TPCEA performed only at the mid-range level on this issue. Except for WFG1 and WFG3, DL-TPCEA performed very well on the other seven 5-objective WFGs. In general, DL-TPCEA had the best performance among the six algorithms. only to DLEA. And DLEA obtained the best IGD values on WFG3, while DL-TPCEA formed only at the mid-range level on this issue. Except for WFG1 and WFG3, DL-TP performed very well on the other seven 5-objective WFGs. In general, DL-TPCEA had best performance among the six algorithms.  Tables 3-6, convergence diversity effects of solution sets in Figures 3 and 4, as well as IGD value trajectory sh in Figure 5, DL-TPCEA had the best performance in these six algorithms. DL-TPCEA great advantages in many-objective optimization, both in terms of the convergence diversity of the final solution set and the convergence speed.  Tables 3-6, convergence and diversity effects of solution sets in Figures 3 and 4, as well as IGD value trajectory shown in Figure 5, DL-TPCEA had the best performance in these six algorithms. DL-TPCEA had great advantages in many-objective optimization, both in terms of the convergence and diversity of the final solution set and the convergence speed. Table 7 shows the average running time of the six comparative algorithms on 3-, 5-, 8-, 10-, and 15-objective WFG1. The last one is the results of Wilcoxon test (the smaller the value is, the shorter the corresponding running time is). The shortest time is NSGA-III, which is due to the simple structure. However, DL-TPCEA is only ranked fourth, which is also a shortcoming. However, given the performance gains, it is worth it, especially for problems that require a lot of accuracy.

Comparison Experiments of DL-TPCEA and Two Weight-Sum Based Algorithms
In this section, we compared DL-TPCEA with two weight-sum based algorithms. The weighted sum method is characterized by fast running speed, simple structure, and easy operation. MaOPs in industrial production tend to have more complex PF, so it may be very limited to solve such problems only by weighted sum method. For this purpose, DL-TPCEA is compared with the weight-sum based approach to verify the advantages of the proposed algorithm.
This paper provides two weight-sum based algorithms for comparison. The first algorithm is a modification of the classic NSGA-II framework, which is called WSEA. The environment selection of the WSEA starts with a non-dominated sort, and then the rest of the solutions are selected by using weight-sum method in the layer MaxFNo mentioned in Section 4.1.2. The environment selection of the second algorithm only selects individuals by weight-sum method, which is called WSEA2. From the point of minimizing the problem, the way to select individuals here is to pick out the N individuals with smallest weighted sum to the next generation. In addition, both algorithms use crossover and mutation operators to generate offspring. Finally, since there is no preference for an objective, the weights of the two algorithms on each objective are set to the same value of 1/M. However, in order to consider the impact of each objective size on the algorithm, a normalization operation should be carried out for each objective before calculating the weighted sum. DL-TPCEA is compared with the two weight-sum based algorithms mentioned above, and the results are shown in Tables 8-11.  As can be seen from the results in Tables 8-11, DL-TPCEA obtained the optimal results in all the other instances except for the HV results of seven instances on WFG1 and WFG3. Regardless of HV or IGD indicator, DL-TPCEA has the best performance among the three algorithms. It also reflects that the complexity of MaOPs cannot be well adapted to only relying on a single weighted sum method. The reasons are as follows: without considering the objective preference, it is not guaranteed that the solution in the population will converge to PF only by the magnitude of the weighted sum. A smaller weighted sum may just be that the individual retains a smaller objective value for some objective, but whether the individual is a non-dominated solution is unknown. In addition, the method based on the weighted sum is linearly convergent. However, different MaOPs have different characteristics, making it difficult to apply this method to all problems.
From the point of the feature of the problem, WFG1 is convex and mixed, while WFG3 is linear and degenerate. WSEA has just obtained the optimal HV results in several examples of these two problems, indicating the weighted sum method is promising to deal with these problems. However, the IGD values obtained by WSEA and WSEA2 on these examples are very poor, which also indicates that the convergence ability is not strong. The calculation of HV will consider some boundary individuals in the population, so DL-TPCEA may not get good HV results because of the boundary individuals in the population. However, combining the results of the two indicators, DL-TPCEA performed best in 83 out of 90 instances, which is an overwhelming advantage. These results reflect the limitations of using the weighted sum method to solve MaOPs, and show that DL-TPCEA has good advantages.

Conclusions
In recent years, in order to enable MOEAs to handle MaOPs with various characteristics, various MOEAs have been proposed. However, these MOEAs also had their own disadvantages. For example, MOEAs that rely on reference vectors cannot well represent the characteristics of the whole PF when generating reference vectors, which results in the performance degradation of MOEAs. This paper made full use of the advantages of DLS in many-objective optimization (better to maintain convergence and diversity), and proposed DL-TPCEA in combination with the BCE framework. The effective combination of the two strategies can further explore the entire decision space. At the same time, the convergence factor in DLS is further improved according to the evolutionary state of the population in BCE, and then the dynamic convergence factor is proposed to better use the important element of the evolutionary state of the population. This effective combination greatly improves the performance of DL-TPCEA. When compared with five state-of-the-art MOEAs, DL-TPCEA has significant advantages. Finally, in order to verify the performance advantage of DL-TPCEA over the weight-sum based algorithm, DL-TPCEA was compared with the two weight-sum based algorithms, and the results showed that DL-TPCEA still had significant advantages.
In addition, the original DLS used I ε+ to maintain individual convergence and a diversity maintenance mechanism based on L p -norm distance to maintain diversity. In this paper, the CV indicator is used to maintain individual convergence, and the comparison between CV and I ε+ should be the future research direction. In addition, there are still many excellent strategies that can be used to maintain convergence and diversity, and this paper does not compare these strategies. The future direction of work can start from this point and be improved under the framework of DL-TPCEA to achieve better results. We used dynamic learning factors to combine DLS and BCE more effectively, but there are more ways to combine them more effectively in the future. In terms of the selection of the initial value of the dynamic convergence factor, suggestions in relevant paper [94] can also be referred to get a better initial value.