Diversity Maintenance for Efficient Robot Path Planning

Path planning is present in many areas, such as robotics, video games, and unmanned autonomous vehicles. In the case of robots, it is a primary low-level prerequisite for the successful execution of high-level tasks. It is a known and difficult problem to solve, especially in terms of finding optimal paths for robots working in complex environments. Recently, population-based methods for multi-objective optimization, i.e., swarm and evolutionary algorithms successfully perform on different path planning problems. Knowing the nature of the problem is hard for optimization algorithms, it is expected that population-based algorithms might benefit from some kind of diversity maintenance implementation. However, advantages and potential traps of implementing specific diversity maintenance methods into the evolutionary path planner have not been clearly spelled out and experimentally demonstrated. In this paper, we fill this gap and compare three diversity maintenance methods and their impact on the evolutionary planner for problems of different complexity. Crowding, fitness sharing, and novelty search are tailored to fit specific problems, implemented, and tested for two scenarios: mobile robot operating in a 2D maze, and 3 degrees of freedom (DOF) robot operating in a 3D environment including obstacles. Results indicate that the novelty search outperforms the other two methods for problem domains of higher complexity.


Introduction
Robotics expands from traditional industrial environments towards coworking and coexisting with humans in domains like medicine, education, leisure, and the general service domain. Apart from the ethical aspects of robots sharing an environment with humans, there is still a plethora of technical issues limiting the robots' ability to fully immerse in collaboration with humans.
One of such important problems is the limited ability of robots to perform efficiently in an unstructured environment. Such environments require robots to constantly re-plan, adapt, and optimize their actions. Furthermore, if an objective like time, distance, or collision avoidance is critical for the success of the operation of the robot, then its ability of planning and optimizing according to required objectives becomes of paramount importance.
For the above-mentioned reasons, path planning, meaning finding trajectories optimized under a set of often colliding criteria is a problem of intense interest in the scientific community. It is certain that reliable, safe, and timely path planning presents a stepping-stone in the direction for robots to reach their full potential in collaboration with humans.
There are many important contributions to path planning in recent years. A comprehensive historical survey can be found in [1]. and thus less informative in an evolutionary context. The consequence is that maintaining diversity on the genotypic level alone does not significantly increase the effectiveness of the search. Despite the diversity explicitly enforced into the population, the bias imposed by the fitness function remains the primary parameter that directs the search. Therefore, individuals in a population, or combinations of steps that could lead to the solution of the complex problem, in the end, can diminish from the current population, and become extinct in an evolutionary context, and thus fail to find high-quality solutions.
Motivated by such results, the novelty method is proposed [22] in which the search process is driven primarily by phenotypic diversity. It is shown that scaling with complex search domains is better than those of traditional objective-based fitness functions. The explanation is that this method does not depend exclusively upon the fitness function to identify the building blocks that lead to a solution. It relies on the idea that already discovered innovations are building blocks for further evolutionary innovations. To summarize, genotypic diversity maintenance methods promote the creation of random raw material, from which objective function selects the most promising ones, novelty search discovers structure in the domain, and thereby making progress possible even when fitness is uninformative.
In this paper, we implement the two most commonly used forms of diversity maintenance, fitness sharing, and crowding. Afterwards we compare it to novelty search using the problem of path planning for mobile and 3 DOF robots. Finally, the three methods are evaluated against a base evolutionary algorithm without any diversity maintenance methods included.

Materials and Methods
Two scenarios will be analyzed in this paper: a two-dimensional case, for a mobile robot navigating through a maze, and a three-dimensional RRR robot. In the first scenario, a robot is presented as a dimensionless point in the pace. The second case considers a model of the robot with three degrees of freedom, as seen in Figure 1. Three joints of the robots are of revolute type, allowing the robot to reach an arbitrary position with the tool center point, considering the point falls in the workspace of the robot.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 13 on the genotypic level alone does not significantly increase the effectiveness of the search. Despite the diversity explicitly enforced into the population, the bias imposed by the fitness function remains the primary parameter that directs the search. Therefore, individuals in a population, or combinations of steps that could lead to the solution of the complex problem, in the end, can diminish from the current population, and become extinct in an evolutionary context, and thus fail to find high-quality solutions. Motivated by such results, the novelty method is proposed [22] in which the search process is driven primarily by phenotypic diversity. It is shown that scaling with complex search domains is better than those of traditional objective-based fitness functions. The explanation is that this method does not depend exclusively upon the fitness function to identify the building blocks that lead to a solution. It relies on the idea that already discovered innovations are building blocks for further evolutionary innovations. To summarize, genotypic diversity maintenance methods promote the creation of random raw material, from which objective function selects the most promising ones, novelty search discovers structure in the domain, and thereby making progress possible even when fitness is uninformative.
In this paper, we implement the two most commonly used forms of diversity maintenance, fitness sharing, and crowding. Afterwards we compare it to novelty search using the problem of path planning for mobile and 3 DOF robots. Finally, the three methods are evaluated against a base evolutionary algorithm without any diversity maintenance methods included.

Materials and Methods
Two scenarios will be analyzed in this paper: a two-dimensional case, for a mobile robot navigating through a maze, and a three-dimensional RRR robot. In the first scenario, a robot is presented as a dimensionless point in the pace. The second case considers a model of the robot with three degrees of freedom, as seen in Figure 1. Three joints of the robots are of revolute type, allowing the robot to reach an arbitrary position with the tool center point, considering the point falls in the workspace of the robot. Three versions of the algorithm will be compared to test how different population diversity maintenance methods influence the performance of the algorithm, and then compared to the base algorithm which does not include any DM. The three DM methods are: novelty search, fitness sharing, and crowding.
To keep the results of algorithm testing comparable, every part of algorithms, other than population diversity maintenance, is the same. Each algorithm tested both two-dimensional and threedimensional environment setup, with the same obstacle placement and same starting and goal positions. Examples of 2D and 3D environments used to test the algorithms are shown in Figure 2. Three versions of the algorithm will be compared to test how different population diversity maintenance methods influence the performance of the algorithm, and then compared to the base algorithm which does not include any DM. The three DM methods are: novelty search, fitness sharing, and crowding.
To keep the results of algorithm testing comparable, every part of algorithms, other than population diversity maintenance, is the same. Each algorithm tested both two-dimensional and three-dimensional

Population
The matrix in Equation (1) represents the whole population. Each row describes one individual member, i.e., one potential solution, in this case, it is one trajectory. The initial population consists of m random individuals with n real numbers in the range [−5, 5] on the position of each allele in the gene. So, the population consists of m genes of length n.
The trajectory is, in theory, a polynomial function of arbitrary order, and every member of the population represents a polynomial function, while each allele matches one of its coefficients. The length of the gene represents the order of the polynomial function. For the three-dimensional environment, the trajectory is composed of two different polynomial functions, in x-y, and x-z planes, respectively.
Equation (2) defines one row of the matrix, this is one individual or candidate solution to the problem. As defined with Equation (1) there are m potential solutions that form the population. As for the population size, defined by m, in our simulation, the population size is set to 50 individuals. The dimension n defines the order of the polynomial used to describe a trajectory.
Its size is experimentally determined and set to n = 7. If the order of the polynomial is set too low, there is not enough flexibility, or inflection points available to evolve a trajectory that avoids obstacles. Too high an order increases processing time significantly, but without any positive impact on the quality of the end solution. The order used in our simulations is determined through trial and error, but might potentially also be a part of the optimization or evolutionary process.

Fitness Function
Members of the population are evaluated according to their fitness values F. Fitness F is calculated using the fitness function, and it depends on the length of the trajectory between the starting and goal positions, and on the penalty for the collisions with the obstacles.
Penalty values are 90 for collisions with vertical obstacles, and 100 for collisions with horizontal obstacles, viewed depending on the observed plane. The number C is arbitrarily chosen so that C > 0. The form of the objective function defined in Equation (3) performed well for the given examples in terms of discrimination of the candidates forcing them to simultaneously shorten their lengths, but

Population
The matrix in Equation (1) represents the whole population. Each row describes one individual member, i.e., one potential solution, in this case, it is one trajectory. The initial population consists of m random individuals with n real numbers in the range [−5, 5] on the position of each allele in the gene. So, the population consists of m genes of length n.
The trajectory is, in theory, a polynomial function of arbitrary order, and every member of the population represents a polynomial function, while each allele matches one of its coefficients. The length of the gene represents the order of the polynomial function. For the three-dimensional environment, the trajectory is composed of two different polynomial functions, in x-y, and x-z planes, respectively. P(x) = a n x n + a n−1 x n−1 + . . .
Equation (2) defines one row of the matrix, this is one individual or candidate solution to the problem. As defined with Equation (1) there are m potential solutions that form the population. As for the population size, defined by m, in our simulation, the population size is set to 50 individuals. The dimension n defines the order of the polynomial used to describe a trajectory.
Its size is experimentally determined and set to n = 7. If the order of the polynomial is set too low, there is not enough flexibility, or inflection points available to evolve a trajectory that avoids obstacles. Too high an order increases processing time significantly, but without any positive impact on the quality of the end solution. The order used in our simulations is determined through trial and error, but might potentially also be a part of the optimization or evolutionary process.

Fitness Function
Members of the population are evaluated according to their fitness values F. Fitness F is calculated using the fitness function, and it depends on the length of the trajectory between the starting and goal positions, and on the penalty for the collisions with the obstacles.
Penalty values are 90 for collisions with vertical obstacles, and 100 for collisions with horizontal obstacles, viewed depending on the observed plane. The number C is arbitrarily chosen so that C > 0. The form of the objective function defined in Equation (3) performed well for the given examples in terms of discrimination of the candidates forcing them to simultaneously shorten their lengths, but at the same time avoid obstacles. The fitness of the best individual is bounded by the value C/length, which maps the whole population on a finite scale. It is not an easy task to balance out these parameters, since one can get overly dominated over the other, and so limiting the evolutionary potential of the candidates. Also, this approach with penalty enables the population to actually evolve over time, as opposed to the approach where one would simply remove an individual that does not perform well in the current generation. The values used in this study were experimentally determined.

Diversity Maintenance
The idea is to force the population to maintain diversity when doing selection or replacement. There are two most commonly used methods, fitness sharing, where fitness value of each individual is adjusted before the selection to separate them in niches in proportion to the niche fitness, and crowding, where individuals are distributed uniformly amongst niches using distance-based selection. A niche is defined as a subset of individuals from the population whose distance is below a given threshold, thus making a niche a group of similar individuals. A distance metric and a niche size are explained later in the text.
It should be noted that both methods use global parent selection and there is nothing that prevents the recombination of parents from different niches.
The novelty search is different in that it actually does not rely solely on the fitness function, but rather on a different metric-the novelty of an individual in the population. Novelty is combined with fitness in this study by copying the best, so-called elite member, from each generation directly to the next one. This way, an evolutionary trace or memory is preserved.

Fitness Sharing
This method is based on the idea that "sharing" of individual fitness values before the selection controls the number of individuals in each niche. In this method, every possible pairing of individuals i and j is considered, and then a distance was calculated between them. The fitness F of each individual i is then adapted according to the number of individuals falling within some pre-specified distance σ share using a power-law distribution: The sharing function sh(d) is a function of the distance. Constant value γ determines the shape of the sharing function and it was taken as γ = 1 to make the function linear in this example. The last parameter was share radius σ share , which defines how many niches can be maintained. Common values are in the range of 5 to 10 [16], in this algorithm value of 10 was taken to promote more niches and consequently larger diversity.
It is preferred to use phenotypic distance, but also genotypic distance can be used as a measure of diversity e.g., Hamming distance for binary presentations. Distance d is defined here as Euclidean distance, see Equation (6), calculated from one individual to every other member of the population, including with itself. With the use of fitness sharing, members with a lower fitness value have increased probability of becoming parents when compared with the basic algorithm. This way we have a larger gene pool which directly increases the diversity of the population.

Crowding
This method relies on the fact that offspring are likely to be similar to their parents. In this algorithm, the parent population is evaluated and then randomly paired. Each pair produces two offspring by recombination, after which offspring are mutated and then evaluated. If parents are denoted as p, and offspring as o, distances: d(p 1 , o 1 ), d(p 1 , o 2 ), d(p 2 , o 1 ) and d(p 2 , o 2 ) between parents and offspring need to be calculated. After that competition pairs are identified.
If Equation (7) is true, the competition is between p 1 − o 1 and p 2 − o 2 , otherwise, the competition is between p 1 − o 2 and p 2 − o 1 . There is no competition between offspring, they only compete with the most similar parent. The competition is based on the fitness of the individual: individuals with higher fitness values stay in the population and losers are discarded.

Novelty Search
This method differs from most EAs in that, instead of tending to converge, novelty is a divergent evolutionary technique. It directly rewards novel behavior instead of progress towards fixed fitness, and thus introduces a constant pressure for finding new and original individuals.
The main idea is that instead of rewarding only the performance of an individual on an objective, novelty search rewards diverging from prior behaviors. It is usually used in combination with fitness function as an additional measure for evaluating solutions [23][24][25], with the purpose of preserving a set of solutions for the next generation.
In this paper, we implemented such an approach that utilizes both novelty and fitness function, by copying only one, the best, individual from the current generation directly to the next generation. This enables us to keep the memory trace of quality individuals trans-generationally.
Tracking novelty of the solution requires a small change to any evolutionary algorithm, aside from adding a novelty metric along with a fitness function. This metric evaluates how far away new individuals are from the rest of the population and its predecessors in so-called behavior space, or space of individual behaviors.
It should measure sparseness of a point in behavior space, with denser clusters of visited points being less novel, and thus receiving a lesser reward. A method used for measuring the sparseness of the point is to measure the average distance to k-nearest members of that point. If this value is large, the space is sparse; if the value is small the area is dense, and hence the novelty level is low. The sparseness ρ at a point x is defined as: where µ i is the i th nearest neighbor of x with respect to the distance metric dist, which is a measure of the distance between two individuals and is domain-dependent. If novelty is above a certain threshold, then this individual is included in the permanent archive of prior solutions in behavioral space. The current generation together with the archive is the search space explored and that way search gradient is directed towards new instead of towards a specific objective of a fitness function. The procedure of selecting and creating offspring remains the same is in a standard evolutionary algorithm, but now with more novel instead of fitter individuals having higher chances of being selected in the parent population.

Selection
The parent selection process depends on the used diversity maintenance method. The novelty algorithm evaluates the current population and the archive, attempting to maximize diversity. The fitness sharing algorithm uses the roulette wheel algorithm, but instead of relative fitness values, each member has assigned a shared fitness value. For the crowding algorithm, instead of the roulette wheel algorithm, every member of the population is randomly chosen and paired with another random member to get parent pairs.
The survivor selection process depends on the age-based replacement for the novelty and the fitness sharing algorithms. The offspring replace their parents with the use of elitism where one elite member is preserved in the population. The crowding algorithm uses a fitness-based replacement. The offspring compete with the most similar parent for their place in the population.

Variation
The variation operators used in the algorithms are arithmetic recombination and nonuniform mutation. During the recombination, each pair of parent genes has a separate probability of 70% for successful recombination. If the recombination occurs, new genes have values equal to: where x i and y i are parent gene values and α is a random real number in the range [0, 1]. Mutation happens on each gene separately with a probability of 50%. If the mutation occurs targeted gene value changes by a random amount: where rand is a random number in the range [0, 1] and β represents the desired intensity of mutation. The value of β is set to 0.5 in every following algorithm.

Results
Matlab environment is used to run the simulations. In order to compare different versions of the algorithm, each algorithm was simulated 1000 times. Testing results are presented using histograms and convergence diagrams and shown in Figures 3 and 4, both for the 2D and 3D cases, respectively.

Results
Matlab environment is used to run the simulations. In order to compare different versions of the algorithm, each algorithm was simulated 1000 times. Testing results are presented using histograms and convergence diagrams and shown in Figures 3 and 4, both for the 2D and 3D cases, respectively.   Each histogram represents how many times an algorithm found a trajectory in a particular length range. Convergence diagrams show the convergence of average lengths from the discovered trajectories in a specific generation that does not collide with the obstacles.
Better performance is for those histograms with the bars grouped in the left region-meaning shorter, and thus fitter, individuals are found with increased frequency compared to those histograms whose values are grouped on the right side.
The average length itself is not of crucial importance since a number of poor performers might Each histogram represents how many times an algorithm found a trajectory in a particular length range. Convergence diagrams show the convergence of average lengths from the discovered trajectories in a specific generation that does not collide with the obstacles.
Better performance is for those histograms with the bars grouped in the left region-meaning shorter, and thus fitter, individuals are found with increased frequency compared to those histograms whose values are grouped on the right side.
The average length itself is not of crucial importance since a number of poor performers might influence the quality of the whole population. At the same time, the algorithm might find a very fit, even the best individual, while the average quality is low. The average length is included in the figures in order to illustrate the overall performance of the algorithm.
In the table presented above, the bold fields denote the best performancefor the given algorithm and fitness sharing method. By simple summation of bold fields, the novelty outperforms other approaches.
It is important to note though, that not all fields have the same importance. For instance, column 5 shows the least fit individual from the population which is definitely less important compared to the best solution found by a certain method. Simple counting of bold fields is thus not the ultimate measure of the quality of a certain method, but has to be rather us as an indication of the quality and interpreted carefully based on additional analysis of the data provided in Table 1. Comparing the test results of the novelty and the fitness sharing 2D algorithms, we can see that they are similarly effective. Both algorithms found a trajectory without collision in every simulation, lengths of discovered trajectories are placed in a narrow span, and the convergence diagrams are similar.
Unlike the previous two algorithms crowding 2D algorithm performed less consistent in terms of quality of results. It failed to find trajectories without a collision in 1% of simulations, lengths of discovered trajectories are approximately 13% longer on average, and convergence to the optimal solution is significantly slower.
Regarding the 3D versions of the algorithms, the novelty algorithm gave the best results with very fast convergence. Testing results of the fitness sharing algorithm are also comparably good as in the 2D example.
In 7.4% of simulations, the algorithm got trapped in suboptimal regions of the search space, with the consequence of the dissipated quality of solutions. Lengths of the trajectories, not shown in the histogram, are in the range 4, 30 . If those simulations were declared unsuccessful and were omitted from the calculation of the average length, the average length would be 1.5659, which is close to the novelty version of the algorithm.
In this scenario, the crowding algorithm was able to find feasible solutions to the path planning problem, without getting trapped in suboptimal regions, but the average length is still larger than that of trajectories provided by the novelty algorithm.
We can also conclude that the basic algorithm, the one that does not include any DM is of lesser quality compared to the other three instances of the algorithm where DM is included. This is an expected outcome since diversity is very important in an evolutionary context.

Simulation
Tracking of the trajectory is simulated in both 2D the 3D environment. In order to implement the tracking, inverse kinematics of the 3 DOF robot is solved. Dimensions used to solve the problem trigonometrically are shown in Figure 1.
The process of the evolution of trajectories is illustrated in Figure 5. Fifty individuals are evolved over a number of generations searching for the best solution for the given environment. The whole population is illustrated on the left. In the middle, the best individual from the current population is shown. The right side presents fitness vs. time or generations for the best individual from the current population. The trajectory is obtained with the use of the novelty 3D algorithm since this algorithm gave the best results for this problem. The optimized polynomial function of the trajectory is discretized in 100 sections, which gave us points that make the polynomial curve. Each point has coordinates that represent external coordinates of robot position. This way, simulation comes down to point-to-point robot positioning which can be easily implemented onto industrial robots. Figure 6 shows the robot following the trajectory at the beginning, after 30%, after 60%, and at the end of the process. The trajectory is obtained with the use of the novelty 3D algorithm since this algorithm gave the best results for this problem. The optimized polynomial function of the trajectory is discretized in 100 sections, which gave us points that make the polynomial curve. Each point has coordinates that represent external coordinates of robot position. This way, simulation comes down to point-to-point robot positioning which can be easily implemented onto industrial robots. Figure 6 shows the robot following the trajectory at the beginning, after 30%, after 60%, and at the end of the process.
The trajectory is obtained with the use of the novelty 3D algorithm since this algorithm gave the best results for this problem. The optimized polynomial function of the trajectory is discretized in 100 sections, which gave us points that make the polynomial curve. Each point has coordinates that represent external coordinates of robot position. This way, simulation comes down to point-to-point robot positioning which can be easily implemented onto industrial robots. Figure 6 shows the robot following the trajectory at the beginning, after 30%, after 60%, and at the end of the process.

Conclusions
The results show that diversity maintenance methods have an impact on the overall fitness of the end solution found, what is expected. The basic version of the algorithm performed less successfully compared to all the three other versions with diversity maintenance included.
Both for the two-dimensional and three-dimensional environment, the novelty algorithm gave solutions that are consistent and converge fast towards the optimal solution.
The fitness sharing algorithm gave comparably good results for the two-dimensional environment but had certain problems with the three-dimensional. The algorithm got trapped in the local optimum in 7.4% of the simulations.
Even though it found trajectories without collisions, the length of those trajectories was such that those simulations could be considered unsuccessful. With the dismissal of those outlying solutions, fitness sharing algorithm has results that are getting close to those of the novelty algorithm. Crowding algorithm gave evidently the least quality results out of the tested algorithms, for both problem domains.
Looking at the results, of this and the past research, the reason for the difference in the performance is not straightforward to explain. It originates from the diversity maintenance methods, which include parent selection models. The novelty and fitness sharing algorithms use a lot more stochastic models when choosing the parent population, which results in greater freedom in the exploration of the environment.
Crowding algorithm is based on the random pairing of parents from the population, and then offspring is compared to parents and fitter individuals are preserved. In the case where the population is sparsely distributed in the large search space, this can become a limitation, since good parents might not be selected to create offspring, and thus their genetic material disappears in the evolutionary process.
Diversity maintenance is not straightforward to implement and requires a significant amount of experimental work to tune the parameters for the algorithm to work efficiently.
It is then a question if this additional work is justified if the increase of the end solution quality is not significant.
The main conclusions regarding the application of different diversity maintenance methods to the problem of path planning drawn from this study are that novelty search equals fitness sharing for simpler scenarios of path planning. The strength of the novelty search arises with the complexity of the environment and path to be found. Also, diversity maintenance methods generally positively influence the quality of solutions to the problem of path planning, when compared to the basic algorithm.