# The Effect of Multi-Generational Selection in Geometric Semantic Genetic Programming

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- By selecting uniformly among the last k generations;
- By selecting among all the generations with a decreasing probability (i.e., with a geometric distribution).

## 2. Related Works

## 3. Geometric Semantic GP with Multi-Generational Selection

#### 3.1. Geometric Semantic Genetic Programming

**Semantic Crossover**. Let ${T}_{1}$ and ${T}_{2}$ be two functions from ${\mathbb{R}}^{n}$ to $\mathbb{R}$ representing two GP trees and let $R:{\mathbb{R}}^{n}\to [0,1]$ be a randomly generated tree. Then the semantic crossover between ${T}_{1}$ and ${T}_{2}$ using the random tree R is defined as:

**Semantic Mutation**. Let $T:{\mathbb{R}}^{n}\to \mathbb{R}$ be the function defined by a GP tree, $R:{\mathbb{R}}^{n}\to \mathbb{R}$ be a randomly generated tree, and $m\in {\mathbb{R}}_{+}$ be a positive real number, called the mutation step. Then, the semantic mutation of T using the random tree R is defined as:

- The initial population is composed of standard GP trees;
- Each successive generation is not composed of trees; rather, each individual is a structure containing the random trees used in crossover and mutations and pointers or references to the individuals in the previous populations. This solves the problem of an exponential space blowup;
- Evaluation can be performed bottom-up, saving the intermediate results from the initial population and combining them following the application of crossover and mutations.

#### 3.2. Multi-Generational Selection for GSGP

Algorithm 1 The pseudocode of the multi-generational (tournament) selection algorithm, where P is a two-dimensional array of individuals of n rows (generations), where $P\left[i\right]\left[j\right]$ is the j-th individual in the i-th generation, f is the fitness function, $t\in \mathbb{N}$ is the tournament size, and D is a distribution. | ||||

functionMulti-generational-selection(P, n, f, t, D) | ||||

Tournament $\leftarrow \u2300$ | ▹ Individuals selected for the tournament | |||

for $1\le i\le t$do | ▹ Repeat for the tournament size t | |||

$j\leftarrow n-\phantom{\rule{4.pt}{0ex}}\mathrm{extract}\phantom{\rule{4.pt}{0ex}}\mathrm{from}\phantom{\rule{4.pt}{0ex}}D$ | ▹ Select the generation | |||

$k\leftarrow \phantom{\rule{4.pt}{0ex}}\mathrm{uniform}\phantom{\rule{4.pt}{0ex}}\mathrm{random}\phantom{\rule{4.pt}{0ex}}\mathrm{integer}\phantom{\rule{4.pt}{0ex}}\mathrm{between}\phantom{\rule{4.pt}{0ex}}1\phantom{\rule{4.pt}{0ex}}\mathrm{and}\phantom{\rule{4.pt}{0ex}}\left|P\right[j\left]\right|$ | ▹ Select the individual | |||

Tournament ← Tournament $\cup \left\{P\right[j\left]\right[k\left]\right\}$ | ▹ Add the individual to the tournament | |||

end for | ||||

best $\leftarrow {arg\; max}_{x\in \mathrm{Tournament}}f\left(x\right)$ | ▹ Find the best individual in the tournament | |||

return best | ||||

end function |

#### 3.2.1. Uniform Multi-Generational Selection

#### 3.2.2. Geometric Multi-Generational Selection

## 4. Experimental Setting

#### 4.1. Dataset

**%F**) measures the percentage of the initial drug dose that effectively reaches the systemic blood circulation: this problem constitutes an essential pharmacokinetic task as the oral assumption is usually the preferred way of supplying drugs to patients, and also because it is a representative measure of the quantity of the active principle that can effectively actuate its biological effect [25].

**%PPB**) characterizes the distribution into the human body of a drug. Specifically, it corresponds to the percentage of the initial drug dose which binds plasma proteins: this measure is fundamental, as blood circulation is the major vehicle of drug distribution into the human body [26].

**LD50**) concerns the harmful effect produced by the distribution of a drug into the human body, as it measures the lethal dose required to kill half the members of a tested population after a specified time. It is expressed as the number of milligrams of drug-related to one kilogram of cavies mass [26].

**air**) measures the hydrodynamic performance of sailing yachts, taking into account their dimension and velocity [27].

**conc**) [28] characterizes the value of the slump flow of the concrete when given as inputs concrete components such as cement, fly ash, slag, water, coarse aggregate and fine aggregate.

**yac**) measures the hydrodynamic performance of sailing yachts starting from their dimension and velocity.

#### 4.2. Experimental Study

## 5. Results and Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

%F | Human oral bioavailability |

%PPB | Protein-plasma binding level |

air | Airfoil self-noise |

conc | Concrete compressive strength |

Gp | Geometric multi-generational selection with parameter p |

EA | Evolutionary algorithms |

GA | Genetic algorithms |

GP | Genetic programming |

GSGP | Geometric semantic genetic programming |

LD50 | Median oral lethal dose |

RMSE | Root-mean-square errors |

Uk | Uniform multi-generational selection with parameter k |

yac | Yacht hydrodynamics |

## References

- Koza, J.R. Genetic programming as a means for programming computers by natural selection. Stat. Comput.
**1994**, 4, 87–112. [Google Scholar] [CrossRef] - Moraglio, A.; Krawiec, K.; Johnson, C.G. Geometric semantic genetic programming. In Proceedings of the International Conference on Parallel Problem Solving from Nature, Taormina, Italy, 1–5 September 2012; pp. 21–31. [Google Scholar]
- Vanneschi, L.; Castelli, M.; Manzoni, L.; Silva, S. A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. In Proceedings of the European Conference on Genetic Programming, Vienna, Austria, 3–5 April 2013; pp. 205–216. [Google Scholar]
- Castelli, M.; Manzoni, L. GSGP-C++ 2.0: A geometric semantic genetic programming framework. SoftwareX
**2019**, 10, 100313. [Google Scholar] [CrossRef] - Vanneschi, L.; Silva, S.; Castelli, M.; Manzoni, L. Geometric semantic genetic programming for real life applications. In Genetic Programming Theory and Practice Xi; Springer: Berlin/Heidelberg, Germany, 2014; pp. 191–209. [Google Scholar]
- Louis, S.; Li, G. Augmenting genetic algorithms with memory to solve traveling salesman problems. In Proceedings of the Joint Conference on Information Sciences, Nagoya, Japan, 23–29 August 1997; pp. 108–111. [Google Scholar]
- Wiering, M. Memory-based memetic algorithms. In Proceedings of the Benelearn’04—Thirteenth Belgian-Dutch Conference on Machine Learning, Brussels, Belgium, 8–9 January 2004; pp. 191–198. [Google Scholar]
- Yang, S. Genetic Algorithms with Memory- and Elitism-Based Immigrants in Dynamic Environments. Evol. Comput.
**2008**, 16, 385–416. [Google Scholar] [CrossRef] [PubMed][Green Version] - Cao, Y.; Luo, W. Novel Associative Memory Retrieving Strategies for Evolutionary Algorithms in Dynamic Environments. In Lecture Notes in Computer Science, Proceedings of the Advances in Computation and Intelligence—4th International Symposium, ISICA 2009, Huangshi, China, 23–25 Ocotober 2009; Cai, Z., Li, Z., Kang, Z., Liu, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5821, pp. 258–268. [Google Scholar]
- Castelli, M.; Manzoni, L.; Vanneschi, L. The effect of selection from old populations in genetic algorithms. In Companion Material Proceedings, Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference, GECCO 2011, Dublin, Ireland, 12–16 July 2011; Krasnogor, N., Lanzi, P.L., Eds.; ACM: New York, NY, USA, 2011; pp. 161–162. [Google Scholar]
- Castelli, M.; Manzoni, L.; Vanneschi, L. A Method to Reuse Old Populations in Genetic Algorithms. In Progress in Artificial Intelligence, Proceedings of the 15th Portuguese Conference on Artificial Intelligence, EPIA 2011, Lisbon, Portugal, 10–13 October 2011; Lecture Notes in Computer Science; Antunes, L., Pinto, H.S., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 7026, pp. 138–152. [Google Scholar]
- Augusto, D.A.; Barbosa, H.J.C. Symbolic Regression via Genetic Programming. In Proceedings of the 6th Brazilian Symposium on Neural Networks (SBRN 2000), Rio de Janiero, Brazil, 22–25 November 2000; pp. 173–178. [Google Scholar]
- Seront, G. External concepts reuse in genetic programming. In Proceedings of the AAAI Symposium on Genetic programming, MIT/AAAI, Cambridge, MA, USA, 10–12 November 1995; pp. 94–98. [Google Scholar]
- Jaskowski, W.; Krawiec, K.; Wieloch, B. Knowledge reuse in genetic programming applied to visual learning. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2007, London, UK, 7–11 July 2007; pp. 1790–1797. [Google Scholar]
- Pei, W.; Xue, B.; Shang, L.; Zhang, M. Reuse of program trees in genetic programming with a new fitness function in high-dimensional unbalanced classification. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO 2019), Prague, Czech Republic, 13–17 July 2019; pp. 187–188. [Google Scholar]
- Bi, Y.; Xue, B.; Zhang, M. A Divide-and-Conquer Genetic Programming Algorithm With Ensembles for Image Classification. IEEE Trans. Evol. Comput.
**2021**, 25, 1148–1162. [Google Scholar] [CrossRef] - Castelli, M.; Manzoni, L.; Silva, S.; Vanneschi, L.; Popovič, A. The influence of population size in geometric semantic GP. Swarm Evol. Comput.
**2017**, 32, 110–120. [Google Scholar] [CrossRef] - Sipper, M.; Moore, J.H. Conservation machine learning. BioData Min.
**2020**, 13, 9. [Google Scholar] [CrossRef] - Sipper, M.; Moore, J.H. Conservation machine learning: A case study of random forests. Sci. Rep.
**2021**, 11, 3629. [Google Scholar] [CrossRef] [PubMed] - Castelli, M.; Trujillo, L.; Vanneschi, L.; Silva, S.; Z-Flores, E.; Legrand, P. Geometric Semantic Genetic Programming with Local Search. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2015, Madrid, Spain, 11–15 July 2015; pp. 999–1006. [Google Scholar]
- Castelli, M.; Manzoni, L.; Mariot, L.; Saletta, M. Extending Local Search in Geometric Semantic Genetic Programming. In Lecture Notes in Computer Science, Proceedings of the Progress in Artificial Intelligence—19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, 3–6 September 2019; Oliveira, P.M., Novais, P., Reis, L.P., Eds.; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11804, pp. 775–787. [Google Scholar]
- Vanneschi, L.; Castelli, M.; Silva, S. A survey of semantic methods in genetic programming. Genet. Program. Evolvable Mach.
**2014**, 15, 195–214. [Google Scholar] [CrossRef] - Moraglio, A.; Poli, R. Topological interpretation of crossover. In Proceedings of the Genetic and Evolutionary Computation Conference, Seattle, WA, USA, 26–30 June 2004; pp. 1377–1388. [Google Scholar]
- McDermott, J.; White, D.R.; Luke, S.; Manzoni, L.; Castelli, M.; Vanneschi, L.; Jaskowski, W.; Krawiec, K.; Harper, R.; De Jong, K.; et al. Genetic programming needs better benchmarks. In Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, Philadelphia, PA, USA, 7–11 July 2012; pp. 791–798. [Google Scholar]
- Archetti, F.; Lanzeni, S.; Messina, E.; Vanneschi, L. Genetic programming and other machine learning approaches to predict median oral lethal dose (LD 50) and plasma protein binding levels (% PPB) of drugs. In Proceedings of the European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, Valencia, Spain, 11–13 April 2007; pp. 11–23. [Google Scholar]
- Archetti, F.; Lanzeni, S.; Messina, E.; Vanneschi, L. Genetic programming for human oral bioavailability of drugs. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, Seattle, WA, USA, 8–12 July 2006; pp. 255–262. [Google Scholar]
- Brooks, T.F.; Pope, D.S.; Marcolini, M.A. Airfoil Self-Noise and Prediction; Technical Report; NASA: Washington, DC, USA, 1989. [Google Scholar]
- Castelli, M.; Vanneschi, L.; Silva, S. Prediction of high performance concrete strength using genetic programming with geometric semantic genetic operators. Expert Syst. Appl.
**2013**, 40, 6856–6862. [Google Scholar] [CrossRef] - Pietropolli, G.; Manzoni, L.; Paoletti, A.; Castelli, M. Combining Geometric Semantic GP with Gradient-Descent Optimization. In Proceedings of the European Conference on Genetic Programming (Part of EvoStar), Madrid, Spain, 20–22 April 2022; pp. 19–33. [Google Scholar]

**Figure 1.**A representation of the effect of multi-generational selection on the convex hull where geometric semantic crossover can generate new individuals.

**Figure 2.**A visual representation of how GSGP can be implemented in an efficient way, sharing subtrees between individuals. At the top, the standard implementation where the parents can be selected only from the previous generation is shown. At the bottom, parents can be selected uniformly at random from the previous two populations. Notice how no additional storage is required.

**Figure 3.**Box-plots of the RMSE on the training set over 100 independent runs of the considered benchmark dataset for all the proposed methods. (

**a**) Air; (

**b**) %F; (

**c**) conc; (

**d**) %PPB; (

**e**) LD50; (

**f**) yac.

**Figure 4.**Box-plots of the RMSE on the test set over 100 independent runs of the considered benchmark dataset for all the proposed methods. (

**a**) Air; (

**b**) %F; (

**c**) conc; (

**d**) %PPB; (

**e**) LD50; (

**f**) yac.

**Table 1.**Principal characteristics of the considered datasets: the number of variables, the number of instances, and the domain.

Dataset | Variables | Instances | Area |
---|---|---|---|

airfoil | 6 | 1503 | Physics |

bioav | 242 | 359 | Pharmacokinetic |

concrete | 9 | 1030 | Physics |

ppb | 627 | 131 | Pharmacokinetic |

toxicity | 627 | 234 | Pharmacokinetic |

yacht | 7 | 308 | Physics |

Parameter | Value |
---|---|

Population size | 100 |

Number of generations | 100 |

Number of runs | 100 |

Max. initial depth | 4 |

Crossover rate | $0.9$ |

Mutation rate | $0.3$ |

Mutation step | $0.1$ |

Selection method | Tournament of size 4 |

Elitism | Best individuals survive |

**Table 3.**Fitness values obtained by selecting the ancestors with uniform multi-generational selection. The values in bold are the best results obtained.

GSGP | U2 | U5 | U10 | U20 | U50 | U100 | ||
---|---|---|---|---|---|---|---|---|

air | train | 34.43 | 33.89 | 32.02 | 33.28 | 32.85 | 34.93 | 34.93 |

test | 34.44 | 34.01 | 31.83 | 33.26 | 33.12 | 34.72 | 37.51 | |

%F | train | 41.92 | 41.78 | 40.37 | 41.13 | 40.72 | 43.00 | 43.53 |

test | 42.17 | 42.54 | 41.27 | 42.10 | 41.36 | 43.56 | 43.94 | |

conc | train | 9.54 | 9.41 | 9.15 | 9.27 | 9.36 | 9.82 | 10.08 |

test | 9.52 | 9.49 | 9.21 | 9.31 | 9.35 | 9.69 | 10.07 | |

%PPB | train | 36.62 | 38.26 | 29.48 | 31.02 | 30.82 | 44.80 | 51.94 |

test | 255.51 | 243.46 | 335.42 | 371.03 | 298.03 | 206.88 | 148.83 | |

LD50 | train | 2183.65 | 2183.17 | 2165.20 | 2171.09 | 2165.36 | 2199.38 | 2243.06 |

test | 2262.15 | 2233.41 | 2250.19 | 2242.84 | 2240.93 | 2274.87 | 2280.51 | |

yac | train | 13.71 | 13.77 | 13.04 | 13.24 | 13.18 | 14.19 | 14.44 |

test | 13.55 | 13.69 | 12.99 | 13.12 | 13.00 | 14.11 | 14.27 |

**Table 4.**Fitness values obtained by selecting the ancestors with geometric multi-generational selection. The values in bold are the best results obtained.

GSGP | G0.25 | G0.50 | G0.75 | ||
---|---|---|---|---|---|

air | train | 34.43 | 37.48 | 32.97 | 40.76 |

test | 34.44 | 32.93 | 40.63 | 43.51 | |

%F | train | 41.92 | 41.16 | 44.60 | 44.92 |

test | 42.17 | 41.49 | 44.63 | 44.73 | |

conc | train | 9.54 | 9.35 | 10.58 | 10.86 |

test | 9.52 | 9.37 | 10.48 | 10.76 | |

%PPB | train | 36.62 | 32.06 | 57.37 | 58.73 |

test | 255.51 | 300.03 | 119.20 | 106.33 | |

LD50 | train | 2183.65 | 2176.51 | 2234.56 | 2264.43 |

test | 2262.15 | 2216.47 | 2258.10 | 2305.45 | |

yac | train | 13.71 | 13.37 | 14.55 | 14.73 |

test | 13.55 | 13.30 | 14.52 | 14.65 |

**Table 5.**p-values returned by the Wilcoxon rank-sum test under the alternative hypothesis that the median errors on the test set obtained from classical GSGP are equal with respect to the errors obtained with the methods introduced in this paper. Highlighted in bold, the p-values below $0.05$ where the direction of the difference shows an improvement with respect to standard GSGP.

U2 | U5 | U10 | U20 | U50 | U100 | G0.25 | G0.50 | G0.75 | |
---|---|---|---|---|---|---|---|---|---|

airfoil | $0.158$ | $\mathbf{0}.\mathbf{000}$ | $\mathbf{0}.\mathbf{000}$ | $\mathbf{0}.\mathbf{000}$ | $0.280$ | $0.000$ | $\mathbf{0}.\mathbf{000}$ | $0.000$ | $0.000$ |

bioav | $0.741$ | $\mathbf{0}.\mathbf{000}$ | $\mathbf{0}.\mathbf{001}$ | $\mathbf{0}.\mathbf{000}$ | $0.000$ | $0.000$ | $\mathbf{0}.\mathbf{007}$ | $0.000$ | $0.000$ |

concrete | $0.763$ | $\mathbf{0}.\mathbf{001}$ | $\mathbf{0}.\mathbf{042}$ | $0.445$ | $0.001$ | $0.000$ | $0.557$ | $0.000$ | $0.000$ |

ppb | $0.000$ | $\mathbf{0}.\mathbf{000}$ | $\mathbf{0}.\mathbf{000}$ | $\mathbf{0}.\mathbf{000}$ | $0.000$ | $0.000$ | $\mathbf{0}.\mathbf{000}$ | $0.000$ | $0.000$ |

toxicity | $0.783$ | $\mathbf{0}.\mathbf{049}$ | $0.365$ | $0.128$ | $0.281$ | $0.001$ | $0.275$ | $0.001$ | $0.000$ |

yacht | $0.135$ | $\mathbf{0}.\mathbf{000}$ | $\mathbf{0}.\mathbf{000}$ | $\mathbf{0}.\mathbf{000}$ | $0.000$ | $0.000$ | $\mathbf{0}.\mathbf{001}$ | $0.000$ | $0.000$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Castelli, M.; Manzoni, L.; Mariot, L.; Menara, G.; Pietropolli, G.
The Effect of Multi-Generational Selection in Geometric Semantic Genetic Programming. *Appl. Sci.* **2022**, *12*, 4836.
https://doi.org/10.3390/app12104836

**AMA Style**

Castelli M, Manzoni L, Mariot L, Menara G, Pietropolli G.
The Effect of Multi-Generational Selection in Geometric Semantic Genetic Programming. *Applied Sciences*. 2022; 12(10):4836.
https://doi.org/10.3390/app12104836

**Chicago/Turabian Style**

Castelli, Mauro, Luca Manzoni, Luca Mariot, Giuliamaria Menara, and Gloria Pietropolli.
2022. "The Effect of Multi-Generational Selection in Geometric Semantic Genetic Programming" *Applied Sciences* 12, no. 10: 4836.
https://doi.org/10.3390/app12104836