# Population Dynamics in Genetic Programming for Dynamic Symbolic Regression

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- How can changing training data be modeled within the context of a dynamic optimization problem?
- Can a dynamic symbolic regression problem be effectively solved using genetic programming?
- Which variants of genetic programming prove effective for dynamic symbolic regression problems?
- How do the population dynamics of GP evolve in the context of dynamic symbolic regression problems?

#### 1.1. Genetic Programming for Symbolic Regression

#### 1.2. Dynamic Optimization Problems

- Runtime clocks, where the epoch changes based on the algorithm’s elapsed duration;
- Generational clocks, which trigger an epoch change after a predetermined number of generations;
- Evaluation clocks, which initiate an epoch change after a predefined number of solution evaluations.

#### 1.3. Open-Ended Evolutionary Algorithms

## 2. Materials and Methods

#### 2.1. Dynamic Symbolic Regression

#### 2.2. Benchmark Data

#### 2.3. Population Dynamics

- Variable frequency, measuring the relative occurrence of a variable within the entire population.
- Variable impact, measuring the importance of a variable, based on Breiman [1]’s permutation feature importance.

- Increasing the frequency of variables gaining importance due to different scenarios;
- Decreasing the frequency of variables no longer relevant due to a change in the scenario.

#### 2.4. Experiment Setup

#### 2.5. Experiment Variation

#### 2.5.1. Faster Epoch Changes

#### 2.5.2. Mutation Rate

## 3. Results

#### 3.1. Base Experiment Results

#### 3.2. Faster Epoch Results

#### 3.3. Mutation Rates Results

- Never: The variable is consistently present in the population at all times.
- Momentarily: The variable drops out of the population at a specific time but is later reintroduced, typically through mutation.
- Permanently: The variable drops out of the population at a specific time but is not reintroduced, remaining absent for the rest of the run.

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Koza, J.R. Genetic programming as a means for programming computers by natural selection. Stat. Comput.
**1994**, 4, 87–112. [Google Scholar] [CrossRef] - Poli, R.; Langdon, W.B.; McPhee, N.F. A Field Guide to Genetic Programming; Lulu Enterprises UK Ltd.: Egham, UK, 2008; Available online: http://gpbib.cs.ucl.ac.uk/gp-html/poli08_fieldguide.html (accessed on 29 November 2023).
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
- Macedo, J.; Costa, E.; Marques, L. Genetic programming algorithms for dynamic environments. In Proceedings of the Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, 30 March–1 April 2016; Proceedings, Part II 19. pp. 280–295. [Google Scholar]
- Yin, Z.; Brabazon, A.; O’Sullivan, C.; O’Neil, M. Genetic programming for dynamic environments. In Proceedings of the International Multiconference on Computer Science and Information Technology, Wisła, Poland, 15–17 October 2007; pp. 437–446. [Google Scholar]
- Quade, M.; Abel, M.; Shafi, K.; Niven, R.K.; Noack, B.R. Prediction of dynamical systems by symbolic regression. Phys. Rev. E
**2016**, 94, 012214. [Google Scholar] [CrossRef] - O’Neill, M.; Nicolau, M.; Brabazon, A. Dynamic environments can speed up evolution with genetic programming. In Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, Dublin, Ireland, 12–16 July 2011; pp. 191–192. [Google Scholar]
- Pearson, K. LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci.
**1901**, 2, 559–572. [Google Scholar] [CrossRef] - Virgolin, M.; Pissis, S.P. Symbolic regression is np-hard. arXiv
**2022**, arXiv:2207.01018. [Google Scholar] - Nocedal, J.; Wright, S.J. Numerical Optimization, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Banzhaf, W.; Nordin, P.; Keller, R.E.; Francone, F.D. Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications; Morgan Kaufmann Publishers Inc.: Cambridge, MA, USA, 1998. [Google Scholar]
- Perkis, T. Stack-based genetic programming. In Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, Orlando, FL, USA, 27–29 June 1994; pp. 148–153. [Google Scholar]
- O’Neill, M.; Ryan, C. Grammatical evolution. IEEE Trans. Evol. Comput.
**2001**, 5, 349–358. [Google Scholar] [CrossRef] - McConaghy, T. FFX: Fast, scalable, deterministic symbolic regression technology. In Genetic Programming Theory and Practice IX; Springer: Berlin/Heidelberg, Germany, 2011; pp. 235–260. [Google Scholar]
- Kommenda, M.; Burlacu, B.; Kronberger, G.; Affenzeller, M. Parameter identification for symbolic regression using nonlinear least squares. Genet. Program. Evolvable Mach.
**2020**, 21, 471–501. [Google Scholar] [CrossRef] - McKay, R.I.; Hoai, N.X.; Whigham, P.A.; Shan, Y.; O’neill, M. Grammar-based genetic programming: A survey. Genet. Program. Evolvable Mach.
**2010**, 11, 365–396. [Google Scholar] [CrossRef] - Affenzeller, M.; Wagner, S. Offspring selection: A new self-adaptive selection scheme for genetic algorithms. In Adaptive and Natural Computing Algorithms, Proceedings of the International Conference, Coimbra, Portugal, 21–23 March 2005; Springer: Vienna, Austria, 2015; pp. 218–221. [Google Scholar]
- Hornby, G.S. ALPS: The age-layered population structure for reducing the problem of premature convergence. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, Seattle, WA, USA, 8–12 July 2006; pp. 815–822. [Google Scholar]
- Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput.
**2002**, 6, 182–197. [Google Scholar] [CrossRef] - Zitzler, E.; Laumanns, M.; Thiele, L. SPEA2: Improving the strength Pareto evolutionary algorithm. TIK Rep.
**2001**, 103, 1–22. [Google Scholar] - Nguyen, T.T.; Yang, S.; Branke, J. Evolutionary dynamic optimization: A survey of the state of the art. Swarm Evol. Comput.
**2012**, 6, 1–24. [Google Scholar] [CrossRef] - Yazdani, D.; Omidvar, M.N.; Cheng, R.; Branke, J.; Nguyen, T.T.; Yao, X. Benchmarking continuous dynamic optimization: Survey and generalized test suite. IEEE Trans. Cybern.
**2020**, 52, 3380–3393. [Google Scholar] [CrossRef] - Li, C.; Yang, S.; Nguyen, T.T.; Yu, E.L.; Yao, X.; Jin, Y.; Beyer, H.; Suganthan, P.N. Benchmark Generator for CEC 2009 Competition on Dynamic Optimization. Technical Report. 2008. Available online: https://bura.brunel.ac.uk/bitstream/2438/5897/2/Fulltext.pdf (accessed on 29 November 2023).
- Yang, S. Non-stationary problem optimization using the primal-dual genetic algorithm. In Proceedings of the 2003 Congress on Evolutionary Computation, Canberra, ACT, Australia, 8–12 December 2003; Volume 3, pp. 2246–2253. [Google Scholar]
- Yazdani, D.; Cheng, R.; Yazdani, D.; Branke, J.; Jin, Y.; Yao, X. A survey of evolutionary continuous dynamic optimization over two decades—Part B. IEEE Trans. Evol. Comput.
**2021**, 25, 630–650. [Google Scholar] [CrossRef] - Tinós, R.; Whitley, D.; Howe, A. Use of explicit memory in the dynamic traveling salesman problem. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, Vancouver, BC, Canada, 12–16 July 2014; pp. 999–1006. [Google Scholar]
- Hansknecht, C.; Joormann, I.; Stiller, S. Dynamic shortest paths methods for the time-dependent TSP. Algorithms
**2021**, 14, 21. [Google Scholar] [CrossRef] - Branke, J. Evolutionary Optimization in Dynamic Environments; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 3. [Google Scholar]
- Strąk, Ł.; Skinderowicz, R.; Boryczka, U.; Nowakowski, A. A self-adaptive discrete PSO algorithm with heterogeneous parameter values for dynamic TSP. Entropy
**2019**, 21, 738. [Google Scholar] [CrossRef] [PubMed] - Chen, Q.; Ding, J.; Yang, S.; Chai, T. A novel evolutionary algorithm for dynamic constrained multiobjective optimization problems. IEEE Trans. Evol. Comput.
**2019**, 24, 792–806. [Google Scholar] [CrossRef] - Karder, J.; Werth, B.; Beham, A.; Wagner, S.; Affenzeller, M. Analysis and Handling of Dynamic Problem Changes in Open-Ended Optimization. In Proceedings of the International Conference on Computer Aided Systems Theory, Las Palmas de Gran Canaria, Spain, 20–25 February 2022; pp. 61–68. [Google Scholar]
- Brest, J.; Zamuda, A.; Boskovic, B.; Maucec, M.S.; Zumer, V. Dynamic optimization using self-adaptive differential evolution. In Proceedings of the 2009 IEEE Congress on Evolutionary Computation, Trondheim, Norway, 18–21 May 2009; pp. 415–422. [Google Scholar]
- Alza, J.; Bartlett, M.; Ceberio, J.; McCall, J. On the elusivity of dynamic optimisation problems. Swarm Evol. Comput.
**2023**, 78, 101289. [Google Scholar] [CrossRef] - Branke, J. Memory enhanced evolutionary algorithms for changing optimization problems. In Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Washington, DC, USA, 6–9 July 1999; Volume 3, pp. 1875–1882. [Google Scholar]
- Yu, E.; Suganthan, P.N. Evolutionary programming with ensemble of explicit memories for dynamic optimization. In Proceedings of the 2009 IEEE Congress on Evolutionary Computation, Trondheim, Norway, 18–21 May 2009; pp. 431–438. [Google Scholar]
- Blanco Abello, M.; Michalewicz, Z. Implicit memory-based technique in solving dynamic scheduling problems through response surface methodology—Part II: Experiments and analysis. Int. J. Intell. Comput. Cybern.
**2014**, 7, 143–174. [Google Scholar] [CrossRef] - Morris, R. Genetic Algorithms with Implicit Memory. 2011. Available online: https://dora.dmu.ac.uk/server/api/core/bitstreams/e28b68b7-67ae-47a3-9b0c-7269ad77e2eb/content (accessed on 29 November 2023).
- Winkler, S.M.; Affenzeller, M.; Kronberger, G.; Kommenda, M.; Burlacu, B.; Wagner, S. Sliding window symbolic regression for detecting changes of system dynamics. In Genetic Programming Theory and Practice XII; Springer: Cham, Switzerland, 2015; pp. 91–107. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat.
**2001**, 29, 1189–1232. [Google Scholar] [CrossRef] - Levenberg, K. A method for the solution of certain non-linear problems in least squares. Q. Appl. Math.
**1944**, 2, 164–168. [Google Scholar] [CrossRef] - Hansen, N.; Auger, A.; Ros, R.; Mersmann, O.; Tušar, T.; Brockhoff, D. COCO: A platform for comparing continuous optimizers in a black-box setting. Optim. Methods Softw.
**2021**, 36, 114–144. [Google Scholar] [CrossRef]

**Figure 1.**In traditional static data modeling, the machine learning algorithm processes the training data once, yielding a singular output model.

**Figure 2.**In scenarios with evolving data over time, one may need to re-run the entire machine learning algorithm to obtain a newly adapted model.

**Figure 3.**In dynamic modeling, the training data undergo changes throughout the machine learning algorithm, enabling continuous model output that adjusts to evolving data or improves with the discovery of more effective models.

**Figure 5.**Hidden states over epochs for the three terms, a, b, and c, for the Friedman-based benchmarks.

**Figure 6.**The figure depicts the mean and best quality of the population, measured in Pearson’s R${}^{2}$, over time. The light-shaded lines represent individual runs, while the darker lines represent the mean values across all runs. Subplots are organized column-wise by the algorithms indicated at the top and instances noted on the right.

**Figure 7.**The variable frequencies over time, averaged across all runs for each generation, with distinct variables represented by different colors. Subplots are arranged column-wise by algorithm and row-wise by benchmark instance.

**Figure 8.**The variable impacts of the populations over time, averaged across all runs for each generation, with distinct variables represented by different colors. Subplots are arranged column-wise by algorithm and row-wise by benchmark instance.

**Figure 9.**Variable frequency and impact plotted over time for the instance W1, differentiated by color. The subplots are organized column-wise by algorithm and row-wise by variables of the benchmark instance. Due to the generally different value ranges of variable frequency and variable impacts, they are displayed on different scales on different y-axes for better alignment.

**Figure 10.**The quality of the best solution candidate for each generation, distinguished by different epoch speeds represented in different colors. Subplots are arranged column-wise by algorithm and row-wise by benchmark instance. Due to variations in their configuration, the number of generations differs among the speeds, particularly for the Friedman-based benchmarks; consequently, the x-axis representing the number of generations was normalized based on the total number of generations.

**Figure 11.**The variable frequencies over time for the F1 instance, depicted with different epoch speeds. The x-axis is normalized to accommodate the varying maximum number of generations for the different epoch speeds. Subplots are organized column-wise by algorithm and row-wise by variable.

**Figure 12.**The figure displays the mean quality of the current best solution per generation over all generations (depicted in red) and the mean quality of the population per generation over all generations (depicted in blue) for different mutation rates on the x-axis. Each individual dot represents the average qualities of a single run, with the overlaid box indicating the distribution over all runs. Subplots are arranged column-wise by algorithms and row-wise by instances.

**Figure 13.**Variable frequency of a specific variable, x3, from the W1 benchmark tracked over time for various mutation rates. The color distinguishes whether the variable becomes extinct during the run. Subplots are organized column-wise by algorithms and row-wise by mutation rates.

Name | Type | Train. Size | Test Size | Features | Complexity |
---|---|---|---|---|---|

W1 | Winkler et al. [41] | 100 | 10 | 3 | low |

W2 | Winkler et al. [41] | 100 | 10 | 8 | low |

W3 | Winkler et al. [41] | 100 | 10 | 10 | medium |

F1 | Friedman [42] | 1000 | 100 | 3 | high |

F2 | Friedman [42] | 1000 | 100 | 6 | high |

W1 | W2 | W3 | F1 | F2 | ||
---|---|---|---|---|---|---|

GA and ALPS | Epoch Clock (Generational Interval) | 1 | 1 | 1 | 150 | 250 |

Max Tree Length | 25 | 50 | 70 | 70 | 100 | |

Creator | Probabilistic Tree Creation | |||||

Mutation | Change Node Type, Full Tree Shaker, One Point Shaker, Remove Branch, Replace Branch [4] | |||||

Mutation Probability | 15% | |||||

Crossover | Subtree Swapping [4] | |||||

Crossover Probability | 100% | |||||

Function Set | $+,-,*,/,\mathit{var},\mathit{const}$ | |||||

Local Numeric Opt. | Levenberg–Marquardt, 10 iterations [43] | |||||

GA | Population Size | 100 | 200 | 500 | 200 | 500 |

Elites * | 1 | 1 | 1 | 1 | 1 | |

Selector | Prop. | Tour. (k = 3) | Tour. (k = 4) | Tour. (k = 3) | Tour. (k = 4) | |

Maximum Generations | 700 | 700 | 700 | 1500 | 2000 | |

ALPS | Max Layers | 10 | 10 | 10 | 10 | 10 |

Population Size (per Layer) | 100 | 100 | 100 | 100 | 100 | |

Elites * (per Layer) | 1 | 1 | 1 | 1 | 1 | |

Selector | Generalized Rank (Pressure = 4) | |||||

Age Gap | 20 | 20 | 20 | 20 | 20 | |

Aging Scheme | Poly | Poly | Poly | Poly | Poly | |

Maximum Generations | 700 | 700 | 700 | 1500 | 2000 |

**Table 3.**Experiment parameters for analyzing the impact of fast epoch changes. The table lists only the modifications from the base experiments, with the three values separated by “/” indicating the configurations for Normal speed and the Fast2 and Fast3 variants.

W1 | W2 | W3 | F1 | F2 | ||
---|---|---|---|---|---|---|

GA | Epoch Clock (Generational Interval) | 1/1/1 | 1/1/1 | 1/1/1 | 150/75/50 | 250/100/67 |

Population Size | 100/50/33 | 200/100/67 | 500/250/167 | 200/200/200 | 500/500/500 | |

Maximum Generations | 700/700/700 | 700/700/700 | 700/700/700 | 1500/750/500 | 2000/1000/670 | |

ALPS | Epoch Clock (Generational Interval) | 1/1/1 | 1/1/1 | 1/1/1 | 150/75/50 | 250/100/67 |

Population Size (per Layer) | 100/50/33 | 100/50/35 | 100/50/33 | 100/100/100 | 100/100/100 | |

Maximum Generations | 700/700/700 | 700/700/700 | 700/700/700 | 1500/750/500 | 2000/1000/670 |

**Table 4.**Parameters for the experiment analyzing varied mutation rates. The table contains only modifications made to the base experiments.

W1 | W2 | W3 | F1 | F2 | |
---|---|---|---|---|---|

Mutation Probability | 15%/10%/5%/1%/0.5%/0.1%/0.05%/0.01%/0.005%/0.001%/0.0% |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fleck, P.; Werth, B.; Affenzeller, M.
Population Dynamics in Genetic Programming for Dynamic Symbolic Regression. *Appl. Sci.* **2024**, *14*, 596.
https://doi.org/10.3390/app14020596

**AMA Style**

Fleck P, Werth B, Affenzeller M.
Population Dynamics in Genetic Programming for Dynamic Symbolic Regression. *Applied Sciences*. 2024; 14(2):596.
https://doi.org/10.3390/app14020596

**Chicago/Turabian Style**

Fleck, Philipp, Bernhard Werth, and Michael Affenzeller.
2024. "Population Dynamics in Genetic Programming for Dynamic Symbolic Regression" *Applied Sciences* 14, no. 2: 596.
https://doi.org/10.3390/app14020596