Using Genetic Algorithms for Research Software Structure Optimization
Abstract
1. Introduction
- To improve maintainability (easier to understand, test, and modify parts of the software system),
- To enhance modularity (reduce coupling and increase cohesion),
- To better adapt to change (support new features), and
- To reduce technical debt (clean up legacy structures that no longer make sense).
- Based on the recovered software architectures of existing research software [3], we show that genetic algorithms allow to generate recommendations for restructuring this research software.
- To evaluate these recommendations for restructuring the ESMs, we collected feedback from domain experts who are involved in climate research as research software engineers. This feedback is also used for a guided, interactive optimization process.
2. Our Application Domain: Earth System Models
3. Reverse Engineering the Software Architecture
4. Genetic Optimization to Generate Restructuring Recommendations
4.1. Genetic Algorithms for Multi-Objective Optimization
4.2. Software Quality Metrics as Fitness Functions
- StrCoup:
- Structural coupling is defined as the average fan-out of components, i.e., the average number of edges connecting a program unit in the current component with a program unit in another component.
- LStrCoh:
- Lack of structural cohesion is defined as the average number of pairs of unrelated units in a component, i.e., the number of pairs where both u and v are program units, and there is no edge connecting u and v in the coupling graph.
5. Optimization of MITgcm Global Ocean
5.1. Unguided Optimization of MITgcm Global Ocean
.- 1.
- The modularization opt-1238 achieves a coupling degree of zero, at the price of very low cohesion (recall that LStrCoh measures the lack of cohesion, so high values indicate low cohesion). Manual inspection of this modularization shows that opt-1238 merges components to obtain a modularization that consists of two unrelated components (and hence obtains coupling degree zero).
- 2.
- An even more extreme modularization is opt-1260, which merges all components into a single one—with the “benefit” of fewer components, but at the price of a very low cohesion.
- 3.
- A third “extreme” modularization is opt-180, which achieves full cohesion (i.e., zero lack of cohesion) while still improving the coupling value over the original modularization (1.72 instead of 2.84). However, a manual inspection of this solution shows that it distributes 96 units into 83 components, which are then mostly one-element components. Such components are cohesive, but obviously this is not a good way to structure a software system.
denote optimization results that introduce at most a single additional component compared to the original modularization of the software, while blue star-shaped symbols
represent results that introduce three additional components, and the small blue dot
stands for optimizations introducing at least 15 and at most 29 additional components (the complete list of symbols is given in the legend of the plots). In this way, the plots illustrate not only the (expected) tradeoff between the quality metrics LStrCoh and StrCoup, but also between the conflicting goals of improving these metrics and introducing a few new components.5.2. Guided, Interactive Optimization of MITgcm Global Ocean
6. Optimization of UVic
6.1. Unguided Optimization of UVic
6.2. Guided, Interactive Optimization of UVic
7. Discussion
- The pure dynamic analysis produces significantly better modularizations than the optimizations based on the other analysis methods that we used for reverse engineering. This may be a specific observation for our application domain of research software: if we use the scientific model for a specific experiment, we can optimize the software architecture for this specific use case. Only those parts of the software that are actually used in the experiment are represented in the reconstructed architecture, which is then optimized.
- The guided, interactive optimization provides good modularizations. Thus, we can recommend feeding expert feedback into the optimization process.
8. Threats to Validity
9. Related Work
10. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CMIP | Coupled Model Intercomparison Project |
| ESM | Earth System Model |
| IPCC | Intergovernmental Panel on Climate Change |
| LStrCoh | Lack of Structural Cohesion |
| MITgcm | MIT General Circulation Model |
| NSGA | Non-dominated Sorting Genetic Algorithm |
| PalMod | Paleoclimatic Modeling |
| SOLAS | Surface Ocean-Lower Atmosphere Study |
| StrCoup | Structural Coupling |
| UVic | University of Victoria Earth System Climate Model |
References
- Candela, I.; Bavota, G.; Russo, B.; Oliveto, R. Using Cohesion and Coupling for Software Remodularization: Is It Enough? ACM Trans. Softw. Eng. Methodol. 2016, 25, 1–28. [Google Scholar] [CrossRef]
- Felderer, M.; Goedicke, M.; Grunske, L.; Hasselbring, W.; Lamprecht, A.L.; Rumpe, B. Investigating Research Software Engineering: Toward RSE Research. Commun. ACM 2025, 68, 20–23. [Google Scholar] [CrossRef]
- Hasselbring, W.; Jung, R.; Schnoor, H. Software Architecture Evaluation of Earth System Models. J. Softw. Eng. Appl. 2025, 18, 113–138. [Google Scholar] [CrossRef]
- Weaver, A.J.; Eby, M.; Wiebe, E.C.; Bitz, C.M.; Duffy, P.B.; Ewen, T.L.; Fanning, A.F.; Holland, M.M.; MacFadyen, A.; Matthews, H.D.; et al. The UVic earth system climate model: Model description, climatology, and applications to past, present and future climates. Atmosphere-Ocean 2001, 39, 361–428. [Google Scholar] [CrossRef]
- Artale, V.; Calmanti, S.; Carillo, A.; Dell’Aquila, A.; Herrmann, M.; Pisacane, G.; Ruti, P.M.; Sannino, G.; Struglia, M.V.; Giorgi, F.; et al. An atmosphere-ocean regional climate model for the Mediterranean area: Assessment of a present climate simulation. Clim. Dyn. 2010, 35, 721–740. [Google Scholar] [CrossRef]
- Hasselbring, W.; Druskat, S.; Bernoth, J.; Betker, P.; Felderer, M.; Ferenz, S.; Hermann, B.; Lamprecht, A.L.; Linxweiler, J.; Prat, A.; et al. Multi-Dimensional Research Software Categorization. Comput. Sci. Eng. 2025, 27, 59–68. [Google Scholar] [CrossRef]
- Johanson, A.; Hasselbring, W. Software Engineering for Computational Science: Past, Present, Future. Comput. Sci. Eng. 2018, 20, 90–109. [Google Scholar] [CrossRef]
- Jung, R.; Gundlach, S.; Hasselbring, W. Software Development Processes in Ocean System Modeling. Int. J. Model. Simul. Sci. Comput. 2022, 13, 2230002. [Google Scholar] [CrossRef]
- Bréviére, E.H.; Bakker, D.C.; Bange, H.W.; Bates, T.S.; Bell, T.G.; Boyd, P.W.; Duce, R.A.; Garcon, V.; Johnson, M.T.; Law, C.S.; et al. Surface ocean-lower atmosphere study: Scientific synthesis and contribution to Earth system science. Anthropocene 2015, 12, 54–68. [Google Scholar] [CrossRef]
- Pahlow, M.; Chien, C.T.; Arteaga, L.A.; Oschlies, A. Optimality-based non-Redfield plankton–ecosystem model (OPEM v1. 1) in UVic-ESCM 2.9–Part 1: Implementation and model behaviour. Geosci. Model Dev. 2020, 13, 4663–4690. [Google Scholar] [CrossRef]
- Chien, C.T.; Pahlow, M.; Schartau, M.; Oschlies, A. Optimality-based non-Redfield plankton–ecosystem model (OPEM v1. 1) in UVic-ESCM 2.9–Part 2: Sensitivity analysis and model calibration. Geosci. Model Dev. 2020, 13, 4691–4712. [Google Scholar] [CrossRef]
- Mengis, N.; Keller, D.P.; MacDougall, A.H.; Eby, M.; Wright, N.; Meissner, K.J.; Oschlies, A.; Schmittner, A.; MacIsaac, A.J.; Matthews, H.D.; et al. Evaluation of the University of Victoria Earth System Climate Model version 2.10 (UVic ESCM 2.10). Geosci. Model Dev. 2020, 13, 4183–4204. [Google Scholar] [CrossRef]
- Stocker, T.F.; Qin, D.; Plattner, G.; Tignor, M.M.B.; Allen, S.K.; Boschung, J.; Nauels, A.; Xia, Y.; Bex, V.; Midgley, P.M. Climate Change 2013—The Physical Science Basis: Working Group I Contribution to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar] [CrossRef]
- Hasselbring, W. Software Architecture: Past, Present, Future. In The Essence of Software Engineering; Springer: Cham, Switzerland, 2018; pp. 169–184. [Google Scholar] [CrossRef]
- Reussner, R.; Goedicke, M.; Hasselbring, W.; Vogel-Heuser, B.; Keim, J.; Märtin, L. Managed Software Evolution; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
- Verdecchia, R.; Kruchten, P.; Lago, P. Architectural Technical Debt: A Grounded Theory. In Software Architecture; Springer: Cham, Switzerland, 2020; pp. 202–219. [Google Scholar] [CrossRef]
- Druskat, S.; Eisty, N.U.; Chisholm, R.; Chue Hong, N.; Cocking, R.C.; Cohen, M.B.; Felderer, M.; Grunske, L.; Harris, S.A.; Hasselbring, W.; et al. Better Architecture, Better Software, Better Research. Comput. Sci. Eng. 2025, 27, 45–57. [Google Scholar] [CrossRef]
- Schnoor, H.; Hasselbring, W. Comparing Static and Dynamic Weighted Software Coupling Metrics. Computers 2020, 9, 24. [Google Scholar] [CrossRef]
- Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
- Jung, R.; Schnoor, H.; Hasselbring, W. Replication Package for: Software Architecture Evaluation of Earth System Models (Restructuring Part); Software Engineering Group-Kiel University: Kiel, Germany, 2025. [Google Scholar] [CrossRef]
- Praditwong, K.; Harman, M.; Yao, X. Software Module Clustering as a Multi-Objective Search Problem. IEEE Trans. Softw. Eng. 2011, 37, 264–282. [Google Scholar] [CrossRef]
- Mkaouer, W.; Kessentini, M.; Shaout, A.; Koligheu, P.; Bechikh, S.; Deb, K.; Ouni, A. Many-objective software remodularization using NSGA-III. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2015, 24, 1–45. [Google Scholar] [CrossRef]
- Sudhakar, G.; Nithiyanandam, S. DOOSRA—Distributed Object-Oriented Software Restructuring Approach using DIM-K-means and MAD-based ENRNN classifier. IET Softw. 2023, 17, 23–36. [Google Scholar] [CrossRef]
- Maikantis, T.; Tsintzira, A.A.; Ampatzoglou, A.; Arvanitou, E.M.; Chatzigeorgiou, A.; Stamelos, I.; Bibi, S.; Deligiannis, I. Software architecture reconstruction via a genetic algorithm: Applying the move class refactoring. In Proceedings of the 24th Pan-Hellenic Conference on Informatics, Athens, Greece, 20–22 November 2020; pp. 135–139. [Google Scholar] [CrossRef]
- Cortellessa, V.; Diaz-Pace, J.A.; Di Pompeo, D.; Frank, S.; Jamshidi, P.; Tucci, M.; van Hoorn, A. Introducing Interactions in Multi-Objective Optimization of Software Architectures. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–39. [Google Scholar] [CrossRef]
- Pourasghar, B.; Izadkhah, H.; Akhtari, M. Beyond Cohesion and Coupling: Integrating Control Flow in Software Modularization Process for Better Code Comprehensibility. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–29. [Google Scholar] [CrossRef]
- Levenshtein, V.I. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 1966, 10, 707–710. [Google Scholar]
- Ankerst, M.; Breunig, M.M.; Kriegel, H.P.; Sander, J. OPTICS: Ordering points to identify the clustering structure. SIGMOD Rec. 1999, 28, 49–60. [Google Scholar] [CrossRef]
- Navarro, G. A guided tour to approximate string matching. ACM Comput. Surv. 2001, 33, 31–88. [Google Scholar] [CrossRef]
- Mancoridis, S.; Mitchell, B.; Chen, Y.; Gansner, E. Bunch: A clustering tool for the recovery and maintenance of software system structures. In Proceedings of the IEEE International Conference on Software Maintenance—1999 (ICSM’99), ‘Software Maintenance for Business Change’ (Cat. No.99CB36360), Oxford, UK, 30 August–3 September 1999; pp. 50–59. [Google Scholar] [CrossRef]































Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Schnoor, H.; Hasselbring, W.; Jung, R. Using Genetic Algorithms for Research Software Structure Optimization. Software 2025, 4, 26. https://doi.org/10.3390/software4040026
Schnoor H, Hasselbring W, Jung R. Using Genetic Algorithms for Research Software Structure Optimization. Software. 2025; 4(4):26. https://doi.org/10.3390/software4040026
Chicago/Turabian StyleSchnoor, Henning, Wilhelm Hasselbring, and Reiner Jung. 2025. "Using Genetic Algorithms for Research Software Structure Optimization" Software 4, no. 4: 26. https://doi.org/10.3390/software4040026
APA StyleSchnoor, H., Hasselbring, W., & Jung, R. (2025). Using Genetic Algorithms for Research Software Structure Optimization. Software, 4(4), 26. https://doi.org/10.3390/software4040026

