Agent-Based Models of Sexual Selection in Bird Vocalizations Using Generative Approaches
Abstract
1. Introduction
2. The Dataset and Generative Audio Models
2.1. The Field Recording and the Dataset
2.2. Variational Autoencoder Training
2.3. Text-to-Audio Model Fine-Tuning
3. Evolutionary Models
3.1. The Evolutionary Model Based on the VAE
3.2. The Evolutionary Model Based on the Text-to-Audio Model
The bird song of a Blue-and-white Flycatcher, which is {word1}, {word2}, and {word3}.
Please partially modify the following three words to describe a bird song. The output must consist of exactly three words formatted as: word1, word2, word3. Do not include explanations, introductions, or additional symbols. The original description is: {text gene}.
Generate three words to describe a birdsong. These words can be positive, negative, or neutral, and should represent different aspects of the sound. Use the format: word1, word2, word3. Please make sure that you do not add explanations or introductions. For example: sweet, lilting, vibrant.
4. Quantitative Analysis
4.1. Diversity
4.2. Affinity
4.3. BirdNET Classification as a Virtual Bird Song Expert
5. Results
5.1. An Evolutionary Experiment Based on the VAE Model
5.2. The Evolutionary Experiment Based on the Text-to-Audio Model
5.3. Species-Specific Acoustic Characteristics in Evolved Songs
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| VAE | Variational Autoencoder |
| TTA | Text-To-Audio |
| LLM | Large Language Model |
| ACI | Acoustic Complexity Index |
| UMAP | Uniform Manifold Approximation and Projection |
| KDE | Kernel Density Estimation |
| BAWF | Blue-and-white Flycatcher |
References
- Bianchi, F.; Squazzoni, F. Agent-based models in sociology. WIREs Comput. Stat. 2015, 7, 284–306. [Google Scholar]
- Fulker, Z.; Forber, P.; Smead, R.; Riedl, C. Spontaneous emergence of groups and signaling diversity in dynamic networks. arXiv 2024, arXiv:2210.17309. [Google Scholar] [CrossRef]
- Pérez, L.; Dragićević, S.; White, R. Model testing and assessment: Perspectives from a swarm intelligence, agent-based model of forest insect infestations. Comput. Environ. Urban Syst. 2013, 39, 121–135. [Google Scholar] [CrossRef]
- Liang, T.; Brinkman, B.A.W. Evolution of innate behavioral strategies through competitive population dynamics. PLoS Comput. Biol. 2022, 18, 1–38. [Google Scholar] [CrossRef]
- Stowell, D. Computational bioacoustics with deep learning: A review and roadmap. PeerJ 2022, 10, e13152. [Google Scholar] [CrossRef]
- Kingma, D.P. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Singh, A.; Ogunfunmi, T. An overview of variational autoencoders for source separation, finance, and bio-signal applications. Entropy 2021, 24, 55. [Google Scholar] [CrossRef]
- Guei, A.C.; Christin, S.; Lecomte, N.; Hervet, É. ECOGEN: Bird sounds generation using deep learning. Methods Ecol. Evol. 2024, 15, 69–79. [Google Scholar] [CrossRef]
- Beguš, G.; Gero, S. Approaching an unknown communication system by latent space exploration and causal inference. arXiv 2023, arXiv:2303.10931. [Google Scholar]
- Kreuk, F.; Synnaeve, G.; Polyak, A.; Singer, U.; Défossez, A.; Copet, J.; Parikh, D.; Taigman, Y.; Adi, Y. Audiogen: Textually guided audio generation. arXiv 2022, arXiv:2209.15352. [Google Scholar]
- Evans, Z.; Parker, J.D.; Carr, C.; Zukowski, Z.; Taylor, J.; Pons, J. Stable Audio Open. arXiv 2024, arXiv:2407.14358. [Google Scholar] [CrossRef]
- Wu, X.; Wu, S.-H.; Wu, J.; Feng, L.; Tan, K.C. Evolutionary computation in the era of large language model: Survey and roadmap. IEEE Trans. Evol. Comput. 2024, 29, 534–554. [Google Scholar] [CrossRef]
- Suzuki, R.; Arita, T. An evolutionary model of personality traits related to cooperative behavior using a large language model. Sci. Rep. 2024, 14, 5989. [Google Scholar] [CrossRef] [PubMed]
- Fernando, C.; Banarse, D.; Michalewski, H.; Osindero, S.; Rocktäschel, T. Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv 2023, arXiv:2309.16797. [Google Scholar]
- Suzuki, R.; Sumitani, S.; Ikeda, C.; Arita, T. A Modeling and Experimental Framework for Understanding Evolutionary and Ecological Roles of Acoustic Behavior Using a Generative Model. In Proceedings of the ALIFE 2022 Conference, Trento, Italy, 18–22 July 2022; p. 58. [Google Scholar]
- Suzuki, R.; Harlow, Z.; Nakadai, K.; Arita, T. Toward integrating evolutionary models and field experiments on avian vocalization using trait representations based on generative models. In Proceedings of the 4th International Workshop on Vocal Interactivity In-and-Between Humans, Animals and Robots, Kos, Greece, 6 September 2024; pp. 69–73. [Google Scholar]
- Higashi, M.; Takimoto, G.; Yamamura, N. Sympatric speciation by sexual selection. Nature 1999, 402, 523–526. [Google Scholar] [CrossRef]
- Ghaffarzadegan, N.; Majumdar, A.; Williams, R.; Hosseinichimeh, N. Generative agent-based modeling: An introduction and tutorial. Syst. Dyn. Rev. 2024, 40, e1761. [Google Scholar] [CrossRef]
- Herzing, D.; Starner, T.; Google DeepMind Team. DolphinGemma: How Google AI Is Helping Decode Dolphin Communication. Google AI Blog. 14 April 2025. Available online: https://blog.google/technology/ai/dolphingemma/ (accessed on 1 September 2025).
- Robinson, D.; Hagiwara, M.; Hoffman, B.; Cusimano, M. NatureLM-audio: An Audio-Language Foundation Model for Bioacoustics. arXiv 2024, arXiv:2411.07186. [Google Scholar]
- Suzuki, R.; Matsubayashi, S.; Nakadai, K.; Okuno, H.G. HARKBird: Exploring acoustic interactions in bird communities using a microphone array. J. Robot. Mechatron. 2017, 27, 213–223. [Google Scholar] [CrossRef]
- Team, G.; Kamath, A.; Ferret, J.; Pathak, S.; Vieillard, N.; Merhej, R.; Perrin, S.; Matejovicova, T.; Ramé, A.; Rivière, M.; et al. Gemma 3 technical report. arXiv 2025, arXiv:2503.19786. [Google Scholar] [CrossRef]
- Sainburg, T.; Thielk, M.; Gentner, T.Q. Latent space visualization, characterization, and generation of diverse vocal communication signals. bioRxiv 2020. [Google Scholar] [CrossRef]
- Kahl, S.; Wood, C.M.; Eibl, M.; Klinck, H. BirdNET: A deep learning solution for avian diversity monitoring. Ecol. Inform. 2021, 61, 101236. [Google Scholar] [CrossRef]
- Marler, P. A comparative approach to vocal learning: Song development in white-crowned sparrows. J. Comp. Physiol. Psychol. 1970, 71, 1–25. [Google Scholar] [CrossRef]
- Wheatcroft, D.; Qvarnström, A. Genetic divergence of early song discrimination between two young songbird species. Nat. Ecol. Evol. 2017, 1, 0192. [Google Scholar] [CrossRef]
- Stoddard, P.K.; Beecher, M.D.; Horning, C.L.; Campbell, S.E. Recognition of individual neighbors by song in the Song Sparrow, a species with song repertoires. Behav. Ecol. Sociobiol. 1991, 29, 211–215. [Google Scholar] [CrossRef]
- Searcy, W.A.; Nowicki, S. The Evolution of Animal Communication: Reliability and Deception in Signaling Systems; Princeton University Press: Princeton, NJ, USA, 2005. [Google Scholar]
- Eshelman, L.J.; Schaffer, J.D. Real-coded genetic algorithms and interval-schemata. In Foundations of Genetic Algorithms; Whitley, D., Ed.; Elsevier: Amsterdam, The Netherlands, 1993; Volume 2, pp. 187–202. [Google Scholar]
- Fisher, R.A. The Genetical Theory of Natural Selection; Clarendon Press: Oxford, UK, 1930. [Google Scholar]
- Pieretti, N.; Farina, A.; Morri, D. A new methodology to infer the singing activity of an avian community: The Acoustic Complexity Index (ACI). Ecol. Indic. 2011, 11, 868–873. [Google Scholar] [CrossRef]
- McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
- Baevski, A.; Zhou, Y.; Mohamed, A.; Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 12449–12460. [Google Scholar]
- Chan, K.-H.; Im, S.-K. Sentiment Analysis by Using Naïve-Bayes Classifier with Stacked CARU. Electron. Lett. 2022, 58, 411–413. [Google Scholar] [CrossRef]
- Technical College. Detailed Bird Vocalization Guide. J-Eco Bird Song Encyclopedia. 2025. Available online: https://www.caretech.ac.jp/topic/bird/birdsong2.html (accessed on 27 August 2025). (In Japanese).
- Yamashina Institute for Ornithology. Why Does the Blue Rock Thrush Expand into Inland Areas. Available online: https://www.yamashina.or.jp/hp/yomimono/isohiyodori_mrkawachi.html (accessed on 27 August 2025). (In Japanese).












Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, H.; Arita, T.; Suzuki, R. Agent-Based Models of Sexual Selection in Bird Vocalizations Using Generative Approaches. Appl. Sci. 2025, 15, 10481. https://doi.org/10.3390/app151910481
Zhao H, Arita T, Suzuki R. Agent-Based Models of Sexual Selection in Bird Vocalizations Using Generative Approaches. Applied Sciences. 2025; 15(19):10481. https://doi.org/10.3390/app151910481
Chicago/Turabian StyleZhao, Hao, Takaya Arita, and Reiji Suzuki. 2025. "Agent-Based Models of Sexual Selection in Bird Vocalizations Using Generative Approaches" Applied Sciences 15, no. 19: 10481. https://doi.org/10.3390/app151910481
APA StyleZhao, H., Arita, T., & Suzuki, R. (2025). Agent-Based Models of Sexual Selection in Bird Vocalizations Using Generative Approaches. Applied Sciences, 15(19), 10481. https://doi.org/10.3390/app151910481

