Prompt’s Evolution for Language Model-Driven Data Generation
Abstract
1. Introduction
2. Related Work
2.1. Automated Prompt Optimization
Gradient-Based Prompt Optimization
2.2. Metaheuristic-Based Prompt Optimization
3. Problem
4. Evolutionary Model
4.1. Large Language Model (LLM)
4.2. Agents
4.3. Genetic Algorithm
4.3.1. Initial Population
- represents the role component, defining the persona or perspective the LLM should adopt during text generation (e.g., “As a medical professional…”, “In the style of a journalist…”, “From an academic viewpoint…”). This component aims to influence the tone, vocabulary, and overall framing of the generated text, aligning it with specific stylistic requirements of the reference dataset. The set of possible roles, , is predefined based on an analysis of the reference text characteristics or domain expert input.
- represents the topic component (or keywords), specifying the central subject matter or theme for the text to be generated. This component directly anchors the LLM’s output to the content domains present in the reference dataset. The set of possible topics, , can be dynamically derived from a reference text.
- represents the action component, this represent what the specific role selected is doing. These actions dictate what is the speaker trying to communicate. The set of possible actions, . Actions are also derived from the reference text.
- Reference Text Selection: A text snippet, , is randomly selected from the reference dataset, . This ensures that the initial prompts are directly grounded in the actual data characteristics we aim to replicate.
- Component Instantiation: For the selected text snippet :
- A Role component () is randomly selected from the predefined set .
- A Topic component () is extracted or inferred from . This could involve methods such as named entity recognition to identify key entities, topic modeling to identify dominant themes, or simple keyword extraction from . In our model, we provide the topic of reference text. The topic extraction from the reference text is supported by the LLM through an agent. The provided topic is then integrated into the prompt structure.
- An Action component () is extracted from the reference text. The action extraction from the reference text is supported by the LLM through an agent.
- Prompt Refinement: The instantiated components () are then fed into the LLM as an initial seed. The LLM is then prompted to refine this structured input into a coherent, syntactically correct, and semantically richer prompt that incorporates the essence of the selected reference text . This step leverages the LLM’s natural language understanding and generation capabilities to produce a human-readable and effective prompt from the structured components. For instance, if is a news article about climate change, and the components are (As a scientist, Climate change impacts, Explain the implications), the LLM might generate a prompt like: “As a climate scientist, thoroughly explain the long-term implications of climate change as observed in recent research.” This LLM-guided refinement ensures that the initial prompts are not merely concatenated keywords but semantically fluid and actionable instructions for subsequent text generation.
4.3.2. Evolution
Fitness Calculation
- Text Generation: Each prompt is provided as input to the LLM, which then generates a corresponding text output, denoted as .
- Similarity Measurement: The generated text is then compared against a randomly selected text snippet from the reference dataset . The similarity is quantified using the BERTScore metric [32]. BERTScore leverages contextual embeddings from pre-trained BERT models to compute a robust similarity score between two texts, capturing both semantic and syntactic alignment. The fitness function, , for a given prompt is thus defined asThis allows us to control the semantic coherence of the generated samples throughout the evolutionary process. While coherence is critical, it is equally important that the generated dataset maintains a certain level of diversity.This process is repeated for each individual in the current population to obtain their respective fitness values.
Genetic Operators
- Crossover: This operator combines genetic material from two parent prompts to create offspring. Two parent individuals, and , are selected using a tournament selection. A crossover point is then randomly determined, and components are interchanged between the parents to produce two offspring. For instance, a single-point crossover could result in offspring like:More complex crossover mechanisms, such as two-point crossover or uniform crossover, can also be applied to exchange multiple components or specific tokens within components. The key is that the exchange occurs at predefined component boundaries or within semantically coherent token sets to preserve the prompt’s overall structure and meaning. Crossover is performed on selected parents individuals, subject to a crossover probability, .
- Mutation: This operator introduces random variations into a single individual to explore new areas of the search space and prevent premature convergence. A single individual is selected using a tournament selection. A random component (e.g., , , or ) and a specific token within a component is chosen for mutation. We implement an LLM-driven semantic mutation operator to introduce novel variations while preserving contextual plausibility. When an individual is selected for mutation, a single token is randomly sampled. This token, along with its surrounding contextual phrase, is transferred to a dedicated Mutation Agent. This agent then formulates a query to the LLM, tasking it to generate a list of semantically similar tokens that are valid and coherent within that specific context. A token from the LLM’s response is then selected to replace the original token, completing the mutation. This approach ensures that mutations constitute intelligent, context-aware explorations of the solution space rather than simple stochastic noise, thereby preventing the introduction of non-viable or nonsensical data artifacts. Mutation is performed on the selected individual, subject to a mutation probability, .
4.4. Replacement/Update
5. Experiments
5.1. Number of Generations
5.2. Population Size
5.3. Elitism
5.4. Tournament Size
5.5. Crossover and Mutation
5.6. Optimal Configuration
- Reference Text 1: “We’re rationing food. Only one store open and it’s chaos”
- Range [0.9–0.8]:
- –
- Fitness: 0.8696Generated Data (): “We’re rationing PCR tests. Only one testing center open and it’s chaos”
- –
- Fitness: 0.8537Generated Data (): “As leaders, we’re rationing masks. Only one distribution center open and it’s chaos”
- Range (0.8–0.7]:
- –
- Fitness: 0.7952 Generated Data (): “We’re prioritizing medical masks. Only one production facility open and it’s a struggle”
- –
- Fitness: 0.7864Generated Data (): “We’re prioritizing resources. Only three hospitals open and it’s a struggle”
- Range (0.7–0.6]:
- –
- Fitness: 0.6995Generated Data (): “We’re adapting our emergency protocols. Only one distribution center open and it’s chaos, but we’ll ensure a fair share for all”
- –
- Fitness: 0.6806Generated Data (): “We’re struggling to get by. Food is scarce, only one market open, and tensions are running high”
- Reference Text 2: “We’re organizing a cleanup drive in the park at 9 AM”
- Range [0.9–0.8]:
- –
- Fitness: 0.8672Generated Data (): “We’re organizing a vaccination campaign in the park at 9 AM.”
- –
- Fitness: 0.8432Generated Data (): “We’re organizing a mandatory cleaning and inoculation event at 9 AM, ensuring a thorough cleanup drive in the park outside”
- Range (0.8–0.7]:
- –
- Fitness: 0.7986Generated Data (): “As a responsible organization, we’re organizing a morning cleanup drive in the park at 9 AM, ensuring public safety during our neighborhood lockdown efforts”
- –
- Fitness: 0.72048Generated Data (): “As part of our efforts to boost community resilience during lockdowns, we’re organizing a morning cleanup drive in the park at 9 AM to promote unity among residents and maintain public safety”
- Range (0.7–0.6]:
- –
- Fitness: 0.6980Generated Data (): “Join us as we come together to boost resilience in our residential district during the shutdown. We’re organizing a community park cleanup drive at 9 AM to promote public safety and community pride”
- –
- Fitness: 0.6674Generated Data (): “As a responsible organization, we’re mobilizing a community cleanup drive in our residential district’s central park at 9 AM to promote public safety, foster neighborhood connections, and enhance overall resilience amid the pandemic”
- Reference Text 3: “Please conserve water as our treatment facilities are still offline”
- Range [0.9–0.8]:
- –
- Fitness: 0.8084Generated Data (): “Please follow water conservation measures as our treatment facilities remain offline until further notice”
- Range (0.8–0.7]:
- –
- Fitness: 0.7636Generated Data (): “Please follow our visual guidelines and conserve water as our treatment facilities are still offline, ensuring a safe journey during this COVID-19 incident.”
- –
- Fitness: 0.7484Generated Data (): “Please prioritize water conservation as our facilities are still offline, vital updates on essential services will be shared regularly”
- Range (0.7–0.6]:
- –
- Fitness: 0.6996Generated Data (): “Conserve water during this critical quarantine period as our treatment facilities remain offline. Essential protocols for tracing and communication are in place”
- –
- Fitness: 0.6599Generated Data (): “As a responsible organization, we urge you to conserve water amid the ongoing pandemic crisis and lockdown. Our treatment facilities are still offline, and every drop counts in maintaining essential response efforts”
6. Discussion
6.1. Parameter Selection and Influence on Convergence
6.2. Quantitative Analysis: Diversity and Semantics
- Reference text 1: “We’re rationing food. Only one store open and it’s chaos”Table 2 presents the diversity and similarity results among reference text 1 and the generated ones. Here we can observe, based on both metrics, that generated text can preserve semantics while creating diverse texts.For example, text retains the “rationing” and “chaos” concepts and the core structure, but adds a new introductory phrase, “As leaders” which slightly lowers the similarity compared to .
- Reference text 2: “We’re organizing a cleanup drive in the park at 9 AM”Table 3 presents the diversity and similarity results among reference text 2 and the generated ones. Here we can observe a more stable behavior in terms of both metric: semantics are better preserved compared to the initial case of reference text 1, however diversity is limited for and . For this scenario, presents a better balance between the metrics. preserves the core action “cleanup drive” but frames it within a new “public safety/lockdown” context. The meaning of the action is kept, but the surrounding purpose is shifted.
- Reference text 3: “Please conserve water as our treatment facilities are still offline”Table 4 presents the diversity and similarity results among reference text 3 and the generated ones. Here we can observe a similar behavior compared to reference text 1 where and present a good balance between the metrics. exhibits the best semantic preservation. It retains the precise cause-and-effect relationship established in the reference text: “conserve water as our treatment facilities are still offline” and adds context introducing visual guidelines and COVID-19 without disrupt the primary instruction.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- García, S.; Ramírez-Gallego, S.; Luengo, J.; Benítez, J.M.; Herrera, F. Big data preprocessing: Methods and prospects. Big Data Anal. 2016, 1, 9. [Google Scholar] [CrossRef]
- Russo Russo, G.; Cardellini, V.; Lo Presti, F. Hierarchical auto-scaling policies for data stream processing on heterogeneous resources. ACM Trans. Auton. Adapt. Syst. 2023, 18, 1–44. [Google Scholar] [CrossRef]
- Hidalgo, N.; Wladdimiro, D.; Rosas, E. Self-adaptive processing graph with operator fission for elastic stream processing. J. Syst. Softw. 2017, 127, 205–216. [Google Scholar] [CrossRef]
- Russo, G.R.; D’Alessandro, E.; Cardellini, V.; Presti, F.L. Towards a Multi-Armed Bandit Approach for Adaptive Load Balancing in Function-as-a-Service Systems. In Proceedings of the 2024 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Aarhus, Denmark, 16–20 September 2024; IEEE: New York, NY, USA, 2024; pp. 103–108. [Google Scholar]
- Wladdimiro, D.; Arantes, L.; Sens, P.; Hidalgo, N. PA-SPS: A predictive adaptive approach for an elastic stream processing system. J. Parallel Distrib. Comput. 2024, 192, 104940. [Google Scholar] [CrossRef]
- Haidar, M.A.; Rezagholizadeh, M. Textkd-gan: Text generation using knowledge distillation and generative adversarial networks. In Proceedings of the Advances in Artificial Intelligence: 32nd Canadian Conference on Artificial Intelligence, Canadian AI 2019, Kingston, ON, Canada, 28–31 May 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 107–118. [Google Scholar]
- Zhang, Y.; Gan, Z.; Fan, K.; Chen, Z.; Henao, R.; Shen, D.; Carin, L. Adversarial feature matching for text generation. In Proceedings of the 34th International Conference on Machine Learning PMLR, Sydney, Australia, 6–11 August 2017; pp. 4006–4015. [Google Scholar]
- Nie, W.; Narodytska, N.; Patel, A. Relgan: Relational generative adversarial networks for text generation. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Yu, L.; Zhang, W.; Wang, J.; Yu, Y. Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Xu, J.; Ren, X.; Lin, J.; Sun, X. Dp-gan: Diversity-promoting generative adversarial network for generating informative and diversified text. arXiv 2018, arXiv:1802.01345. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 2234–2242. [Google Scholar]
- Zar, M.H.F.; Akhtar, N.; Ahmad, M.; Naeem, U.; Shafique, M.; Kim, J. A review on metaheuristic-based synthetic data generation methods. IEEE Access 2022, 10, 47514–47533. [Google Scholar]
- Eiben, A.E.; Hinterding, R.; Michalewicz, Z. Parameter setting in evolutionary algorithms. IEEE Trans. Evol. Comput. 1999, 3, 124–141. [Google Scholar] [CrossRef]
- Shin, T.; Razeghi, Y.; Logan, R.L., IV; Wallace, E.; Singh, S. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 4222–4235. [Google Scholar]
- Li, X.L.; Liang, P. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; Volume 1, pp. 4582–4597. [Google Scholar]
- Pryzant, R.; Iter, D.; Li, J.; Lee, Y.T.; Zhu, C.; Zeng, M. Automatic prompt optimization with “gradient descent” and beam search. arXiv 2023, arXiv:2305.03495. [Google Scholar] [CrossRef]
- Tong, Z.; Ding, Z.; Wei, W. EvoPrompt: Evolving Prompts for Enhanced Zero-Shot Named Entity Recognition with Large Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates, 19–24 January 2025; pp. 5136–5153. [Google Scholar]
- Saletta, M.; Ferretti, C. Exploring the prompt space of large language models through evolutionary sampling. In Proceedings of the Genetic and Evolutionary Computation Conference, Melbourne, VIC, Australia, 14–18 July 2024; pp. 1345–1353. [Google Scholar]
- Tran, K.D.; Bui, D.V.; Luong, N.H. Evolving Prompts for Synthetic Image Generation with Genetic Algorithm. In Proceedings of the 2023 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), Quy Nhon, Vietnam, 5–6 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Wong, M.; Ong, Y.S.; Gupta, A.; Bali, K.K.; Chen, C. Prompt Evolution for Generative AI: A Classifier-Guided Approach. In Proceedings of the 2023 IEEE Conference on Artificial Intelligence (CAI), Santa Clara, CA, USA, 5–6 June 2023; pp. 226–229. [Google Scholar] [CrossRef]
- Pan, H.; Lu, H.; Gao, T.; Shen, M.; Zhang, W.; Lin, Z.; Cao, L.; Xiao, J.; Liu, Z.; Wen, M.; et al. Plum: Prompt Learning using Metaheuristics. arXiv 2023, arXiv:2311.08585. [Google Scholar]
- Zhang, J.; Huang, Z.; Hu, M.; Deng, Y.; Cai, S. EvoPrompting: Language Model Based Prompt Tuning for Few-Shot Learning. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2024; pp. 301–313. [Google Scholar]
- Chen, A.; Dohan, D.; So, D. Evoprompting: Language models for code-level neural architecture search. Adv. Neural Inf. Process. Syst. 2023, 36, 7787–7817. [Google Scholar]
- Sécheresse, X.; Guilbert-Ly, J.Y.; de Torcy, A.V. GAAPO: Genetic Algorithmic Applied to Prompt Optimization. arXiv 2025, arXiv:2504.07157. [Google Scholar] [CrossRef] [PubMed]
- Fernando, C.; Banarse, D.; Michalewski, H.; Osindero, S.; Rocktäschel, T. Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv 2023, arXiv:2309.16797. [Google Scholar]
- Gao, Z.; Gholami, A.; Gu, B.; Mao, Y.; Ma, X.; Zou, T.; Yu, R.; Lu, X.; Chen, P.; Shi, H.; et al. Evolutionary Computation and Large Language Models: A Survey of Methods, Synergies, and Applications. arXiv 2025, arXiv:2407.03073. [Google Scholar] [CrossRef]
- Wladdimiro, D.; Arantes, L.; Sens, P.; Hidalgo, N. A multi-metric adaptive stream processing system. In Proceedings of the 2021 IEEE 20th International Symposium on Network Computing and Applications (NCA), Boston, MA, USA, 23–26 November 2021; IEEE: New York, NY, USA, 2021; pp. 1–8. [Google Scholar]
- Wladdimiro, D.; Pagliari, A.; Brum, R.C. Toward Stream Processing Efficiency Leveraging Cloud Burstable Instances. In Proceedings of the 2025 IEEE International Conference on Cloud Engineering (IC2E), Rennes, France, 23–26 September 2025; IEEE: New York, NY, USA, 2025; pp. 217–224. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Ollama. Ollama: Run LLMs Locally. 2024. Available online: https://ollama.com/ (accessed on 1 June 2025).
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Shukla, A.; Pandey, H.M.; Mehrotra, D. Comparative review of selection techniques in genetic algorithm. In Proceedings of the 2015 International Conference on Futuristic Trends on Computational ANALYSIS and Knowledge Management (ABLAZE), Greater Noida, India, 25–27 February 2015; IEEE: New York, NY, USA, 2015; pp. 515–519. [Google Scholar]
- Razali, N.M.; Geraghty, J. Genetic algorithm performance with different selection strategies in solving TSP. In Proceedings of the World Congress on Engineering, Hong Kong, China, 6–8 July 2011; International Association of Engineers: Hong Kong, China, 2011; Volume 2, pp. 1–6. [Google Scholar]
- Lamsal, R. Coronavirus (COVID-19) Tweets Dataset, IEEE Dataport. 2020. Available online: https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset (accessed on 1 June 2025).
- Niwattanakul, S.; Singthongchai, J.; Naenudorn, E.; Wanapu, S. Using of Jaccard coefficient for keywords similarity. In Proceedings of the International Multiconference of Engineers and Computer Scientists, Hong Kong, 13–15 March 2013; Volume 1, pp. 380–384. [Google Scholar]
- Tata, S.; Patel, J.M. Estimating the selectivity of tf-idf based cosine similarity predicates. ACM Sigmod Rec. 2007, 36, 7–12. [Google Scholar] [CrossRef]








| Number of Generations | Total Exec. Time |
|---|---|
| 50 | 3.04.21 (hh.min.sec) |
| 100 | 5.28.06 (hh.min.sec) |
| 150 | 08.42.06 (hh.min.sec) |
| Generated Text | Jaccard Similarity | TF-IDF Cosine Similarity |
|---|---|---|
| 0.6250 | 0.4923 | |
| 0.5882 | 0.4774 | |
| 0.4211 | 0.2867 | |
| 0.3889 | 0.2626 | |
| 0.3462 | 0.2668 | |
| 0.3043 | 0.2467 |
| Generated Text | Jaccard Similarity | TF-IDF Cosine Similarity |
|---|---|---|
| 0.7143 | 0.6162 | |
| 0.6000 | 0.5159 | |
| 0.5000 | 0.5136 | |
| 0.3750 | 0.3618 | |
| 0.3871 | 0.3787 | |
| 0.3143 | 0.3378 |
| Generated Text | Jaccard Similarity | TF-IDF Cosine Similarity |
|---|---|---|
| 0.4118 | 0.3618 | |
| 0.4348 | 0.4557 | |
| 0.3810 | 0.3502 | |
| 0.3333 | 0.3163 | |
| 0.2812 | 0.2824 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hidalgo, N.; Saez, P.; Meneses, N.; Reyes, V.; Rosas, E. Prompt’s Evolution for Language Model-Driven Data Generation. Appl. Sci. 2025, 15, 12911. https://doi.org/10.3390/app152412911
Hidalgo N, Saez P, Meneses N, Reyes V, Rosas E. Prompt’s Evolution for Language Model-Driven Data Generation. Applied Sciences. 2025; 15(24):12911. https://doi.org/10.3390/app152412911
Chicago/Turabian StyleHidalgo, Nicolás, Pablo Saez, Nicolas Meneses, Víctor Reyes, and Erika Rosas. 2025. "Prompt’s Evolution for Language Model-Driven Data Generation" Applied Sciences 15, no. 24: 12911. https://doi.org/10.3390/app152412911
APA StyleHidalgo, N., Saez, P., Meneses, N., Reyes, V., & Rosas, E. (2025). Prompt’s Evolution for Language Model-Driven Data Generation. Applied Sciences, 15(24), 12911. https://doi.org/10.3390/app152412911

