Review and Mapping of Search-Based Approaches for Program Synthesis

Saber, Takfarinas; Tao, Ning

doi:10.3390/info16050401

Open AccessReview

Review and Mapping of Search-Based Approaches for Program Synthesis

by

Takfarinas Saber

^1,*,†

and

Ning Tao

^2,†

¹

Lero the Research Ireland Centre for Software, School of Computer Science, University of Galway, H91 TK33 Galway, Ireland

²

School of Computer Science, University College Dublin, D04 C1P1 Dublin, Ireland

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Information 2025, 16(5), 401; https://doi.org/10.3390/info16050401

Submission received: 25 March 2025 / Revised: 1 May 2025 / Accepted: 11 May 2025 / Published: 14 May 2025

(This article belongs to the Section Information Applications)

Download

Browse Figures

Versions Notes

Abstract

Context: Program synthesis tools reduce software development costs by generating programs that perform tasks depicted by some specifications. Various methodologies have emerged for program synthesis, among which search-based algorithms have shown promising results. However, the proliferation of search-based program synthesis tools utilising diverse search algorithms and input types and targeting various programming tasks can overwhelm users seeking the most suitable tool. Objective: This paper contributes to the ongoing discourse by presenting a comprehensive review of search-based approaches employed for program synthesis. We aim to offer an understanding of the guiding principles of current methodologies by mapping them to the required type of user intent, the type of search algorithm, and the representation of the search space. Furthermore, we aim to map the diverse search algorithms to the type of code generation tasks in which they have shown success, which would serve as a guideline for applying search-based approaches for program synthesis. Method: We conducted a literature review of 67 academic papers on search-based program synthesis. Results: Through analysis, we identified and categorised the main techniques with their trends. We have also mapped and shed light on patterns connecting the problem, the representation and the search algorithm type. Conclusions: Our study summarises the field of search-based program synthesis and provides an entry point to the acumen and expertise of the search-based community on program synthesis.

Keywords:

program synthesis; automated programming; search-based algorithm; heuristic; metaheuristic; survey

1. Introduction

In this digital era, programming skills are increasingly vital for many jobs. As only a fraction of the labour force hones such skills, governments and companies are putting in place costly and time-consuming upskilling programs. Automated code generation based on user intent, known as program synthesis, is inching closer to becoming a viable alternative for organisations or individuals that strive to enhance their efficiency with minimum effort and cost.

Extensive research has previously been conducted on program synthesis as a means to facilitate the work of programmers by providing them with a range of tools and techniques to generate programming code based on their high-level intentions. Recent studies have proposed various approaches to tackle program synthesis tasks, including machine learning (ML) techniques, Large Language Models (LLMs), and search-based techniques. These algorithms vary in programming languages and target problems, ranging from simple tasks (e.g., symbolic regression [1,2,3,4,5,6,7], string manipulation [8,9,10,11], and binary transformation [12,13,14]), to more complex challenges (e.g., robot path-finding [6,15], algebraic calculations [14,16,17,18,19,20,21], and intricate real-world programming problems). Particularly, Saha et al. [22] proposed an algorithm to generate an ML pipeline using a corpus and a human-written pipeline. Poliansky et al. [23] utilised genetic programming (GP) and context-oriented behavioural programming (COBP) in the Tic-Tac-Toe game. Beltramelli introduced pix2code [24], which leverages Convolutional Neural Networks (CNNs) to generate web development interface code (HTML/CSS) from screenshots of the graphical user interface. The AlphaCode developer team harnessed large-scale sampling and transformer language models to address previously unsolved competitive programming challenges [25].

Several methods have been proposed for program synthesis, each with its strengths and weaknesses. For instance, ML techniques and LLMs rely on a large corpus of training data. While they have been shown to successfully generate human-like programs, they often struggle to produce correct code due to ambiguous task specifications and complex programming syntax (e.g., generating known buggy code [26]). Moreover, the generated code suffers from a lack of trust due to documented risks (e.g., producing code with known vulnerabilities [27,28]). The potential for LLMs to generate flawed code poses a significant and growing risk to software and its stakeholders. Furthermore, their generative nature associated with the small size of their prompts limits their ability to fix erroneous code with iterative LLM prompting [29,30].

Search-based program synthesis (SBPS) algorithms, on the other hand, require less training data and can be constrained to search for programs that adhere to predefined specifications, such as grammar files [18,19]. However, SBPS approaches often struggle to scale to complex tasks and often produce programs with poor readability and maintainability. To address these limitations, recent work has explored combining LLM-based and SBPS techniques [18,31,32].

In this work, we review the literature that utilises search-based algorithms for program synthesis. After identifying and selecting the relevant works, we classify them along various dimensions, including search algorithm type, types of user intent, and representation type of the search space. Furthermore, we analyse the datasets and the target tasks addressed by each approach—shedding light on their efficiency and guiding adoption by non-domain experts. We also examine the connections between these dimensions to identify patterns. Lastly, we conclude by summarising the observed challenges in the SBPS field.

This study holds timely and significant importance. With the emergence of generative AI in automated code generation and growing efforts to integrate LLMs into search-based techniques, there is a critical need to bridge these communities (e.g., [33,34,35]). Our work provides a crucial entry point to the search-based community’s expertise, fostering collaborative synergy between these complementary approaches.

The rest of the paper is structured as follows: Section 2 provides an overview of the background and work related to our study. Section 3 offers a comprehensive exposition of the methodology and framework employed in conducting our review. Subsequently, in Section 4, we embark on a detailed examination and discussion of the review results, facilitating a comparative analysis of the identified research. In Section 5, we outline the challenges faced by the SBPS field. Finally, in Section 6, we conclude our work with the key takeaways and implications of our study while also delineating directions for prospective future research endeavours.

2. Background and Related Work

In this section, we describe the background of our study in two parts: program synthesis in automated code generation and search-based program synthesis. We also discuss surveys related to program synthesis. Finally, we briefly summarise recent approaches for LLM-based program synthesis.

2.1. Program Synthesis in Automated Code Generation

Automated code generation is a programming methodology that creates computer programs based on user specifications. This approach frequently employs artificial intelligence (AI) techniques to enhance productivity and reduce costs by replacing or assisting human developers in the manual composition of code segments. Its application spans various domains within computer science and software engineering, such as the development of Integrated Development Environments (IDEs), automatic program repair, and program synthesis.

Tools employed in automated code generation can be categorised into three primary groups based on their resulting code: code template generation [36], code repair [37], and program synthesis [38]:

Code template generation automates the creation of code by using predefined structures with customisable placeholders. These templates act as blueprints, where placeholders (such as class names, variables, or parameters) are dynamically replaced with actual values based on user input or configuration files. This method is widely used in frameworks and tools that generate boilerplate code, such as web application scaffolding, API endpoints, or database models. While it speeds up development by eliminating repetitive tasks, it lacks flexibility for highly customised logic, requiring manual adjustments if the templates do not fully match the desired output. Common tools include Jinja2, Yeoman, and IDE-based snippet generators.
Code repair focuses on identifying and fixing errors, inefficiencies, or vulnerabilities in existing code. It relies on static analysis, machine learning, or rule-based systems to detect issues—such as syntax errors, security flaws, or performance bottlenecks—and then suggests or automatically applies corrections. This approach is particularly useful in debugging, refactoring, and maintaining software, as it reduces manual effort while improving code quality. However, automated fixes can sometimes introduce new bugs if the repair logic is flawed or if the system misinterprets the developer’s intent. Tools like GitHub Copilot (https://github.com/features/copilot), SonarQube (https://www.sonarsource.com/sem/products/sonarqube/), and linters with autofix capabilities (e.g., ESLint for JavaScript) employ this technique.
Program synthesis generates executable code from high-level specifications, such as input–output examples, natural language descriptions, or formal constraints. Instead of relying on predefined templates, it uses techniques like symbolic reasoning, search algorithms, or machine learning to derive code that meets the given requirements. This method is powerful for creating complex logic from minimal instructions, making it useful for tasks like algorithm design. The nature of user specifications varies across different techniques, with search-based algorithms (e.g., evolutionary approaches) often leveraging input–output test cases and LLMs generating code based on prompts expressed in natural language. Emerging tools, including Microsoft PROSE and OpenAI Codex, demonstrate their potential, though program synthesis remains less mature than template-based or repair-based approaches.

2.2. Search-Based Program Synthesis

SBPS (a subfield of search-based software engineering) is a software development technique that uses search algorithms and optimisation approaches to address programming tasks in creating and maintaining software systems. Various search techniques can be applied to SBPS, while metaheuristic search algorithms are the most popular. Metaheuristic search algorithms include biological approaches like genetic algorithms and swarm intelligence. It also includes non-biological algorithms such as Local and Tabu Searches. This field focuses on analysing and enhancing vast solution spaces related to software engineering problems. SBPS treats software engineering problems as optimisation challenges that aim to find optimal or near-optimal solutions within the solution space. This approach enables the automation of certain software engineering tasks, enhances the efficiency of problem-solving processes, and provides insights into the trade-offs inherent in complex decision spaces. Applications of SBPS span a wide range of software engineering activities, including test case generation, code optimisation, requirement engineering, software maintenance, and project management.

2.3. Reviews Related to Search-Based Program Synthesis

Given its age and importance, automated code generation has garnered numerous surveys. Batouta et al. [39] performed a tertiary and systematic mapping review of research in automation and code generation. Therefore, we do not aim to discuss all related reviews. Instead, we attempt to highlight those that are the closest to ours.

Sobania et al. [38] surveyed the evolutionary program synthesis approaches from 2015 to 2020 that were specifically evaluated on the PSB1 [40] benchmark dataset, which enabled them to compare the performance of the different algorithms.

Olmo et al. [41] surveyed swarm-based automatic programming studies, i.e., studies at the intersection between program synthesis and the use of swarm intelligence as a search technique.

Bodik and Barbara [42] surveyed algorithmic program synthesis research as an introduction to a journal special issue and analysed the application of these technologies. The survey is highly informative. However, it is relatively high-level as it attempted to canvas the program synthesis field and provide an overview of it. Furthermore, the work is relatively dated. The authors divided their review into reactive synthesis (i.e., concerned with automata-theoretic techniques with an infinite input stream) and functional synthesis (i.e., produces programs consuming finite inputs).

Gulwani et al. [43] surveyed and provided an overview of state-of-the-art approaches to program synthesis. This work also aims to provide a general overview of program synthesis, its applications, common approaches (particularly enumeration search, constraint solving, stochastic search, and deduction-based programming by examples), and general principles of such approaches (e.g., bias, oracle-guided inductive search, and optimisation).

The work by Alur et al. [44] is the closest work to our study. The authors discuss the use of search-based approaches to program synthesis. The authors focused specifically on syntax-guided synthesis (SyGuS) using four applications: synthesis from logical specifications, programming by examples, program transformation, and automatic inference of program invariants. Given the publication setting, (i.e., in a communication magazine), the authors did not delve deeper into the search approaches.

In our work, we review program synthesis based on different types of user intent. Furthermore, we survey all search-based program synthesis techniques without focusing on a particular one, delving into their representation of the search space and the types of code generation problems at which they are the most successful.

2.4. LLM-Based Program Synthesis

LLMs have shown promise in various software engineering tasks [45] such as requirements engineering [46], software development [47] and software maintenance [48,49]. While not directly the concern of our survey, we also briefly describe relevant studies that utilise LLMs for program synthesis.

Austin et al. [50] investigate the ability of LLMs to generate Python code (https://www.python.org/) based on task descriptions. They evaluate LLM performance on datasets such as Mostly Basic Programming Problems and MathQA-Python, showing that while LLMs are effective at generating correct programs in some cases, they struggle with ambiguous or complex problems. This highlights the need for better prompt engineering and error-handling mechanisms. Nijkamp et al. [51] present an open-source LLM called CodeGen, specifically trained for code generation. CodeGen demonstrates how large-scale models can handle program synthesis by generalising from natural language task descriptions. It emphasises the importance of multi-turn interactions, where the model iteratively refines its output based on feedback, showing improvements over single-turn solutions. Li et al. [52] explore the integration of LLMs into traditional enumerative synthesis frameworks. They use techniques like counterexample-guided GP, where LLMs are prompted to generate code solutions iteratively, improving accuracy through feedback loops. This approach combines the generative capabilities of LLMs with formal methods to solve more complex synthesis tasks.

Sobania et al. [53] compare the program synthesis performance of GitHub Copilot and GP. They evaluate GitHub Copilot on a standard set of program synthesis benchmark problems and compare the results to those reported in the GP literature. The study finds that while both methods perform similarly, GitHub Copilot offers more readable and practically usable solutions for programmers, whereas GP is still in development, requiring significant test data and being time-consuming. The authors suggest that future GP research should focus on improving usability, execution time, and code readability.

Wang et al. [54] explore a novel approach for enhancing LLMS to generate highly structured languages, such as domain-specific languages, with minimal data. The authors introduce grammar prompting, a technique where an LLM uses BNF grammar to enforce syntactic constraints during generation. This method helps the LLMs handle complex, structured tasks like semantic parsing, AI planning, and molecule generation, which traditional prompting methods struggle to address effectively.

Hemberg et al. [32] introduce LLM_GP, a framework that utilises LLMs to evolve code by integrating LLMs as a core component of the evolutionary process. Unlike traditional GP, where evolutionary operators act on data structures such as syntax trees or linear sequences of program instructions, LLM_GP replaces these mechanisms with LLM-driven prompt engineering. This approach prompts the LLM to execute evolutionary operations, such as generating initial solutions, selecting and modifying programs, and applying crossover or mutation based on pre-trained patterns and completion capabilities. This novel integration of LLMs allows evolutionary processes to leverage the vast knowledge contained within the LLMs, thus transforming the evolutionary operators into a process that is both guided by and dependent on the language model’s learned representations.

Tao et al. [34] attempt to address the lack of trust in LLM-generated code due to documented risks (e.g., code with known and risky vulnerabilities) by utilising predefined restricted Backus–Naur Form (BNF) grammars, which are considered ‘safe’, to restrict the search space and avoid bad programs. They show that while LLMs perform well in generating correct programs, they often fail to produce code that adheres to the grammars. To solve this, the authors leverage LLM-generated programs in a multi-objective grammar-guided GP in two parts: (i) as seeds to the evolution and (ii) as targets to new similarity objectives.

3. Methodology

This section details the three main steps of this study: (i) definition of research questions, (ii) paper search, and (iii) paper screening and selection. This study is conducted in compliance with the PRISMA guidelines and any registration information.

3.1. Definition of Research Questions

Our review aims to identify the existing research on SBPS algorithms by reviewing published literature.

The research questions (RQs) formulated for this study are as follows:

RQ1: What are the main techniques and trends in SBPS?
RQ2: What are the guiding principles of SBPS algorithms?
–
RQ2.1: What type of user intent/input is used to guide the SBPS search process?
–
RQ2.2: What search space representation is used by SBPS algorithms?
RQ3: What are the types of programming tasks targeted by each SBPS algorithm?

3.2. Search for Relevant Papers

Finding an accurate search string for our study proved challenging due to the vast array of subdomains within the automated programming field and large number of techniques under the umbrella of search-based techniques. Given this diversity and instead of naming all potential keywords (i.e., all synonyms of program synthesis and all search-based algorithms), we opted to start our search with a query that only includes the main terms in our study—albeit with a high chance of returning a large number of false positives.

To conduct our comprehensive review, we executed keyword-based queries on 24 March 2024 within the Scopus digital library. Scopus, renowned for its collection of peer-reviewed publications from top software engineering journals and conferences, indexes research papers from several esteemed sources including IEEE Xplore, ACM Digital Library, ScienceDirect (Elsevier), and Springer.

To search the Scopus digital library, we use the following search string:

This approach facilitated the identification of relevant studies, providing a foundation for our exploration of the nuanced landscape of automated programming. The utilisation of such a methodologically robust strategy ensures the inclusion of diverse perspectives and insights from reputable sources, contributing to the scholarly rigour and validity of our study.

3.3. Paper Screening and Selection

We retrieved 1002 publications by running the search string. After the implementation of the search string and the establishment of clear inclusion and exclusion parameters (outlined in Table 1), a meticulous screening process was executed on the retrieved papers, delineated in the PRISMA flow diagram in Figure 1.

In the initial phase, 671 papers were excluded based on their titles and abstracts as they did not align with the predefined criteria. The subsequent phase involved an in-depth examination of the remaining 331 studies through a full-text reading process, culminating in the exclusion of an additional 167 papers. The resulting subset of 164 publications exhibited an exclusive focus on SBPS. Notably, 95 studies were omitted from this subset as (i) secondary studies, (ii) minor incremental improvements of a primary study, or (iii) survey papers.

Our full process enabled us to identify 69 papers contributing to novel search-based program synthesis algorithms.

4. Analysis

In this section, we present an in-depth analysis of our review in an attempt to answer our research questions, focusing on (i) the trends and techniques of SBPS, (ii) the guiding principles of each SBPS algorithm (type of user intent and representation of the search space), and (iii) the type of problem target.

4.1. Main SBPS Techniques and Trends (RQ1)

We start by analysing the type of SBPS techniques. Our survey identified a range of techniques for SBPS, which we have categorised into the following categories:

Uninformed denotes an algorithmic approach that systematically explores a given search space in a predetermined order, devoid of any heuristic strategies or domain-specific knowledge. Such algorithms adhere to a straightforward exploration pattern, sequentially examining potential solutions without incorporating any informed guidance or optimisation criteria.
Heuristic is an AI-based technique that is built for efficient search through large search spaces instead of using traditional exhaustive search methods. This type of algorithm often uses certain heuristics or rules to guide the search process towards the solution.
Metaheuristic is a higher-level search strategy that is designed to find optimal solutions for a wider range of problems instead of focusing on one particular task. This type of algorithm is often inspired by natural phenomena or human behaviour to guide the search. Metaheuristics often guide the search process using an iterative process in an attempt to reach better solutions.
Other encompasses search-based program synthesis algorithms that cannot be classified within the previously defined categories. For instance, our survey identified a subset of papers that employ a pre-built database search approach for iterative program synthesis. Notably, these algorithms do not adhere to standard search techniques, and we have grouped them under this distinct category.

Table 2 provides a comprehensive summary of the identified approaches, classifying them according to the above algorithmic categories. Examination of the data underscores a notable predilection for the application of metaheuristics, evident in the identification of 33 papers employing them as their primary search strategy.

A variety of metaheuristic approaches were identified in our survey—with evolutionary approaches being the most used among them (in 25 papers). Analysis of the evolutionary approaches shows that despite sharing the evolution framework, such approaches differ significantly in their architectures (e.g., linear GP and grammatical evolution) and strategies (e.g., linear, push, and tree-based GP). Additionally, combining GP with other techniques emerged as a prevalent strategy among various evolutionary approaches in six papers. Correia et al. [95,96] proposed tackling the program synthesis problem as a model finding using a synthesiser called Alloy*. They further expanded their synthesiser to tackle complex synthesis problems by integrating it with a genetic programming module. Arcuri et al. [16] proposed co-evolutionary program synthesis. This system evolves the program using a genetic algorithm and co-evolves it with a population of unit tests. They calculated the program’s fitness using the unit test and evolved the unit test with these programs. Virgolin et al. [97] combined a model-based evolutionary algorithm called Gene-pool Optimal Mixing Evolutionary Algorithm with a tree-based genetic programming algorithm. Poliansky et al. [23] proposed genetic programming in conjunction with context-oriented behavioural programming.

The tree-based GP approach in the evolutionary approach category was utilised in six papers. Igwe and Pillay [89] investigated using a tree-based GP approach for program synthesis. Fernandes et al. [91] proposed a tree-based GP approach called Higher-order Typed GP with grammar that supports higher-order functions, parametric polymorphism and parametric types. Hosseini Amini et al. [3] proposed a tree-based GP called Rule-Centered GP, which uses evolutionary rules to help the evolution process. Xu et al. [90] used tree-based GP with Lexicase selection for tackling job shop scheduling problems. Islam et al. [92] proposed a new mutation operator for tree-based GP that uses Monte Carlo simulation to expand and evaluate programs repeatedly. Moreover, Tao et al. [21] proposed a MOG3P, a multi-objective tree-based GP system, by expanding objectives to code similarity to better guide the search process.

Four papers have been identified using GE as their approach in our survey. Kim et al. [4] proposed a new approach for GE that uses probabilistic modelling. They used probabilistic grammar and a new mapping process to create new individuals from the distribution of grammars. Schweim et al. [84] proposed multiple GE variants to use human-generated programs to guide the search. Chennupati et al. [85] presented a multi-core GE algorithm to generate parallel sorting programs automatically, whereas Lopes and Costa [6] combined GE with the Artificial Regulatory Network model.

Three papers reported using Code Building Genetic Programming (CBGP). Partridge and Spector [88] proposed CBGP, a program synthesis system that supports generic functions and polymorphic types. It generates graphs that can be translated into programs in human-readable source code. A deeper exploration of the capability of CBGP is presented in [87]. They further formalised the method using the type theory algorithm and analysed the approach with other GP methods in [86].

For other evolutionary search methods, Krawiec et al. [93] proposed counterexample-driven genetic programming (CDGP). They produce counterexamples for failed tests during evolution and use them to drive the search process. Helmuth et al. [14] proposed a human-driven genetic programming system to make the system easier for non-experts to use. To reduce the large training data, they also used counterexamples to reduce the training cost. Moreover, Ahmad and Helmuth [40,82,83] used PushGP, a GP system that uses a stack-based language (i.e., Push), to test program synthesis performance on two proposed program benchmark suites and two novel initialisation techniques (i.e., Lexicase Seeding and Pareto Seeding). Finally, Serruto and Alfaro [94] used many-objective linear GP to generate assembly language programs. They decomposed the program into segments and evolved them simultaneously, allowing these segments to collaborate during the process.

Now, we briefly describe other non-evolutionary metaheuristic methods in selected publications. Four papers were found using Local Search to generate code. Nguyen et al. [77] used Iterated Local Search to tackle dynamic job shop scheduling problems. Their key idea is to perform multiple Local Searches, starting a modification of the best existing program. Rosin [78] proposed a Delayed Acceptance Hill Climbing method, which updates the current best candidates after a period of gathering additional candidate programs using Local Search. Bornholt et al. [79] presented a program synthesis framework combining global and Local Search. The global search coordinates the activities of the Local Search, while the Local Search explores different candidate solutions. Feser et al. [80] performed bottom-up enumeration synthesis and then used Local Search to fix these programs.

Swarm intelligence (SI) is utilised in two papers: Hara et al. [1] proposed parallel Ant Programming (AP) using genetic operators of GP to tackle the premature convergence problem of existing AP systems. Nekoei et al. [2] proposed artificial bee colony expression programming (ABCEP), which combines artificial bee colony programming and expression programming to tackle weak convergence and high locality.

Mahanipour and Nezamabadi-pour [5] applied the gravitational search algorithm in program synthesis. Golafshani [81] used biogeography-based optimisation (BBO) for program synthesis problems. BBO is a new evolutionary algorithm that is inspired by biogeography science.

In our survey, heuristic search approaches were utilised in 14 studies, with diverse heuristic search methods being employed in selected publications. Three papers found to use the A* algorithm: Jin et al. [65,66] applied the A* algorithm to synthesise a data transformation program. Lee et al. [8] used a syntax-guided synthesis (SyGuS) framework by applying a probabilistic model on grammar. They used the A* algorithm to find the resulting program efficiently. Two papers reported using the divide-and-conquer search method: Cropper [70] combined divide-and-conquer search with constraint-driven search to learn optimal, recursive, and large programs. Chen et al. [71] proposed a tool called Facon to generate programs in domain-specific languages based on input–output examples. They applied the divide-and-conquer principle to tackle the stability issue.

For other heuristic approaches, one paper was found for each type. Liu et al. [67] proposed a probability-based approach to synthesise Java programs. They reduced the search space with the knowledge from open-source code repositories. Wong et al. [68] introduced Language for Abstraction and Program Search, a technique that uses natural language information to guide the neurally guided search models. Yoon et al. [69] proposed a bidirectional inductive synthesis method that uses iterative forward–backwards abstract interpretation. Cui and Zhu [72] used a gradient-descent-based method to learn the probability distribution over all possible program space. Hua et al. [73] proposed execution-driven sketching, a backtracking search approach to synthesise Java programs. Miltner et al. [12] used lenses with priority queues to transform different data representations bidirectionally. Yuan et al. [74] proposed a trace-guided approach using version space algebra to tackle ambiguity and generalisation challenges for synthesising recursive programs. Herrmann et al. [75] used expert rule tree search to generate computer vision programs. Osera and Zdancewic [76] used the proof-theoretic search technique to synthesise recursive functions that process algebraic data types.

Surprisingly, we found 16 papers using an uninformed search algorithm to perform program synthesis. Seven papers were identified using the enumerative search method: Polozov and Gulwani [56] proposed a data-driven domain-specific deduction method for inductive synthesis. It efficiently combined deductive inference with enumeration search to learn solution programs. Zhang et al. [57] proposed interpretable program synthesis, which enables users to interact and guide the search process. The enumerative search is utilised in their method. Valizadeh and Berger [13] proposed a data-parallel algorithm using an enumeration search for regular expression inference. Guria et al. [10] proposed a lightweight and language-agnostic program synthesiser that uses enumeration search. Feng et al. [58] proposed a component-based synthesis algorithm that combines type-directed enumeration search and SMT-based deduction. Feser et al. [59] presented a program synthesis approach utilising inductive generalisation, deduction and enumeration search. Polikarpova et al. [60] proposed an SBPS approach for synthesising recursive functions based on a polymorphic refinement type specification. Three papers utilised best-first search for their algorithm: Ameen and Lelis [9] proposed the best-first bottom-up search (BEE) search to reduce the information loss problem of cost-guided bottom-up search. Chen et al. [61] used best-first search to synthesise network specifications in a declarative logic programming language. Cropper and Dumancic [15] used the example-dependent loss function to guide the best-first search to learn large programs. Two papers were identified that used a top-down search: Ye et al. [62] used a top-down search in a neural model to find a solution program that satisfies the user’s natural language intent and input–output examples. Bowers et al. [7] proposed a corpus-guided synthesis algorithm that performs a top-down search to generate library functions from a corpus of domain-specific languages. For other uninformed search algorithms, only one paper was identified. Barke et al. [63] proposed using a partial solution from the synthesis process to build the model instead of training the model before the code generation process. They used a bottom-up search that used a probabilistic model to guide the search efficiently. Heule et al. [55] utilised random search to generate models for opaque code automatically. Opaque code is an executable code whose source is unavailable. Guerra et al. [17] presented Hoogle*, a type-directed component-based program synthesiser that can handle constants and

λ

-abstractions better than the previous version. Ren et al. [64] used a breadth-first search to generate nuclear power software source code automatically.

We found five papers using problem-specific search methods to generate the program. Fix et al. [99] used a combinatorial evolution approach to open-ended programming. They stored code blocks in a database and iteratively combined them to generate programs. Saha et al. [22] proposed AutoML to generate an ML pipeline based on a corpus of human-written pipelines. Similarly, they built a database using a corpus and iteratively construct the pipeline using the knowledge learned from the database. Shimonaka et al. [100] proposed a reuse-based code generation technique that utilises the signature of the Java method and test cases. First, it constructs the database using the Java source files. Then, it uses a method extractor, searcher and processor to generate a Java program iteratively. Liu et al. [101] proposed a framework called ITAS, which can synthesise programs iteratively using API knowledge from the internet. Liu et al. [102] proposed API recommendations via a general search engine.

Figure 2 shows the number of publications identified for each SBPS category along with the year. In the following analysis, we divided the evolutionary approach from the metaheuristic algorithms for a more detailed examination. The total number of research publications in this domain exhibits a consistent upward trajectory, indicating a growing interest in SBPS within the research community.

Analysing the trends for each category reveals the following insights: (i) GP exhibits a solid presence each year, showcasing its dominant position in this field. The number of publications leveraging GP shows an increasing trend by year. (ii) The uninformed search method was popular in 2015, and the number of publications reported using it rapidly increased in 2023. (iii) The diversity of algorithms has grown since 2020.

4.2. Guiding Principles of SBPS Algorithms (RQ2)

In this subsection, we analyse the search details of each SBPS algorithm in terms of required user input (intent) and representation of each solution.

4.2.1. Analysis of SBPS Algorithm Input Type (RQ2.1)

In this subsection, we analyse the input type of the selected SBPS algorithms. Programming by example (PBE) is a programming paradigm that synthesises computer programs by using user-provided input–output examples. We introduce a new type of user interaction in computer synthesis, Programming by Instruction (PBI), which performs programming synthesis based on the user’s textual instruction. This textual instruction can be a natural language task description, programming rule, incomplete code snippet or other user specification form in a textual format.

Figure 3 illustrates the distribution of studies in our survey based on user input type. Notably, the PBE approach emerges as the predominant choice for synthesis algorithms, demonstrating consistent prevalence across all years. This preference can be attributed to its ability to guide the search towards correct solutions precisely. Specifically, a program is considered correct if it successfully passes all training and test input–output cases. The continued presence of PBI since 2020, independently or in conjunction with PBE, underscores the increasing popularity of textual intent, particularly in the context of the prevailing Large Language Model paradigm.

4.2.2. Representation of SBPS Search Space (RQ2.2)

We now analyse the representation type of the search space in the SBPS algorithm. Among the studies selected, diverse representation methods, such as tree, linear code sequences, and more, have been identified. We start by introducing these representations and aim to gain insights into the connection between the representation type and the search algorithm type, as well as the trend of each representation type, along with the published year.

The tree representation (e.g., Abstract Syntax Tree) is one of the most popular approaches for forming the search space in SBPS, where solutions are structured hierarchically in a tree format. Nodes within the tree correspond to distinct code parts, and the hierarchy information and connections of tree nodes capture the program’s logical flow. In search-based program synthesis, trees enable systematic exploration by organising code into nested expressions, making it easier to apply grammar-guided mutations or constraint-based pruning. The main strength of the tree representation lies in preserving syntactic and semantic relationships, which helps maintain correctness during synthesis. However, deep or unbalanced trees can lead to a combinatorial explosion in the search space, making synthesis inefficient for complex programs. Techniques like genetic programming often use trees, but they require careful handling to avoid invalid or bloated solutions.

Another way to represent the program in the search space is using a linear code sequence, where programs are directly presented in the search space using a sequence of code blocks or code segments. In linear code sequence representation, each element in the sequence corresponds to a specific part of the code, and the sequential order dictates the program’s execution flow. This representation type is straightforward, improving the solution program’s interpretability during the search process. This format simplifies execution, making it efficient for superoptimisation and just-in-time synthesis. Since linear code lacks explicit structure, it reduces the search space compared to trees or graphs, but this also means high-level semantics must be reconstructed.

Rule-based representation stands out as a declarative approach, encoding solutions as sets of rules or logical expressions. This method captures complex conditions and actions, offering a more abstract and high-level representation, making them ideal for domain-specific synthesis (e.g., SQL queries, parser generation). Rules help prune invalid candidates early, improving synthesis efficiency. However, they are less flexible for general-purpose programming, as not all computations can be easily expressed as rules. Tools like FlashMeta demonstrate how rule-based synthesis can excel in constrained domains but may struggle with programs requiring complex, unstructured logic.

Graph representation is designed for problems where dependencies between program elements are crucial. Solutions are modelled as nodes and edges in a graph, effectively representing relationships and dependencies between different parts of the program. This representation helps enforce correct program behaviour by ensuring that generated code follows logical execution paths. Graphs are particularly useful for path-sensitive synthesis, where the solver must respect data and control dependencies. However, dynamically constructing and traversing graphs during synthesis can be computationally expensive, limiting scalability.

In specific scenarios within the evolutionary algorithms, such as genetic evolution (GE) [103], solutions are encoded as strings of symbols or characters. This type of representation is called string representation. In this approach, the sequence and arrangement of symbols within the string directly mirror the program’s structure. In this representation, the term “genome” is often used to refer to the encoded string, encapsulating the genetic information that determines the solution. The corresponding program or solution derived from this genome is called the “phenotype”. It represents the expressed functionality of the genetic information encoded in the string. This encoding mechanism provides a concise and human-readable way to capture the essential elements of a program’s structure within a string format. Strings treat code as plain text, making them efficient and flexible, but also the least structured representation.

Lastly, stack representation organises solutions for each data type into a stack in a last-in, first-out (LIFO) manner. This type of representation is introduced with Push language by [104,105], specifically designed for solving program synthesis problems. Stacks’ deterministic execution makes them suitable for GP, where programs are evolved via mutations and crossovers. However, the low-level nature of stack-based code makes it difficult to synthesise high-level logic, and translating results back to readable source code can be challenging.

Each representation type comes with its own set of strengths and is often chosen based on the specific requirements and characteristics of the program synthesis task. Table 3 provides an overview of the representation types identified in the surveyed papers, along with the corresponding papers.

As shown in Table 3, there is a large domination of tree representation, with 26 publications, almost twice the number of other representation types. The reason behind this dominance is that evolutionary search algorithms often use trees to represent the search space, and the evolutionary algorithms are reported as the most used algorithm among selected papers. It further underscores the effectiveness of the tree representation and the evolutionary algorithm for tackling program synthesis tasks. Followed by rule-based representation with 13 publications identified, linear code sequence with 12 and string representation with 9. While less prevalent, other representation types include graph (six papers) and stack (three papers). These alternative representations contribute to the diversity of approaches explored in the surveyed literature.

Table 4 summarises the algorithm types corresponding to each representation type. We observed a general preference for representation type for each algorithm type:

The most common approach for representing search space for the metaheuristic approaches is a tree representation, which was reported in 14 out of 33 publications.
It is also the same for non-evolutionary metaheuristic approaches that tree representation is identified as the most used approach in this category.
The representation type for heuristic algorithms are evenly distributed between trees, rule-based, linear code sequences and graphs. However, no selected paper reported a heuristic algorithm using string or stack representation.
SBPS approaches using an uninformed search algorithm are more likely to use a tree- or rule-based representation. No paper has been reported using an uninformed search with graph, string, or stack representation.
For the “Other” SBPS algorithms, linear code sequence is the only representation used to synthesise programs (all five studies in this category reported using linear code sequence as a representation method).
Rule-based representation is mainly utilised in the uninformed search algorithms, while string and stack representations are utilised in the evolutionary metaheuristic approach.

Figure 4 visually represents the distribution of studies across different representation types by year.

A clear upward trend is observed over the decade, with a significant spike in 2023, marking the highest total number of studies in any given year. Tree-based representations consistently appear throughout the years and dominate particularly in 2015 and 2023, indicating sustained and growing interest. Linear code sequence representation experienced a rise from 2020. However, this representation was not utilised in any selected research in 2023. The rule-based representation showed a steady presence from 2015 to 2023, with a spike in 2018. The graph representation was first used in 2017 and has shown frequent presence in recent years. The string representation was distributed randomly along the selected period and did not show a clear pattern.

Overall, the trends highlight the dynamic nature of representation choices in SBPS. Tree, linear code sequence, and rule-based representations emerge as popular choices, each with specific strengths and applications. Other representations, such as stack, show sporadic usage, indicating their suitability for particular scenarios or research contexts. This distribution suggests an evolution in the field’s methodological focus, with a recent surge in complex, hierarchical representation models.

4.3. Type of Task Targeted by Each SBPS Algorithm (RQ3)

Going further, we analysed the targeted problem type for each publication and the dataset used for the experiment. By examining the addressed problem type and the dataset, we aim to understand the research context and methodology adopted in SBPS.

Table 5 shows the problem types and datasets employed in the selected publications. Moreover, Figure 5 provides a visual overview of the number of dataset usages (excluding datasets with a single unique usage) for each problem type.

We observed a diverse spectrum of program synthesis approaches spanning various application domains. These domains range from simple problems such as algebraic calculations, symbolic regression, etc., to more challenging applications such as robot path-finding and high-level coding tasks in diverse languages like Python and Java. Notably, some studies extend their investigations to real-world complex problems, targeting ML pipeline generation and synthesising feature constructions to preprocess the ML data.

We categorised the target problem of selected publications into six types:

Symbolic Regression: This type of problem aims to discover mathematical expressions or symbolic representations that model the relationship within a given dataset.
String Manipulation: This involves the task of generating or transforming strings based on specific rules or requirements.
Circuit Transformation: This problem type targets automatically modifying or optimising electronic circuits based on a digital specification of certain circuits.
Array/Vector Transformation: Similar to other transformation tasks, this problem type aims to manipulate or transform elements within arrays or vectors.
General Coding: This category of problem type generally aims to solve general coding tasks using high-level programming languages like Python, Java, etc.
Others: In this problem type, we have collected relatively challenging real-world problems which cannot be included in previous categories.

Most selected studies aim for general coding tasks (29), followed by other relatively challenging problems (21). Closely followed by array/vector transformation, 18 studies were reported. The distribution of other manipulation/transformation-type problems is even, where 9 studies target symbolic regression, 10 for string manipulation, and 6 for circuit transformation.

Furthermore, we also observed patterns in the selection of datasets across different problem domains. Notably, datasets from the SyGuS competition are prominently featured in studies addressing challenges in string manipulation, circuit transformation and array/vector transformation problems. Interestingly, there is a notable absence of standard, commonly used benchmarks for symbolic regression in these studies, suggesting a preference for creating custom benchmarks tailored to the intricacies of this problem type. For high-level coding tasks, the General Program Synthesis Benchmark Suite (PSB1) [40] emerges as a promising benchmark, employed in seven studies to evaluate the effectiveness of algorithms. Helmuth and Kelly published their updated benchmark, General Program Synthesis Benchmark Suite (PSB2) [83], in 2021. However, three studies [21,86,91] published after the release of the new version of the benchmark did not choose to use it to compare the algorithm with other approaches on PSB1. Additionally, the benchmark first used in SyPet [106] has gained popularity as a dataset for Java programming tasks. An interesting observation is using a user study to evaluate a code generation system in [57], showcasing a unique and user-centred approach in the selected research landscape.

We now describe some of the interesting problems collected in the “other” category, considering problems contained in this category are relatively challenging compared to other categories. Cropper and Dumancic [15] used the ILP system with best-first search to address the problem of learning to draw ASCII art. Saha et al. [22] proposed a technique that can generate an ML pipeline for a predictive task on a new dataset, while Mahanipour et al. [5] focused on feature construction, which is an important task in preprocessing in ML tasks. Poliansky et al. [23] demonstrated their approach in a game of Tic-Tac-Toe. Chen et al. [61] aimed to synthesise network specifications.

Going further, we study the relationship between the classification of problem type and the employed search algorithm type. Table 6 illustrates the type of problem addressed by each of the selected publications based on their corresponding search algorithms. Furthermore, Figure 6 provides a visual distribution of these works.

At least one publication was found for the evolutionary approach applied in each category of problems, with a primary focus on general coding tasks (16 out of 34 studies). This highlights the great scalability of the evolutionary approach across various problem domains. Similarly, SBPS algorithms using “other” search algorithms show great presence in general coding problems. We noticed these problem-specific techniques generally target relatively harder problems, considering no studies were found for tackling transformation or manipulation problems. The remaining metaheuristic, heuristic, and uninformed search algorithms exhibit even distribution across each problem domain.

Finally, we analyse the relationship between the representation and target problem types, aiming to gain insight into which representation is more suitable for various domains. Table 7 shows the representation and problem type of each selected publication. Furthermore, Figure 7 provides a visual representation of the distribution of works based on their representation and addressed problem.

Algorithms using tree and string representations are applied to every identified problem type. However, tree representations focus more on solving relatively challenging target problems, i.e., general coding tasks and other real-world programming tasks, while SBPS approaches with string representation are evenly distributed. Linear code sequences and graph representations are mainly employed in harder problem types, such as array/vector transformation, general coding, and other challenging real-world tasks. Rule-based representations are mainly used to tackle string manipulation and other real-world tasks. Stack representation is utilised in general coding and other challenging programming tasks, showing potential for solving harder problems.

5. Existing Challenge in SBPS

In this section, we summarise and discuss the observations and existing challenges from the results of our previous research questions analysis, thereby attempting to provide insight into the direction of future work.

5.1. Bridging Theory and Practice

We have noticed that many studies report their algorithms being evaluated on relatively straightforward problems, i.e., transformation and manipulation, restricting their availability to handle real-world programming scenarios. This reveals a significant hurdle in the early developmental stages of the field of SBPS, where most studies lack the ability to assess problems beyond the theoretical domain into real-world applications or tasks that transcend the predefined boundaries of training and testing scopes. Among the six categories of problem types that we collected on Table 5, more than half of them (symbolic regression, string manipulation, circuit transformation and array/vector transformation, and algebra calculation in generic coding) are considered theoretical problems, which make up approximately 50% (43 out of 91) of the problems we analysed. Even other benchmarks in the general coding category often contain many basic-level tasks. For example, PSB1 is a dataset with 29 problems selected from elementary-level programming courses, making it considerably easier than real-world problems.

The challenge of bridging the gap between theoretical frameworks and real-world applicability is evident in the field of SBPS. Looking ahead, finding ways to overcome these challenges and making SBPS techniques that are more applicable to real-world problems will be crucial for the future of this field. Thus, it is important to recognise and overcome the hurdles at this early stage to guide the field toward a more robust and impactful future.

5.2. Advancing Algorithms: Tools, Strategies, and Evolution

We observed that numerous studies demonstrate their implementation with a simple prototype, with the primary objective of showcasing the effectiveness of the proposed ideas through the employment of comparatively elementary principles and strategies. Nonetheless, they mentioned that better tools and strategies are needed to move these algorithms towards better performance. The academic community agrees that we urgently need stronger tools, like improved search strategies and fine-tuning of parameters. We realise that while the first versions of these algorithms may show promise in theory, their full potential remains untapped unless we use more advanced methods. To apply the SBPS field to complex problems, researchers and practitioners must recognise and embrace these acknowledged needs as opportunities for future progress. By directly tackling these challenges, the academic community can significantly contribute to improving and refining algorithmic approaches, making sure they can be used effectively in a wide range of situations.

5.3. Absence of a Common Benchmark

Our comprehensive survey reveals significant variability in programming tasks and datasets, which makes it challenging to compare the strength of each algorithm. A lack of standardised benchmarks persists across selected publications, even within the same coding tasks, such as symbolic regression. This inconsistency raises concerns about the reproducibility and generalisability of findings, as researchers may inadvertently tailor their approaches to specific datasets or problem subsets, leading to inflated performance claims. Without a unified benchmark, it becomes difficult to assess whether improvements in synthesis techniques stem from genuine algorithmic advancements or simply from favourable task selection.

Although several studies have employed the SyGuS competition dataset as an evaluation benchmark, the specific SyGuS problems differ each year, and the analysed studies often select problems from distinct versions of the benchmark. This ad hoc selection introduces bias, as different problem sets may vary in complexity, domain, or required search strategies, making cross-study comparisons unreliable. Furthermore, the lack of consistency in evaluation metrics—such as success rate, computational efficiency, or solution quality—further exacerbates the problem, as some studies may emphasise different performance aspects without justification.

One well-constructed benchmark, PSB1, has been utilised in six studies. However, considering the 28 studies identified in the general coding task category, the utilisation rate remains notably low. The under-adoption of existing benchmarks suggests either a lack of awareness, a preference for custom evaluation setups, or a misalignment between benchmark tasks and real-world synthesis challenges. This fragmentation hinders cumulative progress, as researchers cannot build directly upon prior work without re-evaluating previous methods on new, often incompatible, datasets.

The absence of standardisation also has broader implications for the field. It slows down innovation by making it difficult to identify the most promising research directions, as conflicting results may arise from methodological differences rather than true performance disparities. Additionally, funding agencies and industry stakeholders may struggle to assess the maturity and applicability of synthesis techniques, potentially limiting investment and adoption.

We recommend that future research endeavours prioritise adopting common problem sets and datasets for program synthesis tasks. Such standardisation will facilitate more meaningful comparisons, improve reproducibility, and drive advancements in the field. Establishing consensus on evaluation protocols—including problem selection, performance metrics, and baseline comparisons—should be a key focus for the community to ensure rigorous and transparent progress in search-based program synthesis.

5.4. Computational Challenges in Search-Based Program Synthesis

Our survey findings underscore a noteworthy observation regarding the computational overhead inherent in SBPS. It becomes evident that this computational cost rapidly increases during the search process, primarily attributable to the expanding dimensions of the search space and the consequential expenses incurred during the fitness evaluation. The expansion of the search space, a consequence of the intricate nature of the programming landscape, contributes significantly to the escalated computational demands. Future work can aim to reduce the search space while not influencing the performance of the search algorithms. Moreover, the expense associated with fitness evaluation emerges as a critical determinant in the observed higher computational costs. Most search-based algorithms refine their goal based on fitness evaluation, which has to be applied to every individual within the search space. Potential future work can be conducted to improve the efficiency of the fitness evaluation.

6. Conclusions

In pursuing a comprehensive understanding of the contemporary landscape of program synthesis through search-based algorithms, we surveyed the existing body of literature. This study revealed a considerable corpus of works dedicated to applying search-based algorithms in the realm of program synthesis. Notably, there is a discernible upward trajectory in the annual publication rate, indicative of this research domain’s growing interest and significance.

The selected studies underwent a thorough analysis, enabling the categorisation of these works based on the employed search techniques, user intent type, and representations of the search space. This categorisation was complemented by trend analysis, offering insights into the evolving methodologies and approaches within the field. Furthermore, the review encompassed the collection of targeted problem types and datasets from each identified study, facilitating an in-depth examination of the empirical foundations underpinning these investigations. We also conducted an analysis of the relationships between the attributes we categorised, including target problem types and algorithm types to provide navigation guidelines for this large field in particular. Particularly, we found the following:

SBPS continues to attract an increasing and significant amount of novel work targeted various problems, from simple modification/manipulation problems to more challenging programming and real-world problems.
Programming by example is the dominant way to guide the search of SBPS approaches, whereas Programming by Instruction has become popular recently.
Approaches utilising evolutionary and uninformed search algorithms that leverage tree search spaces have been the most attractive in recent years, particularly the following:
–
Symbolic regression tasks are mostly tackled with metaheuristics, whereas string manipulation tasks are mostly tackled with uninformed algorithms. General coding tasks attracted a wide range of techniques; however, evolutionary methods are the most used.
–
Tree and string representations are utilised to tackle all kinds of problems, while linear code sequence and graph representations are utilised more for challenging problems.

These findings are of high and timely importance. As we witness a new research community (i.e., generative AI) taking on the program synthesis challenge, our study will provide an entry point to the acumen and expertise of the search-based community and help build synergy for collaboration between the two communities.

Author Contributions

Conceptualization, methodology, investigation, data curation, writing: T.S. and N.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Science Foundation Ireland grant 13/RC/2094_P2 to Lero—the Science Foundation Ireland Research Centre for Software (www.lero.ie).

Acknowledgments

This publication has emanated from research supported in part by a grant from Taighde Éireann Research Ireland under Grant number 13/RC/2094_P2. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hara, A.; Kushida, J.-i.; Tanabe, S.; Takahama, T. Parallel Ant Programming using genetic operators. In Proceedings of the 2013 IEEE 6th International Workshop on Computational Intelligence and Applications (IWCIA), Hiroshima, Japan, 13 July 2013; pp. 75–80. [Google Scholar]
Nekoei, M.; Moghaddas, S.A.; Mohammadi Golafshani, E.; Gandomi, A.H. Introduction of ABCEP as an automatic programming method. Inf. Sci. 2021, 545, 575–594. [Google Scholar] [CrossRef]
Hosseini Amini, S.M.H.; Abdollahi, M.; Amir Haeri, M. Rule-centred genetic programming (RCGP): An imperialist competitive approach. Appl. Intell. 2020, 50, 2589–2609. [Google Scholar] [CrossRef]
Kim, H.T.; Kang, H.K.; Ahn, C.W. A Conditional Dependency Based Probabilistic Model Building Grammatical Evolution. IEICE Trans. Inf. Syst. 2016, E99.D, 1937–1940. [Google Scholar] [CrossRef]
Mahanipour, A.; Nezamabadi-Pour, H. GSP: An automatic programming technique with gravitational search algorithm. Appl. Intell. 2019, 49, 1502–1516. [Google Scholar] [CrossRef]
Lopes, R.L.; Costa, E. GEARNet: Grammatical Evolution with Artificial Regulatory Networks. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, GECCO ’13, Kaohsiung, Taiwan, 6–8 October 2023; pp. 973–980. [Google Scholar]
Bowers, M.; Olausson, T.X.; Wong, L.; Grand, G.; Tenenbaum, J.B.; Ellis, K.; Solar-Lezama, A. Top-Down Synthesis for Library Learning. Proc. ACM Program. Lang. 2023, 7, 1182–1213. [Google Scholar] [CrossRef]
Lee, W.; Heo, K.; Alur, R.; Naik, M. Accelerating Search-Based Program Synthesis Using Learned Probabilistic Models. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, 18–22 June 2018; pp. 436–449. [Google Scholar]
Ameen, S.; Lelis, L.H. Program synthesis with best-first bottom-up search. J. Artif. Intell. Res. 2023, 77, 1275–1310. [Google Scholar] [CrossRef]
Guria, S.N.; Foster, J.S.; Van Horn, D. Absynthe: Abstract Interpretation-Guided Synthesis. Proc. ACM Program. Lang. 2023, 7, 1584–1607. [Google Scholar] [CrossRef]
Yuan, Y.; Banzhaf, W. Iterative genetic improvement: Scaling stochastic program synthesis. Artif. Intell. 2023, 322, 103962. [Google Scholar] [CrossRef]
Miltner, A.; Fisher, K.; Pierce, B.C.; Walker, D.; Zdancewic, S. Synthesizing Bijective Lenses. Proc. ACM Program. Lang. 2017, 2, 1–30. [Google Scholar] [CrossRef]
Valizadeh, M.; Berger, M. Search-Based Regular Expression Inference on a GPU. Proc. ACM Program. Lang. 2023, 7, 1317–1339. [Google Scholar] [CrossRef]
Helmuth, T.; Frazier, J.G.; Shi, Y.; Abdelrehim, A.F. Human-Driven Genetic Programming for Program Synthesis: A Prototype. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation, GECCO ’23 Companion, Lisbon, Portugal, 15–19 July 2023; pp. 1981–1989. [Google Scholar]
Cropper, A.; Dumančić, S. Learning large logic programs by going beyond entailment. arXiv 2020, arXiv:2004.09855. [Google Scholar]
Arcuri, A.; Yao, X. Co-evolutionary automatic programming for software development. Inf. Sci. 2014, 259, 412–432. [Google Scholar] [CrossRef]
Botelho Guerra, H.; Ferreira, J.F.; Costa Seco, J. Hoogle: Constants and Lambda-abstractions in Petri-net-based Synthesis using Symbolic Execution. In Leibniz International Proceedings in Informatics (LIPIcs), Proceedings of the 37th European Conference on Object-Oriented Programming (ECOOP 2023), Seattle, WA, USA, 17–21 July 2023; Ali, K., Salvaneschi, G., Eds.; Dagstuhl: Wadern, Germany, 2023; Volume 263, pp. 4:1–4:28. [Google Scholar]
Tao, N.; Ventresque, A.; Saber, T. Program synthesis with generative pre-trained transformers and grammar-guided genetic programming grammar. In Proceedings of the 2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI), Recife-Pe, Brazil, 29 October–1 November 2023; pp. 1–6. [Google Scholar]
Tao, N.; Ventresque, A.; Saber, T. Assessing similarity-based grammar-guided genetic programming approaches for program synthesis. In Proceedings of the OLA, Syracuse, Italy, 18–20 July 2022; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Tao, N.; Ventresque, A.; Saber, T. Many-objective Grammar-guided Genetic Programming with Code Similarity Measurement for Program Synthesis. In Proceedings of the 2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI), Recife-Pe, Brazil, 29 October–1 November 2023. [Google Scholar]
Tao, N.; Ventresque, A.; Saber, T. Multi-objective Grammar-guided Genetic Programming with Code Similarity Measurement for Program Synthesis. In Proceedings of the 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
Saha, R.K.; Ura, A.; Mahajan, S.; Zhu, C.; Li, L.; Hu, Y.; Yoshida, H.; Khurshid, S.; Prasad, M.R. SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Writen Solutions. In Proceedings of the 44th International Conference on Software Engineering, ICSE ’22, Pittsburgh, PA, USA, 25–27 May 2022; pp. 1932–1944. [Google Scholar]
Poliansky, R.; Sipper, M.; Elyasaf, A. From Requirements to Source Code: Evolution of Behavioral Programs. Appl. Sci. 2022, 12, 1587. [Google Scholar] [CrossRef]
Beltramelli, T. pix2code: Generating code from a graphical user interface screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Paris, France, 19–22 June 2018; pp. 1–6. [Google Scholar]
Li, Y.; Choi, D.; Chung, J.; Kushman, N.; Schrittwieser, J.; Leblond, R.; Eccles, T.; Keeling, J.; Gimeno, F.; Dal Lago, A.; et al. Competition-Level Code Generation with AlphaCode. Science 2022, 378, 1092–1097. [Google Scholar] [CrossRef]
Jesse, K.; Ahmed, T.; Devanbu, P.T.; Morgan, E. Large language models and simple, stupid bugs. In Proceedings of the 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), Melbourne, Australia, 15–16 May 2023; pp. 563–575. [Google Scholar]
Asare, O.; Nagappan, M.; Asokan, N. Is github’s copilot as bad as humans at introducing vulnerabilities in code? Empir. Softw. Eng. 2023, 28, 129. [Google Scholar] [CrossRef]
Schuster, R.; Song, C.; Tromer, E.; Shmatikov, V. You autocomplete me: Poisoning vulnerabilities in neural code completion. In Proceedings of the USENIX Security 21, Virtual, 11–13 August 2021; pp. 1559–1575. [Google Scholar]
Stechly, K.; Marquez, M.; Kambhampati, S. GPT-4 Doesn’t Know It’s Wrong: An Analysis of Iterative Prompting for Reasoning Problems. arXiv 2023, arXiv:2310.12397. [Google Scholar]
Krishna, S.; Agarwal, C.; Lakkaraju, H. Understanding the Effects of Iterative Prompting on Truthfulness. arXiv 2024, arXiv:2402.06625. [Google Scholar]
Pinna, G.; Ravalico, D.; Rovito, L.; Manzoni, L.; De Lorenzo, A. Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement. In Proceedings of the European Conference on Genetic Programming (Part of EvoStar), Aberystwyth, UK, 3–5 April 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 108–124. [Google Scholar]
Hemberg, E.; Moskal, S.; O’Reilly, U.M. Evolving Code with A Large Language Model. arXiv 2024, arXiv:2401.07102. [Google Scholar] [CrossRef]
Hemberg, E.; Jorgensen, S.; O’Reilly, U.M. Survey of Genetic Programming and Large Language Models. In Genetic Programming Theory and Practice XXI; Springer: Berlin/Heidelberg, Germany, 2025; pp. 67–86. [Google Scholar]
Tao, N.; Ventresque, A.; Nallur, V.; Saber, T. Grammar-obeying program synthesis: A novel approach using large language models and many-objective genetic programming. Comput. Stand. Interfaces 2025, 92, 103938. [Google Scholar] [CrossRef]
Tao, N.; Ventresque, A.; Nallur, V.; Saber, T. Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic Programming. Algorithms 2024, 17, 287. [Google Scholar] [CrossRef]
Mittapalli, J.S.; Arthur, M.P. Survey on template engines in Java. ITM Web Conf. 2021, 37, 01007. [Google Scholar] [CrossRef]
Monperrus, M. Automatic software repair: A bibliography. ACM Comput. Surv. (CSUR) 2018, 51, 1–24. [Google Scholar] [CrossRef]
Sobania, D.; Schweim, D.; Rothlauf, F. A comprehensive survey on program synthesis with evolutionary algorithms. IEEE Trans. Evol. Comput. 2022, 27, 82–97. [Google Scholar] [CrossRef]
Batouta, Z.I.; Dehbi, R.; Talea, M.; Hajoui, O. Automation in code generation: Tertiary and systematic mapping review. In Proceedings of the 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt), Tangier, Morocco, 24–26 October 2016; pp. 200–205. [Google Scholar]
Helmuth, T.; Spector, L. General program synthesis benchmark suite. In Proceedings of the GECCO 2015, Madrid, Spain, 11–15 July 2015. [Google Scholar]
Olmo, J.L.; Romero, J.R.; Ventura, S. Swarm-based metaheuristics in automatic programming: A survey. WIREs Data Min. Knowl. Discov. 2014, 4, 445–469. [Google Scholar] [CrossRef]
Bodík, R.; Jobstmann, B. Algorithmic program synthesis: Introduction. Int. J. Softw. Tools Technol. Transf. 2013, 15, 397–411. [Google Scholar] [CrossRef]
Gulwani, S.; Polozov, O.; Singh, R. Program synthesis. Found. Trends® Program. Lang. 2017, 4, 1–119. [Google Scholar] [CrossRef]
Alur, R.; Singh, R.; Fisman, D.; Solar-Lezama, A. Search-based program synthesis. Commun. ACM 2018, 61, 84–93. [Google Scholar] [CrossRef]
Hou, X.; Zhao, Y.; Liu, Y.; Yang, Z.; Wang, K.; Li, L.; Luo, X.; Lo, D.; Grundy, J.; Wang, H. Large language models for software engineering: A systematic literature review. ACM Trans. Softw. Eng. Methodol. 2024, 33, 1–79. [Google Scholar] [CrossRef]
Hemmat, A.; Sharbaf, M.; Kolahdouz-Rahimi, S.; Lano, K.; Tehrani, S.Y. Research directions for using LLM in software requirement engineering: A systematic review. Front. Comput. Sci. 2025, 7, 1519437. [Google Scholar] [CrossRef]
Chen, L.; Guo, Q.; Jia, H.; Zeng, Z.; Wang, X.; Xu, Y.; Wu, J.; Wang, Y.; Gao, Q.; Wang, J.; et al. A survey on evaluating large language models in code generation tasks. arXiv 2024, arXiv:2408.16498. [Google Scholar]
Zhang, Z.; Saber, T. Machine Learning Approaches to Code Similarity Measurement: A Systematic Review. IEEE Access 2025, 13, 51729–51764. [Google Scholar] [CrossRef]
Zhang, Z.; Saber, T. Exploring the Boundaries Between LLM Code Clone Detection and Code Similarity Assessment on Human and AI-Generated Code. Big Data Cogn. Comput. 2025, 9, 41. [Google Scholar] [CrossRef]
Austin, J.; Odena, A.; Nye, M.; Bosma, M.; Michalewski, H.; Dohan, D.; Jiang, E.; Cai, C.; Terry, M.; Le, Q.; et al. Program synthesis with large language models. arXiv 2021, arXiv:2108.07732. [Google Scholar]
Nijkamp, E.; Pang, B.; Hayashi, H.; Tu, L.; Wang, H.; Zhou, Y.; Savarese, S.; Xiong, C. Codegen: An open large language model for code with multi-turn program synthesis. arXiv 2022, arXiv:2203.13474. [Google Scholar]
Li, Y.; Parsert, J.; Polgreen, E. Guiding enumerative program synthesis with large language models. In Proceedings of the International Conference on Computer Aided Verification, Montreal, QC, Canada, 24–27 July 2024; pp. 280–301. [Google Scholar]
Sobania, D.; Briesch, M.; Rothlauf, F. Choose your programming copilot: A comparison of the program synthesis performance of github copilot and genetic programming. In Proceedings of the GECCO 2022, Boston, MA, USA, 9–13 July 2022. [Google Scholar]
Wang, B.; Wang, Z.; Wang, X.; Cao, Y.; A Saurous, R.; Kim, Y. Grammar prompting for domain-specific language generation with large language models. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
Heule, S.; Sridharan, M.; Chandra, S. Mimic: Computing Models for Opaque Code. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, 30 August–4 September 2015; pp. 710–720. [Google Scholar]
Polozov, O.; Gulwani, S. FlashMeta: A Framework for Inductive Program Synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, Portland, OR, USA, 13–14 June 2015; pp. 107–126. [Google Scholar]
Zhang, T.; Chen, Z.; Zhu, Y.; Vaithilingam, P.; Wang, X.; Glassman, E.L. Interpretable program synthesis. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Online, 8–13 May 2021; pp. 1–16. [Google Scholar]
Feng, Y.; Martins, R.; Van Geffen, J.; Dillig, I.; Chaudhuri, S. Component-based synthesis of table consolidation and transformation tasks from examples. SIGPLAN Not. 2017, 52, 422–436. [Google Scholar] [CrossRef]
Feser, J.K.; Chaudhuri, S.; Dillig, I. Synthesizing data structure transformations from input-output examples. SIGPLAN Not. 2015, 50, 229–239. [Google Scholar] [CrossRef]
Polikarpova, N.; Kuraj, I.; Solar-Lezama, A. Program synthesis from polymorphic refinement types. SIGPLAN Not. 2016, 51, 522–538. [Google Scholar] [CrossRef]
Chen, H.; Wu, C.; Zhao, A.; Raghothaman, M.; Naik, M.; Loo, B.T. Synthesizing Formal Network Specifications From Input-Output Examples. IEEE/ACM Trans. Netw. 2023, 31, 994–1009. [Google Scholar] [CrossRef]
Ye, X.; Chen, Q.; Dillig, I.; Durrett, G. Optimal neural program synthesis from multimodal specifications. arXiv 2020, arXiv:2010.01678. [Google Scholar]
Barke, S.; Peleg, H.; Polikarpova, N. Just-in-time learning for bottom-up enumerative synthesis. Proc. ACM Program. Lang. 2020, 4, 1–29. [Google Scholar] [CrossRef]
Ren, H.; Mo, W.; Zhao, G.; Ren, D.; Liu, S. Breadth First Search Based COSINE Software Code Framework Automation Algorithm. In Proceedings of the ASME Power Conference, Baltimore, MD, USA, 28–31 July 2014; American Society of Mechanical Engineers: New York, NY, USA, 2015; Volume 56604, p. V001T07A003. [Google Scholar]
Jin, Z.; Anderson, M.R.; Cafarella, M.; Jagadish, H.V. Foofah: Transforming Data By Example. In Proceedings of the SIGMOD ’17: 2017 ACM International Conference on Management of Data, Chicago, IL, USA, 14–19 May 2017; pp. 683–698. [Google Scholar]
Jin, Z.; Anderson, M.R.; Cafarella, M.; Jagadish, H.V. Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs. In Proceedings of the SIGMOD ’17: 2017 ACM International Conference on Management of Data, Chicago, IL, USA, 14–19 May 2017; pp. 1607–1610. [Google Scholar]
Liu, B.B.; Dong, W.; Liu, J.X.; Zhang, Y.T.; Wang, D.Y. Prosy: Api-based synthesis with probabilistic model. J. Comput. Sci. Technol. 2020, 35, 1234–1257. [Google Scholar] [CrossRef]
Wong, C.; Ellis, K.M.; Tenenbaum, J.; Andreas, J. Leveraging language to learn program abstractions and search heuristics. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11193–11204. [Google Scholar]
Yoon, Y.; Lee, W.; Yi, K. Inductive program synthesis via iterative forward-backward abstract interpretation. Proc. ACM Program. Lang. 2023, 7, 1657–1681. [Google Scholar] [CrossRef]
Cropper, A. Learning logic programs though divide, constrain, and conquer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; pp. 6446–6453. [Google Scholar]
Chen, H.; Wang, A.; Loo, B.T. Towards Example-Guided Network Synthesis. In Proceedings of the 2nd Asia-Pacific APNet ’18, Workshop on Networking, Beijing, China, 1–3 August 2018; pp. 65–71. [Google Scholar]
Cui, G.; Zhu, H. Differentiable synthesis of program architectures. Adv. Neural Inf. Process. Syst. 2021, 34, 11123–11135. [Google Scholar]
Hua, J.; Khurshid, S. EdSketch: Execution-Driven Sketching for Java. In Proceedings of the 24th ACM SIGSOFT International SPIN Symposium on Model Checking of Software, SPIN 2017, Santa Barbara, CA, USA, 13–14 July 2017; pp. 162–171. [Google Scholar]
Yuan, Y.; Radhakrishna, A.; Samanta, R. Trace-Guided Inductive Synthesis of Recursive Functional Programs. Proc. ACM Program. Lang. 2023, 7, 860–883. [Google Scholar] [CrossRef]
Herrmann, M.; Mayer, C.; Radig, B. Automatic generation of image analysis programs. Pattern Recognit. Image Anal. 2014, 24, 400–408. [Google Scholar] [CrossRef]
Osera, P.M.; Zdancewic, S. Type-and-example-directed program synthesis. SIGPLAN Not. 2015, 50, 619–630. [Google Scholar] [CrossRef]
Nguyen, S.; Zhang, M.; Johnston, M.; Tan, K.C. Automatic Programming via Iterated Local Search for Dynamic Job Shop Scheduling. IEEE Trans. Cybern. 2015, 45, 1–14. [Google Scholar] [CrossRef]
Rosin, C.D. Stepping stones to inductive synthesis of low-level looping programs. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 2362–2370. [Google Scholar]
Bornholt, J.; Torlak, E.; Grossman, D.; Ceze, L. Optimizing Synthesis with Metasketches. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’16, St. Petersburg, FL, USA, 20–22 January 2016; pp. 775–788. [Google Scholar]
Feser, J.; Dillig, I.; Solar-Lezama, A. Inductive Program Synthesis Guided by Observational Program Similarity. Proc. ACM Program. Lang. 2023, 7. [Google Scholar] [CrossRef]
Golafshani, E.M. Introduction of Biogeography-Based Programming as a new algorithm for solving problems. Appl. Math. Comput. 2015, 270, 1–12. [Google Scholar] [CrossRef]
Ahmad, H.; Helmuth, T. A Comparison of Semantic-Based Initialization Methods for Genetic Programming. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’18, Kyoto, Japan, 15–19 July 2018; pp. 1878–1881. [Google Scholar]
Helmuth, T.; Kelly, P. PSB2: The second program synthesis benchmark suite. In Proceedings of the Genetic and Evolutionary Computation Conference, Lille, France, 10–14 July 2021; pp. 785–794. [Google Scholar]
Schweim, D.; Hemberg, E.; Sobania, D.; O’Reilly, U.M.; Rothlauf, F. Using Knowledge of Human-Generated Code to Bias the Search in Program Synthesis with Grammatical Evolution. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’21, Lille, France, 10–14 July 2021; pp. 331–332. [Google Scholar]
Chennpati, G.; Azad, R.M.A.; Ryan, C. On the Automatic Generation of Efficient Parallel Iterative Sorting Algorithms. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation, GECCO Companion ’15, Madrid, Spain, 11–15 July 2015; pp. 1369–1370. [Google Scholar]
Pantridge, E.; Helmuth, T.; Spector, L. Functional Code Building Genetic Programming. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’22, Boston, MA, USA, 9–13 July 2022; pp. 1000–1008. [Google Scholar]
Pantridge, E.; Helmuth, T. Solving Novel Program Synthesis Problems with Genetic Programming using Parametric Polymorphism. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’23, Lisbon, Portugal, 15–19 July 2023; pp. 1175–1183. [Google Scholar]
Pantridge, E.; Spector, L. Code Building Genetic Programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO ’20, Cancún, Mexico, 8–12 July 2020; pp. 994–1002. [Google Scholar]
Igwe, K.; Pillay, N. Automatic programming using genetic programming. In Proceedings of the 2013 Third World Congress on Information and Communication Technologies (WICT 2013), Hanoi, Vietnam, 15–18 December 2013; pp. 337–342. [Google Scholar]
Xu, M.; Mei, Y.; Zhang, F.; Zhang, M. Genetic Programming with Lexicase Selection for Large-scale Dynamic Flexible Job Shop Scheduling. IEEE Trans. Evol. Comput. 2023, 28, 1235–1249. [Google Scholar] [CrossRef]
Fernandes, M.C.; de França, F.O.; Francesquini, E. HOTGP–Higher-Order Typed Genetic Programming. arXiv 2023, arXiv:2304.03200. [Google Scholar]
Islam, M.; Kharma, N.N.; Grogono, P. Expansion: A Novel Mutation Operator for Genetic Programming. In Proceedings of the IJCCI, Seville, Spain, 18–20 September 2018; pp. 55–66. [Google Scholar]
Krawiec, K.; Blkadek, I.; Swan, J. Counterexample-Driven Genetic Programming. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17, Berlin, Germany, 15–19 July 2017; pp. 953–960. [Google Scholar]
Serruto, W.F.; Alfaro, L. Many-Objective Cooperative Co-evolutionary Linear Genetic Programming Applied to the Automatic Microcontroller Program Generation. Int. J. Adv. Comput. Sci. Appl. 2019, 10. [Google Scholar] [CrossRef]
Correia, A.; Iyoda, J.; Mota, A. A family of multi-concept program synthesisers in Alloy*. Sci. Comput. Program. 2021, 201, 102536. [Google Scholar] [CrossRef]
Correia, A.; Iyoda, J.; Mota, A. Combining model finder and genetic programming into a general purpose automatic program synthesizer. Inf. Process. Lett. 2020, 154, 105866. [Google Scholar] [CrossRef]
Virgolin, M.; Alderliesten, T.; Witteveen, C.; Bosman, P.A.N. Scalable Genetic Programming by Gene-Pool Optimal Mixing and Input-Space Entropy-Based Building-Block Learning. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17, Berlin, Germany, 15–19 July 2017; pp. 1041–1048. [Google Scholar]
Liventsev, V.; Härmä, A.; Petković, M. Neurogenetic programming framework for explainable reinforcement learning. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Lille, France, 10–14 July 2021; pp. 329–330. [Google Scholar]
Fix, S.; Probst, T.; Ruggli, O.; Hanne, T.; Christen, P. Automatic Programming As An Open-Ended Evolutionary System. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 2022, 14, 204–212. [Google Scholar]
Shimonaka, K.; Higo, Y.; Matsumoto, J.; Naito, K.; Kusumoto, S. Towards automated generation of Java methods: A way of automated reuse-based programming. In Proceedings of the 2018 IEEE 12th International Workshop on Software Clones (IWSC), Campobasso, Italy, 20 March 2018; pp. 30–36. [Google Scholar]
Liu, J.; Dong, W.; Liu, B. Boosting Component-Based Synthesis with API Usage Knowledge. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, ASE ’20, Virtual, 21–25 December 2020; pp. 91–97. [Google Scholar]
Liu, J.; Liu, B.; Dong, W.; Zhang, Y.; Wang, D. How Much Support Can API Recommendation Methods Provide for Component-Based Synthesis? In Proceedings of the 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), Madrid, Spain, 13–17 July 2020; pp. 872–881. [Google Scholar]
Ryan, C.; Collins, J.J.; Neill, M.O. Grammatical evolution: Evolving programs for an arbitrary language. In Proceedings of the Genetic Programming: First European Workshop, EuroGP’98, Paris, France, 14–15 April 1998; Proceedings 1. pp. 83–96. [Google Scholar]
Spector, L.; Robinson, A. Genetic programming and autoconstructive evolution with the push programming language. Genet. Program. Evolvable Mach. 2002, 3, 7–40. [Google Scholar] [CrossRef]
Pantridge, E.; Spector, L. PyshGP: PushGP in python. In Proceedings of the GECCO 2017, Berlin, Germany, 15–19 July 2017. [Google Scholar]
Feng, Y.; Martins, R.; Wang, Y.; Dillig, I.; Reps, T.W. Component-based synthesis for complex APIs. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, Paris, France, 15–21 January 2017; pp. 599–612. [Google Scholar]

Figure 1. PRISMA flow diagram of the literature search process.

Figure 2. Number of studies by category and year.

Figure 3. Number of studies by input type and year.

Figure 4. Number of studies by representation and year.

Figure 5. Number of dataset usages (larger than one) per problem type.

Figure 6. Distribution of number of works per type of SBPS algorithm used for each addressed problem.

Figure 7. Distribution of number of works per type of problem addressed by each representation type.

Table 1. Inclusion and exclusion criteria.

Criteria	Description
Inclusion (IC1)	The work focuses on program synthesis using a search-based algorithm.
Exclusion (EC1)	The work is not written in English.
Exclusion (EC2)	The work was published before January 2013.
Exclusion (EC3)	The work is a secondary study.
Exclusion (EC4)	The work is a minor incremental improvement of the approach.

Table 2. Algorithms used for SBPS in selected publications.

Category	Subcategories		Papers	Count
Uninformed	Random search		[55]	1	16
	Enumerative search		[10,13,56,57,58,59,60]	7
	Best-first search		[9,15,61]	3
	Depth-first search		[17]	1
	Top-down search		[7,62]	2
	Bottom-up search		[63]	1
	Breadth-first search		[64]	1
Heuristic	A* search		[8,65,66]	3	14
	Probability-based search		[67]	1
	Neurally guided search		[68]	1
	Bidirectional search		[69]	1
	Divide and conquer search		[70,71]	2
	Gradient-descent search		[72]	1
	Backtracking search		[73]	1
	Lenses with priority queue		[12]	1
	Trace-guided search		[74]	1
	Expert rule tree search		[75]	1
	Proof search		[76]	1
Metaheuristic	Local search		[77,78,79,80]	4	34
	Gravitational search		[5]	1
	Swarm intelligence		[1,2]	2
	Biogeography-based programming		[81]	1
	Evolution	Push GP	[40,82,83]	3
		Grammatical Evolution	[4,6,84,85]	4
		Code-building GP	[86,87,88]	3
		Tree-based GP	[3,21,89,90,91,92]	6
		Counterexample-guided GP	[14,93]	2
		Linear GP	[11,94]	2
		GP combined with other technique	[16,23,95,96,97,98]	6
Other	Database-based iterative search		[22,99,100]	3	5
Other	API search		[101,102]	2	5

Table 3. Representation used for SBPS in selected papers.

Representation	Count	Papers
Tree	26	[1,2,3,7,9,11,16,17,21,23,57,58,62,63,64,69,74,76,77,80,81,89,90,91,92,93]
Rule-based	13	[8,10,12,15,23,56,59,60,61,71,72,73,78]
Linear code sequence	12	[22,55,68,70,75,79,95,96,98,100,101,102]
Graph	6	[65,66,67,86,87,88]
String	9	[4,5,6,13,14,84,85,94,97]
Stack	3	[40,82,83]

Table 4. Representation used for SBPS with algorithm type.

Representation	Uninformed	Heuristic	Metaheuristic		Other
Representation	Uninformed	Heuristic	Non-Evolution	Evolution	Other
Tree	[7,9,17,57,58,62,63,64]	[69,76]	[1,2,77,80]	[3,11,16,21,23,89,90,91,92,93]
Rule-based	[10,15,56,59,60,61]	[8,72,73]	[78]	[23]
Linear code sequence	[55]	[70,75]	[79]	[95,96,98]	[22,99,100,101,102]
Graph		[65,66,67]		[86,87,88]
String			[5]	[4,6,14,84,85,94,97]
Stack				[40,82,83]

Table 5. Problem type with the used datasets in selected publications.

Problem Type	Dataset	Count		Papers
Symbolic Regression	Custom Benchmark	9		[1,2,3,4,5,6,7,92,97]
String Manipulation	SyGuS	10	5	[8,9,10,11,63]
String Manipulation	Custom Benchmark	10	5	[12,13,14,15,62]
Circuit Transformation	SyGuS	6	3	[8,63,69]
Circuit Transformation	Custom Benchmark	6	3	[1,4,94]
Array/Vector Transformation	OpenAI Gym toolkit	18	1	[98]
	Sorting		2	[16,85]
	SyGuS		9	[8,9,11,63,69,79]
	Custom Benchmark		6	[17,59,60,76,95,97]
General Coding	Apache dataset (JAVA)	29	1	[100]
	SyPet (JAVA)		3	[67,101,102]
	PSB1		7	[21,40,78,82,86,88,91]
	PSB2		1	[83]
	Algebra Calculation		6	[14,16,17,93,95,96]
	Computer Vision		1	[75]
	Array.prototype (Java Script)		1	[55]
	java.util(Java)		1	[73]
	Custom Benchmark		8	[7,14,70,74,84,87,89,102]
Other	ASCII Art	21	1	[15]
	Path Finding		2	[6,15]
	ML Pipeline		1	[22]
	Custom Real-World Problem		4	[3,4,71,72]
	Game of Tic-Tac-Toe		1	[23]
	Data Transformation		5	[10,58,59,65,66]
	Job Shop Scheduling		2	[77,90]
	User Study		1	[57]
	Feature Construction		1	[5]
	Network Analysis		1	[61]
	Inverse Constructive Solid Geometry		1	[80]
	Nuclear Power Software Development		1	[64]

Table 6. Target problem types per SBPS algorithm type.

Problem	Uninformed	Heuristic	Metaheuristic		Other
Problem	Uninformed	Heuristic	Non-Evolution	Evolution	Other
Symbolic Regression			[1,2,5]	[3,4,6,92]
String Manipulation	[7,9,10,15,62,63]	[8]		[11,14]
Circuit Transformation	[63]	[8,69]	[1]	[4]
Array/Vector Transformation	[9,17,59,60,63]	[8,69,76]	[78,79]	[11,16,85,95,98]
General Coding	[7,17,55]	[67,70,73]	[78]	[14,16,21,40,82,83,84,86,87,88,89,91,93,95,96,97]	[99,100,101,102]
Other	[10,15,57,58,59,61,64]	[65,66,72,75]	[5,77,80]	[3,4,6,23,90,94]	[22]

Table 7. Target problem type assessed for SBPS with representation type.

Problem	Tree	Rule-Based	Linear Code Sequence	Graph	String	Stack
Symbolic Regression	[1,2,3,5,69,92]				[4,5,6]
String Manipulation	[7,9,62,63]	[8,10,11,12,15]			[13,14]
Circuit Transformation	[1,63,69]	[8]			[4,94]
Array/Vector Transformation	[9,11,16,17,63,76]	[59]	[8,60,79,95,98]	[79]	[85,97]
General Coding	[7,16,17,21,74,89,91,93]	[73,78]	[55,70,75,95,96,100,101,102]	[67,86,87,88]	[14,84]	[40,83]
Other	[3,4,5,15,23,57,58,64,77,80,90]	[15,23,59,61,71,72,78]	[22,70]	[65,66]	[5,6]	[82]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Saber, T.; Tao, N. Review and Mapping of Search-Based Approaches for Program Synthesis. Information 2025, 16, 401. https://doi.org/10.3390/info16050401

AMA Style

Saber T, Tao N. Review and Mapping of Search-Based Approaches for Program Synthesis. Information. 2025; 16(5):401. https://doi.org/10.3390/info16050401

Chicago/Turabian Style

Saber, Takfarinas, and Ning Tao. 2025. "Review and Mapping of Search-Based Approaches for Program Synthesis" Information 16, no. 5: 401. https://doi.org/10.3390/info16050401

APA Style

Saber, T., & Tao, N. (2025). Review and Mapping of Search-Based Approaches for Program Synthesis. Information, 16(5), 401. https://doi.org/10.3390/info16050401

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Review and Mapping of Search-Based Approaches for Program Synthesis

Abstract

1. Introduction

2. Background and Related Work

2.1. Program Synthesis in Automated Code Generation

2.2. Search-Based Program Synthesis

2.3. Reviews Related to Search-Based Program Synthesis

2.4. LLM-Based Program Synthesis

3. Methodology

3.1. Definition of Research Questions

3.2. Search for Relevant Papers

3.3. Paper Screening and Selection

4. Analysis

4.1. Main SBPS Techniques and Trends (RQ1)

4.2. Guiding Principles of SBPS Algorithms (RQ2)

4.2.1. Analysis of SBPS Algorithm Input Type (RQ2.1)

4.2.2. Representation of SBPS Search Space (RQ2.2)

4.3. Type of Task Targeted by Each SBPS Algorithm (RQ3)

5. Existing Challenge in SBPS

5.1. Bridging Theory and Practice

5.2. Advancing Algorithms: Tools, Strategies, and Evolution

5.3. Absence of a Common Benchmark

5.4. Computational Challenges in Search-Based Program Synthesis

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI