Cogniﬁcation of Program Synthesis—A Systematic Feature-Oriented Analysis and Future Direction

: Program synthesis is deﬁned as a software development step aims at achieving an automatic process of code generation that is satisfactory given high-level speciﬁcations. There are various program synthesis applications built on Machine Learning (ML) and Natural Language Processing (NLP) based approaches. Recently, there have been remarkable advancements in the Artiﬁcial Intelligent (AI) domain. The rise in advanced ML techniques has been remarkable. Deep Learning (DL), for instance, is considered an example of a currently attractive research ﬁeld that has led to advances in the areas of ML and NLP. With this advancement, there is a need to gain greater beneﬁts from these approaches to cognify synthesis processes for next-generation model-driven engineering (MDE) framework. In this work, a systematic domain analysis is conducted to explore the extent to the automatic generation of code can be enabled via the next generation of cogniﬁed MDE frameworks that support recent DL and NLP techniques. After identifying critical features that might be considered when distinguishing synthesis systems, it will be possible to introduce a conceptual design for the future involving program synthesis / MDE frameworks. By searching di ﬀ erent research database sources, 182 articles related to program synthesis approaches and their applications were identiﬁed. After deﬁning research questions, structuring the domain analysis, and applying inclusion and exclusion criteria on the classiﬁcation scheme, 170 out of 182 articles were considered in a three-phase systematic analysis, guided by some research questions. The analysis is introduced as a key contribution. The results are documented using feature diagrams as a comprehensive feature model of program synthesis showing alternative techniques and architectures. The achieved outcomes serve as motivation for introducing a conceptual architectural design of the next generation of cogniﬁed MDE frameworks.


Introduction
Since the early days of computer science, the automatic generation of correct, complete, and executable program code from high-level logical specifications has been a grand ambition. Program synthesis is defined as the automatic process of constructing executable programs that satisfy a given high-level specification. It is also considered a type of code translation that lowers the abstraction level of program code [1], unlike compilers which only accept correctly written high-level source code to be able to extract certain low-level facts after performing direct translation of the syntax and generate a platform-specific machine code to run.
Synthesizers, on the other hand, can accept various types of high-level specifications, such as logic formula, domain-specific language, grammar, natural language, and even partial program source code. Instead of applying direct translation into low-level executable code, a suitable search technique is applied over some space [2] in order to achieve the solution language (final code) or proof (logical 1.
The first level of analysis identifies the differences between paradigms of program synthesis in terms of the synthesizing approach adopted in each paradigm. This level does not compare the detailed features of the program synthesis approach and its applications.

2.
The second level compares different program synthesis approaches in terms of the main three dimensions, namely, user intent, search space and search techniques. 3.
The third level identifies the applications of program synthesis from a software engineering perspective.
The resulting alternatives are demonstrated and documented using several feature diagrams.

Definition of Research Questions
As one of the main goals of this work is not only adding recent investigations to give clarity to the domain of program synthesis overall, but it also aims to explore possible design choices and search for technique alternatives related to the four core dimensions of synthesis approaches. Eight research questions (RQs) that guide the domain analysis and contribute to the proposal of a conceptual design were selected after analyzing briefly related works and previous reviews.
The output of the analysis is provided at the end to readers as an architectural design of the promising cognified MDE code generation tool. Figure 1 demonstrates the model of domain analysis adopted in this work. These questions are addressed were as follows: RQ1: What was reported in peer-reviewed literature about paradigms and approaches to program synthesis with respect to the Software Engineering domain between 2003 and 2019? RQ2: What are the main characteristics (features) of each program synthesis approach reported in peer-reviewed research literature? RQ3: What are the common alternatives and trends in describing the user intent? RQ4: What are the common alternatives for describing the program space? RQ5: What are the common alternative approaches and trends used to deal with the synthesis problem? RQ6: What are common trending techniques that are applied over the program space for solving synthesis problem? RQ7: What are the spectra and trends of the program synthesis applications? RQ8: What kinds of search strategy were used for each application of program synthesis between 2009 and 2019?
Computers 2020, 9, x FOR PEER REVIEW 4 of 41

Strategy of Domain Analysis
In order to cover all possible alternative designs and techniques in the core dimensions of program synthesis (user intent, program space and search techniques and synthesis applications), the domain analysis is organized in three levels of reviewing and evaluating research papers. The analysis phases are listed as follows: 1. The first level of analysis identifies the differences between paradigms of program synthesis in terms of the synthesizing approach adopted in each paradigm. This level does not compare the detailed features of the program synthesis approach and its applications. 2. The second level compares different program synthesis approaches in terms of the main three dimensions, namely, user intent, search space and search techniques. 3. The third level identifies the applications of program synthesis from a software engineering perspective.
The resulting alternatives are demonstrated and documented using several feature diagrams.

Definition of Research Questions
As one of the main goals of this work is not only adding recent investigations to give clarity to the domain of program synthesis overall, but it also aims to explore possible design choices and search for technique alternatives related to the four core dimensions of synthesis approaches. Eight research questions (RQs) that guide the domain analysis and contribute to the proposal of a conceptual design were selected after analyzing briefly related works and previous reviews.
The output of the analysis is provided at the end to readers as an architectural design of the promising cognified MDE code generation tool. Figure 1 demonstrates the model of domain analysis adopted in this work. These questions are addressed were as follows:

Conduction of the Domain Analysis
A guided domain analysis of program synthesis was applied via a systematic literature review of the existing program synthesis approaches and frameworks from different research database sources available in the following globally recognized academic collections from 2003-2019: Scopus and ISI Web of Science, namely, IEEE Xplore, ACM Digital Library, ScienceDirect and Springer. The search covered two main directions (focuses): synthesis approaches and techniques (between 2003 and 2019) and their applications (between 2009 and 2019). This was accomplished to ensure the identification of a wider range of relevant research publications and proceedings that discuss program synthesis topics as well as their applications in formal software development methods. As it clearly seen in Figure 2, despite the broad time range chosen for this review and analysis, the number of considered publications noticeably increased in the most recent years, especially from 2009, reflecting of the active and more interesting research in the software engineering domain.
Computers 2020, 9, x FOR PEER REVIEW 5 of 41

Conduction of the Domain Analysis
A guided domain analysis of program synthesis was applied via a systematic literature review of the existing program synthesis approaches and frameworks from different research database sources available in the following globally recognized academic collections from 2003-2019: Scopus and ISI Web of Science, namely, IEEE Xplore, ACM Digital Library, ScienceDirect and Springer. The search covered two main directions (focuses): synthesis approaches and techniques (between 2003 and 2019) and their applications (between 2009 and 2019). This was accomplished to ensure the identification of a wider range of relevant research publications and proceedings that discuss program synthesis topics as well as their applications in formal software development methods. As it clearly seen in Figure 2, despite the broad time range chosen for this review and analysis, the number of considered publications noticeably increased in the most recent years, especially from 2009, reflecting of the active and more interesting research in the software engineering domain. The conducted search strategy was based on technical keywords with various search strings related to the domain of program synthesis, its paradigms, approaches, and applications. The search was executed in the selected research databases using the strings and keywords. The retrieved papers were inspected manually, one-by-one to check whether or not they were relevant to the research focus. Table 1 provides the detailed terms, as utilized in the search process. It is worth mentioning that the total number of selected publications was 182 papers before considering the RQs, IC, EC, and CS.  The conducted search strategy was based on technical keywords with various search strings related to the domain of program synthesis, its paradigms, approaches, and applications. The search was executed in the selected research databases using the strings and keywords. The retrieved papers were inspected manually, one-by-one to check whether or not they were relevant to the research focus. Table 1 provides the detailed terms, as utilized in the search process. It is worth mentioning that the total number of selected publications was 182 papers before considering the RQs, IC, EC, and CS.

Inclusion and Exclusion Criteria
All resulting publications from the search strategy were initially reviewed, and only three categories of proceedings were considered: conferences, workshops, and symposiums, besides journal Computers 2020, 9,27 6 of 41 articles and reports. From the 182 papers, only 12 papers that were either abstracts, positions, tool demos, posters, or non-English articles were excluded. In addition, two papers that were published before 2009 and related to the applications of program synthesis were eliminated, and four papers that were published before 2003 and related to the approaches and techniques of program synthesis were also eliminated, as illustrated previously in Figure 2. The following figure (Figure 3) shows the types of research publications included in the domain analysis.
Computers 2020, 9, x FOR PEER REVIEW 6 of 41

Inclusion and Exclusion Criteria
All resulting publications from the search strategy were initially reviewed, and only three categories of proceedings were considered: conferences, workshops, and symposiums, besides journal articles and reports. From the 182 papers, only 12 papers that were either abstracts, positions, tool demos, posters, or non-English articles were excluded. In addition, two papers that were published before 2009 and related to the applications of program synthesis were eliminated, and four papers that were published before 2003 and related to the approaches and techniques of program synthesis were also eliminated, as illustrated previously in Figure 2. The following figure (Figure 3) shows the types of research publications included in the domain analysis.

Classification Scheme
According to our research questions and inclusion and exclusion criteria, the 170 selected publications were classified, initially, based on their main topics, into categories belonging to three aspects. These aspects were paradigms of program synthesis, features of program synthesis approaches (techniques), and applications of program synthesis (Table 2).
For the features of program synthesis aspect, the same key dimensions were used to present the categories of program synthesis. It is worth mentioning that the classification schema was developed and changed after starting the data extraction from selected articles and papers. This led to grouping and splitting of some categories. The final classification schema for this work is illustrated in Table 2.  Table 3 summarizes the information about synthesis techniques and their application domains collected from the extracted publications with respect to the defined RQs, IC, EC, and categorization scheme (CS).

Classification Scheme
According to our research questions and inclusion and exclusion criteria, the 170 selected publications were classified, initially, based on their main topics, into categories belonging to three aspects. These aspects were paradigms of program synthesis, features of program synthesis approaches (techniques), and applications of program synthesis (Table 2). For the features of program synthesis aspect, the same key dimensions were used to present the categories of program synthesis. It is worth mentioning that the classification schema was developed and changed after starting the data extraction from selected articles and papers. This led to grouping and splitting of some categories. The final classification schema for this work is illustrated in Table 2. Table 3 summarizes the information about synthesis techniques and their application domains collected from the extracted publications with respect to the defined RQs, IC, EC, and categorization scheme (CS).  [65] Verification approach to solve the synthesis problem, specifications of atomic operations are used as input/output intent. Abstract finite tree automata used for expressing an initial program.

Data Extraction Strategy
12 [66,67] Logic with approximation approaches to solve the synthesis problem.
22  Domain of code repair and correction.

Paradigms of Program Synthesis
Since the 1960s, the problem of constructing the executable code of a desired program from higher-level descriptions has been considered. Two main synthesis paradigms have been distinguished over time, namely the inductive approach and the deductive approach. The inductive approach aims to derive the final program from some traces at a high level of specification, whereas the deductive approach aims to construct the final program from a type of specification that expresses a relationship between the input and output of a desired program [91,96,97,99,100,174].
The deductive program synthesis paradigm represents the large umbrella of the classic old school synthesis approach that many synthesis frameworks, over time, come under. Researchers who are interested in formal software engineering methods, where various activities are based on deductive reasoning (reasoning from the general to the specific), support the automation of development and processes, such as generating code from domain-specific languages or unified modeling language (UML) diagrams (transform a given low-level platform-specific model into executable code), as well as code optimization, theorem proving, model checking (proving that a given intermediate or low-level model meets a given specification), code static analysis, test case generation (producing automatic inputs that cause a code to fail to meet its high-level specifications), code verification (proving that a given code meets a given specification), code transformation (transforming a given code into an equivalent and more efficient one), logic programming (executing a code written in logic), and more [96,[174][175][176][177].
On the other hand, the inductive program synthesis paradigm denotes another wider scope, which includes a variety of synthesis frameworks that have been emerging further with the rise of machine learning and artificial intelligence techniques during the most recent decades. These kinds of synthesis systems are based on inductive reasoning (reasoning from the specific to the general). They work, statistically, with a specific amount of data from a problem, such as I/O examples, test cases, computation traces of code, and desirable/undesirable behavior of a code to identify general patterns in data, and then they generate more complete programs [91,96,100].
In order to answer the first research question (RQ1) adequately, first, we had to distinguish between the inductive approaches and the deductive ones that fall within the scope of the inductive synthesis paradigm and deductive synthesis paradigm, respectively. This was achieved by applying the first level of domain analysis to research papers belonging to all subcategories associated with the first aspect. By going back to Table 3, it was found that from the 12 selected research papers that four papers were grouped under the deductive paradigm of program synthesis and discussed various approaches, whereas the rest of the publications (eight papers) were grouped as publications of an inductive paradigm of program synthesis and its related approaches ( Figure 4).
Computers 2020, 9, x FOR PEER REVIEW 8 of 41

Paradigms of Program Synthesis
Since the 1960s, the problem of constructing the executable code of a desired program from higher-level descriptions has been considered. Two main synthesis paradigms have been distinguished over time, namely the inductive approach and the deductive approach. The inductive approach aims to derive the final program from some traces at a high level of specification, whereas the deductive approach aims to construct the final program from a type of specification that expresses a relationship between the input and output of a desired program [91,96,97,99,100,174].
The deductive program synthesis paradigm represents the large umbrella of the classic old school synthesis approach that many synthesis frameworks, over time, come under. Researchers who are interested in formal software engineering methods, where various activities are based on deductive reasoning (reasoning from the general to the specific), support the automation of development and processes, such as generating code from domain-specific languages or unified modeling language (UML) diagrams (transform a given low-level platform-specific model into executable code), as well as code optimization, theorem proving, model checking (proving that a given intermediate or low-level model meets a given specification), code static analysis, test case generation (producing automatic inputs that cause a code to fail to meet its high-level specifications), code verification (proving that a given code meets a given specification), code transformation (transforming a given code into an equivalent and more efficient one), logic programming (executing a code written in logic), and more [96,[174][175][176][177].
On the other hand, the inductive program synthesis paradigm denotes another wider scope, which includes a variety of synthesis frameworks that have been emerging further with the rise of machine learning and artificial intelligence techniques during the most recent decades. These kinds of synthesis systems are based on inductive reasoning (reasoning from the specific to the general). They work, statistically, with a specific amount of data from a problem, such as I/O examples, test cases, computation traces of code, and desirable/undesirable behavior of a code to identify general patterns in data, and then they generate more complete programs [91,96,100].
In order to answer the first research question (RQ1) adequately, first, we had to distinguish between the inductive approaches and the deductive ones that fall within the scope of the inductive synthesis paradigm and deductive synthesis paradigm, respectively. This was achieved by applying the first level of domain analysis to research papers belonging to all subcategories associated with the first aspect. By going back to Table 3, it was found that from the 12 selected research papers that four papers were grouped under the deductive paradigm of program synthesis and discussed various approaches, whereas the rest of the publications (eight papers) were grouped as publications of an inductive paradigm of program synthesis and its related approaches ( Figure 4).   The detailed results of the domain analysis in this part of the survey are documented using a feature diagram ( Figure 5). The diagram illustrates all possible inductive-based and deductive-based program synthesis approaches covered in the domain analysis. The following subsections describe, in brief, the synthesis paradigm features illustrated in Figure 5.
Computers 2020, 9, x FOR PEER REVIEW 9 of 41 The detailed results of the domain analysis in this part of the survey are documented using a feature diagram ( Figure 5). The diagram illustrates all possible inductive-based and deductive-based program synthesis approaches covered in the domain analysis. The following subsections describe, in brief, the synthesis paradigm features illustrated in Figure 5.

Inductive Program Synthesis Paradigm
This feature groups some characteristics for distinguishing approaches that follow the inductive paradigm of synthesis. As mentioned earlier, synthesizers that fall under this paradigm learn, statistically, from different forms of data, such as I/O examples, test cases, or code traces using a suitable searching algorithm or a machine learning (ML) technique to obtain final programs. The learner is considered a critical part of the design of synthesizers in inductive synthesis frameworks. The details of possible search techniques and learning methods, as well as the types of examples that might be given for the synthesis system are discussed and documented in Section 6.
Synthesis frameworks, based on the type of data provided for learning, have their own characteristics that might be used to classify inductive synthesis frameworks. This is documented in the lower layers of the feature diagram illustrated in Figure 5.
• Example-Guided: Unlike synthesis approaches that completely rely on deductive techniques to assemble user-intended programs, here, learning synthesizers from a small number of examples are required. • Oracle-Guided: Frameworks under this subcategory use a querying system (Oracle) as part of the synthesizer design. They focus on the Oracle to answer (interactively) queries produced by a learner, such as the counterexample-guided approaches discussed in [95]. • Component-based: The process of assembling programs (loop-free) avoids the use of formal specifications and replaces them with a collection of existing functions/methods, composed to provide building blocks that are required for obtaining implementation detail. The component functions are interchangeable with a set of library functions provided by an application program interface (API) [92,93]. User intent can be described via a set of input/output examples, for example in [94], or even as a set of test cases, for example, in the work presented in [92]. In addition, the FrAngel approach [178] is able to synthesize loops and other control structures using a desired signature.

Deductive Program Synthesis Paradigm
This groups some subfeatures used in distinguishing approaches that follow the deductive paradigm of program synthesis. The classified frameworks under this paradigm allows a user to provide clear statements to specify a program without describing how they want to implement it. A proof system is then used to produce the final code from the given specifications. There are many formal proof methods that can be adopted in a program synthesis framework that is based on the

Inductive Program Synthesis Paradigm
This feature groups some characteristics for distinguishing approaches that follow the inductive paradigm of synthesis. As mentioned earlier, synthesizers that fall under this paradigm learn, statistically, from different forms of data, such as I/O examples, test cases, or code traces using a suitable searching algorithm or a machine learning (ML) technique to obtain final programs. The learner is considered a critical part of the design of synthesizers in inductive synthesis frameworks. The details of possible search techniques and learning methods, as well as the types of examples that might be given for the synthesis system are discussed and documented in Section 6.
Synthesis frameworks, based on the type of data provided for learning, have their own characteristics that might be used to classify inductive synthesis frameworks. This is documented in the lower layers of the feature diagram illustrated in Figure 5.
• Example-Guided: Unlike synthesis approaches that completely rely on deductive techniques to assemble user-intended programs, here, learning synthesizers from a small number of examples are required. • Oracle-Guided: Frameworks under this subcategory use a querying system (Oracle) as part of the synthesizer design. They focus on the Oracle to answer (interactively) queries produced by a learner, such as the counterexample-guided approaches discussed in [95]. • Component-based: The process of assembling programs (loop-free) avoids the use of formal specifications and replaces them with a collection of existing functions/methods, composed to provide building blocks that are required for obtaining implementation detail. The component functions are interchangeable with a set of library functions provided by an application program interface (API) [92,93]. User intent can be described via a set of input/output examples, for example in [94], or even as a set of test cases, for example, in the work presented in [92]. In addition, the FrAngel approach [178] is able to synthesize loops and other control structures using a desired signature.

Deductive Program Synthesis Paradigm
This groups some subfeatures used in distinguishing approaches that follow the deductive paradigm of program synthesis. The classified frameworks under this paradigm allows a user to provide clear statements to specify a program without describing how they want to implement it.
Computers 2020, 9,27 10 of 41 A proof system is then used to produce the final code from the given specifications. There are many formal proof methods that can be adopted in a program synthesis framework that is based on the deductive paradigm. Each method has its own characteristics that might be used to classify synthesis frameworks. This is documented using the optional operator in the lower layers of Figure 5.
• Theorem-proving: This subfeature groups the deductive approaches that consider the derivation process of the final programs from specification as a problem of proving a mathematical or logical theorem. The given specification normally describes the relation between input and output without explaining the recommended way of implementing and computing it. It is worth mentioning that when trying to construct a program with recursive or iterative loops, the process of applying theorem-proving becomes more complex [172]. In this technique, for any input object of the program, the existence of an output that meets conditions is proved formally by one or more theories, such as:

Mathematical Induction:
A proof method that is based on the principle of mathematical induction. Predicate Logic Extension: A formal proof method based on logical theories.

•
Transformation Rules: This subfeature groups the deductive approaches that rely on the direct application of transformations or program rewriting rules to a specification of a desired program [172]. The program derivation process, in this instance, is not regarded as a process of proving a theorem but as a process of transformational steps [171]. There are three mechanisms for expressing rules of transformation [30,135], as follows: Declarative Rules: Each transformation rule is designed as a relation between the source and target without going into operational detail of how the relation can be achieved. The rules can be implemented using transformation languages such as Query/View/Transformation (QVT) Relations. Imperative Rules: Each transformation rule is specified and designed as a number of operational mapping steps, which are required to obtain the target from the source, showing how the transformation itself is performed. The rules can be implemented using any Object-Oriented Programming (OOP) language, such as Java or C++. Hybrid Rules: A combination of both declarative and imperative rules, which is represented in Figure 5. It is mandatory that (at least one) notation follows the features of transformation rules.

Features of Program Synthesis
This section aims to answer research questions RQ2, RQ3, RQ4, and RQ5. There are three main perspectives that must be considered if we want to investigate, classify, analyze, design, or construct a program synthesis system, namely, users (developers, programmers), programs of interest, and search techniques [101]. Dealing with these perspectives helps researchers and developers to draw a comprehensive view of program synthesis frameworks and systems. In previous works presented in [101,102,135], three dimensions of program synthesis that tackle these perspectives were discussed, namely, user intent, search space, and search technique. In the following subsections, these dimensions are reintroduced as top-level features that reflect critical points in the variation of program synthesis systems in the feature model ( Figure 6).
After applying the second level of the domain analysis during the literature survey, results were documented using some feature diagrams, which are presented later in the following subsections. The following subsections explain and discuss features of the program synthesis approaches demonstrated in Figure 6. The mandatory notation is used in the above FD as every program synthesis system must consist of the three subfeatures to generate the final code from the high-level user intent specifications.
search techniques [101]. Dealing with these perspectives helps researchers and developers to draw a comprehensive view of program synthesis frameworks and systems. In previous works presented in [101,102,135], three dimensions of program synthesis that tackle these perspectives were discussed, namely, user intent, search space, and search technique. In the following subsections, these dimensions are reintroduced as top-level features that reflect critical points in the variation of program synthesis systems in the feature model ( Figure 6).

Developer & User Intent Specifications
Describing intent, or the specifications on the desired program [102], is the first significant dimension that is related to users or developers. When applying the adopted strategy of data extraction, which is based on the proposed RQs, IC, EC and Classification Scheme, it was found that the total number of papers that are grouped together under the Features and Techniques category with a focus on user intent was 50. Publications belonging to references groups: 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 are included in Table 3.
According to the domain analysis conducted on these groups of papers to start answering RQ2, there are different ways of expressing the user intent adopted in the various program synthesis approaches, such as input/output examples, formal specifications, logical relations or formulas, demonstrations, test cases, partial programs, (restricted) natural languages, and traces [101,102]. Figure 7 shows the distribution of publications on the various methods for expressing user intent. After reviewing the selected 50 papers and evaluating the publication dates and the methods of expressing user intent, the RQ3 was completely answered and the changes in expressing user intent trends between 2003 and 2019 are summarized in Figure 8. After applying the second level of the domain analysis during the literature survey, results were documented using some feature diagrams, which are presented later in the following subsections. The following subsections explain and discuss features of the program synthesis approaches demonstrated in Figure 6. The mandatory notation is used in the above FD as every program synthesis system must consist of the three subfeatures to generate the final code from the high-level user intent specifications.

Developer & User Intent Specifications
Describing intent, or the specifications on the desired program [102], is the first significant dimension that is related to users or developers. When applying the adopted strategy of data extraction, which is based on the proposed RQs, IC, EC and Classification Scheme, it was found that the total number of papers that are grouped together under the Features and Techniques category with a focus on user intent was 50. Publications belonging to references groups: 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 are included in Table 3.
According to the domain analysis conducted on these groups of papers to start answering RQ2, there are different ways of expressing the user intent adopted in the various program synthesis approaches, such as input/output examples, formal specifications, logical relations or formulas, demonstrations, test cases, partial programs, (restricted) natural languages, and traces [101,102]. Figure 7 shows the distribution of publications on the various methods for expressing user intent. After reviewing the selected 50 papers and evaluating the publication dates and the methods of expressing user intent, the RQ3 was completely answered and the changes in expressing user intent trends between 2003 and 2019 are summarized in Figure 8.    After applying the second level of the domain analysis during the literature survey, results were documented using some feature diagrams, which are presented later in the following subsections. The following subsections explain and discuss features of the program synthesis approaches demonstrated in Figure 6. The mandatory notation is used in the above FD as every program synthesis system must consist of the three subfeatures to generate the final code from the high-level user intent specifications.

Developer & User Intent Specifications
Describing intent, or the specifications on the desired program [102], is the first significant dimension that is related to users or developers. When applying the adopted strategy of data extraction, which is based on the proposed RQs, IC, EC and Classification Scheme, it was found that the total number of papers that are grouped together under the Features and Techniques category with a focus on user intent was 50. Publications belonging to references groups: 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 are included in Table 3.
According to the domain analysis conducted on these groups of papers to start answering RQ2, there are different ways of expressing the user intent adopted in the various program synthesis approaches, such as input/output examples, formal specifications, logical relations or formulas, demonstrations, test cases, partial programs, (restricted) natural languages, and traces [101,102]. Figure 7 shows the distribution of publications on the various methods for expressing user intent. After reviewing the selected 50 papers and evaluating the publication dates and the methods of expressing user intent, the RQ3 was completely answered and the changes in expressing user intent trends between 2003 and 2019 are summarized in Figure 8.    Common features of these forms of user intent were derived after reviewing each paper; the mechanisms for describing user intent were named and determined during the second level of domain analysis as syntax-based, semantics-based, symbolic-based, and example-based mechanisms. The following feature diagram ( Figure 9) illustrates the mechanisms used to describe the user intent.
Computers 2020, 9, x FOR PEER REVIEW 12 of 41 Common features of these forms of user intent were derived after reviewing each paper; the mechanisms for describing user intent were named and determined during the second level of domain analysis as syntax-based, semantics-based, symbolic-based, and example-based mechanisms. The following feature diagram ( Figure 9) illustrates the mechanisms used to describe the user intent.
Frameworks that fall under the syntax-based subfeature focus on a different format of syntax provided to the synthesis framework as user intent to produce the target code [179]. A mandatory (at least one) notation is used here to indicate that any combination of subfeatures is possible. The following points highlight, compare, and distinguish program synthesis approaches, briefly, in terms of the subfeatures illustrated in Figure 9. • Natural Languages: Synthesis frameworks that fall under this subcategory use natural languages to describe the user intent program. Synthesizers then work to produce a program code from the NL description using a learning algorithm such as reinforcement or maximum marginal likelihood. The approaches presented in [116,139], and [19] are examples of synthesis frameworks that use an NL description for expressing user intent. Additionally, other frameworks, such as Tellina, adopt Recurrent Neural Networks to translate a program described using a natural language into an executable program [15]. • Program Sketches: According to frameworks presented in [25], synthesis frameworks allow a user to write an incomplete program (a program with holes or missing details) and a synthesizer and then derive the low-level implementation detail from the sketches by filling all given holes based on previously specified assertions.
• Domain-Specific Languages: A Domain-Specific Language (DSL) is a restricted set of a programming language that is designed to be understood and adopted for a particular domain. Similar to the structure of general-purpose programming languages (like C++ and Java), a DSL is a set of typed and annotated symbol definitions that form the DSL terminology [97]. These symbols can be either terminals or non-terminals that are defined using some high-level specification rules (e.g., context-free grammar). Each rule describes the transformation of every non-terminal into another non-terminal or terminal token of the language. All possible transformation operators and source symbols (tokens) are typed and located on the right-hand-side of the rules. Every symbol in the grammar is annotated with a corresponding output. The PROgram Synthesis using examples (PROSE) approach [31] is an example of a program that falls under the deductive synthesis paradigm (explained previously in Section 5) where the synthesis problem is solved using transformation rules and version space algebra such as FlashExtract [44] and FlashMeta [32] Additionally, the solver-aided DSL (Rosette) that is based on theorem proving technique is designed in another DSL based approach for solving synthesis problem [33]. Frameworks that fall under the syntax-based subfeature focus on a different format of syntax provided to the synthesis framework as user intent to produce the target code [179]. A mandatory (at least one) notation is used here to indicate that any combination of subfeatures is possible. The following points highlight, compare, and distinguish program synthesis approaches, briefly, in terms of the subfeatures illustrated in Figure 9.
• Natural Languages: Synthesis frameworks that fall under this subcategory use natural languages to describe the user intent program. Synthesizers then work to produce a program code from the NL description using a learning algorithm such as reinforcement or maximum marginal likelihood. The approaches presented in [116,139], and [19] are examples of synthesis frameworks that use an NL description for expressing user intent. Additionally, other frameworks, such as Tellina, adopt Recurrent Neural Networks to translate a program described using a natural language into an executable program [15]. • Program Sketches: According to frameworks presented in [25], synthesis frameworks allow a user to write an incomplete program (a program with holes or missing details) and a synthesizer and then derive the low-level implementation detail from the sketches by filling all given holes based on previously specified assertions. • Domain-Specific Languages: A Domain-Specific Language (DSL) is a restricted set of a programming language that is designed to be understood and adopted for a particular domain. Similar to the structure of general-purpose programming languages (like C++ and Java), a DSL is a set of typed and annotated symbol definitions that form the DSL terminology [97]. These symbols can be either terminals or non-terminals that are defined using some high-level specification rules (e.g., context-free grammar). Each rule describes the transformation of every non-terminal into another non-terminal or terminal token of the language. All possible transformation operators and source symbols (tokens) are typed and located on the right-hand-side of the rules. Every symbol in the grammar is annotated with a corresponding output. The PROgram Synthesis using examples (PROSE) approach [31] is an example of a program that falls under the deductive synthesis paradigm (explained previously in Section 5) where the synthesis problem is solved using transformation rules and version space algebra such as FlashExtract [44] and FlashMeta [32] Additionally, the solver-aided DSL (Rosette) that is based on theorem proving technique is designed in another DSL based approach for solving synthesis problem [33].
In addition, there are some synthesis frameworks that focus on learning their synthesizers using different forms of examples given as user intent, instead of providing a syntactic representation of the desired code. These frameworks are classified under the example-based category ( Figure 9). The types of examples can be either one or a combination of I/O examples and counterexamples, or even traces.
Programming by examples, as done in [16,39,41,44,47], is a common approach where a user expresses a desired code behavior using a set of I/O example pairs, and the synthesis tool constructs an executable implementation from these examples. This kind of synthesis approach provides an interactive interface between the user and the synthesizer that allows the user to provide input/output example pairs until the desired program is reached, such as the approaches provided in [41,44,47]. • Counterexample: Synthesizers in frameworks under this subfeature adopt the so-called "Counterexample-guided" inductive synthesis strategy to produce possible candidate implementations from concrete examples of program behavior, whether this behavior is correct or not [27]. The synthesizer, in this instance, acts as a verifier or an Oracle in some approaches like [94,95] that performs a validation process on the candidate implementation code and produces (generates) counterexamples from its context to be used in the following iteration as input fed to the synthesizer. The counterexamples in this mechanism are used, iteratively, instead of new knowledge-free I/O examples generated for each solving iteration [27].
Moreover, frameworks that fall under the symbolic-based (computational) approach treat the program synthesis as a computational problem. Constraints, logic formulas, finite-state-machines, and context-free grammar are examples of symbolic notations that might be used to solve various computational problems in computer science. They can be adopted in the symbolic-guided synthesis framework as a representation of the synthesis problem to obtain the target code.
• Logic Formulas: The use of logical formulas is considered to be one of the classic methods for expressing high-level specifications of programs. There are two kinds of specifications considered for describing programs: semantics specifications and syntactic specifications. Frameworks that follow these subcategories use logic to describe semantic specifications, whereas they use grammar (e.g., context-free grammar) to describe constraints of syntactic specifications. Together, grammar and syntactic constraints provide a comprehensive template for the desired program. Using a template benefits the synthesizer by reducing the program search space [50,85]. • Constraints: Approaches that fall under this subfeature use formal language such as context-free grammar (attribute grammar) as a language for describing rich, structured constraints over desired programs. This kind of synthesis approach tries to tackle the problem of learning synthesizers, a rich set of constraints that must be satisfied from provided data, which is considered a difficult mission. The work presented in [54] is an example of this kind of synthesis framework. • Finite Automata (FA): Frameworks under this subcategory allow a user to describe the desired program partially using finite state machines (FSMs) or, in some approaches, Extended FSMs with execution specifications and invariants to construct an FSM skeleton of the program. The synthesizer then completes the FSM skeleton from the desired specifications and invariants (supplied with the skeleton) using an inference technique [55]. In a TRANSIT tool [55], for instance, a computational-guided synthesis approach is adopted for a reactive system, where each process is expressed as an Extended FSM. The description of processes consists of a collection of internal state variables and control states and the transitions between them. The synthesis approach works by specifying these transitions through a set of guard conditions and an update code by inferring expressions using symbolic forms of functions, variables, and examples (Concolic snippets) to achieve a consistent system behavior.
• Grammar: In addition, there are other frameworks that may involve program synthesis activities, such as program analysis and debugging. They use semantics information about a program, such as a bug report or memory address, instead of using actual program syntax or formal representations of it (e.g., grammar). These frameworks are categorized as under the semantics-based subcategory, demonstrated in Figure 9. It is commonly known that execution traces of programs consist of rich semantics information about the code. Thus, the use of execution traces has become widely accepted in the domains of program analysis and synthesis, which has brought remarkable results [57]. According to [57], many learning processes of (learning-based) synthesizers [58,59,152], are improved when using execution traces generated from I  [62]. This is because execution traces have been widely used for program analysis [63,64], where the traces are given as input to identify detailed (technical) characteristics about a program. Trace information may contain significant detail about the program, including dependencies, control flows (paths), values, memory addresses, and the inter-relationship between them. Reverse engineering techniques and tools are used to analyze traces and understand all possible scenarios and dynamic behaviors related to the code [64].

Search Space of the Program
The search space of a program is considered to be the domain of programs over which the desired program will be searched. Expressiveness and efficiency are two significant characteristics that must be considered by search space developers when designing the search space. On one hand, the expressiveness of the space should be adequate to describe all programs that users require. On the other hand, the space should be designed with a good degree of restrictiveness to allow it to perform an efficient search [148]. Reaching this balance between expressiveness and efficiency allows developers to create a good code synthesizer.
When applying the adopted strategy of data extraction, it was found that the total number of papers grouped under the program search space category was six. This is illustrated as group 16 (Table 3). According to the analysis conducted on these groups to continue answering RQ4, there is a variety of ways in which the search space can be expressed, for example, as a subset of an existing programing language, domain-specific language, context-free grammar, deterministic/non-deterministic FA, or logics [84,148].
During the second level of systematic domain analysis on the search space, it was found that templates are widely used across almost all kinds of program synthesis approaches [85,[87][88][89]. Templates are considered to be a common technique that enables developers (users) to provide high-level insights about target programs to a synthesis framework using a generic programming or meta-programming feature (technique) available in some programming languages, such as C++, to create a template of a desired program. Template-based synthesis approaches can reduce the search problem and optimize the solving performance. The detail of possible types of solvers is covered later in other sections where the synthesis frameworks are categorized based on different adopted search strategies [88]. The creation of templates using programming languages or even formal specification languages (e.g., Z, Petri Net or Abstract Syntax Tree (AST)) or logic is considered a critical and difficult task, as the solver needs to translate the template back into an appropriate form for performing formal reasoning, such as logics or grammar [88] and then produce the complete target code.
After reviewing the selected six papers, it was found that the search space of a program can be expressed using four alternatives, namely, programming languages, logic, grammar, and domain specific languages. From that, RQ4 was completely answered. At the completion of this level of the domain analysis on the search space, the results were documented using a feature diagram ( Figure 10). Some possible language combinations may appear to form the final search space template. That is why the mandatory (at least one) notation is used in Figure 10. After reviewing the selected six papers, it was found that the search space of a program can be expressed using four alternatives, namely, programming languages, logic, grammar, and domain specific languages. From that, RQ4 was completely answered. At the completion of this level of the domain analysis on the search space, the results were documented using a feature diagram ( Figure  10). Some possible language combinations may appear to form the final search space template. That is why the mandatory (at least one) notation is used in Figure 10.

Search Strategy
As mentioned earlier, the program synthesis problem is defined as a problem of finding an executable program that satisfies some high-level specifications and constraints. The process of searching over a program space to solve this problem is considered one of the three critical dimensions of any program synthesis approach. There are various search techniques and algorithms that might be adopted when designing code synthesizers based on whether the user intent specification is expressed via examples, partial program code, example pairs, or formal specifications [84,148].
This section is used to answer both RQ5 and RQ6. To answer RQ5 first, the data extraction strategy was applied, in which all publications belonging to reference groups 11, 12, 13, 14, and 15 were included ( Table 3). The total number of publications considered under the search strategy category at this step was 21. These papers were evaluated based on the methods adopted for dealing with the synthesis problem and its variations over the period between 2005 and the middle of 2019. It was found that the program synthesis problem is tackled and treated from different perspectives as five kinds of computational problem, namely, the verification problem, the constraints satisfaction (solving) problem, the machine approximation problem, the combinatorial optimization problem, and the learning (statistical) problem ( Figure 11).
The alternative searching techniques used for solving the synthesis problem are documented in a feature diagram demonstrated in Figure 12. Additionally, in order to highlight the changes in this issue, Figure 13 summarizes the changes in handling the synthesis problem between 2005 and the middle of 2019. A remarkable increase in adopting machine learning (ML) and its related techniques and optimization techniques as search techniques for solving synthesis problems can be observed. Secondly, in order to answer RQ6, again the data extraction strategy was applied once more, in which all publications belonging to reference groups 11,12,13,14,15,18,19,20,21, and 22 were included (Table 3).

Search Strategy
As mentioned earlier, the program synthesis problem is defined as a problem of finding an executable program that satisfies some high-level specifications and constraints. The process of searching over a program space to solve this problem is considered one of the three critical dimensions of any program synthesis approach. There are various search techniques and algorithms that might be adopted when designing code synthesizers based on whether the user intent specification is expressed via examples, partial program code, example pairs, or formal specifications [84,148].
This section is used to answer both RQ5 and RQ6. To answer RQ5 first, the data extraction strategy was applied, in which all publications belonging to reference groups 11, 12, 13, 14, and 15 were included ( Table 3). The total number of publications considered under the search strategy category at this step was 21. These papers were evaluated based on the methods adopted for dealing with the synthesis problem and its variations over the period between 2005 and the middle of 2019. It was found that the program synthesis problem is tackled and treated from different perspectives as five kinds of computational problem, namely, the verification problem, the constraints satisfaction (solving) problem, the machine approximation problem, the combinatorial optimization problem, and the learning (statistical) problem ( Figure 11).
The alternative searching techniques used for solving the synthesis problem are documented in a feature diagram demonstrated in Figure 12. Additionally, in order to highlight the changes in this issue, Figure 13 summarizes the changes in handling the synthesis problem between 2005 and the middle of 2019. A remarkable increase in adopting machine learning (ML) and its related techniques and optimization techniques as search techniques for solving synthesis problems can be observed. Secondly, in order to answer RQ6, again the data extraction strategy was applied once more, in which all publications belonging to reference groups 11,12,13,14,15,18,19,20,21, and 22 were included (Table 3). The total number of publications considered under the search strategy category at this step was 95, after eliminating some publications that did not mainly cover search techniques. These papers were evaluated based on the adopted technique used for solving the synthesis problem. Figure 14 demonstrates the distribution of publications on the solving techniques used for searching the program space to solve the synthesis problem. It is worth mentioning that the findings represented in Figure 14 were also used to answer RQ8, as described in the following section (Section 7). The following subsections compare and distinguish program synthesis approaches based on the features of the search technique, as demonstrated in the above top-level feature diagram ( Figure 12).  The total number of publications considered under the search strategy category at this step was 95, after eliminating some publications that did not mainly cover search techniques. These papers were evaluated based on the adopted technique used for solving the synthesis problem. Figure 14 demonstrates the distribution of publications on the solving techniques used for searching the program space to solve the synthesis problem. It is worth mentioning that the findings represented in Figure 14 were also used to answer RQ8, as described in the following section (Section 7). The following subsections compare and distinguish program synthesis approaches based on the features of the search technique, as demonstrated in the above top-level feature diagram ( Figure 12).    The total number of publications considered under the search strategy category at this step was 95, after eliminating some publications that did not mainly cover search techniques. These papers were evaluated based on the adopted technique used for solving the synthesis problem. Figure 14 demonstrates the distribution of publications on the solving techniques used for searching the program space to solve the synthesis problem. It is worth mentioning that the findings represented in Figure 14 were also used to answer RQ8, as described in the following section (Section 7). The following subsections compare and distinguish program synthesis approaches based on the features of the search technique, as demonstrated in the above top-level feature diagram ( Figure 12).  The total number of publications considered under the search strategy category at this step was 95, after eliminating some publications that did not mainly cover search techniques. These papers were evaluated based on the adopted technique used for solving the synthesis problem. Figure 14 demonstrates the distribution of publications on the solving techniques used for searching the program space to solve the synthesis problem. It is worth mentioning that the findings represented in Figure 14 were also used to answer RQ8, as described in the following section (Section 7). The following subsections compare and distinguish program synthesis approaches based on the features of the search technique, as demonstrated in the above top-level feature diagram ( Figure 12). Computers 2020, 9, x FOR PEER REVIEW 17 of 41 Figure 14. Distribution of publications on the solving techniques used for solving synthesis problem.
• Verification: Verification can be defined as a process of solving a problem by checking that a program satisfies a high-level specification on all inputs brought from a very large or infinite set. Program synthesis frameworks that fall under the verification subcategory encode the synthesis problem as a verification problem to be solved using certain verification tools. This kind of program synthesis follows the correct-by-construction philosophy from the old school program design approach that appeared in the 1970s and 1980s. In the program verification approach, as discussed in [60], code statements are encoded as logical facts with some guards to be examined. The tool then infers some invariants and program statements until the program is synthesized automatically. The proof of correctness for these generated conditions is then produced theoretically using reasoning techniques. According to [65], the second phase of the common counterexample-guided inductive program synthesis approach is based on iterative verification processes. A verification step is performed on the candidate program in order to discover a counterexample input that violates the specification. This process continues until the candidate program checking is accomplished by either passing the verification check or failing a synthesis check [65].
• Approximation: Traditional program synthesis frameworks generally produce programs that only meet specifications without the guarantee that they will be the optimal solution. Some synthesis approaches treat the program synthesis problem as an approximation problem. The approximation problem usually results in the discovery of approximate solutions (the nearest solution) to the optimal one. Synthesis approaches in this category aim to automatically produce optimal programs that approximately meet a desired correctness specification with certain attributes, for example, the fastest program [66,67]. According to [67], a collection of large problems, where candidate programs have a search space that is too large and hard to explore, is introduced as a PARROT benchmark suite. The sketch-based program synthesis approach is used to solve PARROT problems, resulting in more efficient program solutions with a reasonable level of accuracy for all seven problems. It is worth mentioning that the syntax-based synthesis framework (SyGuS) fails to solve any PARROT collection problem [67].
• Combinatorial Optimization: Combinatorial optimization is a type of mathematical optimization that reflects, in general terms, the process of selecting the best value that satisfies some given criterion from some available alternatives [74]. Combinatorial optimization is considered to be a field of theoretical computer science that solves discrete optimization problems through finding an optimal solution from a finite set of possibilities [74]. In the • Verification: Verification can be defined as a process of solving a problem by checking that a program satisfies a high-level specification on all inputs brought from a very large or infinite set. Program synthesis frameworks that fall under the verification subcategory encode the synthesis problem as a verification problem to be solved using certain verification tools. This kind of program synthesis follows the correct-by-construction philosophy from the old school program design approach that appeared in the 1970s and 1980s. In the program verification approach, as discussed in [60], code statements are encoded as logical facts with some guards to be examined. The tool then infers some invariants and program statements until the program is synthesized automatically. The proof of correctness for these generated conditions is then produced theoretically using reasoning techniques. According to [65], the second phase of the common counterexample-guided inductive program synthesis approach is based on iterative verification processes. A verification step is performed on the candidate program in order to discover a counterexample input that violates the specification. This process continues until the candidate program checking is accomplished by either passing the verification check or failing a synthesis check [65].
• Approximation: Traditional program synthesis frameworks generally produce programs that only meet specifications without the guarantee that they will be the optimal solution. Some synthesis approaches treat the program synthesis problem as an approximation problem. The approximation problem usually results in the discovery of approximate solutions (the nearest solution) to the optimal one. Synthesis approaches in this category aim to automatically produce optimal programs that approximately meet a desired correctness specification with certain attributes, for example, the fastest program [66,67]. According to [67], a collection of large problems, where candidate programs have a search space that is too large and hard to explore, is introduced as a PARROT benchmark suite. The sketch-based program synthesis approach is used to solve PARROT problems, resulting in more efficient program solutions with a reasonable level of accuracy for all seven problems. It is worth mentioning that the syntax-based synthesis framework (SyGuS) fails to solve any PARROT collection problem [67].
• Combinatorial Optimization: Combinatorial optimization is a type of mathematical optimization that reflects, in general terms, the process of selecting the best value that satisfies some given criterion from some available alternatives [74]. Combinatorial optimization is considered to be a field of theoretical computer science that solves discrete optimization problems through finding an optimal solution from a finite set of possibilities [74]. In the optimization process, an original program uters 2020, 9, x FOR PEER REVIEW 18 of 41 optimization process, an original program is given, and a search technique is applied over the program space to find another functionally equivalent program that satisfies some constraints for performance enhancement [69]. It improves the search performance by reducing the size of the search space. Although various real-world problems, such as finding variable assignments that satisfy constraints, partitioning graphs, coloring graphs, and more can be solved numerically by combinatorial optimization, most of these problems are often subject to uncertainty [74]. This caused the emergence of two widely adopted resolution methods of combinatorial optimization, namely, Stochastic Optimization and Deterministic Optimization (Figure 15). In some approaches, both techniques may be used together. This is illustrated in the FD as a mandatory (at least one) subfeature. Stochastic Optimization: The Stochastic Optimization method involves solving combinatorial optimization problems that involve uncertainties, whereas the deterministic one focuses on finding solutions for combinatorial optimization problems by evaluating a finite set of discrete variables. For each method, several efficient algorithms have been designed and successful search techniques have been adopted for solving many real-world problems, including program synthesis (demonstrated in the detailed feature diagram in Figure 16). According to the domain analysis, the Evolutionary Algorithm (Genetic programming), Dynamic Programming Algorithm (e.g., Viterbi algorithm), and Simulated Annealing are stochastic algorithms that are used in several program synthesis frameworks as techniques for seeking the target code constructs from the high-level specifications [68,155]. o Dynamic Programming: Dynamic Programming is an optimization technique that simplifies complex problems by boiling them down into many overlapping subproblems. Solutions of these simpler subproblems are combined to provide an optimal solution to the complicated problem. In a nested problem structure, a relation between the value of the larger problem and the values of the subproblems is specified, is given, and a search technique is applied over the program space to find another functionally equivalent program Computers 2020, 9, x FOR PEER REVIEW 18 of 41 optimization process, an original program is given, and a search technique is applied over the program space to find another functionally equivalent program that satisfies some constraints for performance enhancement [69]. It improves the search performance by reducing the size of the search space. Although various real-world problems, such as finding variable assignments that satisfy constraints, partitioning graphs, coloring graphs, and more can be solved numerically by combinatorial optimization, most of these problems are often subject to uncertainty [74]. This caused the emergence of two widely adopted resolution methods of combinatorial optimization, namely, Stochastic Optimization and Deterministic Optimization (Figure 15). In some approaches, both techniques may be used together. This is illustrated in the FD as a mandatory (at least one) subfeature.

•
Stochastic Optimization: The Stochastic Optimization method involves solving combinatorial optimization problems that involve uncertainties, whereas the deterministic one focuses on finding solutions for combinatorial optimization problems by evaluating a finite set of discrete variables. For each method, several efficient algorithms have been designed and successful search techniques have been adopted for solving many real-world problems, including program synthesis (demonstrated in the detailed feature diagram in Figure 16). According to the domain analysis, the Evolutionary Algorithm (Genetic programming), Dynamic Programming Algorithm (e.g., Viterbi algorithm), and Simulated Annealing are stochastic algorithms that are used in several program synthesis frameworks as techniques for seeking the target code constructs from the high-level specifications [68,155]. o Dynamic Programming: Dynamic Programming is an optimization technique that simplifies complex problems by boiling them down into many overlapping subproblems. Solutions of these simpler subproblems are combined to provide an optimal solution to the complicated problem. In a nested problem structure, a relation between the value of the larger problem and the values of the subproblems is specified, that satisfies some constraints for performance enhancement [69]. It improves the search performance by reducing the size of the search space. Although various real-world problems, such as finding variable assignments that satisfy constraints, partitioning graphs, coloring graphs, and more can be solved numerically by combinatorial optimization, most of these problems are often subject to uncertainty [74].
This caused the emergence of two widely adopted resolution methods of combinatorial optimization, namely, Stochastic Optimization and Deterministic Optimization (Figure 15). In some approaches, both techniques may be used together. This is illustrated in the FD as a mandatory (at least one) subfeature.
Computers 2020, 9, x FOR PEER REVIEW 18 of 41 optimization process, an original program is given, and a search technique is applied over the program space to find another functionally equivalent program that satisfies some constraints for performance enhancement [69]. It improves the search performance by reducing the size of the search space. Although various real-world problems, such as finding variable assignments that satisfy constraints, partitioning graphs, coloring graphs, and more can be solved numerically by combinatorial optimization, most of these problems are often subject to uncertainty [74]. This caused the emergence of two widely adopted resolution methods of combinatorial optimization, namely, Stochastic Optimization and Deterministic Optimization (Figure 15). In some approaches, both techniques may be used together. This is illustrated in the FD as a mandatory (at least one) subfeature.

•
Stochastic Optimization: The Stochastic Optimization method involves solving combinatorial optimization problems that involve uncertainties, whereas the deterministic one focuses on finding solutions for combinatorial optimization problems by evaluating a finite set of discrete variables. For each method, several efficient algorithms have been designed and successful search techniques have been adopted for solving many real-world problems, including program synthesis (demonstrated in the detailed feature diagram in Figure 16). According to the domain analysis, the Evolutionary Algorithm (Genetic programming), Dynamic Programming Algorithm (e.g., Viterbi algorithm), and Simulated Annealing are stochastic algorithms that are used in several program synthesis frameworks as techniques for seeking the target code constructs from the high-level specifications [68,155]. o Dynamic Programming: Dynamic Programming is an optimization technique that simplifies complex problems by boiling them down into many overlapping subproblems. Solutions of these simpler subproblems are combined to provide an optimal solution to the complicated problem. In a nested problem structure, a relation between the value of the larger problem and the values of the subproblems is specified, • Stochastic Optimization: The Stochastic Optimization method involves solving combinatorial optimization problems that involve uncertainties, whereas the deterministic one focuses on finding solutions for combinatorial optimization problems by evaluating a finite set of discrete variables. For each method, several efficient algorithms have been designed and successful search techniques have been adopted for solving many real-world problems, including program synthesis (demonstrated in the detailed feature diagram in Figure 16). According to the domain analysis, the Evolutionary Algorithm (Genetic programming), Dynamic Programming Algorithm (e.g., Viterbi algorithm), and Simulated Annealing are stochastic algorithms that are used in several program synthesis frameworks as techniques for seeking the target code constructs from the high-level specifications [68,155].
Computers 2020, 9, x FOR PEER REVIEW 18 of 41 optimization process, an original program is given, and a search technique is applied over the program space to find another functionally equivalent program that satisfies some constraints for performance enhancement [69]. It improves the search performance by reducing the size of the search space. Although various real-world problems, such as finding variable assignments that satisfy constraints, partitioning graphs, coloring graphs, and more can be solved numerically by combinatorial optimization, most of these problems are often subject to uncertainty [74]. This caused the emergence of two widely adopted resolution methods of combinatorial optimization, namely, Stochastic Optimization and Deterministic Optimization (Figure 15). In some approaches, both techniques may be used together. This is illustrated in the FD as a mandatory (at least one) subfeature.

•
Stochastic Optimization: The Stochastic Optimization method involves solving combinatorial optimization problems that involve uncertainties, whereas the deterministic one focuses on finding solutions for combinatorial optimization problems by evaluating a finite set of discrete variables. For each method, several efficient algorithms have been designed and successful search techniques have been adopted for solving many real-world problems, including program synthesis (demonstrated in the detailed feature diagram in Figure 16). According to the domain analysis, the Evolutionary Algorithm (Genetic programming), Dynamic Programming Algorithm (e.g., Viterbi algorithm), and Simulated Annealing are stochastic algorithms that are used in several program synthesis frameworks as techniques for seeking the target code constructs from the high-level specifications [68,155]. o Dynamic Programming: Dynamic Programming is an optimization technique that simplifies complex problems by boiling them down into many overlapping subproblems. Solutions of these simpler subproblems are combined to provide an optimal solution to the complicated problem. In a nested problem structure, a relation between the value of the larger problem and the values of the subproblems is specified, Dynamic Programming: Dynamic Programming is an optimization technique that simplifies complex problems by boiling them down into many overlapping subproblems. Solutions of these simpler subproblems are combined to provide an optimal solution to the complicated problem. In a nested problem structure, a relation between the value of the larger problem and the values of the subproblems is specified, and each computed value of a subproblem's solution is used recursively to find the overall optimal solution to the problem [70]. It is worth mentioning that the synthesis problem must be described in a way in which its solution is constructed from some solutions to overlap subproblems [71]. Many implementation algorithms that improve the overall performance of the optimization process are based on dynamic programing, such as the divide-and conquer [70] and linear-time dynamic programming algorithms presented in [71]. Simulated Annealing: The principles of simulated annealing were inspired and inherited from physical properties in annealing solid mechanics. In physics, defects of solids are removed first by heating the solids up to a high temperature and then transforming them into crystal materials by a slow cooling process. At the highest temperature, the material is considered to be at the highest (max) energy state, whereas the minimum energy state is the frozen state [72].
Simulated annealing was introduced into the domain of computer science as a probabilistic strategy for solving combinatorial optimization problems with a large search space. For example, simulated annealing is used, via the Real-Time Software System Generator (RT-Syn) framework [72], to minimize the related resource costs of software applications, including design and maintenance, by synthesizing the implementation detail of the design. When considering program synthesis as an optimization problem, some crucial implementation decisions must be involved during problem resolution, such as data structures, control flows, and algorithms. In simulated annealing, the program space is treated as a configuration space that encompasses all legal decisions.
Iteratively, a random current feasible design with some perturbations (move set) is proposed. At the end, this move set must achieve all feasible designs in the design space. In each iteration, a cost function is used to measure the goodness of the current design in order to find the best design that can be reached. The last characteristic of the simulation is the cooling schedule, which mimics the cooling process of materials in physics. Moves in the high-energy state that decrease gradually in the cost function are accepted to produce a suboptimal solution, whereas a quick decrease results in a near-optimal solution to the problem [72]. Evolutionary Algorithm: An evolutionary algorithm is a kind of generic population-based optimization that is inspired by biological evolution mechanisms, such as reproduction, mutation, recombination, and selection. In biology, biological changes in characteristics, or evolution, occur when evolutionary mechanisms and genetic recombination react to these changes, resulting in different characteristics becoming more common or hidden in the population in the following generations [96,169]. In the domain of computer science, In the domain of computer science, algorithms for solving optimization problem is applied to a population of individuals, where fitness functions are used iteratively over the population to evolve the quality of the final solution [73]. Genetic Programming (GP): is a kind of evolutionary algorithm that uses genetic operations, namely, mutation, crossover, and selection to evolve its populations iteratively, until the best solutions to a given optimization problem are achieved [154]. It performs better than the exhaustive search when searching a problem (program) space that is too broad, because the search over the space is guided by the measures produced by the fitness function [96,169]. GP is considered one of the common techniques that is applied in the domain of program synthesis and automatic program repairs [73,96,154,155,169].

•
Deterministic Optimization: On the other hand, the simpler alternative to the Stochastic method is the Deterministic Optimization method, which is used in some synthesis frameworks as an implemented search technique ( Figure 16). Exhaustive Enumeration is considered to be a very general search-based problem-solving technique that involves all possible alternatives Computers 2020, 9, 27 20 of 41 to be examined during the problem resolution process in order to find the optimal solution to the problem [68]. The brute-force algorithm is considered to be a common technique of exhaustive enumeration optimization, as noted during the conducted domain analysis. In this optimization, three kinds of collection optimization input are given: formal representation of candidate expressions E, logical specification and constraints S, and a finite set of examples X.
The targeted problem to be solved must satisfy the following (First-Order Predicate Logic with equality) formal rule: solve(e, s, x)) FALSE (1) However, the rapid (exponential) growth of the search space, which occurs due to the program's size or other reason, is considered a crucial problem that deterministic-based synthesizers may face, even when a powerful optimization technique is adopted. In order to solve this problem, the synthesis framework must be improved to guide the search using the weighted directed graph and decision tree in the approaches mentioned in [55,69], respectively. A probabilistic model is used as guidance for the search-based synthesizer.
The model takes a set of program tokens, including terminal and non-terminal ones, and produces a probability for each production rule. A weighted directed graph with a sentential form for each node and a calculated weight for each edge is then derived from the model. The enumeration search based on this improved structure decreases the search by considering the shortest path from the source node via graph search algorithms such as Dijkstra's algorithm [69].
• Constraint Solving: The theory behind constraint solving program synthesis begins by expressing the semantics of a given program in some logic formulas. Instead of compiling the program into such a low-level executable machine code, it is compiled into logical constraints (formulas) as an intermediate representation of the given program. A solver-based strategy is then applied via solver-aided verification or synthesis tools to solve the condition satisfaction problem through proofing the correctness of the given program. It tries to find an input that makes the program fail (if it exists) when such a constraint is unsatisfied in an automatically generated test. Here, the program synthesis problem is treated as the Constraints' Satisfaction Problem (CSP). The CSP can be defined as a collection of mathematical questions that are considered to be objects that must satisfy some constraints. Some intensive research has been conducted in the artificial intelligence (AI) and operational research domains when solving the CSP.
The feature diagram shown in Figure 17 classifies program synthesis frameworks with respect to those approaches that solve the synthesis problem as a CSP using theorem provers (logical reasoning techniques). The solving approach can be achieved by adopting either the Boolean Satisfiability Problem (SAT solver), the Satisfiability Modulo Theories (SMT solver), or a combination of both. A common solving strategy that is based on logical reasoning is aimed at reducing the second-order search problem to (first-order) constraint solving first. A type of solver (SAT or SMT) is then used to solve the constraint problem. The solving-based tool can be integrated within some program synthesis approaches like syntax-based synthesis, as discussed in [78]. This is expressed in the following FD by the mandatory (at least one) notation. Computers 2020, 9,   Solvers can be implemented using two strategies, namely, solver-aided programming and an algorithmic based approach. In the algorithmic-based approach, the written implementation is often complex and hard to understand with an informal correctness proof. It is described normally using a high-level specification language (e.g., Hoare logic) supported by one theorem proving system (e.g., Isabelle [74]) to prove its correctness [79].
Another approach for implementing solvers is adopting an appropriate solver-aided domain specific language (e.g., Rosette [33,48]) and tools. DSLs are used to package the insights and knowledge of domain experts and allow other people who are interested in that domain application to effectively solve problems in that domain [30]. Rosette is a solver-aided DSL that is built on the top of a programmable programming language called Racket to enable the development of the kinds of tools that are based on program verification and synthesis concepts [33,48].
Unlike the algorithmic-based approach where a compiler must be built from a programming language into the constraint solving system, which is an extremely hard task, DSL simplifies the task by building these special kinds of compliers to build an interpreter for the DSL language, or just a library or an API when using an embedded type of DSL. The interpreter of the language requires a so-called symbolic virtual machine to translate the given program semantics into constraints. While using solver-aided DSL language, the synthesis framework becomes simpler and better, as the translation from the language into constraints is obtained automatically [33,48].
o Boolean Satisfiability Problem (SAT solver): The Boolean Satisfiability Problem (SAT) can be defined as a problem for checking whether or not a formula that is expressed using Boolean logic is satisfiable. SAT solving is considered the cornerstone of several software engineering applications, such as system design, model checking and hardware, debugging, pattern generation, and software verification [33]. The SAT problem is denoted as the first proven nondeterministic polynomial time (NP-complete problem) in which algorithms in their worst-case complexity that involve thousands of variables and millions of constraints are used for solving [33,75]. There are several program synthesis frameworks that use the SAT solver to resolve the synthesis problem, which is implemented based on an algorithmic approach using C++ or Python. For instance, the SKETCH framework utilizes the SAT solving technique in a counterexample-guided iteration that interacts with a verifier to check the candidate program against the specification and generates counterexamples until the final program that meets the complete specifications is found [27,94]. Additionally, SAT solving and the so-called gradient-based numerical optimization technique are combined and used for solving program synthesis problems in the Real Synthesis (REAS) framework [76]. The search space in REAS is explored using the SAT Solver for solving constraints on discrete variables to fix the set of Boolean expressions that appear in the program structure. This allows better tolerance with approximation errors, which leads to efficient Solvers can be implemented using two strategies, namely, solver-aided programming and an algorithmic based approach. In the algorithmic-based approach, the written implementation is often complex and hard to understand with an informal correctness proof. It is described normally using a high-level specification language (e.g., Hoare logic) supported by one theorem proving system (e.g., Isabelle [74]) to prove its correctness [79].
Another approach for implementing solvers is adopting an appropriate solver-aided domain specific language (e.g., Rosette [33,48]) and tools. DSLs are used to package the insights and knowledge of domain experts and allow other people who are interested in that domain application to effectively solve problems in that domain [30]. Rosette is a solver-aided DSL that is built on the top of a programmable programming language called Racket to enable the development of the kinds of tools that are based on program verification and synthesis concepts [33,48].
Unlike the algorithmic-based approach where a compiler must be built from a programming language into the constraint solving system, which is an extremely hard task, DSL simplifies the task by building these special kinds of compliers to build an interpreter for the DSL language, or just a library or an API when using an embedded type of DSL. The interpreter of the language requires a so-called symbolic virtual machine to translate the given program semantics into constraints. While using solver-aided DSL language, the synthesis framework becomes simpler and better, as the translation from the language into constraints is obtained automatically [33,48].
Boolean Satisfiability Problem (SAT solver): The Boolean Satisfiability Problem (SAT) can be defined as a problem for checking whether or not a formula that is expressed using Boolean logic is satisfiable. SAT solving is considered the cornerstone of several software engineering applications, such as system design, model checking and hardware, debugging, pattern generation, and software verification [33]. The SAT problem is denoted as the first proven nondeterministic polynomial time (NP-complete problem) in which algorithms in their worst-case complexity that involve thousands of variables and millions of constraints are used for solving [33,75]. There are several program synthesis frameworks that use the SAT solver to resolve the synthesis problem, which is implemented based on an algorithmic approach using C++ or Python. For instance, the SKETCH framework utilizes the SAT solving technique in a counterexample-guided iteration that interacts with a verifier to check the candidate program against the specification and generates counterexamples until the final program that meets the complete specifications is found [27,94]. Additionally, SAT solving and the so-called gradient-based numerical optimization technique are combined and used for solving program synthesis problems in the Real Synthesis (REAS) framework [76]. The search space in REAS is explored using the SAT Solver for solving constraints on discrete variables to fix the set of Boolean expressions that appear in the program structure. This allows better tolerance with approximation errors, which leads to efficient approximation results. The REAS technique is implemented within the SKETCH framework. The end user, a programmer, writes their program with a set of unknowns using the high-level SKETCH language to express the intent. These unknowns are Boolean expressions (constraints) that need to be solved [76]. Satisfiability Modulo Theories (SMT solver): The Satisfiability Modulo Theories (SMT) is a technique that is used to find satisfying solutions for the First-Order Logic (FOL) with an equality formula. The FOL formulas include the Boolean operations, belonging to Boolean Logic, which have more complicated expressions than variables including functions, predicates, and constants, as sometimes, the adoption of SAT solvers for a program synthesis problem requires richer logic formulas. Thus, in SMT formulas, some propositional variables in the SAT formula are replaced with some First-Order predicates. These predicates are Boolean functions that return the Boolean values of some variables [77]. The use of the Satisfiability Modulo Theory (SMT) solvers has emerged as a useful tool for verification, symbolic execution, theorem proving, and program synthesis approaches. There are many available SMT solvers, such as Z3 and the Cooperating Validity Checker (CVC4), that are used for solving the program synthesis problem. These frameworks are implemented based on an algorithmic approach using general-purposes programming languages [77].
For instance, Z3 is a new SMT solver that is implemented in C++ and produced by Microsoft Research to tackle software analysis and verification problems. It works as a reasoning engine that proves the correctness of programs or discovers their errors by analyzing the verification conditions. Additionally, Z3 acts as a test case generation tool in which it is used to produce new test cases with different behaviors from the execution traces of the program [77]. Some techniques appear to integrate SMT solving tools with various synthesis approaches. According to [78], the syntax-guided approach of program synthesis allows users to provide hints to guide the synthesizer to find solutions to its synthesis conjectures. Moreover, SMT solvers are used to solve synthesis conjectures. The CVC4 SMT solver is, as shown in the work presented in [78], extended with some capabilities to make it efficient for synthesis conjectures using two embedded techniques, namely, Quantifier Instantiation and Syntax-Guided Enumeration.

•
Machine Learning: Machine learning (ML) is an application of the artificial intelligence (AI) branch of computer science that enables the machine to learn from a massive amount of data without being explicitly programmed. In the context of software engineering, ML techniques have brought great advances in program synthesis, in which they may be used to create automated tools with better code comprehension ability to help developers to understand and modify their code using knowledge extraction or recognition techniques [81]. Thus, the synthesis problem is introduced here as a machine learning problem. Developers who are interested in following this approach to solve the synthesis problem find themselves faced with a variety of independent choices, expressed in FD 17 by some optional notations. Different learning techniques are used to guide the synthesis search and automatically decompose the problem synthesis, such as deep learning, neural networks, reinforcement learning, and version space learning. These learning styles are illustrated in Figure 18.
Version Space Learning: Version Space Learning is commonly used in programming-by demonstration (PBD) synthesis applications. In the PBD approach, a programmer demonstrates how to perform a task, and the system learns an appropriate representation of the procedure of that task. Version Space is considered to be a logical approach to machine learning where the concepts of learning are described using some logical language. The learning process can be seen as a search function over the space of the hypothesis that maps from a complex object into a binary classification. Different learning algorithms might be used to search over the space. This space is a hypothesis that is a set of disjunction logical formulas, which can be defined as In this approach, the learning algorithm uses a number of training examples to restrict the space of the hypothesis. Each inconsistent hypothesis with a given example is removed from the space. This refinement process of the hypothesis space is called the candidate elimination algorithm.
According to [62], an extended version space with algebraic operations is used for learning the synthesis approach from execution traces of programs, in addition to the inputs and outputs of programs. Algebraic operators, such as the union, intersection, join, and transformation operators are used to construct complex version spaces. This allows an efficient, exhaustive search of the program's space that is consistent with the training execution traces. The designed learner is able to recognize the control structures of a program, such as IF and WHILE statements, as well as an array data structure. In the evaluation of this approach, it was shown to provide correct results from a small number of training examples.
Additionally, the SMARTedit framework was introduced as a (PBD) application based on version space algebra [180]. Reusable version space components have been designed beside version space algebra for the domain of text editing that supports a subset of Emacs command language. Any type of mapping between inputs and output, including Boolean values and structured objects, is considered. o Reinforcement Learning: Reinforcement learning (RL) is considered to be a subfield of machine learning that aims to teach an agent how to perform a specific task and achieve a goal in an uncertain, potentially complex environment. Many RL applications have been emerging with the rapid advancement in the domain of games technology and robotics. In the context of program synthesis, reinforcement learning algorithms are applied within various frameworks to maximize the likelihood of generating semantically correct programs, as well as to tackle program aliasing issues when different programs may satisfy a given specification [19,139]. The learning process can be seen as a search function over the space of the hypothesis that maps from a complex object into a binary classification. Different learning algorithms might be used to search over the space. This space is a hypothesis that is a set of disjunction logical formulas, which can be defined as ∀H m , H n : HYPOTHESIS·(H m ∨ H n ) (2) In this approach, the learning algorithm uses a number of training examples to restrict the space of the hypothesis. Each inconsistent hypothesis with a given example is removed from the space. This refinement process of the hypothesis space is called the candidate elimination algorithm.
According to [62], an extended version space with algebraic operations is used for learning the synthesis approach from execution traces of programs, in addition to the inputs and outputs of programs. Algebraic operators, such as the union, intersection, join, and transformation operators are used to construct complex version spaces. This allows an efficient, exhaustive search of the program's space that is consistent with the training execution traces. The designed learner is able to recognize the control structures of a program, such as IF and WHILE statements, as well as an array data structure. In the evaluation of this approach, it was shown to provide correct results from a small number of training examples.
Additionally, the SMARTedit framework was introduced as a (PBD) application based on version space algebra [180]. Reusable version space components have been designed beside version space algebra for the domain of text editing that supports a subset of Emacs command language. Any type of mapping between inputs and output, including Boolean values and structured objects, is considered.
Reinforcement Learning: Reinforcement learning (RL) is considered to be a subfield of machine learning that aims to teach an agent how to perform a specific task and achieve a goal in an uncertain, potentially complex environment. Many RL applications have been emerging with the rapid advancement in the domain of games technology and robotics. In the context of program synthesis, reinforcement learning algorithms are applied within various frameworks to maximize the likelihood of generating semantically correct programs, as well as to tackle program aliasing issues when different programs may satisfy a given specification [19,139].
It is obvious that there is a limitation in other supervised machine learning techniques when dealing with program aliasing, for example. With the synthesis approach presented in [139], the process of generating any consistent program with the given I/O examples is directly encouraged by using policy gradient reinforcement learning instead of only optimizing the maximum likelihood. Furthermore, a syntax checker is used to prune the space of possible programs, which helps to generate better programs.
Not only this, but reinforcement learning has been presented in a transformation approach from natural language mapping into an executable program, such as the approach presented in [19]; reinforcement learning (RL) has been integrated with the maximum marginal likelihood (MML) paradigm. This resulted in a new learning algorithm that can be applied to a neural semantic parser and showed significant results. It can deal with spurious program bias by adopting an exploration strategy that is based on approximating the policy gradients of both LR and MML, which guide the exploration task [19].
Neural-Network-Based Learning: A neural network can be defined as an interconnected group of artificial neurons that use a mathematical or computational model for information processing. They are used to solve AI problems through building classification and prediction systems to make predictions. According to the domain analysis, neural-network-based approaches to program synthesis have gained greater attention from the software engineering research community. This is reflected in the popularity of NNs for machine learning in recent years. Several recent research works have introduced neural-network-based frameworks and approaches to program synthesis from I/O examples [57]. Deep Learning: As mentioned earlier in this paper, deep learning (DL) can be defined as a branch of machine learning where the architecture of a learning approach consists of multiple layers of data processing units. There is a variety of synthesis frameworks that adopt deep learning techniques, such as deep neural networks (Convolutional and Recurrent NNs) and deep reinforcement learning [116]. The RobustFill framework [82], for instance, is a neural program synthesis framework based on RNN that allows variable-length sets of input/output examples (pairs) to be encoded. Domain Specific Language (DSL) is used in RobustFill to express the collection of transformation rules of different textual operations, such as substring extractions, constant strings, and text conversions. The adopted DSL has the ability to express complex textual expressions (strings) by employing an effective regular expression extraction technique. The DSL takes a given string as the input and returns another string as the output. The synthesis system is trained with a number of I/O examples and has been shown to achieve 92% accuracy. It is worth mentioning that during the conducted domain analysis, we found various deep learning techniques adopted in different program synthesis applications, such as DeepCom [126] and CRAIC [127] for code comment, the CDE-Model [118] for code summarization, DeepRepair [156] for code repair, and RobustFill [82] and DLPaper2Code [145] for code translation and generation.
Further comparison and detail of these frameworks is beyond the scope of this paper. For each mentioned framework, only the kind of DL technique adopted was extracted to be used in developing the lower level of the above-mentioned feature diagram ( Figure 18).

Applications of Program Synthesis
This section aims to answer RQ7 and RQ8 through a concise revision and evaluation of a number of publications that discuss the applications of program synthesis from the software engineering domain perspective. When applying the data extraction technique based on the RQs, IC, EC, and Classification Scheme, the total number of papers grouped together under the applications of program synthesis category was 71. All publications belonged to reference groups 18, 19, 20, 21, and 22 (Table 3).
According to the analysis conducted on these groups of publications to answer RQ7, there are various modern applications built on top of synthesis frameworks, namely, code completion, code repair, code suggestion, code transformation, code summarization, code documentation, and code generation. The following figure (Figure 19) shows the distribution of papers on the different modern program synthesis applications.
According to the analysis conducted on these groups of publications to answer RQ7, there are various modern applications built on top of synthesis frameworks, namely, code completion, code repair, code suggestion, code transformation, code summarization, code documentation, and code generation. The following figure (Figure 19) shows the distribution of papers on the different modern program synthesis applications. In order to completely answer RQ7, changes in the program synthesis applications between 2013 and 2019 are expressed in Figure 20. Remarkable growth in the number of publications related to code documentation, code repair, and code summarization can be observed in the last two years regarding contrast code completion and code transformations/generation. Answers to RQ8 were provided from the interesting findings of RQ6 and RQ7. Various (ML/DL) techniques, such as Neural Networks (NNs), Recurrent Neural Networks (RNNs), Artificial Recurrent Neural Networks (LSTM) and Reinforcement learning have been used in various publications to solve the program synthesis problem in the covered applications with the highest number of selected papers, 27 out of 71 (38%). The percentages of studies related to each application category were as follows: 80% for code transformation/generation, 50% for code summarization papers, 50% for code completion, 30% for code documentation and only 17.8% for code repair papers.  In order to completely answer RQ7, changes in the program synthesis applications between 2013 and 2019 are expressed in Figure 20. Remarkable growth in the number of publications related to code documentation, code repair, and code summarization can be observed in the last two years regarding contrast code completion and code transformations/generation. Table 3).

(
According to the analysis conducted on these groups of publications to answer RQ7, there are various modern applications built on top of synthesis frameworks, namely, code completion, code repair, code suggestion, code transformation, code summarization, code documentation, and code generation. The following figure (Figure 19) shows the distribution of papers on the different modern program synthesis applications.   Answers to RQ8 were provided from the interesting findings of RQ6 and RQ7. Various (ML/DL) techniques, such as Neural Networks (NNs), Recurrent Neural Networks (RNNs), Artificial Recurrent Neural Networks (LSTM) and Reinforcement learning have been used in various publications to solve the program synthesis problem in the covered applications with the highest number of selected papers, 27 out of 71 (38%). The percentages of studies related to each application category were as follows: 80% for code transformation/generation, 50% for code summarization papers, 50% for code completion, 30% for code documentation and only 17.8% for code repair papers.
In addition to this, it was also found that different Natural Languages Processing (NLP) techniques, such as n-gram and pattern-based (nano and micro) techniques, have been used in various publications to solve the synthesis problem in all categories of applications, representing the second highest number of selected papers with 11 out of 71 (15.5%). The most popular types were code summarization papers (with 50%) and code completion papers (with 23%). Figure 21 demonstrates the distribution of program synthesis application.
Computers 2020, 9, x FOR PEER REVIEW 26 of 41 In addition to this, it was also found that different Natural Languages Processing (NLP) techniques, such as n-gram and pattern-based (nano and micro) techniques, have been used in various publications to solve the synthesis problem in all categories of applications, representing the second highest number of selected papers with 11 out of 71 (15.5%). The most popular types were code summarization papers (with 50%) and code completion papers (with 23%). Figure 21 demonstrates the distribution of program synthesis application.

Architectural Design of a Suggested Code Generation Framework
The resulting features of program synthesis approaches from the conducted analysis are discussed in this section by highlighting the future direction for the design of a code generation framework that is based on the cognified code synthesis phase. It is commonly known that the traditional code generation phase focuses on generating a final executable code from platformspecific (design) models (PSMs) using efficient approaches and techniques. This is unlike the code synthesis stage, which aims to produce an executable code from higher specifications (user intent) using sophisticated search techniques and descriptive high-level user intent specifications.
In the proposed work, it is recommended that a learning-based synthesis step, which supports both machine learning and NLP, is considered in the design of the proposed code generation framework to make it cognified. The following subsections characterize the recommended features of a synthesis tier and explain the architectural design, conceptually, without going into depth on the implementation detail or design choices, for example, how to implement the chosen machine learning and NLP algorithms to solve the synthesis problem.

Architectural Design of a Suggested Code Generation Framework
The resulting features of program synthesis approaches from the conducted analysis are discussed in this section by highlighting the future direction for the design of a code generation framework that is based on the cognified code synthesis phase. It is commonly known that the traditional code generation phase focuses on generating a final executable code from platform-specific (design) models (PSMs) using efficient approaches and techniques. This is unlike the code synthesis stage, which aims to produce an executable code from higher specifications (user intent) using sophisticated search techniques and descriptive high-level user intent specifications.
In the proposed work, it is recommended that a learning-based synthesis step, which supports both machine learning and NLP, is considered in the design of the proposed code generation framework to make it cognified. The following subsections characterize the recommended features of a synthesis tier and explain the architectural design, conceptually, without going into depth on the implementation detail or design choices, for example, how to implement the chosen machine learning and NLP algorithms to solve the synthesis problem.

Concepts
The suggested cognified code synthesis tier can be described as a combination of four subfields of computer science, namely, model-driven engineering (MDE), natural languages processing (NLP), computer vision (CV), and artificial intelligence (AI), as illustrated in Figure 22. The suggested cognified code synthesis tier can be described as a combination of four subfields of computer science, namely, model-driven engineering (MDE), natural languages processing (NLP), computer vision (CV), and artificial intelligence (AI), as illustrated in Figure 22.  Figure 23 demonstrates the overall architectural design of the cognified generation framework. It shows that there are two main transformation phases, namely, the code synthesis phase and the code generation phase. Both phases are performed by the synthesis engine component. The code synthesis phase aims to produce a platform-specific design model from a higher and more abstract model expressed using non-technical domain expert knowledge (Intent Model), rather than producing the final executable code. There are four core components in the proposed synthesis engine (Figure 24) that are responsible for orchestrating and executing all required transformations at the synthesis phase to produce the design model. These components are the editor, code recognizer, model recognizer, and transformation engine. On the other hand, the code generation phase consists of a number of domainspecific generators (agents) that are responsible for generating the final executable code for a target environment. The key component that is responsible for performing this process is the code generation engine.

Architecture of the Synthesis Engine
With respect to the variety of user intents covered in the proposed classification system (Section 6), here, the user intent is described using some hand-drawn models containing graphical and textual characteristics. The textual details (features) might be constraints, such as Object Constraint Language (OCL), or a textual DSL, whereas the graphical details reflect the actual graphical notation of some modelling languages, such as UML.
The visual and textual characteristics of models, or the inputs to the synthesis engine, are recognized by the synthesis system via two recognizers ( Figure 24). The ModelRecognizer is used for detecting the graphical features of the hand-drawn models through an appropriate AI system (machine/deep learning) and computer vision techniques. Furthermore, the CodeRecognizer is used to capture the textual features on the models through AI and NLP techniques.  Figure 23 demonstrates the overall architectural design of the cognified generation framework. It shows that there are two main transformation phases, namely, the code synthesis phase and the code generation phase. Both phases are performed by the synthesis engine component. The code synthesis phase aims to produce a platform-specific design model from a higher and more abstract model expressed using non-technical domain expert knowledge (Intent Model), rather than producing the final executable code. The suggested cognified code synthesis tier can be described as a combination of four subfields of computer science, namely, model-driven engineering (MDE), natural languages processing (NLP), computer vision (CV), and artificial intelligence (AI), as illustrated in Figure 22.  Figure 23 demonstrates the overall architectural design of the cognified generation framework. It shows that there are two main transformation phases, namely, the code synthesis phase and the code generation phase. Both phases are performed by the synthesis engine component. The code synthesis phase aims to produce a platform-specific design model from a higher and more abstract model expressed using non-technical domain expert knowledge (Intent Model), rather than producing the final executable code. There are four core components in the proposed synthesis engine (Figure 24) that are responsible for orchestrating and executing all required transformations at the synthesis phase to produce the design model. These components are the editor, code recognizer, model recognizer, and transformation engine. On the other hand, the code generation phase consists of a number of domainspecific generators (agents) that are responsible for generating the final executable code for a target environment. The key component that is responsible for performing this process is the code generation engine.

Architecture of the Synthesis Engine
With respect to the variety of user intents covered in the proposed classification system (Section 6), here, the user intent is described using some hand-drawn models containing graphical and textual characteristics. The textual details (features) might be constraints, such as Object Constraint Language (OCL), or a textual DSL, whereas the graphical details reflect the actual graphical notation of some modelling languages, such as UML.
The visual and textual characteristics of models, or the inputs to the synthesis engine, are recognized by the synthesis system via two recognizers ( Figure 24). The ModelRecognizer is used for detecting the graphical features of the hand-drawn models through an appropriate AI system (machine/deep learning) and computer vision techniques. Furthermore, the CodeRecognizer is used to capture the textual features on the models through AI and NLP techniques. There are four core components in the proposed synthesis engine ( Figure 24) that are responsible for orchestrating and executing all required transformations at the synthesis phase to produce the design model. These components are the editor, code recognizer, model recognizer, and transformation engine. On the other hand, the code generation phase consists of a number of domain-specific generators (agents) that are responsible for generating the final executable code for a target environment. The key component that is responsible for performing this process is the code generation engine.
With respect to the variety of user intents covered in the proposed classification system (Section 6), here, the user intent is described using some hand-drawn models containing graphical and textual characteristics. The textual details (features) might be constraints, such as Object Constraint Language (OCL), or a textual DSL, whereas the graphical details reflect the actual graphical notation of some modelling languages, such as UML.
The visual and textual characteristics of models, or the inputs to the synthesis engine, are recognized by the synthesis system via two recognizers ( Figure 24). The ModelRecognizer is used for detecting the graphical features of the hand-drawn models through an appropriate AI system (machine/deep learning) and computer vision techniques. Furthermore, the CodeRecognizer is used to capture the textual features on the models through AI and NLP techniques. Computers 2020, 9, x FOR PEER REVIEW 28 of 41

Recommended Search Technique
According to the presented classification system (Section 6), the search techniques of the proposed synthesis engine can be categorized as machine learning-based search techniques. Many advanced deep neural networks, especially convolutional neural networks (CNNs) and recurrent learning systems that are connected to some huge model/neural networks (RNNs), have been used for code generation, as well as object detection and text analysis. In the proposed design, it is suggested that these possible search techniques are implemented via two recognizer components. Thus, the demand for the suggested recognizers to be supplied by models and code repositories for training purposes has emerged.

Language Model (Program Space)
It was suggested that the domain specific language (DSL) approach should be used in this work [97]. It is common for the abstract and concrete syntax and semantics of an external DSL or domainspecific modeling language (DSML) to be defined via the following various strategies: grammarbased (e.g., the Backus-Naur form (BNF)), metamodeling, and the UML profiling approach. The details of each strategy are outside the scope of this paper. The metamodeling approach was selected at this stage of development. Indeed, it is important to mention that the design language must be expressive enough to be able to capture various real-world domain features. At the same time, the language must also be restrictive enough to be able to describe the problem domain precisely, and it must be cognified, using structured examples for training, in the future.
Firstly, to represent the characteristics captured in the hand-drawn models, two levels of DSL that form intermediate representations of models were used, namely, the Intent model and the Design model. The user intent model was expressed via a DSL language that is closely related to the handdrawn model domain, whereas the design model was described using low-level language that was close to the domain of the implementation platform but was platform-independent.
The UML class diagram ( Figure 25) demonstrates a snapshot concept only at the metamodel level that was used to define the Intent model; however, a detailed description of the full version of this DSL along with its semantics is outside the scope of this work. This partial metamodel was used to exemplify the proposed model through a simple case that expresses a hand-drawn table. It is worth mentioning that the following UML activity diagram ( Figure 26) shows the data flow of models from the hand-drawn form into the final executable code.

Recommended Search Technique
According to the presented classification system (Section 6), the search techniques of the proposed synthesis engine can be categorized as machine learning-based search techniques. Many advanced deep neural networks, especially convolutional neural networks (CNNs) and recurrent learning systems that are connected to some huge model/neural networks (RNNs), have been used for code generation, as well as object detection and text analysis. In the proposed design, it is suggested that these possible search techniques are implemented via two recognizer components. Thus, the demand for the suggested recognizers to be supplied by models and code repositories for training purposes has emerged.

Language Model (Program Space)
It was suggested that the domain specific language (DSL) approach should be used in this work [97]. It is common for the abstract and concrete syntax and semantics of an external DSL or domain-specific modeling language (DSML) to be defined via the following various strategies: grammar-based (e.g., the Backus-Naur form (BNF)), metamodeling, and the UML profiling approach. The details of each strategy are outside the scope of this paper. The metamodeling approach was selected at this stage of development. Indeed, it is important to mention that the design language must be expressive enough to be able to capture various real-world domain features. At the same time, the language must also be restrictive enough to be able to describe the problem domain precisely, and it must be cognified, using structured examples for training, in the future.
Firstly, to represent the characteristics captured in the hand-drawn models, two levels of DSL that form intermediate representations of models were used, namely, the Intent model and the Design model. The user intent model was expressed via a DSL language that is closely related to the hand-drawn model domain, whereas the design model was described using low-level language that was close to the domain of the implementation platform but was platform-independent.
The UML class diagram ( Figure 25) demonstrates a snapshot concept only at the metamodel level that was used to define the Intent model; however, a detailed description of the full version of this DSL along with its semantics is outside the scope of this work. This partial metamodel was used to exemplify the proposed model through a simple case that expresses a hand-drawn table. It is worth

Intent Model
This model was constructed as a result of solving vision problems to extract important graphical features from the hand-drawn model. Unsupervised feature learning was performed by using the CNN and RNN techniques to learn the recognizer. This example-based approach can be used to train the system, as many model and code repositories are emerging on the Internet nowadays. The preprocessing preparation of the data and training approach are outside the scope of this work. The following snapshot (Listing 1) is a possible Extensible Markup Language (XML) representation of the Intent model.

Intent Model
This model was constructed as a result of solving vision problems to extract important graphical features from the hand-drawn model. Unsupervised feature learning was performed by using the CNN and RNN techniques to learn the recognizer. This example-based approach can be used to train the system, as many model and code repositories are emerging on the Internet nowadays. The preprocessing preparation of the data and training approach are outside the scope of this work. The following snapshot (Listing 1) is a possible Extensible Markup Language (XML) representation of the Intent model.
As seen in Listing 1, the language of the Intent model consists of graphical terms, such as grid and cell. Similarly, the recognizer must recognize other shapes like arrows, lines, circles, and more. <Grid no = "1"> <Row no = "1"> <Cell no = "1" text = "rigNo" underlined = "true" /> <Cell no = "2" text = "name" /> Figure 26. The process of model evolution to obtain the final executable code from hand-drawn models.

Intent Model
This model was constructed as a result of solving vision problems to extract important graphical features from the hand-drawn model. Unsupervised feature learning was performed by using the CNN and RNN techniques to learn the recognizer. This example-based approach can be used to train the system, as many model and code repositories are emerging on the Internet nowadays. The pre-processing preparation of the data and training approach are outside the scope of this work. The following snapshot (Listing 1) is a possible Extensible Markup Language (XML) representation of the Intent model.

Design Model
The model was constructed as the result of applying a set of model transformation rules by the transformation engine. It is considered a low-level model that is expressed using a language close to the implementation and code. The following snapshot (Listing 2) is a possible XML representation of the database design model. It is worth mentioning that the design model is platform-independent. From Listing 2, it can be seen the language of the design model contains various terms, such as table, column, record, primary key (PK), and field, taken from database systems.

Transformations Engine
As shown in Figure 24, the role of the (model) transformation engine component is clear. It transforms elements expressed in the user intent model into low-level elements of the platform-independent design model, at the code synthesis phase. This can be achieved by applying a set of transformation rules that represent a complete model transformational system. As mentioned earlier, a case of generating MySQL schema was utilized to exemplify the possible rules of transformations that might be considered in the proposed approach.
By following the strategy of scattering the whole transformation into a collection of transformational agents (composed transformation) as an internal architecture, as suggested in [181], several suggested transformational rules/agents for generating MySQL schema from hand-drawn sketches (models) can be listed:

•
Detecting → ∃!t : Table, n : DModel·((t ∈ n) ∧ NameO f (g, t)) (4) • Detecting Column: This transformation agent is responsible for transforming the captured hand-drawn cell of a grid from the Intent model into a platform-independent column structure. This transformation step includes deciding whether the captured column is a primary key or not. The mapping rule between a cell element and a column one can be expressed using the following logical formula as tr : Cell → Field ∀c : Cell, g : Grid·(c ∈ g) → ∃!t : Table, f : Field·(( f ∈ t) ∧ NameO f (c, f )) • Detecting Record Instance: This transformation agent is responsible for transforming the captured hand-drawn rows of a grid from the Intent model into a platform-independent recorded instance. The mapping rule between row and recorded elements can be expressed using the following FOPL logical formula as tr : Row → Record ∀r : Row, g : Grid·(r ∈ g) → ∃!t :

Code Generation Engine
The code generation engine is responsible for translating the platform-specific design model into executable code at the code generation phase. As demonstrated in Figure 24, the generation engine is a component of the proposed synthesis engine. Unlike the discussed components, the role of the code generation engine component is straightforward. By following the previously mentioned strategy of scattering the whole transformation into a collection of transformational agents that are structured as a hierarchal composition structure (Section 8.5), as suggested in [181], a collection of transformational agents (generators) was adopted in the design of the code generation engine. These platform specific agents are responsible for generating executable code in a target environment.
As mentioned earlier, a case of generating MySQL schema was utilized to exemplify the possible rules of transformations that might be considered in the proposed approach. Figure 27 shows the internal design of a code generator agent to generate MySQL schema from the design model. a hierarchal composition structure (Section 8.5), as suggested in [181], a collection of transformational agents (generators) was adopted in the design of the code generation engine. These platform specific agents are responsible for generating executable code in a target environment.
As mentioned earlier, a case of generating MySQL schema was utilized to exemplify the possible rules of transformations that might be considered in the proposed approach. Figure 27 shows the internal design of a code generator agent to generate MySQL schema from the design model.

Conclusions
Program synthesis is expanding rapidly. In this paper, a feature model for describing program synthesis approaches was introduced as a result of applying a systematic three-phase domain analysis to related publications on existing program synthesis paradigms and approaches and their applications. The selected 170 publications were considered and reviewed in the designed three-phase systematic domain analysis. The RQs, IC, EC, and classification scheme were identified and applied to the related publications, which resulted in the classification of synthesis paradigms into inductive and deductive paradigms. Synthesis approaches were also classified based on three main features, namely, user intent, program space, and search technique. All results were documented in a number of feature diagrams using the feature-based modelling technique.
Although there have been various successful deductive-based program synthesis approaches since the 1980s, many new frameworks have been proposed over the last five years for various applications, which are based on inductive-based approaches, especially machine (deep) learning and NLP. This is because of the accelerated wave of deep learning and computer vision advancement in the past few years. This recent transition has led to program synthesis gaining many benefits from the massive number of code samples and program traces available on different online repositories and datasets for solving the synthesis problem. This, in other words, contributes to the cognification of the program synthesis task as a part of a complete code generation framework.
In this respect, the results of the conducted analysis are considered to motivate, highlight, and give further insight regarding the architectural design of the promising cognified code generation framework that is supplied by a learning-based code synthesis engine. This is a contribution associated with one application of program synthesis, which is code generation. A synthesis engine is recommended and discussed as a tier of the proposed architecture to enable developers to express their intents using hand-drawn models and code hand-writing. The engine is able to capture critical features from both textual and graphical sketches of code and evolve them into the final executable code via a series of transformational steps.