# From Grammar Inference to Semantic Inference—An Evolutionary Approach

## Abstract

## 1. Introduction

- Grammar Inference is able to infer only the syntactic structure, whilst, in many problems, there are additional restrictions on allowed structures [11,12] which can’t be described by Context-Free Grammars (CFGs). Hence, we also need to know the static semantics, or even the meaning of the structure (e.g., in the area of programming languages a program might be syntactically correct, but contains semantic errors such as undeclared identifiers). How can we extend Grammar Inference beyond discovering only the syntactic structure?
- The search space is enormous, even in the case for inferring regular and context-free languages [13], and it becomes substantially bigger for the context-sensitive languages (context-free languages with static semantics). How can we assure sufficient exploration and exploitation of the search space [14] for Semantic Inference? Note that the search space is too large for the exhaustive (brute-force) approach.

## 2. Related Work on Grammar Inference and Semantic Inference

#### 2.1. Grammar Inference

2. print 23 |

3. print a + 23 |

4. print a + b + c |

5. print a where a = 23 |

6. print 23 where b = 11 |

7. print 23 + c where c = 28 |

8. print 23 + 11 where c = 28 |

9. print a where a = 23; a = 28 |

10. print 28 where a = 23; b = 11 |

11. print 1 + 2 where b = 23; a = 5 |

N2 → + N3 | ε |

N3 → num N2 | id N2 |

N4 → ; id = num N4 | ε |

#### 2.2. Semantic Inference

## 3. Semantic Inference with LISA

`parentFitnessRatio = (parentFitness + 1) / maxFitness ∗ 100;`

`parentProbability = 50 - mutationProbability / 2;`

parent1Probability = (parent1FitnessRatio ∗ (100 - mutationProbability)) /(parent1FitnessRatio + parent2FitnessRatio); |

parent2Probability = (parent2FitnessRatio ∗ (100 - mutationProbability)) /(parent1FitnessRatio + parent2FitnessRatio); |

## 4. Experiments

#### 4.1. Example 1

`T = {S.ok, A[0].val, B[0].val, C[0].val, A[1].val, B[1].val, C[1].val, 1}`

`F = {+(int), ==(int), &&(int)}`

`Max tree depth: 2`

`Population size: 2500`

`Elitism: 20%`

`Selection pressure: 50%`

`Mutation probability: 10%`

`(abc, ok=true)`

`(aabbcc, ok=true)`

`(aaabbbccc, ok=true)`

`(aaaabbbbcccc, ok=true)`

`(abbcc, ok=false)`

`(aabcc, ok=false)`

`(aabbc, ok=false)`

`(aabbbccc, ok=false)`

`(aaabbccc, ok=false)`

`(aaabbbcc, ok=false)`

`(abbccc, ok=false)`

`(abc, ok=true, val=1)`

`(aabbcc, ok=true, val=2)`

`(aaabbbccc, ok=true, val=3)`

`(aaaabbbbcccc, ok=true, val=4)`

`(abbcc, ok=false, val=1)`

`(aabcc, ok=false, val=2)`

`(aabbc, ok=false, val=2)`

`(aabbbccc, ok=false, val=2)`

`(aaabbccc, ok=false, val=3)`

`(aaabbbcc, ok=false, val=3)`

`(abbccc, ok=false, val=1)`

#### 4.2. Example 2

`T = {E.val, T.val, EE[0].val, EE[0].inVal, EE[1].val, EE[1].inVal, #Int.value()}`

`F = {+(int), int Integer.valueOf(String).intValue()}`

`Max tree depth: 1`

`Population size: 1000`

`Elitism: 20%`

`Selection pressure: 50%`

`Mutation probability: 10%`

`(5, val=5)`

`(2+5, val=7)`

`(10+5+8, val=23)`

#### 4.3. Example 3

`begin right up up`

`end`, is shown in Figure 6. The current position of the robot is adjusted with four commands:

`left`(decrease the x coordinate by 1),

`right`(increase the x coordinate by 1),

`down`(decrease the y coordinate by 1) and

`up`(increase the y coordinate by 1). After executing the first command

`right`, the robot moved to the position (1,0) (inx=0, iny=0, outx=1, outy=0 on Figure 6). Similarly, after executing the next command

`up`, the robot moved to the position (1,1) (inx=1, iny=0, outx=1, outy=1 in Figure 6). Finally, the robot stopped at position (1, 2) (see Figure 6).

T = {START.outx, START.outy, COMMANDS.outx, COMMANDS.outy, COMMANDS.inx,COMMANDS.iny, COMMAND.outx, COMMAND.outy, COMMAND.inx,COMMAND.iny,COMMANDS[1].outx, COMMANDS[1].outy, COMMANDS[1].inx,COMMANDS[1].iny, 0, 1} |

`F = {+(int), -(int)}`

`Max tree depth: 1`

`Population size: 2000`

`Elitism: 20%`

`Selection pressure: 50%`

`Mutation probability: 10%`

`(begin end, outx=0, outy=0)`

`(begin down end, outx=0, outy=-1)`

`(begin up end, outx=0, outy=1)`

`(begin left end, outx=-1, outy=0)`

`(begin right end, outx=1, outy=0)`

`(begin left left left end, outx=-3, outy=0)`

`(begin up left up end, outx=-1, outy=2)`

`(begin left down up right up up end, outx=0, outy=2)`

`(begin up left up end, outx=-1, outy=2)`

`(begin right down right up down end, outx=2, outy=-1)`

`(begin right down down up end, outx=1, outy=-1)`

`(begin left down left up end, outx=-2, outy=0)`

## 5. Conclusions

- Few previous approaches were able to learn Attribute Grammars with synthesised attributes only. This limitation has been overcome in this paper, and we were able to learn Attribute Grammars with synthesised and inherited attributes. Consequently, few previous approaches inferred only S-attributed Attribute Grammars, whilst our approach inferred also L-attributed Attribute Grammars.
- The search space of all possible semantic equations is enormous and quantified in Section 3.
- We have shown that Genetic Programming can be used effectively to explore and exploit the search space solving the problem of Semantic Inference successfully.

Attribute | Operands | Operators | P(0) | P(1) | P(2) | SumP |
---|---|---|---|---|---|---|

P1:S.ok | 4 | 3 | 4 | 48 | 8064 | 8116 |

P2:A.val | 2 | 1 | 2 | 4 | 32 | 38 |

P3:A.val | 1 | 1 | 1 | 1 | 3 | 5 |

P4:B.val | 2 | 1 | 2 | 4 | 32 | 38 |

P5:B.val | 1 | 1 | 1 | 1 | 3 | 5 |

P6:C.val | 2 | 1 | 2 | 4 | 32 | 38 |

P7:C.val | 1 | 1 | 1 | 1 | 3 | 5 |

SearchSpace | 55.667.644.000 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

