Mining Road Trafﬁc Rules with Signal Temporal Logic and Grammar-Based Genetic Programming

: Trafﬁc systems, where human and autonomous drivers interact, are a very relevant instance of complex systems and produce behaviors that can be regarded as trajectories over time. Their monitoring can be achieved by means of carefully stated properties describing the expected behavior. Such properties can be expressed using Signal Temporal Logic (STL), a speciﬁcation language for expressing temporal properties in a formal and human-readable way. However, manually authoring these properties is a hard task, since it requires mastering the language and knowing the system to be monitored. Moreover, in practical cases, the expected behavior is not known, but it has instead to be inferred from a set of trajectories obtained by observing the system. Often, those trajectories come devoid of human-assigned labels that can be used as an indication of compliance with expected behavior. As an alternative to manual authoring, automatic mining of STL speciﬁcations from unlabeled trajectories would enable the monitoring of autonomous agents without sacriﬁcing human-readability. In this work, we propose a grammar-based evolutionary computation approach for mining the structure and the parameters of an STL speciﬁcation from a set of unlabeled trajectories. We experimentally assess our approach on a real-world road trafﬁc dataset consisting of thousands of vehicle trajectories. We show that our approach is effective at mining STL speciﬁcations that model the system at hand and are interpretable for humans. To the best of our knowledge, this is the ﬁrst such study on a set of unlabeled real-world road trafﬁc data. Being able to mine interpretable speciﬁcations from this kind of data may improve trafﬁc safety, because mined speciﬁcations may be helpful for monitoring trafﬁc and planning safety promotion strategies.


Introduction
Autonomous cars are a well-known example of a safety-critical Cyber-Physical System (CPS), meaning that a failure could result in loss of life or in catastrophic consequences for the environment. One of the main challenges for autonomous driving in a real environment consists in ensuring such safety [1]. When many autonomous and human drivers interact in the same environment, achieving safety is even more challenging, due to several factors such as the unpredictability of agents' behavior and scalability.
The explosion of Artificial Intelligence (AI) and Machine Learning permitted the design of very powerful and efficient CPSs. However, it also increased the complexity and opacity of such systems. In real-world systems, where safety plays a crucial role, such opacity and the consequent loss of explainability can be serious issues. Indeed, explainability in AI is itself a prominent goal [2].
In recent years, the formal methods community helped to tackle this problem [3]. Formal methods allow us to describe safe requirements in a formal and rigorous way and to check in an automatic way when such requirements are satisfied. When dealing with data describing how values vary over time, a possibility is to use temporal logic languages. Temporal logic is a logic that comes with specific temporal operators to describe the temporal evolution of a system. In particular, we consider the Signal Temporal Logic (STL), a logic very suitable to specify properties over real-world trajectories. For example, given two trajectories that describe the velocity of a car and its distance to the closest car over time, with STL, we can formally specify simple properties such as "travel no faster than 30 km h −1 ", as well as more complex properties such as "do not exceed 30 km h −1 if, in the last minute, the closest other car has been closer than 30 m". Provided that an STL specification is available, the monitoring of a CPS can be done efficiently using suitable algorithms that can check when the trajectory satisfies the property [4], possibly exploiting distributed processing paradigms [5].
In principle, STL specifications can be designed by the human operator, who should both know the domain, i.e., the involved properties and the expected behavior, and master STL syntax and semantics. In practice, however, designing the right specification that fits the observable system is very hard, because many attributes (and hence many trajectories) should be considered at the same time. A possible solution is to learn the formal specification in an automatic way: given a sufficiently representative set of system trajectories describing the system behavior, some inference technique can be used to synthesize the specification that fits the data. While some approaches for specification inference from data exist (see Section 2 for an overview), the vast majority of them require the user to either (1) specify the structure of the STL specification, while actually inferring only its numerical parameters, or (2) provide both positive and negative example trajectories that are representative of expected and unexpected behaviors. In practical settings, these requirements often become limitations and hamper the applicability of inference techniques.
In this paper, we present a methodology to learn STL specifications from real-world datasets of unlabeled trajectories, i.e., from trajectories for which no human-assigned labels are required that tell if the observations are related to an expected or undesired behavior of the system. Our methodology is based on grammar-based evolutionary optimization, namely Context-Free Grammar Genetic Programming (CFGGP) [6]. We chose this kind of optimization because it is perfectly suited for searching in the space of strings of a language specified by means of grammar: in fact, the STL language can be specified with a grammar (see Section 3). While designing our methodology, we were driven by two goals: generating STL specifications that (1) fit the data and (2) are human-readable. For achieving the first goal, we designed a fitness function, i.e., a measure of the quality of the candidate specification, that measures the degree of (dis)satisfaction of the trajectories with respect to the specification, to be minimized: intuitively, all the trajectories (which, we remark, are unlabeled) should barely satisfy or dissatisfy the specification. For achieving the second goal, we structurally limited the complexity of the learned specification, by imposing special constraints within the grammar.
We assess experimentally our methodology on a real-world traffic dataset of 5678 unique vehicles tracked over 19,679 frames with information about velocity and position. After extracting the informative attributes from the dataset, we show how the method can learn STL specifications in an automatic and efficient way. The learned specifications provide relevant information concerning the dataset and are indeed human-interpretable.
The rest of this paper is structured as follows. In the next section, we survey the most significant previous research that is relevant to our study. In Section 3, we briefly introduce the background about STL. In Section 4, we formally define the problem that we aim to solve. In Section 5, we describe our proposed solution, which we experimentally evaluate in Section 6. Finally, we draw the conclusion and we sketch possible future lines of research in Section 7.

Related Work
There exist several approaches devoted to learning temporal logic specifications, particularly for STL. We partition here them into two categories: template-based and template-free. While those in the former category rely on a user-provided template formula and focus on estimating parameters for it, the latter also try to learn the formula structure. We fall in the template-free domain.
Template-based methods cast the STL specification mining problem as an optimization problem in terms of the satisfaction degree ρ of an STL formula (see Section 3). Succinctly, we define ρ as a (real-valued) degree of satisfaction of an STL formula with respect to a trajectory. Bartocci et al. [7] adopted an active learning approach, dependent on a probability distribution over ρ, to query the next point in the parameters space to be evaluated. Bortolussi and Silvetti [8] extended the work cited previously with a statistical approach that emulates the expected value of this probability distribution using Gaussian Process Regression [9]; optimization of the emulation was then performed via the GP-UCB algorithm [10]. In general, these approaches are labeled as Parametric Signal Temporal Logic (PSTL) [11,12].
Although interesting on its own, the applicability of PSTL is sometimes limited, since specifying templates can be hard to start with. As such, template-free methods also attempt to build an optimal structure for the formula. The vast majority of the works in this regard start from a dataset of labeled trajectories, partitioned into positive and negative examples, and try to learn an STL classifier for the data. As an example, Nenzi et al. [13] proposed ROGE (RObustness GEnetic algorithm), a bi-level optimization procedure, which optimized the structure by a genetic algorithm and the parameters using Bayesian Optimization. To the best of our knowledge, it is the only attempt at using an evolutionary algorithm for solving the template-free problem. Others mined STL structure by exploring a directed acyclic graph [14], using a decision-tree oriented approach [15], or employing enumerative solvers [16].
The achievements of the aforementioned approaches have been remarkable, but none of them addressed the problem of mining STL specifications from unlabeled data. Considering real-world scenarios, it is often the case that we labeled data that are not available, because labeling is costly. Still, those data do bring some information that could be, in principle, condensed in the form of an STL specification. Some special cases of the unlabeled data case have been considered: M. Vazquez-Chanlatte et al. [17] and Mohammadinejad et al. [16] addressed unsupervised clustering of time-series data using PSTL, i.e., they learned just the parameters of template formulas.
To the best of our knowledge, the only work learning both structure and parameters of a formula from unlabeled data is [18], where the authors proposed a heuristic for sequentially building more complex formulas. This seems a very interesting approach, but its applicability to large datasets is unclear: the experimental evaluation of the cited paper is based on a single trajectory and the results concerning the learning of the structure of the formula are not clear. Unfortunately, the code is not available and this hampers reproducibility and comparisons.
A different approach for designing automatic rules for road traffic has been proposed by Medvet et al. [19]. Similar to this study, the authors relied on grammar-based genetic programming and did not use labeled data. Differently from here, they did not use STL as the syntax for the rules and they optimized rules with the goal of maximizing the efficiency and safety of (simulated) road traffic rather than for describing real data of road traffic.

Background: Signal Temporal Logic
Signal Temporal Logic (STL) is a formal language to specify behaviors of dynamical systems through logic formulas. Let X be a set of trajectories x : T → R p for every x ∈ X , with T ⊆ R ≥0 a time domain. We say that each trajectory is a p-dimensional signal of real-valued variables, and we denote by x i (t) the projection on the i-th coordinate of x = (x 1 , . . . , x p ) at time t ∈ T. We now introduce the syntax of STL, the set of rules used for constructing specifications in this language. Definition 1 (STL syntax). We define the syntax of an STL formula ϕ using the following grammar: where is the Boolean constant true; [t 1 , t 2 ], with t 1 , t 2 ∈ T, is a time interval such that t 1 < t 2 ; µ is an atomic proposition of the form y(t) ∼ c, with y : R p → R projecting the p-dimensional signal onto a single variable, ∼ ∈ {<, >} and c ∈ R + being a threshold (practically inequality over the variable of the signal); ¬ and ∧ are the usual Boolean connectives; S [t 1 ,t 2 ] is the Since temporal modality; O [t 1 ,t 2 ] is the Once temporal modality; and H [t 1 ,t 2 ] is the Historically temporal modality.
The semantics of an STL formula ϕ allows us to tell if and to which degree a trajectory x satisfies the formula ϕ at time t. We now define two kinds of semantics: with the Boolean semantics, the satisfaction assumes Boolean values (does satisfies, does not satisfy); with the quantitative semantics, the satisfaction assumes real-valued values [20,21]. Definition 2 (STL Boolean Semantics). For the Boolean semantics, we write (x, t) |= ϕ if ϕ holds for trajectory x at time t. If ϕ is an atomic proposition µ, then (x, t) |= ϕ if and only if µ is true. The semantics of ¬ϕ and ϕ 1 ∧ ϕ 2 is trivial and the semantics of Since is defined as follows: . In other words, we say that ϕ 1 S [t 1 ,t 2 ] ϕ 2 is satisfied at time t if ϕ 2 occurs at some point in [t 1 , t 2 ] and ϕ 1 holds continuously since then. The other temporal operators are defined based on Since: Once as Definition 3 (STL Quantitative Semantics). The quantitative satisfaction function ρ returns a value ρ(ϕ, x, t) ∈ R ∪ {−∞, +∞} quantifying the robustness degree of the formula ϕ by the trajectory x at time t. It is defined recursively as follows: Similar to the Boolean semantics, the Once and Historically temporal operators are defined based on Since.
The sign of ρ(ϕ, x, t) provides the link with the standard Boolean semantics. It holds that ρ(ϕ, is a borderline case, and the truth of ϕ cannot be assessed from the robustness degree alone.
In practical settings, systems are monitored for a given, finite amount of time, and, as a result, trajectories have a limited time span. We define as |x| the length of a trajectory, i.e., its number of samples. For the sake of this study, monitoring an STL formula ϕ over a trajectory x is constrained to [0, |x| − 1] ⊂ T, a subset of the time domain.
When talking about temporal formulas, the necessary length concept is of importance [20]. The necessary length of a formula ϕ (let it be ϕ ) is defined recursively as: Intuitively, the necessary length is the shortest trajectory length such that (x, t) |= ϕ is well-defined. For example, the formula ϕ 1 S [0,10] ϕ 2 cannot be evaluated on trajectories shorter than 10 (assuming that ϕ 1 > 10 and ϕ 2 > 10), since this would imply looking at a future that is not part of the trajectory.

Problem Statement
We consider systems described by real-valued attributes and a set of trajectories that describe the way these attributes vary over time. We aim to mine specifications that (1) describe such trajectories (2) in a way that the specifications are readable and interpretable for a human.
Formally, let X = {x 1 , . . . , x n } be a set of trajectories gathered from a system described by attributes A = {a 1 , a 2 , . . . , a |A| }. Let Φ be the space of all possible STL formulas defined over A. We aim at finding a ϕ ∈ Φ that is both human-readable and describes X. More specifically, ϕ should be tight with respect to X, i.e., the robustness value |ρ(ϕ , x i , t)| should be as much as possible close to zero for all x i ∈ X and t. From another point of view, small perturbations on tight formulas ϕ should result in an overall increase in robustness for some trajectories of X; i.e., some trajectories could be described by a tighter formula ϕ = ϕ .
Tightness is of fundamental importance when no labels are provided; in fact, the satisfaction of a robustness metric can be trivially maximized by "pushing" the parameters towards the boundaries of the parameter space, resulting in rules that are of no relevance. For example, if a < 0 is satisfied by all trajectories, then a < 1000 will be satisfied as well and have a much greater degree of robustness.
Concerning human-interpretability, a formal, widely accepted definition for STL formulas does not exist. Indeed, the lack of such a definition holds for many other kinds of Machine Learning modules. Moreover, it is acknowledged that human-interpretability is not, itself, universal: different subjects might perceive the same model as differently interpretable [22]. In this work, we circumvent the problem of defining interpretability with a practical approach: we force our method to search in a subset of Φ containing structurally simple formulas, where both the overall size of the formula and the maximum degree of nesting of temporal operators are limited. The rationale of the latter constraint is in the fact that the interpretability of symbolic models is affected differently by different kinds of components [23].
We remark that our problem is more complex than the mere specification mining of a template formula where the structure of the formula is already fixed (see Section 2), which searches only in the parameter space of the formula. In the easier case, the user has to provide a template formulaφ, and the problem is a numerical optimization problem; i.e., the solution has to be found in R m , where m is the number of numerical parameters inφ. In our case, no burden is on the user, but the system is required to search in Φ rather than R m . In Section 5, we detail the methodology we employed to address this problem statement.

Methodology
Since the STL syntax can be defined by means of context-free grammar (CFG), we rely on CFGGP [6], a grammar-based version of GP [24]. In CFGGP, candidate solutions are represented as derivation trees of a grammar G = (N, T, s 0 , R), where N is the set of non-terminal symbols, T is the set of terminal symbols (with T ∩ N = ∅), s 0 ∈ N is the starting symbol, and R is the set of derivation rules. Each derivation rule describes how a non-terminal symbol may be replaced by a sequence of symbols, either terminal or non-terminal. A derivation tree is a tree where nodes are symbols of the grammar: leaf nodes are terminal symbols, and non-leaf nodes are non-terminal symbols. The children of each node match one of the derivation rules for the corresponding non-terminal symbol. The string of the language defined by the grammar corresponding to a given derivation tree is the sequence of the leaves of the tree. Note that a derivation tree is not an abstract syntax tree: the former is in general deeper than the latter, for the same formula. Figure 1 shows an example of a derivation tree for the CFG of Figure 2.
We use an improved version of CFGGP that promotes diversity in the population. The lack of diversity in the population may result in premature convergence towards a local optimum [25], in particular when the search space is discrete, as in CFGGP [26]. In this work, we promote diversity by simply enforcing the re-application of the genetic operator whenever a generated individual is already part of the population.

Evolutionary Algorithm
Given a CFG G and a fitness function f : L(G) → R, CFGGP works as shown in Algorithm 1. After the initialization of the population P, CFGGP repeats n gen times the following three steps.

1.
It builds the offspring population P , with |P | = n pop , by iteratively selecting one (mutation, with 1 − p xover probability) or two (crossover, with p xover probability) parents chosen with tournament selection of size n tour and then applying the genetic operator. If the resulting solution ϕ c is already part of the offspring P or parent population P, a new solution is generated, and the process is repeated for a maximum number of n atts attempts; otherwise s c is added to P and its fitness f (ϕ) is computed.

2.
It merges the parent and offspring populations P and P.

3.
It shrinks the resulting new population P, until its size is n pop , by iteratively removing the worst solution.
The initial population is built with the ramped half-and-half method [27]. Let a range {d min , . . . , d max } for the depth of the derivation trees be given and let n pop be the number of trees to be generated. For each d in the range, we build k random approximately full derivation trees (i.e., where each leaf node is at depth d) and k random trees with the deepest leaf at depth d, with k = n pop 2(d max −d min +1) . We write "approximately" because it is not possible, in general, to build a derivation tree of a grammar G where each leaf is exactly at depth d. This procedure ensures that the size, and hence, the complexity of the generated formulas is evenly distributed in a predefined range.
The genetic operators are defined over the space of derivation trees of the grammar G. We used the standard CFGGP mutation and crossover. The former "replaces" a random subtree of the derivation tree with a randomly generated subtree that is appropriate according to G. The crossover "replaces" a random subtree of one parent with an appropriate random subtree of the other parent. In both cases, it is ensured that the resulting derivation tree is at most d max deep.

Algorithm 1:
The EA for the optimization. function evolve():

Fitness Function
To achieve the goals of Section 4, we use as fitness function: which computes the average absolute quantitative robustness of an individual ϕ over the dataset X . Minimizing this quantity is consistent with the notion of achieving a tight evaluation with respect to the trajectories in X . Tightness is of importance when no labels are provided; in fact, the satisfaction of a robustness metric can be trivially maximized by "pushing" the parameters towards the boundaries of the parameter space, resulting in rules that are of no relevance. As a matter of example, if a < 0 is satisfied by all trajectories, then a < 1000 will be satisfied as well and have a much greater degree of robustness. By minimizing the sum of the absolute values, f achieves a tight evaluation as it rewards individuals ϕ having robustness values as close as possible to zero. Finally, we divide by the total number of trajectories so that, for normalized data, f ∈ [0, 1], with 0 corresponding to a formula that perfectly fits all of trajectories, and 1 corresponding to a formula that does not satisfy all of the trajectories in the worst possible way.

Grammar for STL Formula Structures
We need to define a grammar G for the language of STL formulas. G must be customizable for the considered problem, i.e., for its attributes A, and must allow generating formulas along with appropriate values as numerical parameters.
In order to favor the building of human-readable formulas, we build the grammar to explicitly limit the depth of nesting of temporal operators. We remark that the overall size of STL formulas is limited by the value of d max used by CFGGP (in the operators and in the initialization of the population). However, we believe that posing a further limit on the composition of the temporal operators may make the STL formulas more readable, and not only just smaller. Our belief is corroborated by the findings of [23] for mathematical expressions: some operators, such as log and sin, make the expressions less interpretable than others, e.g., + and ÷. Figure 2 shows the grammar for STL formula structures with limited nesting of the temporal operators. The figure adopts the common Backus-Naur form: the non-terminal symbols are enclosed in angle brackets, whereas the terminal symbols are shown as literals (¬, ∧, . . . , r, a 1 , a 2 , . . . , <, >, 0, . . . , 9); for each non-terminal, derivation rules are separated by |; the starting symbol is the topmost non-terminal, i.e., s 0 = formula . The terminals a 1 , a 2 , . . . , derived from the non-terminal attr , represent the attributes of the problem at hand: in this way, the grammar is tailored to a specific problem. For brevity, we express some of the non-terminals using a parameter i that represents the maximum nesting. The only derivation rule that increases i is the one for temp i , which represents (partial) formulas with temporal operators. The limit to nesting is enforced by the parametric definition of the derivation rule of formula i , that does not expand to temp i if i ≥ i max . In this study, we set the maximum nesting to i max = 3. This means that CFGGP operates on a grammar G that is the realization of the grammar of Figure 2 with i max = 3 and a given set of attributes A.
formula ::= formula 1 The CFG for describing STL formula structures. Non-terminal symbols are enclosed in angle brackets: the topmost non-terminal symbol, formula , is the starting symbol s 0 of the grammar. The derivation rules for the symbols formula i , logic i , temp i are parametric on i, which represents the nesting level. The derivation rule for attr is the one that makes the grammar tailored to a given system with attributes A = {a 1 , a 2 , . . . , a |A| }.
When mapping a derivation tree into the corresponding STL formula, we apply the following adjustments concerning numerical parameters. To map a non-terminal interval symbol interval into the corresponding time interval, we map the first num into the corresponding integer in {0, . . . , 99} and set it to be the start of the interval. We then map the second num into the corresponding integer in {0, . . . , 99} and add it to the start to obtain the end of the interval. As such, we avoid any issue arising from intervals having a start greater or equal to the end. Moreover, we remark that, since there can be at most two nested temporal operators, the maximum necessary length of the formula will be 198, corresponding to the necessary length of a formula with two nested temporal operators with intervals [0, 99]. When mapping a non-terminal atom , we divide the product of its num (the numeric constant in the atomic proposition) by 100. As a result, numeric constants lie in [0, 1], and, for normalized data, we can express all possible thresholds.

Experimental Evaluation
Considering the goals described in Section 4, we aim at answering the following research questions:

RQ1
Can we mine specifications that describe the input unlabeled trajectories? RQ2 Are the mined specifications readable and interpretable for a human?
We consider to formula describe the dataset well if it tightly fits the pool of trajectories. To this end, we verify whether the fitness f of the learned formula is as close as possible to 0.0. We say that a formula is readable and interpretable for a human if it is parsimonious; to this end, we verify whether the size of a formula (number of nodes of the derivation tree), |ϕ|, is reasonable. Moreover, we also verify whether a formula is easily understandable by a human by manually inspecting and reporting it.
To answer the research questions according to these definitions, we ran an experimental campaign on real-world data of road traffic. We performed 10 evolutionary runs with different random seeds. We used the same parameter values for all the runs and set n pop = 500, n gen = 50, n tour = 5, p xover = 0.8, n atts = 100, d min = 1, and d max = 12.
We executed the experiments on an HPC cluster with nodes equipped with 2 × 18 cores based on 2.30 GHz Intel Xeon E5-2697 v4 (Broadwell) and with 128 GB RAM. Fitness evaluations were parallelized across cores; evolutionary runs were parallelized across nodes. We implemented the software for the experiments in the Java programming language and made it publicly available at https://github.com/pigozzif/STLRulesEvolutionaryInferenceNoClass (accessed on 1 November 2021). The project employs the monitoring tool MoonLight (https://github.com/MoonLightSuite/MoonLight, accessed on 1 November 2021) [28] and the evolutionary framework JGEA (https://github.com/ericmedvet/jgea, accessed on 1 November 2021).

Data
The dataset used in this study [29] consists The dataset contains a total of 5678 unique vehicles tracked over 19,679 frames. For each vehicle and each frame, it contains the position of the vehicle (lateral and longitudinal offsets with respect to a reference position), its velocity, its size (width and length), and a lane identifier. All these attributes have been extracted by the creators of the dataset by means of image processing and computer vision techniques: we refer the reader to [29] for more details. Figure 3 provides a graphical representation of the information contained in a frame of the dataset.

Data Processing
The aim of the data processing step is to extract useful attributes. Chiefly, we want attributes that (1) are more meaningful to the other road settings than the one considered in this study and (2) effectively describe the phenomenon at hand, i.e., capture measurements that are relevant for monitoring road traffic. The first point discards attributes such lane identifier and position with respect to the reference point since they would be of no interest for roads with a different topology and number of lanes. At the same time, we want attributes that are relative to the vehicle and not to the setting. For example, positions and coordinates should be relative, not absolute. As a result, formulas are more readable as they employ attributes that are immediately comprehensible. In the following, we detail how we extracted additional attributes from the ones reported in the original dataset.
A set of very relevant attributes is the set of distances from the nearest neighbors of each vehicle. Intuitively, such a set of attributes is relevant for drivers and allows formulas to clearly and synthetically be expressed such as " keep a safety distance of at least 10 m from the closest front vehicle". To formalize this, we partition the space surrounding each vehicle as shown in Figure 4, and we find the closest vehicle in each of the eight regions. We thus consider eight new attributes, namely E, SE, S, SW, W, NW, N, and NE.  To efficiently compute them, we used a kd-tree [30], a spatial data structure. Given a query point and a set of k-dimensional points, it allows for logarithmic-time (in the number of points) look-up of nearest neighbors, improving over the quadratic time (in the number of points) of a brute-force search. For us, points are cars and are bi-dimensional (lateral and longitudinal offsets with respect to the reference position). Given that construction and update take both linear time (in the number of points) for a kd tree, we built a separate tree for every frame instead of updating all cars' positions.
Finally, after having found the nearest neighbors for every car and for every frame, we trimmed neighbors that might not be sensed in a real-world setting. In particular, we set to −∞ the distances above 485 ft. Moreover, we limit the neighbor search to the current and adjacent lanes of a vehicle, as we assume that drivers do not care about vehicles in far lanes.
To summarize, we built a dataset with velocity (vel) and eight distances (E, SE, S, SW, W, NW, N, NE) as attributes, which together form the set of attributes A mentioned in Section 4. On the other hand, width and height were discarded.
We do not consider trajectories being longer than t f = 20 s, since that could result in specifications requiring agents (both human and automated) to consider an unpractically long behavior history. To accomplish this, we partitioned the trajectories as follows. Let |x| be the number of samples for trajectory x. For each trajectory x ∈ X , we split it into a new set of trajectories by sliding a window of size n over it, with n 2 overlapping, obtaining approximately 2|x| n new trajectories. The resulting pool of trajectories shares the same length n, which must be chosen to reflect a sensible interval of monitoring for an autonomous agent. Thus, we set n = 200, corresponding to 20 s for this dataset. We remark that, by fixing n = 200, all the new trajectories will have a maximum size (number of samples) of 200; given that, with our grammar, the maximum necessary length of the formula is 198 (see Section 5.3), no issue regarding the necessary length of formulas arises.
Finally, as a consequence of our grammar shown in Figure 2, and to avoid favoring attributes with shorter ranges, we normalize attributes to [0, 1] by subtracting the minimum and dividing by the range.

RQ1: Solutions that are Effective
We ran an experimental campaign with the aforementioned parameters. To evaluate whether our approach is effective, we (1) verify whether it evolved tight formulas, and (2) compare the fitness f of the best individuals with that of randomly initialized formulas. For the former, Figure 5 plots the histogram for the distribution of ρ(ϕ, x, t) for the best individuals ϕ found in each run (i.e., the individual having the lowest fitness f at the last generation) and for all the car trajectories x. For the latter, Figure 6 reports median ± standard deviation fitness f over the course of evolution for the best individuals found in each run. Table 1 summarizes (with median ± standard deviation) the results in terms of fitness f and |ϕ| for the best individual found in each run, as well as evolution time (in seconds).   Table 1 corroborates these observations. Recall from Section 4 that our goal is to mine formulas that are tight since tightness is of fundamental importance when no labels are provided. Formulas are tight when robustness values are as close to zero as possible, in particular |ρ(ϕ, x i , t f )| < , ∀i, with a small quantity, which depends on the system at hand. In other words, formulas must fit the pool of trajectories at hand and in such a way that small perturbations make the robustness value greater than on some trajectory. By looking at Figure 5, we remark that the mined formulas are indeed tight. Let a value be fixed. The best individuals produce robustness values that fit into a segment centered in 0.0 and of length 2 ; that is, they lie in [− , + ]. Choose an adequate value for , e.g., = 0.25. From Figure 5  As a further confirmation that our methodology actually learns tight formulas, we also show that learned formulas are significantly better than random formulas. As can be seen from Figure 6, fitness f progresses over the course of evolution and settles into a (local) optimum. Considering that the initial population is composed entirely of randomly initialized formulas (see Section 5.1), this observation points to the fact that, in terms of fitness f (and, thus, of tightness), the best individuals are clearly better than random formulas.
To comment, our methodology succeeds in mining specifications that are effective and tight with respect to the pool of trajectories. 6.3.2. RQ2: Specifications that are Readable and Interpretable for a Human As far as readability is concerned, we found the mined specifications to be readable and interpretable by a human. To provide deeper insights, Figure 7 plots the histogram for the frequency distribution of the operators (see Section 5.3) and the attributes (see Section 6.2) in our grammar.
By manually inspecting the best individuals and visualizing the aggregate frequencies with Figure 7, we found that all the attributes are present, with a slight overabundance of E and W that are, respectively, the distance from the front neighbor and the distance from the rear neighbor. Moreover, evolution prefers the ∧ and ¬ operators over the temporal operators (of which there is approximately one per formula). This finding is in line with our expectations since temporal operators are likely to be the least interpretable for a human (as confirmed for mathematical expressions in [23]). In the following, we transcribe some instances of best individuals. For ease of understanding, the numeric constants that appear in the formulas below have been denormalized by multiplying by the attribute range and summing the attribute minimum value. For example, (O [15,78]  compactly dictates to stay at a distance that is neither too far away nor too close to the northwest, southeast, and southwest neighbors.

O S H ∧ ¬ E SE S SW W NW N NE vel
From the examples above, we draw one very important conclusion. Recalling Section 6.1 and Figure 3, the dataset used here was collected on a very trafficked highway. The mined specifications follow a common pattern of the ruling: to poise the distances from the neighbors, and (ii) to drive neither too fast nor too slow.
The former point is crucial, as keeping too far from one neighbor implies coming very close to the neighbors on the opposite side.
To summarize, the mined specifications closely mimic what a car stuck in dense traffic would do and point to the effectiveness of our approach. We believe the reason for such readability to be the parsimony of the mined STL formulas. Intuitively, parsimony is directly linked to interpretability. In fact, as reported in Table 1, median solution size |ϕ| is 52.5, which is a reasonable size for an STL formula.

Conclusions and Future Work
We have considered the case of monitoring and describing the behavior of traffic systems by means of Signal Temporal Logic (STL) formulas. Authoring these formulas is a hard task due to the necessity of knowing the system at hand and mastering language syntax. Automatically learning STL formulas would allow for the real-time monitoring of traffic systems with the result of improving safety and providing an explanation for the behaviors of autonomous agents. We endeavor to do so with the goal of learning formulas that describe the system at hand and are interpretable for a human.
We proposed a methodology to learn STL formulas for real-world traffic trajectories; the trajectories are unlabeled, in the sense that there are no human-assigned labels discriminating between positive and negative behaviors. Since the STL language can be specified by means of grammar, we use a grammar-based evolutionary optimization algorithm to evolve STL formulas. We evaluate formulas against a fitness function that rewards those that tightly fit the pool of trajectories at hand.
With an experimental campaign, we showed that our approach (1) learns formulas that tightly fit the pool of trajectories and (2) appears interpretable to a human due to its parsimony. We believe that, by applying our approach for inferring formulas describing road traffic in different conditions (e.g., different countries), one could systematically compare alternatives using formulas instead of raw data.
In the future, we will extend our approach to supervised binary classification scenarios, i.e., scenarios in which trajectories come accompanied by human-assigned labels discriminating between positive and negative behaviors. We will also consider anomaly detection, in which only a subset of the positive trajectories is labeled and we want an STL classifier to correctly detect negative trajectories.

Data Availability Statement:
The software for the experiments reported in this paper is publicly available at https://github.com/pigozzif/STLRulesEvolutionaryInferenceNoClass (accessed on 1 November 2021). The data is publicly available and provided by [29].

Conflicts of Interest:
The authors declare no conflict of interest.