Coevolution of the Features of the Dynamics of the Accelerator Pedal and Hyperparameters of the Classiﬁer for Emergency Braking Detection

: We investigate the feasibility of inferring the intention of the human driver of road motor vehicles to apply emergency braking solely by analyzing the dynamics of lifting the accelerator pedal. Focusing on building the system that reliably classiﬁes the emergency braking situations, we employed evolutionary algorithms (EA) to coevolve both (i) the set of features that optimally characterize the movement of accelerator pedal and (ii) the values of the hyperparameters of the classiﬁer. The experimental results demonstrate the superiority of the coevolutionary approach over the analogical approaches that rely on an a priori deﬁned set of features and values of hyperparameters. By using simultaneous evolution of both features and hyperparameters, the learned classiﬁer inferred the emergency braking situations in previously unforeseen dynamics of the accelerator pedal with an accuracy of about 95%. We consider the obtained results as a step towards the development of a brake-assisting system, which would perceive the dynamics of the accelerator pedal in a real-time and in case of a foreseen emergency braking situation, would apply the brakes automatically well before the human driver would have been able to apply them.


Introduction
In recent years, the technological growth of the intellectual driving aid cannot be overlooked [1].The interest in this field is dictated by the needs of increasing the safety of the road traffic.Nearly 1.3 million people die in road crashes every year [2].Part of these tragedies is caused by too-late brake application that could be prevented by embedding automated brake assistance into the cars.This type of assisting could be integrated either as a fully automated braking system [3,4] or as an assistant to the human driver [5,6].The automated braking system is considered to be beneficial in many traffic situations.However, this type of aid suffers from serious engineering and psychological problems.One of these problems-according to risk homeostasis theory [7]-is overconfidence in the fully automated driving aids, which, in turn, could possibly blunt the human driver and lead to dangerous situations on the roads.The engineering problems include the reliability and validity of sensors.With an intention to create a safe system devoid of the mentioned drawbacks, we decided to leave the fully automated braking aids out of the scope of our research and focus on brake assistance instead.For implementing the brake assistance, the emergency braking classification (EBC) problem [8] must be solved.
The EBC problem can be addressed by different techniques, such as an analysis of the motion pattern of the pressed brake pedal [9], or measuring the brain and the myoelectric activities, as proposed in the research of Haufe et al. [10].Another possible solution of the EBC problem could be based on the analysis of time series of the position of the accelerator pedal [11].
As shown in Figure 1, the specific motion of the accelerator pedal corresponds to a specific driving situation-such as accelerating, cruising, slowing down, normal braking, and emergency braking.A depicted time series of the position of both the accelerator-and brake pedals was obtained from the simulated parametric data recorder of the full-scale Forum-8 drive simulator [12], as shown in Figure 2.
Actuators 2018, 7, x FOR PEER REVIEW 2 of 18 proposed in the research of Haufe et al. [10].Another possible solution of the EBC problem could be based on the analysis of time series of the position of the accelerator pedal [11].
As shown in Figure 1, the specific motion of the accelerator pedal corresponds to a specific driving situation-such as accelerating, cruising, slowing down, normal braking, and emergency braking.A depicted time series of the position of both the accelerator-and brake pedals was obtained from the simulated parametric data recorder of the full-scale Forum-8 drive simulator [12], as shown in Figure 2.  In one of our previous studies [8] on the EBC problem, besides showing the possibility of distinguishing between normal driving and emergency braking situations solely by analyzing the motion of the accelerator pedal, we also demonstrated that straightforward approaches such as threshold classifiers were not feasible for the EBC problem, because of the impossibility of accurately separating the data set into two categories of normal driving and emergency braking, respectively.Under the same work, we revealed that during the emergency braking, the average time lag of a human driver is within the interval (200 ms, 400 ms).Consequently, for the speed of 72 km/h (20 m/s), an eventual EBS would reduce the overall braking distance by 4 to 8 m.
In our work [13] we proposed a method of applying genetic algorithms (GA) for (i) tuning the hyper-parameters and (ii) selecting the best combinations of manually extracted features of the time series of the accelerator pedal in order to increase the quality of classifiers.The classifier with the proposed in the research of Haufe et al. [10].Another possible solution of the EBC problem could be based on the analysis of time series of the position of the accelerator pedal [11].
As shown in Figure 1, the specific motion of the accelerator pedal corresponds to a specific driving situation-such as accelerating, cruising, slowing down, normal braking, and emergency braking.A depicted time series of the position of both the accelerator-and brake pedals was obtained from the simulated parametric data recorder of the full-scale Forum-8 drive simulator [12], as shown in Figure 2.  In one of our previous studies [8] on the EBC problem, besides showing the possibility of distinguishing between normal driving and emergency braking situations solely by analyzing the motion of the accelerator pedal, we also demonstrated that straightforward approaches such as threshold classifiers were not feasible for the EBC problem, because of the impossibility of accurately separating the data set into two categories of normal driving and emergency braking, respectively.Under the same work, we revealed that during the emergency braking, the average time lag of a human driver is within the interval (200 ms, 400 ms).Consequently, for the speed of 72 km/h (20 m/s), an eventual EBS would reduce the overall braking distance by 4 to 8 m.
In our work [13] we proposed a method of applying genetic algorithms (GA) for (i) tuning the hyper-parameters and (ii) selecting the best combinations of manually extracted features of the time series of the accelerator pedal in order to increase the quality of classifiers.The classifier with the In one of our previous studies [8] on the EBC problem, besides showing the possibility of distinguishing between normal driving and emergency braking situations solely by analyzing the motion of the accelerator pedal, we also demonstrated that straightforward approaches such as threshold classifiers were not feasible for the EBC problem, because of the impossibility of accurately separating the data set into two categories of normal driving and emergency braking, respectively.Under the same work, we revealed that during the emergency braking, the average time lag of a human driver is within the interval (200 ms, 400 ms).Consequently, for the speed of 72 km/h (20 m/s), an eventual EBS would reduce the overall braking distance by 4 to 8 m.
In our work [13] we proposed a method of applying genetic algorithms (GA) for (i) tuning the hyper-parameters and (ii) selecting the best combinations of manually extracted features of the time series of the accelerator pedal in order to increase the quality of classifiers.The classifier with the best-evolved values of hyper-parameters could distinguish normal driving from emergency braking situations with an accuracy of about 93%.Despite these results achieved by applying GA to the EBC problem, we considered the priority of further improvement of the classification quality and investigated the feasibility of employing a more versatile and flexible approach-cooperative coevolution-for the simultaneous optimization of both (i) the features of the time series of accelerator pedal, and (ii) the values of the hyperparameters of the classifier.
The evolutionary computation as a method for optimizing different classifiers has been explored in plenty of studies.Most frequently, among all evolutionary approaches, the researchers resort to genetic programming (GP) and GA.Thus, in one of the studies [14], the authors use the GP for extracting the features for epileptic electroencephalography (EEG) classification.They could successfully reduce the dimensionality of the feature space and improve the performance of the classifier.In another study [15], the authors employ GP to search for the set of mappings with optimal dimensionality to project the input space into a decision space with maximized class separability.Feature extraction by genetic programming was also successfully applied for vision systems [16].The GA, on the other hand, has been widely used for searching for optimal values of the hyperparameters as an alternative to exhaustive ("brute-force") searches.For example, Francescomarino et al. [17] employ the GA for hyperparameter optimization in predictive business process monitoring.Another good example of a successful application of the GA is the research of Ahmadi et al. [18], where the authors apply the GA for searching the optimal hyperparameters for least squares support vector machine.
Inspired by the success of the aforementioned research, we decided to employ cooperative coevolution for solving the EBC problem by the coevolution of both (i) the values of hyperparameters of the classifier and (ii) the features pertinent to the dynamics of accelerator pedal.The primary objective of our research is to verify the superiority of the coevolution over the other techniques that were considered in previous studies of EBC.We shall also examine the generality of the proposed solution to the EBC problem on the unforeseen data of human drivers who have not participated in obtaining the training set.The key motivation of our research is the exploration of the opportunity to significantly improve the quality of the classifier of EBC.
The latter would facilitate the reduction of the time lag between the following two instants: the instant the driver moves their foot away from the accelerator pedal (a in Figure 3) and the moment when they press the brake pedal to its maximum position (c in Figure 3).In properly classified cases, the proposed brake-assisting system would be able to activate the brakes automatically well before the human driver would have been able to apply them, and, as a consequence, to reduce the braking distance caused by the above-mentioned time lag.
best-evolved values of hyper-parameters could distinguish normal driving from emergency braking situations with an accuracy of about 93%.Despite these results achieved by applying GA to the EBC problem, we considered the priority of further improvement of the classification quality and investigated the feasibility of employing a more versatile and flexible approach-cooperative coevolution-for the simultaneous optimization of both (i) the features of the time series of accelerator pedal, and (ii) the values of the hyperparameters of the classifier.
The evolutionary computation as a method for optimizing different classifiers has been explored in plenty of studies.Most frequently, among all evolutionary approaches, the researchers resort to genetic programming (GP) and GA.Thus, in one of the studies [14], the authors use the GP for extracting the features for epileptic electroencephalography (EEG) classification.They could successfully reduce the dimensionality of the feature space and improve the performance of the classifier.In another study [15], the authors employ GP to search for the set of mappings with optimal dimensionality to project the input space into a decision space with maximized class separability.Feature extraction by genetic programming was also successfully applied for vision systems [16].The GA, on the other hand, has been widely used for searching for optimal values of the hyperparameters as an alternative to exhaustive ("brute-force") searches.For example, Francescomarino et al. [17] employ the GA for hyperparameter optimization in predictive business process monitoring.Another good example of a successful application of the GA is the research of Ahmadi et al. [18], where the authors apply the GA for searching the optimal hyperparameters for least squares support vector machine.
Inspired by the success of the aforementioned research, we decided to employ cooperative coevolution for solving the EBC problem by the coevolution of both (i) the values of hyperparameters of the classifier and (ii) the features pertinent to the dynamics of accelerator pedal.The primary objective of our research is to verify the superiority of the coevolution over the other techniques that were considered in previous studies of EBC.We shall also examine the generality of the proposed solution to the EBC problem on the unforeseen data of human drivers who have not participated in obtaining the training set.The key motivation of our research is the exploration of the opportunity to significantly improve the quality of the classifier of EBC.
The latter would facilitate the reduction of the time lag between the following two instants: the instant the driver moves their foot away from the accelerator pedal (a in Figure 3) and the moment when they press the brake pedal to its maximum position (c in Figure 3).In properly classified cases, the proposed brake-assisting system would be able to activate the brakes automatically well before the human driver would have been able to apply them, and, as a consequence, to reduce the braking distance caused by the above-mentioned time lag.The remaining of this article is organized as follows.Section 2 describes the methods that we used in our research.Section 3 outlines the proposed approach.In Section 4 we explain the research methodology.Section 5 presents the experimental results.Section 6 discusses the uncertainty of the evolved classifier.Finally, Section 7 draws a conclusions.

Methods
In our research we use the tools as explained below in this section.

Cooperative Coevolution
Cooperative coevolution (CC) [19,20] is a general framework of evolutionary computation that divides a large problem into subcomponents and solves them independently in order to solve the large problem.Like other evolutionary approaches, CC is characterized by the iterative improvement of the set of candidate solutions.Improvement implies the increase of fitness value through the selection, reproduction, and mutations of solutions.In CC, the candidate solutions are decomposed into smaller species that are evolved mostly separately, with the only cooperation happening during fitness evaluation.In our work we are focused on the application of CC for both: (i) automatic extraction of features pertinent to the time series of the position of accelerator pedal and (ii) tuning the values of the hyperparameters of XGBoost classifier.
In our study, we represent the solutions (individuals) as a conjunction of several parse trees (forest) and one vector (one-dimensional chromosome).The forest stands for the whole set of extracted features, while each parse tree of the forest represents an arithmetical expression of a (compound) feature involving a priori defined primitive (atomic) features.On the other hand, the chromosome is a vector (x 0 , x 1 , . . ., x 8 ), where the allele x i represents an unique hyperparameter.The range of each x i is expounded in the following sections.
In coevolution the search for the best solution, i.e., the solution that yields an optimal value of the fitness, consists of the following main steps: Step 0: Creating the initial population of randomly generated individuals; Step 1: Evaluating the fitness of the individuals in the population; Step 2: Checking the termination criteria: good enough fitness value (of the best individual in the population), too long runtime, or too many generations.The evolutionary algorithm (EA) terminates if one of the criteria is satisfied; Step 3: Selection: selecting the mating pool of individuals.The size of the mating pool is a fraction (i.e., 10%) of the overall size of the population, and the selection of the forests in the mating pool is fitness-proportional (roulette-wheel, tournament, elitism, etc.); Step 4: Reproduction: implementing crossover by swapping random nodes (genes) of trees (chromosomes) of randomly selected pairs (parents) of individuals from the mating pool.Crossover produces pairs of offspring forests (chromosomes) that are inserted into the newly growing population; Step 5: Mutation: individuals' random node(s) (gene(s)) of newly generated individuals (offspring) are randomly modified with a given probability.
The details of the implemented coevolution are explained in the following sections.

Cross Validation
Cross validation (CV) [21] is a model estimation technique which was used in our research for fitness evaluation.As we mentioned earlier in Section 1, the proposed emergency braking support system would be based on a classifier of the emergency braking situations.For calculating the CV score (fitness value), the available dataset is divided into k subsets, or folds.The classifier is trained on k−1 folds and its performance is validated on the remaining k fold.The process performed k iterations using every fold for validation.The CV score of the classifier was calculated by averaging the validation quality of each iteration.Despite the computational overhead of calculating the CV score, it was rather parsimonious in terms of the required amount of data.Also, it was shown to prevent overfitting of the obtained solutions [21].The k-fold CV is illustrated in Figure 4.
Actuators 2018, 7, x FOR PEER REVIEW 5 of 18 score, it was rather parsimonious in terms of the required amount of data.Also, it was shown to prevent overfitting of the obtained solutions [21].The k-fold CV is illustrated in Figure 4.As a score of CV, different quality metrics could be used, as elaborated below.

Quality Metrics
For the binary classification problems, the following metrics are usually considered [22]: accuracy, precision, recall, f-score, and area under the receiver operating characteristic curve (ROC AUC).All these metrics reflects different aspects of the classifier.However, for the comparability of our experimental results with the results of previous studies, we would leave ROC AUC out of consideration.Regarding accuracy, this metric reflects the percentage of the correct answers.For the determination of the other metrics, we shall consider the confusion matrix as shown in Table 1.From herein, we will refer to the cases of emergency braking as class "1", and normal driving cases as class "0".TP reflects the number of time series samples of class "1" (human-annotated), that are correctly classified.FP corresponds to the samples for which the correct, human-annotated class is "0", but the classifier returns "1".Conversely, if the classifier returns "0" but the actual class is "1", then the sample is considered as FN.Finally, TN reflects the number of samples of class "0" that are correctly classified.Hence, the precision and recall are defined as follows: From the Equations (1) and ( 2) we can express f-score, which is a harmonic mean of both precision and recall: As a score of CV, different quality metrics could be used, as elaborated below.

Quality Metrics
For the binary classification problems, the following metrics are usually considered [22]: accuracy, precision, recall, f-score, and area under the receiver operating characteristic curve (ROC AUC).All these metrics reflects different aspects of the classifier.However, for the comparability of our experimental results with the results of previous studies, we would leave ROC AUC out of consideration.Regarding accuracy, this metric reflects the percentage of the correct answers.For the determination of the other metrics, we shall consider the confusion matrix as shown in Table 1.From herein, we will refer to the cases of emergency braking as class "1", and normal driving cases as class "0".TP reflects the number of time series samples of class "1" (human-annotated), that are correctly classified.FP corresponds to the samples for which the correct, human-annotated class is "0", but the classifier returns "1".Conversely, if the classifier returns "0" but the actual class is "1", then the sample is considered as FN.Finally, TN reflects the number of samples of class "0" that are correctly classified.Hence, the precision and recall are defined as follows: (1) From the Equations (1) and ( 2) we can express f -score, which is a harmonic mean of both precision and recall: As can be seen from Equation (3), the f -score reaches its maximum value of 1.0 when the values of both precision and recall are 1.0 (maximum value).

Extreme Gradient Boosting
Extreme gradient boosting or XGBoost [23] is a scalable machine learning algorithm that is widely used in many supervised machine learning challenges [24,25].XGBoost produces the classification or regression model as an ensemble of K classification and regression trees (CART) [26].The output of the model can be expressed as follows: where f k corresponds to independent tree structure and leaf weights, while X i is a test sample which can be expressed as a vector of features (x i0 , x i1 , . . . ,x in ).F represents a test space of all CART.For training the set of functions used in the model, the following regularized objective must be minimized: The first term l of Equation ( 5) is differentiable convex loss function which calculates the difference between target y i and predicted ŷi .The second term Ω is a regularization that prevents over-fitting by penalizing the complexity of the model.
The XGBoost classifier is also known for its superior training speed.The latter is crucial for the running time of the coevolution, since the training of XGBoost is executed during each evaluation of the fitness.

Wilcoxon Signed Rank Test
In our work, we examined the significant differences between the results obtained within CC and GA [13].Hence we employed a non-parametric statistical hypothesis test, namely the Wilcoxon signed-rank test [27] which ranks the differences in performances of two algorithms.This test ignores the signs and compares the ranks for the positive and negative differences.
Let d i be the difference between the performance scores of the two algorithms on the i-th run out of n.The differences are ranked according to their absolute values; average ranks are assigned in the case of ties.Let R + be the sum of ranks where the second algorithm performed better than the first one, and R − the sum of ranks for the opposite case.If d i = 0, the ranks are split evenly among the sums; if there is an odd number of them, one is ignored.The critical values of T = min(R + , R − ) for number of runs up to 25 can be found in the book on general statistics [28].For a larger number, z-score must be calculated.
For the significance level α = 0.05 the null-hypothesis (both models perform equally) can be rejected if the z-score is smaller than −1.96.

Proposed Approach
In order to achieve our objective of discovering the optimal combination of features and the values of hyperparameters that would potentially improve the performance of the classifier for EBC problem, we organized our research into the following four consecutive stages: (i) obtaining and analyzing the time series of the movement of the accelerator-and brake pedals of a car simulated in a full-scale drive simulator, (ii) adopting CC for evolving features from the acquired time series and the values of the hyperparameters of XGBoost, (iii) evaluating the performance of XGBoost with the evolved solution on unforeseen data; and, finally, (iv) comparing the final solution for EBC problem with the results of previously published research.We will elaborate on these stages in the following Sections 4 and 5.

Data Collection and Analysis
For obtaining the time series that represent the motion of the pedals, we asked 16 human drivers (men and women, aged 20-52) to drive a simulated car featuring an automatic transmission [12] in various tracks.The details of the tracks are shown in Table 2.All the testers were required to have a driver license and to practice the driving of the car on the simulator.After practicing, all testers participated in two separate experiments which allowed us to split the data carefully into two classes: normal driving and emergency braking, respectively.The details of experiments are elaborated in Table 3.For experiments with normal driving, the drivers had to drive a car without applying emergency braking.In contrast, during the experiments with emergency braking, they were asked to apply the emergency brake at the time that they hear a special audible signal.This way of signalling imitates the sudden danger on the road.We assumed that there is no difference between the responses of drivers to a real danger and to the audible signal.When the drivers heard the signal, they were requested to release the accelerator and press the brake pedal to its maximum position as soon as possible.
Despite the fact that the drivers were asked not to use normal brakes during emergency braking experiments and not to use emergency brakes during normal driving experiments, due to various human factors, we could not be completely sure that this would be the case.For instance, during normal driving experiments, the driver could accidentally apply emergency brakes if they considered the actual traffic situation to be dangerous.Therefore, it was necessary for us to annotate the obtained time series data of the accelerator pedal carefully.However, due to the impossibility of the human annotator to handle all these "mistakes", the final dataset could still contain some mislabeled samples.
The simulated parametric data recorder captured all of the relevant time series of the position of both pedals (within the range 0% . . .100%), sampled with a frequency of 25 Hz.During the processing of the mentioned time series, we extracted 2373 events in total, including both normal driving (class "0") and emergency braking (class "1") events.Sample fragments of the obtained time series of position of accelerator pedal are shown in Table 4.As Table 4 illustrates, each fragment consists of a series of changing positions of the accelerator pedal, where the first member of the series corresponds to the moment when the accelerator was pressed, while the last one to the moment when the accelerator pedal returned to its initial position.The proposed approach of collecting the data is a fundamental difference between our previous and current research.In our previous studies, we only considered the decreasing sequence of the position of the accelerator pedal, leaving the remaining part of series out of our scope.As will be seen in the next paragraphs, the new approach offered much more flexibility for the analysis of drivers' behaviour.As the series were defined by the dynamics of their values, rather than the number of entities, the first in, first out (FIFO)-buffer that should eventually store these fragments should be of a variable size too.This might add to the complexity of the considered classification problem, as the input of the classifier in general, should be of a fixed size.

Implementation of the Cooperative Coevolution
As we briefly mentioned in Section 2, we represent the individuals as a conjunction of forest and chromosome as depicted in Figure 5.The proposed approach of collecting the data is a fundamental difference between our previous and current research.In our previous studies, we only considered the decreasing sequence of the position of the accelerator pedal, leaving the remaining part of series out of our scope.As will be seen in the next paragraphs, the new approach offered much more flexibility for the analysis of drivers' behaviour.As the series were defined by the dynamics of their values, rather than the number of entities, the first in, first out (FIFO)-buffer that should eventually store these fragments should be of a variable size too.This might add to the complexity of the considered classification problem, as the input of the classifier in general, should be of a fixed size.

Implementation of the Cooperative Coevolution
As we briefly mentioned in Section 2, we represent the individuals as a conjunction of forest and chromosome as depicted in Figure 5.

Set of Terminals in the Trees of the Forest
Firstly, it is necessary to determine the set of terminal symbols in trees that represent the arithmetic expressions of features pertinent to the dynamics of the accelerator pedal.Because the initial task deals with the classification of samples which have variable size, we decided to propose as terminals a set of primitive (atomic) functions that are invariant to the length of the time series of the positions of the accelerator pedal.Unlike the a priori predefined permutations of the extracted features in [13], the proposed functions took into account a longer history of the dynamics of the accelerator-the movements of the pedal preceding the final deceleration.The atomic functions that comprise the terminal set of the trees in the forest evolution are shown in Table 5.

Set of Terminals in the Trees of the Forest
Firstly, it is necessary to determine the set of terminal symbols in trees that represent the arithmetic expressions of features pertinent to the dynamics of the accelerator pedal.Because the initial task deals with the classification of samples which have variable size, we decided to propose as terminals a set of primitive (atomic) functions that are invariant to the length of the time series of the positions of the accelerator pedal.Unlike the a priori predefined permutations of the extracted features in [13], the proposed functions took into account a longer history of the dynamics of the accelerator-the movements of the pedal preceding the final deceleration.The atomic functions that comprise the terminal set of the trees in the forest evolution are shown in Table 5.Thus, the domain of the defined atomic functions is a set of the values of positions of accelerator pedal (p n , p n−1 , . . . ,p 0 ) where n ∈ N.However, for the functions max (bu f , i) and mdelta (bu f , i), there may be exceptions that arise when i > n.Therefore, we programmed these functions in the way that is tolerant to such exceptions.If i > n the function will be recursively recalled until i ≤ n.For instance, max ({0, 25, 50, 2, 0}, 3) will return 0 as well as max ({0, 25, 50, 2, 0}, 2).Besides the defined functions, we added a random constant c ∈ N: to our terminal set TS, hence the latter became: TS = {mean(bu f ), std (bu f ), mdelta (bu f , i), max(bu f , i), c, }.

Set of Function in the Trees of the Forest
The set of functions in GP included four binary (i.e., requiring two arguments) arithmetical functions {+, −, ×, /}-addition, subtraction, multiplication, and (protected) division.

Evolving the Features
In a bid to evolve the features, we use a forest of several parse trees representing the evolved individual.The forest could contain up to eight trees.The number of trees in the forest was not fixed and might vary as a result of implementing the crossover and mutation operations in due course of the evolution.Each tree in the forest encoded an evolved (compound) feature, and it contained functionaland terminal nodes from the above-mentioned set of functions and terminals of the trees, respectively.An example of an individual, represented as a forest of two trees (features), is shown Figure 6.After parsing this example tree in a depth-first left-to-right manner, we would obtain the following set of two features: max(bu f , 0) mean(bu f ) , std(bu f ) .

Evolving the Hyperparameters
As it was mentioned above, for evolving the hyperparameters, we utilized the chromosomes that contain the discretized values of XGBoost hyperparameters.The ranges and discretized values of the hyperparameters of XGBoost are shown in Table 6.

Fitness Function
The objective of applying GP is to search for optimal set of features that results in the best value of the fitness function (i.e., CV score).In order to determine the metric which must be used for calculating the CV score, it is important to analyze the class distribution of the available data.
The training set which was used during the cross validation included 536 samples of class "1" and 1069 of class "0".If we used accuracy as a fitness function it could lead to incorrect quality assessment: if the learned XGBoost detected all normal driving samples and missed more than a half of emergency braking samples, the cross validation score could still reach the value of 0.83; however our classifier would not be considered as reliable.Therefore, instead of using the raw CV accuracy as a fitness function, we used a more informative metric-the f-score, which is the harmonic mean of precision and recall.There are two major reasons why we were reluctant to use precision or recall as a fitness function:

Evolving the Hyperparameters
As it was mentioned above, for evolving the hyperparameters, we utilized the chromosomes that contain the discretized values of XGBoost hyperparameters.The ranges and discretized values of the hyperparameters of XGBoost are shown in Table 6.The number of boosting stages to perform

Fitness Function
The objective of applying GP is to search for optimal set of features that results in the best value of the fitness function (i.e., CV score).In order to determine the metric which must be used for calculating the CV score, it is important to analyze the class distribution of the available data.
The training set which was used during the cross validation included 536 samples of class "1" and 1069 of class "0".If we used accuracy as a fitness function it could lead to incorrect quality assessment: if the learned XGBoost detected all normal driving samples and missed more than a half of emergency braking samples, the cross validation score could still reach the value of 0.83; however our classifier would not be considered as reliable.Therefore, instead of using the raw CV accuracy as a fitness function, we used a more informative metric-the f -score, which is the harmonic mean of precision and recall.There are two major reasons why we were reluctant to use precision or recall as a fitness function: 1.
These metrics describe different aspects of the classifier.It is not evident which metric is more preferable for building the emergency braking system, and 2.
The consideration of only precision or recall as fitness function can lead to the phenomenon known as "evolutionary cheating".For instance, evolution will choose the features that decrease the number of FP, without any respect to FN.Consequently, the precision might even become 1.0, however the recall will be significantly lower.
Conversely, the f -score is a good tradeoff between the values of precision and recall.For the sake of avoiding the meaningless features in our final solution, we added a penalty that subtracts 1% of fitness values if the resulting features are nothing but constants.

Main Parameters of Coevolution
The considered size of the evolved population of individuals (forest and chromosome) is 200, as indicated by the parameter Population size in Table 7.The mating pool of each following generation consists of 24 individuals.Part (20 individuals, or 10% of population) of the mating pool is selected via a binary tournament selection (parameter Selection ratio) plus the best (elite) four individuals (parameter Elite in Table 7).The number of elite individuals is selected empirically to provide the best trade-off between the convergence of evolution (yet preventing the premature convergence to suboptimal solutions) and diversity of population.The remaining 176 (of 200) individuals are produced by single-point crossover operations (parameter Crossover in Table 6) on pairs of forests, randomly selected from the individuals in the mating pool.The newly produced trees are mutated via single-point mutation (parameter Mutation in Table 7) with a probability of 5% (parameter Mutation ratio in Table 7).The details of the implementation of the binary selection, single-point crossover, and mutation operations are provided in [29].The mechanism of coevolving the features pertinent to the dynamics of accelerator pedal and the values of the hyperparameters of the classifier is illustrated in Figure 7.The mechanism consisted of the following eight main steps: 1.
Initialization: Generating the random initial population of 200 individuals.Each individual is a conjunction of forest and chromosome.The forest can contain up to eight trees (features), while the number of genes (hyperparameters) is fixed and equal to 7.

2.
Fitness evaluation, phase A: Obtaining the sets of features and hyperparameters for each individual by parsing the trees in the forest and the linear chromosome.

3.
Fitness evaluation, phase B: Extracting the values of each of the features from the raw time series of accelerator pedal.

4.
Fitness evaluation, phase C: Calculating the fitness value of each individual (set of features) as a CV f -score of XGBoost classifier that uses the obtained hyperparameters and calculated features.

5.
Checking the termination criteria.Terminating if any of them has been satisfied.

6.
Selection: based on the obtained fitness values of all individuals, selecting the mating pool of 24 individuals via binary tournament and elitism.7.
Reproduction: Reproducing the population via crossover of individuals in the mating pool and mutation of offspring.8.
Continuing with step 2.

Experimental Results
The implementation of XGBoost classifier is based on an open source package of XGBoost [30] that supports multiple programming languages including C++, Python, R, Java, Scala, and Julia.
To comparatively evaluate the quality of EBCs based on evolved (via CC) features and hyperparameters, as a benchmark we also used XGBoost classifiers with evolved (via GA) optimal values of its hyperparameters [13].For each evolutionary algorithm we performed 50 independent runs and consequently obtained 100 classifiers (50 within GA, 50 within CC).For GA, we preserved the parameters that were used in the corresponding studies as shown in Table 8.
The convergence of the fitness value (CV f-score) during 50 independent runs of CC and GA for XGBoost classifiers is shown in Figure 8.
Table 8.Main parameters of the GA for the evolution of values of hyperparameters of the classifier.

Genotype
Set of parameters shown in Table 6 and fixed combination of features (pertinent to the last decreasing subsequence of the buffer).Among them are the highest position of the accelerator pedal before starting the deceleration (mP), the maximum and average rate of lifting the accelerator (mR and aR respectively)

Experimental Results
The implementation of XGBoost classifier is based on an open source package of XGBoost [30] that supports multiple programming languages including C++, Python, R, Java, Scala, and Julia.
To comparatively evaluate the quality of EBCs based on evolved (via CC) features and hyperparameters, as a benchmark we also used XGBoost classifiers with evolved (via GA) optimal values of its hyperparameters [13].For each evolutionary algorithm we performed 50 independent runs and consequently obtained 100 classifiers (50 within GA, 50 within CC).For GA, we preserved the parameters that were used in the corresponding studies as shown in Table 8.
Table 8.Main parameters of the GA for the evolution of values of hyperparameters of the classifier.

Genotype
Set of parameters shown in Table 6 and fixed combination of features (pertinent to the last decreasing subsequence of the buffer).Among them are the highest position of the accelerator pedal before starting the deceleration (mP), the maximum and average rate of lifting the accelerator (mR and aR respectively) The convergence of the fitness value (CV f -score) during 50 independent runs of CC and GA for XGBoost classifiers is shown in Figure 8.

Statistical Significance Test
In order to show the statistical significance of the results, each classifier was tested on unforeseen data (test set) with respect to f-score.The full comparative table of 100 classifiers can be accessed online from the web site of the first author [30].As it turned out, 31 EBCs that were obtained within CC performed better than classifiers obtained within GA, and 19 showed slightly worse results.Upon implementing the Wilcoxon signed rank test with significance level = 0.05, we obtained − score = -2.12854794772which suggests to us that the resulting difference was significant.

Best Solutions Comparison
In this subsection, we compared the best solutions (highest fitness value) that were obtained by coevolution (xgbcc) and genetic algorithm (xgbga).The performance of classifiers on both training and test sets is shown in Table 9.As can be seen in Table 9, xgbcc outperformed xgbga with respect to each considered metric for training and test sets.More detailed performance on test set is provided in Table 10.The evolved features and their relative importance are shown in Table 11.The importance measures are based on the number of times a feature is selected for splitting, weighted by the squared improvement to the model as a result of each split, and averaged over all trees.

Statistical Significance Test
In order to show the statistical significance of the results, each classifier was tested on unforeseen data (test set) with respect to f -score.The full comparative table of 100 classifiers can be accessed online from the web site of the first author [30].As it turned out, 31 EBCs that were obtained within CC performed better than classifiers obtained within GA, and 19 showed slightly worse results.Upon implementing the Wilcoxon signed rank test with significance level α = 0.05, we obtained z − score = -2.12854794772which suggests to us that the resulting difference was significant.

Best Solutions Comparison
In this subsection, we compared the best solutions (highest fitness value) that were obtained by coevolution (xgb cc ) and genetic algorithm (xgb ga ).The performance of classifiers on both training and test sets is shown in Table 9.As can be seen in Table 9, xgb cc outperformed xgb ga with respect to each considered metric for training and test sets.More detailed performance on test set is provided in Table 10.The evolved features and their relative importance are shown in Table 11.The importance measures are based on the number of times a feature is selected for splitting, weighted by the squared improvement to the model as a result of each split, and averaged over all trees.It is important to note again that features that were obtained within GA took into account only the last decreasing subsequence of the accelerator pedal positions, which limited the scope of evolutionary creativity and flexibility.Thus, features #3, #4, and #7 obtained within CC leveraged the whole buffer.Moreover, feature #7 had the highest importance value in comparison with the other features that are used by xgb cc .Evolution of features was closely linked to the evolution of hyperparameters.The values of hyperparameters that yielded the best EBCs are shown in Table 12.We noticed that similar to the results obtained via other nature-inspired optimization approaches (e.g., memetic algorithms, genetic programming, neural networks, etc.), the solutions evolved via CC and GA were hard to interpret due to (i) the complexity of the computational structures the hyperparameters defined, (ii) the complexity of the mathematical representations of several features, and (ii) the lack of any human-understandable logic-similar to that of the "canonical" top-down problem-solving approaches-expressed in these features.

Discussion
The coevolution of the features and the classifier implies that the overall computational overhead of the real-world implementation of the brake assisting system would consist of two components corresponding to the following two classification stages: (i) calculating the values of the features in accordance with the evolved (via GP) algebraic expression, and (ii) classifying the braking situations from the calculated features by the (tuned by GA) XGBoost classifier.These two classification stages are activated in real time as soon as the position of the accelerator pedal becomes 0. We measured the computational overhead of these two classification stages on a PC with x64-based Intel Core i7-6700 3.4 GHz CPU and 16 GB of RAM.The mean and maximum values of the overhead are shown in Table 13.We estimated the maximum overall overhead with the assumption of a hypothetic worst-case scenario when the computational overhead of both stages is at its maximum.As shown in Table 13, on average, the computational overhead is about 0.6 ms, which is less than 0.3% of the average time lag between the instant when the drivers lift the accelerator and then press the brake pedal (about 200 ms).At worst, the brake assisting system would be activated with a delay of about 2.5 ms, which is still negligible compared to the delays of the driver's reaction.
False Positives In real-driving situations, the FP errors committed by the classifier can cause an incorrect applying of emergency braking.As the research of Podusenko et al. [13] suggests, generally, for the emergency braking systems under consideration, the false positives will not necessarily lead to an accident, as the emergency braking could be applied just for a very brief period of time that is equivalent to the average delay of the drivers' response.
Nevertheless, both false positives and false negatives are undesirable in EBS.The errors of the considered classifier occurred on the test set could be attributed to the following factors: 1.
Mislabelled (by a human expert) samples in the whole dataset, 2.
Lack of a sufficient number of samples featuring a similar trend, 3.
Personal traits of the drivers that result in contradictory data for the classifier, and 4.
XGBoost could not capture the underlying trends of the data.
Despite the fact that CC addresses the problem of conflicting duplicates, the presence of contradicting samples is not fully resolved.Through the analysis of erroneously classified samples, we noticed, that there are several similar pairs obtained from different drivers yet annotated differently.These samples remain extremely difficult for each evolved classifier.Therefore, learning the classifier to classify the driving style of a particular driver, rather than attempting to classify the actually non-existent "average" driver, might result in further increase of the reliability of classification.Hence, in our future research, we are planning to categorize the drivers according to the way they operate the pedals.

Figure 3 .
Figure 3.Typical dynamics of accelerator-and brake pedals during emergency braking.Figure 3. Typical dynamics of accelerator-and brake pedals during emergency braking.

Figure 3 .
Figure 3.Typical dynamics of accelerator-and brake pedals during emergency braking.Figure 3. Typical dynamics of accelerator-and brake pedals during emergency braking.

Figure 5 .
Figure 5.The genetic representation of the evolved individual comprising both the arithmetic expressions of features pertinent to the dynamics of the accelerator pedal (the forest) and the values of the hyperparameters of the classifier (the linear chromosome of seven genes).

Figure 5 .
Figure 5.The genetic representation of the evolved individual comprising both the arithmetic expressions of features pertinent to the dynamics of the accelerator pedal (the forest) and the values of the hyperparameters of the classifier (the linear chromosome of seven genes).

Figure 6 .
Figure 6.Example of the individual.

Figure 6 .
Figure 6.Example of the individual.

4 .
Fitness evaluation, phase C: Calculating the fitness value of each individual (set of features) as a CV f-score of XGBoost classifier that uses the obtained hyperparameters and calculated features.5. Checking the termination criteria.Terminating if any of them has been satisfied.6. Selection: based on the obtained fitness values of all individuals, selecting the mating pool of 24 individuals via binary tournament and elitism.7. Reproduction: Reproducing the population via crossover of individuals in the mating pool and mutation of offspring.8. Continuing with step 2.

Figure 8 .
Figure 8. Fitness convergence characteristics obtained from 50 independent runs of CC (left) GA (right).

Figure 8 .
Figure 8. Fitness convergence characteristics obtained from 50 independent runs of CC (left) GA (right).

Table 2 .
Details of experimental tracks.

Table 3 .
Details of the two series of experiments.

Table 4 .
Sample fragments of the time series of the position of accelerator pedal (in percent).

Table 4 .
Sample fragments of the time series of the position of accelerator pedal (in percent).

Table 5 .
Atomic functions that are invariant to the length of the time series.

Table 6 .
Content of chromosome for optimizing the values of hyperparameters of XGBoost.

Table 6 .
Content of chromosome for optimizing the values of hyperparameters of XGBoost.

Table 7 .
Main parameters of the coevolution.

Table 9 .
Performance of best classifiers on training and test set.

Table 10 .
Performance of best classifiers on training and test set.

Table 9 .
Performance of best classifiers on training and test set.

Table 10 .
Performance of best classifiers on training and test set.

Table 13 .
Computational overhead of the brake-assisting system.

Table 14 .
Example of samples' classification.