Empowering Advanced Driver-Assistance Systems from Topological Data Analysis

: We are interested in evaluating the state of drivers to determine whether they are attentive to the road or not by using motion sensor data collected from car driving experiments. That is, our goal is to design a predictive model that can estimate the state of drivers given the data collected from motion sensors. For that purpose, we leverage recent developments in topological data analysis (TDA) to analyze and transform the data coming from sensor time series and build a machine learning model based on the topological features extracted with the TDA. We provide some experiments showing that our model proves to be accurate in the identiﬁcation of the state of the user, predicting whether they are relaxed or tense.


Introduction
While there have recently been considerable advances in self-driving car technology, driving still relies mainly on human factors. Even in self-driving mode, human drivers must often make decision in a fraction of a second to avoid accidents. Therefore, it is still of utmost importance to develop systems capable of discerning if the human driver is attentive or not to the road conditions. In general, the so-called advanced driver assistance systems (ADAS) [1,2] are systems that are able to improve the driver's performance, among which, adaptive speed limiters, pedestrian detectors [3], and cruise controllers are some of the most popular systems. Fatigue alerting systems are among the most useful among ADAS systems, and the aim of this work is to contribute to the development of such a system based on a systematic analysis of drivers in actual driving conditions. The estimation of the driver's condition (degree of attention to the road, fatigue, etc.) is a very important factor to ensure safety in driving [4,5]. A recent review on the topic can be found in [6]. The goal of this work is to extract behavior patterns from car user data to be able to accurately estimate their state. We used data obtained by the laboratory of prof. Hyung Yun Choi at Hongik University in Seoul. His experiment involved the application of mechanical stimulation to people seated in an automobile.
Our main goal is to extract patterns of behavior from experimental data so as to allow us to learn the most relevant factors affecting driver's attention to the situation of the road.
In the present work, we combine some tools from Morse theory [7] and topological data analysis (TDA) with all of the associated concepts and methods (e.g., Betti numbers, homology persistence, barcodes, persistence images, etc.) [8], most of them introduced and employed later in order to analyze and classify the experimental data. This allows us to introduce concepts as barcodes, that is, persistent and life-time diagrams in a similar way to how they are used in persistent homology. Our main goal is to predict car user behavior following a supervised approach [9]. Instead of considering an original sensor signal as the quantity of interest, we focus on its topological features. In this sense, the framework proposed in this paper allows us to unveil the true dimensionality of data or, in other words, the actual number of factors affecting driver's performance. Thus, we model a sensor signal as a dynamical system, and, therefore, our approach seems to be better at describing its properties, or rather its variations, such as extrema, patterns, and self-similarity, than other approaches. We note that our approach is, in some senses, similar to that followed by Milnor and Thurston [10] in the study of the combinatorial properties of dynamical systems by combining tools from automata theory.
The structure of the paper is as follows: In Section 2, we describe the material and methods employed in this work. Particular attention is paid to the process of data acquisition and the description of time series and data curation. In Section 3, we present the main results of this work, and we discuss the main consequences in Section 4. As a complement, in Appendix A, we thoroughly illustrate the process of computing persistence images for the data of interest.

Material and Methods
In this section, we describe the collection and preprocessing of the experimental data. In Section 2.1, we describe the data acquisition, and in Section 2.2, we provide a description of the time series. Section 2.3 is devoted to data preprocessing. The mathematical tools used to describe the times series at a topological level are explained in Section 2.4. Finally, the image classification methodology is given in Section 2.5.

Data Acquisition
Our proposed predictor directly uses the data collected from the experiments. The data acquisition process involves measuring the response of human behavior when an excitation is applied to the seat. Figure 1 shows the location of the sensors in the experiments. The excitation signal is an angular acceleration imposed on the seat of the user. This acceleration is an oscillating chirp function with a frequency range of 1 to 7.5 Hz on the X axis in rotation. The linear acceleration a = (a x , a y , a z ) and angular velocity ω = (ω x , ω y , ω z ) were measured in both the head and the seat by two IMU (Shimmer inertia measurement unit (IMU) sensors) at 256 Hz. By observing the floor excitation signals, we noted that the excitation is purely rotational around the X-axis-see Figure 2. Several experiences were conducted by nine people by taking into account a set of six fixed states: driver, passenger, tense person, relaxed person, rigid seat, and SAV (sport activity vehicle seat). In particular, for each individual, eight experiments for the six available states were performed: As a consequence, we worked with a sample of 72 experiences, each of them encoded in a time series (as we explain later). Our goal is to classify the behavior of a generic driver, assigning one of the two states (tense or relaxed) by using the sensor data.

Time Series Description
The data acquired from sensors (see Figures 3 and 4) were stored into six-dimensional time series, for both linear acceleration and angular velocity of the head movement. The sampling frequency of the data was 256 Hz, and the duration of the experiment was 34 s; hence, the resulting data dimensionality is 256 × 34 = 8704. For each times series, where 1 ≤ t ≤ 8704, we constructed three new times series called the sliding window, embedding a length of 5800. The first one is given by the times values from t = 1 to t = 5800, the second is given by the times values from t = 1450 to t = 7250, and, to conclude, the third time window is defined as from t = 2904 to t = 8704. Each element in the sample (1 ≤ i ≤ 72) was encoded by means of three six-dimensional time series representing each of the three sliding windows that we represent in matrix form as follows: Here, the matrices have a size of 6 × 5800 and 1 ≤ i ≤ 72. This allows us to represent the information by using a third-order tensor, namely, Z ∈ R 216×6×5800 defined by

Data Preprocessing
In order to obtain a single series for each observation, we concatenated all of the 6 time series (linear accelerations and angular velocities) for each observation horizontally and then created a data frame by stacking the 216 in sample observations. The concatenation operation on the multidimensional time series collapsed the last two dimensions into one dimensional arrays with a length of 5800 × 6 = 34,800. The result is the two-dimensional table of concatenated time series We chose not to filter the signals because the topological sub-level set method should filter the high-frequency features naturally. We also chose to keep working on acceleration signals in order to avoid signal deviations after two integrations in time so as to obtain positions, the sensors not always keeping a zero mean height. Thus, the approach is completely (topologically) data-based.
The six time series Z i of each observation were collapsed into a single concatenated time series with a size of 34,800-see Figure 5. The concatenated time series for the 216 observations were then stacked to create the dataset D with a size of 216× 34,800. We also used binary labels in the chained time series Z i on the two target classes that we were interested in. In particular, we wrote Z (α) i where α is "0" for a relaxed driver and "1" for a tense one.

Extracting Topological Features from a Time Series
The idea to extract the topological information regarding the times series is to consider each sample observation as a piecewise linear continuous map from a closed interval to the real line. Therefore, we used a construction closely related to the Reeb graph [11] used in Morse theory to describe the times series at the topological level.
To this end, we consider the time series x t for 0 ≤ t ≤ N − 1 (N ≥ 3) given by a vector we can view X as a function also denoted by X : Here, to study the topological features of X we use the sub-level set of a piecewise-linear function f X : To construct this function, we consider the basis functions {ϕ 0 , . . . , ϕ N−1 } of continuous functions ϕ i : R −→ R defined by This allows us to construct a piecewise continuous map f X : R −→ R by and also to endow R N with a norm given by In particular, we prove the following result, which helps us to identify the time series given by the vector X in R N with the function f X in L 2 (R). Proposition 1. The linear map Φ : (R N , · ) −→ (L 2 (R), · L 2 (R) ) given by Φ(X) = f X is an injective isometry between Hilbert spaces. Furthermore, Φ(R N ) is a closed subspace in L 2 (R N ).
Proof. The map is clearly isometric and injective because {ϕ 0 , . . . , ϕ N−1 } is a set of linear independent functions in L 2 (R).
Here, we describe the maps f X ∈ Φ(R N ) at the combinatorial level using the connected components (intervals) associated with its λ sub-level sets  For each λ min ≤ λ ≤ λ max , we introduce the following symbolic λ sub-level set for the map f X : Our next goal was to quantify the evolution of the above symbolic λ sub-level with. To this end, we introduce the notion of feature associated with the λ sub-level set LS λ ( f X ).
We define the set of features for functions in Φ(R N ) as We note that LS λ ( f X ) ⊂ F ⊂ F(Φ(R N )). Then next definition introduces the notion of features for a symbolic λ sub-level set as the interval of F(Φ(R N )) constructed by a maximal union of faces of LS λ ( f X ).
Definition 1. We suggest that I ∈ F(Φ(R N )) is a feature for the symbolic λ sub-level set LS λ ( f X ) if there exists I 1 , . . . , I k ∈ LS λ ( f X ) such that I = k j=1 I k and for every J ∈ LS λ ( f X ) such that J = I i for 1 ≤ i ≤ k it holds that I ∩ J = ∅. We denote by F(LS λ ( f X )) the set of features for the λ sub-level set LS λ ( f X ).
A feature for a λ sub-level set LS λ ( f X ) is the maximal interval of F(Φ(R N )) that we can construct by unions of intervals in LS λ ( f X ). To illustrate this definition, we give the following example: Example 1. Let us consider the time series X = (11,14,9,7,9,7,8,10,9).
This allows us to construct the map f X as shown in Figure 6. Then, λ min = 7 and λ max = 14, and we have the following symbolic λ sub-level sets. [3,4], [4,5] This allows us to compute the available features for each λ-value: Let F( f X ) be the whole set of features for f X , that is, Figure 6. The map f X for X = (11,14,9,7,9,7,8,10,9).
Let I ∈ F( f X ); in order to quantify the persistence of this particular feature for the map f X , we use the map λ → LS λ ( f X ) from [λ min , λ max ] to F( f X ). To this end, we introduce the following definition: the birth point of the feature I is defined by a(I) = inf{λ : I ∈ F(LS λ ( f X ))} and the corresponding death point by b(I) = sup{λ : I ∈ F(LS λ ( f X ))}.
In order to determine the grade of similarity between two barcodes from two different time series, we need to set a similarity metric. To this end, we construct the persistent image for f X as follows: we observe that LT ( f X ) is a finite set of points, namely, a 1 , b 1 − a 1 ), . . . , (a k , b k − a k )} for some natural numbers k ≥ 1 and such that b 1 − a 1 ≤ b 2 − a 2 . . . ≤ b k − a k . Then, we consider a non-negative weighting function w : LT ( f X ) −→ [0, 1] given by Finally, we fix M, a natural number, and take a bivariate normal distribution g u (x, y) centered at each point u ∈ LT ( f X ) and with its variance σ id = 1 M max 1≤i≤k (b i − a i ) id, where id is the 2 × 2 identity matrix. A persistence kernel is then defined by means of a function ρ X : (1) We associate with each X ∈ R a matrix in R M×M as follows: let ε > 0 be a nonnegative real number that is sufficiently small, and then consider a square region Ω X,ε = [α, β] × [α * , β * ] ⊂ R 2 , covering the support of ρ X (x, y) (up to a certain precision), such that Ω X,ε ρ X (x, y) dx dy ≥ 1 − ε holds. Next, we consider two equispaced partitions of the intervals The persistence image of X associated with the partition P = {P i,j } is then described by the matrix given by the following equation: (2)

Classification
Image classification is a procedure that is used to automatically categorize images into classes by assigning to each image a label representative of its class. A supervised classification algorithm requires a training sample for each class, that is, a collection of data points whose class of interest is known. Labels are assigned to each class of interest. The classification problem applied to a new observation is thus based on how close a new point is to each training sample. The Euclidean distance is the most common distance metric used in low-dimensional datasets. The training samples are representative of the known classes of interest to the analyst. In order to classify the persistence diagrams, we can use any state-of-the-art technique. In our case, we considered the random forest classification.
Recall that we conducted 9 different experiments, with 24 samples associated with each one of them corresponding to 3 samples for each of the different experimental conditions: relaxed rigid driver, relaxed rigid passenger, relaxed SAV driver, relaxed SAV passenger, tense rigid driver, tense rigid passenger, tense SAV driver, and tense SAV passenger. Their respective labels are {0, 0, 0, 0, 1, 1, 1, 1}. Therefore, we designed the following training validation process: The model is trained over 144 samples and evaluated over the remaining unseen 72 experiments (two-to-one training-to-testing ratio). The split between training and sampling is achieved using random shuffling and stratification to ensure balance between the classes. In order to improve the evaluation of the model generalizability, we also performed a cross-validation procedure following a leave-one-out strategy, consisting of iteratively training over the full dataset except one sample that was left out and used to test and score the model. We used the accuracy metric to evaluate the classification model. We can represent the performance of the model using the so-called confusion matrix: a 2D entries table where elements account for the number of samples in each category, with the first axis representing the true labels and the second axis the predicted labels. We also computed the different classification metrics to obtain a more detailed reporting of the model performances.

Results
The trained random forest classifier model for the persistence images has a notably high accuracy score on the training dataset (144) for both approaches and high accuracy for the testing dataset (72 samples). This suggests strong differentiation of the images with the respect to their generating signals, see Figure 10. The scores on the training and testing are 93 and 83%, respectively. The leave-one-out cross-validation achieved a score of 81%, indicating a good variance-bias trade-off and good generalization potential of the model.

Discussion
The combination of Morse theory and topological data analysis allows us to extract information from real data without the need for smoothness or regularity assumption on the time series. In our case, input data for each experiment were reduced from six-sensor time series of measurements to one single image containing the persistent pattern for attention to the road. Using the obtained persistence images as the new inputs, supervised learning proved to successfully predict the attention state of the driver or passenger.
The procedure used and described in this paper does not involve any additional pre-processing of the sensor data; is robust to noise and degraded signals; and supports large quantities of data, which makes it efficient and scalable.
It is important to highlight the fact that while the proposed methodology based on the TDA (successfully applied in large datasets [9]) seems general and powerful and it was able to extract the main data features, the validity of the driver behaviors observed in the analyzed dataset should be carefully checked due to the overly reduced dataset employed (limited to nine individuals) that does not allow for the full validation of prediction robustness.  Institutional Review Board Statement: Ethical review and approval were waived for this study due to the research presents no risk of harm to subjects and only collects non-personalized anonymized data.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Data are available under request.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix B
To better evaluate a classification model, we are interested in quantities that express how often a sample is correctly or wrongly labelled into a particular class over all the samples and all the classes: Therefore, we can examine in more detail the classification model performance using the following metrics: • The precision P is the number of correct positive results divided by the number of all positive results.
The recall R is the number of correct positive results divided by the number of all relevant samples.
The F-1 score is the harmonic mean of precision and recall.
The accuracy A is the number of correct predictions over the number of all samples.
We can summarize the presented metrics for our model in the following two reports: (a) Training set. (b) Testing set. Figure A9. Classification report.