Hybrid Representation of Sensor Data for the Classiﬁcation of Driving Behaviour

: Monitoring driving behaviour is important in controlling driving risk, fuel consump-tion, and CO 2 emissions. Recent advances in machine learning, which include several variants of convolutional neural networks (CNNs), and recurrent neural networks (RNNs), such as long short-term memory (LSTM) and gated recurrent unit (GRU) networks, could be valuable for the development of objective and efﬁcient computational tools in this direction. The main idea in this work is to complement data-driven classiﬁcation of driving behaviour with rules derived from domain knowledge. In this light, we present a hybrid representation approach, which employs NN-based time-series encoding and rule-guided event detection. Histograms derived from the output of these two components are concatenated, normalized, and used to train a standard support vector machine (SVM). For the NN-based component, CNN-based, LSTM-based, and GRU-based variants are investigated. The CNN-based variant uses image-like representations of sensor measurements, whereas the RNN-based variants (LSTM and GRU) directly process sensor measurements in the form of time-series. Experimental evaluation on three datasets leads to the conclusion that the proposed approach outperforms a state-of-the-art camera-based approaches in distinguishing between normal and aggressive driving behaviour without using data derived from a camera. Moreover, it is demonstrated that both NN-guided time-series encoding and rule-guided event detection contribute to overall classiﬁcation accuracy.


Introduction
The development of computational tools using sensor measurements for the analysis of driving behaviour raises several challenges from a machine learning and signal processing perspective. These challenges include the formulation of classification schemes capable of handling time-series of sensor measurements and the creation of sufficiently diverse datasets, encompassing data of various driving styles.
The first approaches for driving behaviour analysis employed rules, which were usually based on empirically defined thresholds. Bergasa et al. [1] defined such thresholds for acceleration, braking, and turning. They developed a tool using data obtained from mobile phone sensors and cameras, in order to provide feedback in the form of scores and alerts. Joubert et al. [2] used data obtained from telemetry devices and discretized speed and acceleration measurements into a finite risk space to provide personalized driving risk assessment. These rule-based approaches performed well in some setups, assuming a

Background
This section briefly describes CNNs and RNNs. Both types of NNs are investigated in the context of the proposed approach.

CNNs
Convolutional neural networks (CNNs) are widely established NN architectures, capable of generalizing with a relatively low number of free parameters, when compared to traditional NN architectures. CNN-based approaches have led to state-of-the-art results in various areas, most prominently in computer vision [20] and speech processing [21]. The training of a CNN is very similar to the training of a standard NN. It consists of forward data propagation, followed by gradient-based backward error propagation for tuning NN weights. CNNs employ a non-linear activation function, such as rectified linear unit (ReLU): where f is the ReLU and introduces nonlinearities in the decision function, as well as in the entire network. Other functions, such as hyperbolic tangent and sigmoid, are also used to increase non-linearity. ReLU is more often used since it reduces training time without a considerable cost in terms of generalization. CNNs comprise convolutional, pooling and fully connected layers: Convolutional layers are responsible for feature extraction. Each convolutional layer consists of a group of neurons, forming a rectangular grid. This grid is convolved with a given part of the input and the result is passed to the next layer.
Pooling layers are usually placed between single or multiple convolutional layers and progressively reduce the representation size. Each output block of the convolutional layer is subsampled, reducing the complexity of the network and controlling overfitting.
Fully-connected layers are the top-level layers of every CNN-based architecture, performing the high-level reasoning of the entire network, i.e., combining features detected from input parts (such as image regions). The output of the pooling layer is transformed into an N-dimensional vector, where N denotes the number of classes considered. It could be argued that the fully-connected layers are the actual network, whereas the previous layers are feature extractors. This last layer is tied to a loss function, used to provide an estimation of the classification error, which affects the weight tuning performed by means of backward error propagation.

RNNs
Recurrent neural networks (RNNs) were proposed to handle time-series data. An RNN is formulated by means of state and hidden state variables, which depend on the previous states and hidden states. Let x = (x 1 , x 2 , , . . . , x T ), be a sequence. Each hidden state h t is recurrently updated by: where ϕ is a nonlinearity, such as the composition of a logistic sigmoid with an affine transformation. Optionally, the RNN may have an output y = (y 1 , y 2 , . . . , y T ). The update of the recurrent hidden state in Equation (2) is often implemented as: where g is a smooth, bounded function such as the hyperbolic tangent or the logistic sigmoid. RNNs often fail to capture long-term dependencies due to the vanishing gradient effect. Two main directions were followed to cope with this. The first direction consists of alternatives to stochastic gradient descent [15][16][17]. The second direction involves the design of a sophisticated activation function, which consists of affine transformation and a simple element-wise nonlinearity obtained by gating units. The earliest attempt in this direction resulted in a recurrent unit, called long short-term memory (LSTM) [18]. Later, another type of recurrent unit, gated recurrent unit (GRU), was proposed [13,19]. RNNs employing either of these recurrent units have been shown to perform well in tasks that require capturing long-term dependencies, such as speech recognition [22,23] and natural language processing [24,25]. Starting from time-series of sensor measurements obtained under various driving conditions, we investigate the application of both LSTMs and GRUs in the context of driving behaviour analysis.

LSTMs
LSTMs were originally proposed by Hochreiter and Schmidhuber [14], whereas several variants followed, including the variant of Graves [26], which is described here. Unlike the recurrent unit which simply computes a weighted sum of the input signal and applies a nonlinearity, each j-th LSTM unit maintains a memory c j t at time t. The activation h j t of the LSTM unit is: where o j t is an output gate that modulates the amount of memory content exposure. The output gate is computed by: where σ is a logistic sigmoid function and V o is a diagonal matrix. The memory cell c j t is updated by partially 'forgetting' the existing memory and adding a new memory contentc j t : where the new memory content is: The extent to which the existing memory is 'forgotten' is modulated by a forget gate f t j , and the degree to which the new memory content is added to the memory cell is modulated by an input gate i j t . Gates are computed by: Note that V f and V i are diagonal matrices. Unlike the traditional recurrent unit which overwrites its content at each time-step (Equation (3)), an LSTM unit is able to decide whether to keep the existing memory via the introduced gates. Intuitively, if the LSTM unit detects an important feature from an input sequence at an early stage, it easily carries this information much later, capturing potential long-term dependencies.

GRUs
GRUs are more efficient than LSTMs, since they have less parameters and no output gate. Each recurrent unit adaptively captures dependencies in different time scales. As is the case with LSTMs, GRUs have gating units that modulate the flow of information inside   (10) where an update gate z j t determined the extent of activation or content update. The update gate is computed by: This process of deriving a linear sum between the existing state and the new state is similar to the process of LSTMs. However, GRUs do not have any control mechanism determining the extent of state exposure and expose the entire state each time.
The candidate activationh j t is computed similarly to that of the traditional recurrent unit (Equation (3)) and [23]: where r t is a set of reset gates and • is element-wise multiplication. When off (r close to 0), the reset gate effectively makes the unit act as if it is reading the first symbol of an input sequence, allowing it to forget the previously computed state. The reset gate r is computed similarly to the update gate:

Hybrid Representation Approach
The proposed hybrid representation approach employs NN-based (NN could be either CNN or RNN) time slice encoding, and rule-guided event detection. Both components have: (1) input formed by sensor measurements; (2) output encoded in the form of histograms representing frequency of occurrence. The two resulting histograms are merged, divided by route duration, and normalized to form feature vectors aimed at reflecting overall driving behaviour. Figure 1 summarizes the main stages of the proposed approach, whereas each component is described in the following subsections. It can be noted that rather than providing an alternative to CNN, LSTM, and GRU, each variant of the proposed approach incorporates one of these NN types as its data-driven component. Accordingly, CNN/LSTM/GRU models are just 'part of the story'.

CNN-Based Image-like Representation of Sensor Measurements
The main idea in this representation variant is to treat sensor measurements as an image, which is subsequently analyzed by a CNN. Figure 2 illustrates examples of such 'images', derived from sensor measurement samples of the datasets used in this work (see Section 4.1) and associated with different driving behaviours. It could be observed that for each type of driving behaviour there is a distinctive visual element. Taking into account that each time slice label is derived from the label of the containing route (e.g., normal, semi-aggressive or aggressive), the absolute classification accuracy is not a primary concern at this stage. The intuition is to generate a time-series encoding which represents frequency of occurrence of driving patterns and is distinctive of each driving style.
histograms representing frequency of occurrence. The two resulting histograms a merged, divided by route duration, and normalized to form feature vectors aimed at r flecting overall driving behaviour. Figure 1 summarizes the main stages of the propose approach, whereas each component is described in the following subsections. It can b noted that rather than providing an alternative to CNN, LSTM, and GRU, each variant the proposed approach incorporates one of these NN types as its data-driven componen Accordingly, CNN/LSTM/GRU models are just 'part of the story'.

CNN-Based Image-like Representation of Sensor Measurements
The main idea in this representation variant is to treat sensor measurements as an image, which is subsequently analyzed by a CNN. Figure 2 illustrates examples of such 'images', derived from sensor measurement samples of the datasets used in this work (see Section 4.1) and associated with different driving behaviours. It could be observed that for each type of driving behaviour there is a distinctive visual element. Taking into account that each time slice label is derived from the label of the containing route (e.g., normal, semi-aggressive or aggressive), the absolute classification accuracy is not a primary concern at this stage. The intuition is to generate a time-series encoding which represents frequency of occurrence of driving patterns and is distinctive of each driving style.
The CNN architecture used in this work consists of three convolutional layers filtering their input with 32, 64 and 64 kernels of size 1 × 1, 1 × 1 and 3 × 3, respectively. A flatten layer is used to transform the output of the last convolutional layer to a vector. This vector is the input to a dense layer with 100 units. Finally, a fourth dense layer classifies each time slice. The end product of this component in the context of the proposed approach is the histogram encoding the frequency of occurrence of each type of driving behaviour, in the time slice level (Figure 1). Convolutional and dense layers employed the ReLU activation function, except for the last dense layer that employed the sigmoid activation function. Sparse categorical cross-entropy is employed as a loss function and stochastic gradient descent is employed for optimization. This CNN-based approach has been presented in detail in our preliminary work [10].

RNN-Based Time-Series Representation
As an alternative to CNNs, RNNs, such as LSTMs or GRUs, can also be employed for the classification of time slices of raw acceleration data. Taking into account that RNNs have been formulated to address time-series, they constitute a natural choice for time slice classification. As was the case with CNNs (see Section 3.1), the absolute classification accuracy is not a primary concern at this stage and the resulting time-series encoding represents the frequency of occurrence of driving patterns. Both LSTMs and GRUs are configured with 1 layer of 128 neurons, followed by a dense layer of 2 neurons with softmax The CNN architecture used in this work consists of three convolutional layers filtering their input with 32, 64 and 64 kernels of size 1 × 1, 1 × 1 and 3 × 3, respectively. A flatten layer is used to transform the output of the last convolutional layer to a vector. This vector is the input to a dense layer with 100 units. Finally, a fourth dense layer classifies each time slice. The end product of this component in the context of the proposed approach is the histogram encoding the frequency of occurrence of each type of driving behaviour, in the time slice level (Figure 1). Convolutional and dense layers employed the ReLU activation function, except for the last dense layer that employed the sigmoid activation function. Sparse categorical cross-entropy is employed as a loss function and stochastic gradient descent is employed for optimization. This CNN-based approach has been presented in detail in our preliminary work [10].

RNN-Based Time-Series Representation
As an alternative to CNNs, RNNs, such as LSTMs or GRUs, can also be employed for the classification of time slices of raw acceleration data. Taking into account that RNNs have been formulated to address time-series, they constitute a natural choice for time slice classification. As was the case with CNNs (see Section 3.1), the absolute classification accuracy is not a primary concern at this stage and the resulting time-series encoding represents the frequency of occurrence of driving patterns. Both LSTMs and GRUs are configured with 1 layer of 128 neurons, followed by a dense layer of 2 neurons with softmax activation function. This shallow network architecture has been tested, taking into account to avoid overfitting. Sparse categorical cross-entropy is employed as a loss function and Adam is employed for optimization. Preliminary versions of these components have been presented in [13,17].

Rule-Guided Event Detection
Rules can be formulated to express domain knowledge in the area of driving behaviour. In this light, it has been suggested that simple thresholds can be used to reliably detect basic events, such as acceleration, braking and turning [1,2]. Table 1 presents the thresholds proposed by Bergasa et al.
[1] for each type of event and three levels of intensity: low, medium, and high. The rule-guided event detection component uses these thresholds to calculate the frequency of occurrence for each pair of event type and intensity. The derived histogram reflects long-term driving behaviour. Intuitively, both normal and aggressive drivers are expected to use brakes at some point, yet aggressive drivers use brakes much more frequently.

Event Type
Threshold Sensitivity

Route Level Classification
This stage unifies the previously described components in order to obtain route level classification, as illustrated in Figure 1. The histogram generated from the CNN-based component (Section 3.1) or RNN-based component (Section 3.2) is concatenated with the one generated from the rule-guided component (Section 3.3), forming a feature vector aimed at reflecting overall driving behaviour. This feature vector is divided by overall route duration and normalized. Labelled samples of normalized feature vectors are used to train a standard SVM classifier, in order to assess driving behaviour at the route level. A preliminary version of the hybrid approach has been presented in [17]. This approach can be considered as a framework. Accordingly, the NN-based component, the rule-guided component or the SVM classifier could be replaced by alternatives.

Experimental Evaluation
In this section, we describe the datasets used to experimentally evaluate the classification schemes investigated, we provide information details on the experimental setup and present the results obtained.

Datasets
The classification schemes investigated are evaluated on the publicly available UAH dataset [12], which has been acquired by means of mobile phone sensors, as well as on two datasets, which have been acquired by means of telematics sensors and have been created in the context of this work.

UAH
The UAH dataset has been introduced by Romera et al. [12] to facilitate benchmarking of computational approaches for driving behaviour analysis. It comprises of data acquired by means of mobile phone sensors and cameras. UAH route samples represent three types of driving behaviour: normal, aggressive, and drowsy, under various conditions: motorway or secondary road, with six drivers of different genres and ages ( Table 2). As demonstrated by Romera et al., drowsy driving behaviour is manifested with slow lane changes and can be effectively detected by means of a camera. Since, in this work, we limit our study in sensor data, we focus on distinguishing between normal and aggressive behaviour, using only the respectively labelled routes from UAH. Overall, in our experiments we use 23 UAH route samples encompassing acceleration measurements in the lateral and longitudinal axis of the vehicle, acquired at 10 Hz by means of the inertial sensor of an iPhone [12].

. MOTIF Datasets
We created two datasets, namely MOTIF 1 and MOTIF 2 (MOTIF is the title of the project which funded this work). Both datasets were acquired by means of FMS-500Light+ telematics device, manufactured by Xirgo technologies (former BCE) [27]. The device is equipped with accelerometers, a GPS receiver of 1 m resolution, and a GSM modem that works up to 4G cellular protocol. The raw accelerometer data were merged with the corresponding GPS coordinates to create vehicle routes. Three types of driving behaviour have been considered: normal, semi-aggressive, and aggressive. The MOTIF 1 dataset comprises of 11 route samples, whereas the MOTIF 2 comprises of 12 route samples. Six drivers have been involved (Table 3) in a range of six months, driving the same vehicles, following the same route but in different time periods. Each route sample has approximately a duration of 15 min, whereas the measurement vectors were acquired at 0.1 Hz and comprise 27 features: maximum positive acceleration, maximum negative acceleration, maximum transverse acceleration, a 21-bin histogram of acceleration values ranging from −0.5 g to 0.5 g, latitude, longitude and speed.

Experimental Setup
For the CNN-based variant of the proposed approach, dropout is set to 0.4, whereas batch size was set to 32. In the case of the RNN-based variants dropout, recurrent dropout [28] and sample length were set to 0.2, 0.2, and 50, respectively, whereas the batch size for RNNs was set to 1914. The SVM used for route level classification employs an RBF kernel with C = 1000. When considering 20% ranges centered on these parameter settings, the classification accuracy variance did not exceeding 5%, for all parameters. In the case of event-based detection component, the thresholds used are the ones validated by Bergasa et al.
[1] for identifying accelerations, braking, and turnings, with three different levels of intensity: low, medium, and high. These threshold values are provided in Table 1, where a y and a z , are the accelerations in lateral and longitudinal direction, respectively. Sixty percent of the samples have been used for training, 30% for validation, and 10% for testing. For the training stage, all time slices of a route inherit its label (i.e., all time slices of a normal or aggressive route are labelled as normal or aggressive, respectively). Accordingly, this component is trained using a broad labelling at the route level and provides classification at the time slice level.
The experiments were performed on a workstation with AMD Ryzen 5 1400 quad core processor (8 CPUs) on 3.4 GHz and 8 GB RAM, using NVIDIA GeForce GTX 1060 GPU with 6 GB and Microsoft Windows 10 Pro (64 bit). All pipelines were implemented in Python, using Keras [29] with the Tensorflow [30] backend.

Results
The experiments were performed at two levels: the first level is the classification of time slices by means of the NN-based representations presented in Sections 3.1 and 3.2, whereas the second level is the classification of routes, which is performed by means of the hybrid approach described in Section 3.4. In both levels, we perform comparisons with state-of-the-art approaches.

Time Slice Classification
In this Section we evaluate the classification performance of the three variants of the proposed approach (CNN, LSTM and GRU) at the time slice level and perform quantitative and qualitative comparisons with state-of-the-art. Figures 3-5 illustrate the training and validation loss of the proposed approach at the time slice level for CNN, LSTM, and GRU-based variants, respectively. The oscillations of the validation loss can be attributed to Adam optimization [31]. Besides this, there are no increasing parts in the validation loss, indicating that overfitting has been prevented.        Table 4 presents the confusion matrices for the classification of time slices, as performed by the three variants of the proposed approach (CNN, LSTM and GRU) on each one of the three datasets described in Section 4.1 (UAH, MOTIF 1 and MOTIF 2). The overall classification results at the time slice level are summarized in Table 5. The most accurate classification is performed by GRU (accuracy 0.91), followed closely by LSTM (accuracy 0.89). Also, CNN tends to misclassify most aggressive time slices as normal. This latter behaviour also emerged in the preliminary work [17], and agrees with the intuition that both normal and aggressive drivers often seem to drive 'normally'. Still, the results of both LSTM-based and GRU-based components indicate that there are differences between normal and aggressive drivers in most time slices, which can be distinguished by certain NN-based encodings. It should be noted that classification at the time slice level cannot be regarded as a goal in itself in the context of this work. Rather than resulting in 'correctly' classified time slices, the intuition is to result in a time-series encoding which represents frequency of occurrence of driving patterns and is distinctive for each driving style. More so, taking into account that time slice labels are derived by the label of the containing route, essentially prohibiting a 'normal' time slice in an 'aggressive' route and vice-versa. In this sense, each NN-based component performing time slice classification is essentially evaluated by considering the labels at the route level. For example, when comparing two NN-based variants, variant A with lower classification accuracy at the time slice level, followed by higher classification accuracy at the route level, and variant B with higher classification accuracy at the time slice level, followed by lower classification accuracy at the route level, the encoding obtained by variant A generalizes better than the one obtained by variant B. Other works, sharing similar elements with the proposed approach, have been applied on the UAH dataset focusing on time slice classification. Saleh et al. [11] report an F1 measure of 91%, which is equal to the one reported here for the GRU-based variant ( Table 5). Saleh et al. additionally identify drowsy driving behaviour, however this is achieved by employing mobile phone camera in order to identify lane drifting, which has been acknowledged [12] as a strong indicator for this type of driving behaviour. On a similar setting, Khodairi and Abosamra [18] report an F1 measure exceeding 99%. This is the highest score reported in the literature for time slice classification, however it is also obtained by using mobile phone camera. The CNN-based approach of Xie et al. [7] is also evaluated on the UAH dataset, however it is not quantitatively comparable since it addresses a different problem: maneuver classification. Overall, the proposed hybrid approach can be viewed as a framework to combine data-driven classification with domain knowledge. As such, it could potentially encompass other classification approaches introduced in the literature, as is the approach of Khodairi and Abosamra, in order to address time slice classification, as well as overall route classification (see Section 4.3.2).

Route Level Classification
In this section, we evaluate the classification performance of the three variants of the proposed approach (CNN, LSTM and GRU) at the route level. To provide further insights, we also investigate the performance of standalone NN-based and rule-based components. Finally, we perform quantitative and qualitative comparisons with state-of-the-art. Table 6 presents the confusion matrices for the classification at the route level, as performed by the three variants of the proposed approach (CNN, LSTM and GRU) on each one of the three datasets described in Section 4.1 (UAH, MOTIF 1 and MOTIF 2). Hybrid variants, which employ rule-based event detection, are presented without parentheses, whereas their corresponding NN-only variants are presented in parentheses. It could be observed that: (1) In almost all cases, each hybrid variant obtains equal or higher classification accuracy, when compared to its NN-only counterpart (one exception arises in CNN-based classification of normal samples in the MOTIF 1 dataset). This indicates that the rule-based component contributes to overall classification performance and in several cases 'corrects' the result obtained by the NN-based component. (2) When comparing the three NN architectures investigated, the RNN-based ones (LSTM and GRU) obtain a higher classification performance than the CNN-based architecture, more so in the UAH and MOTIF 2 datasets. This could be attributed to the fact that RNNs have been formulated to capture patterns in time-series such as these sensor measurements. (3) Between LSTM and GRU, the latter achieves slightly more accurate classification.  Table 7 provides a more detailed view of the classification results in the UAH dataset, including the results obtained by the approach of Romera et al. [12]. Each classification result is marked as 'T' (True) or 'F' (False). We included the results obtained by the standalone rule-based component ('Events (only)') in order to highlight its contribution to overall classification accuracy obtained by the hybrid variants. It could be observed that: (1) The GRU-based hybrid variant ('Hybrid (GRU)') has 1/23 misclassification, whereas the LSTM-based and CNN-based hybrid variants ('Hybrid (CNN)' and 'Hybrid (LSTM)') have 6/23 and 2/23 misclassifications. The approach of Romera et al. [12] has 3/23 misclassifications. (2) All 'NN-only' variants lead to a considerable number of misclassifications. Still, when 'NN-only' variants are combined with 'Events-only', the overall classification accuracy is increased, as evident in the results obtained by the 'Hybrid' counterparts. Also, there is a case in which 'NN-only' variants 'correct' 'Events-only' ('D4-Aggressive-Secondary'). These observations demonstrate that each component contributes complementary information, increasing overall classification accuracy.
It should be noted that the results of the hybrid classification variants, as well as of the 'NN-only' and 'Events-only' variants, which have been introduced in the context of this work, were obtained using acceleration measures as input. On the other hand, the method of Romera et al. [12] uses all smartphone sensors (inertial sensors, camera, GPS, and internet access), in order to log and recognize driving maneuvers and infer behaviour. In addition, Romera et al. identify drowsy driving behaviour, but as is the case with Saleh et al. [11], this is achieved by using mobile phone camera in order to identify lane drifting, which has been acknowledged by the authors as a strong indicator for this type of driving behaviour. In the work of Romera et al., there are no misclassifications of either normal or aggressive routes as drowsy, in order to affect the results presented on Table 7.  T  T  T  T  T  T  T  T  D2  15  26  F  T  T  T  T  T  T  T  D3  15  26  F  T  T  T  T  T  T  T  D4  16  25  T  T  T  T  T  T  T  T  D5  15  25  F  T  T  T  T  T  T  T  D6  17  25  T  T  T  T  T  T  T  T   Aggressive  (Motorway)   D1  12  24  F  T  T  T  T  T  T  T  D2  14  26  T  F  F  T  F  T  T  F  D3  13  26  F  T  T  T  T  T  T  T  D4  15  25  T  F  F  T  T  T  T  F  D5  13  25  F  F  F  T  T  T  T  F  D6  15  25  F  T  T  T  T  T  T  T   Normal  (Secondary)   D1  10  16  T  T  T  T  T  T  T  T  D2  10  16  T  F  T  T  T  F  T  T  D3  11  16  F  T  T  T  T  T  T  T  D4  11  16  T  F  F  T  T  T  T  T  D5  11  16  T  T  T  T  F  T  T  T  D6  13  16  T  T  T  T  T  T  T  T   Aggressive  (Secondary)   D1  8  16  T  F  F  T  T  T  T  T  D2  10  16  T  T  T  T  T  T  T  T  D3  11  16  F  F  F  F  F  F  F  T  D4  10  16  F  T  T  F  F  T  T  T  D5  7  12  T  T  T  T  F  T  T

Conclusions
This work introduces a hybrid representation approach for driving behaviour classification. The main idea is to combine data-driven classification methods with domain knowledge in the form of rules. The proposed approach combines NN-based encoding and rule-based event detection. Histograms derived from the output of these two components are concatenated and normalized to train a standard SVM, which is used to assess overall driving behavior. CNN, LSTM, and GRU architectures are employed in the context of different variants. The proposed approach is evaluated on the publicly available UAH dataset [12], as well as on two datasets (MOTIF 1 and 2) created in the context of this work.
Such a hybrid approach has not appeared in the literature of driving behaviour analysis computational tools. Other novel elements of the proposed approach include the use of image-like representations of sensor measurements in the case of the CNN-based variant, as well as the first application of GRUs in driving behaviour analysis.
The main conclusions derived from our experiments can be summarized as follows: (1) Both NN-guided time-series encoding and rule-guided event detection contribute to the accuracy obtained by the proposed hybrid classification method. (2) The RNN-based variants (LSTM and GRU) obtain higher classification performance than the CNN-based variants, more so in the UAH and MOTIF 2 datasets. (3) Between LSTM and GRU, the latter achieves slightly more accurate classification. (4) The GRU-based variant obtains time slice classification, which exceeds 90% accuracy, without using data derived from camera, as is the case with other state-of-the-art approaches [11,18]. (5) In terms of overall route classification, the proposed approach outperforms the approach of Romera et al. [12] in distinguishing between normal and aggressive driving behaviour, resulting in less misclassifications in the UAH dataset. This result of the proposed method is obtained without using data derived from camera, as is the case with the method of Romera et al.
Future perspectives of this work include experimentation with other data-driven components, as well as the use of fuzzy rules in the context of the rule-guided event detection component. Silva et al. [32] underlined the scientific opportunity that is created by the abundance of live data retrieved from sensing systems, pervasive devices, or systems with context recognition and communication. In the same direction, Semansjki et al. [33] investigated the integration of data generated through the mobile devices and the social media activities can be integrated to Smart City sustainable mobility planning. Having been successfully applied on various domains [34,35], machine learning techniques could eventually lead to "intelligent" mobility criteria, contributing to the decision making, planning and overall sustainable mobility policy of modern Smart Cities. Funding: This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH-CREATE-INNOVATE (project code: T1EDK-03459).

Institutional Review Board Statement:
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. For this type of study formal consent is not required.