Focusing on Driving Modes Rather Than Drivers: Toward More Precise and Efﬁcient Car-Following Behavior Modeling

: Car-following (CF) behavior is one of the most important driving behaviors. Accurately understanding and modeling CF behavior is essential for trafﬁc ﬂow simulation and user-acceptable advanced driving assistance systems (ADASs). In previous decades, CF models were calibrated based on drivers or trajectories, with short-term changes ignored. Recent studies have indicated that these changes could be caused by occasional irritations or regular switches of driving modes, but there is still a lack of speciﬁc understanding of driving modes and how these modes affect simulation accuracy in the reproduction of CF behavior. This paper explored the existence of driving modes and the quantiﬁed modeling inﬂuence of driving modes. Speciﬁcally, we ﬁrst extracted 4000 high-resolution CF events of 40 drivers from large-scale naturalistic driving data for the discovery of underlying driving modes. Then, we introduced a novel multivariate time series method, Toeplitz Inverse Covariance-based Clustering (TICC), to achieve the segmentation and classiﬁcation extraction of different driving modes. Finally, calibrated by the CF dataset, the proper cluster number of the driving mode was determined, and a comparison of driving-mode-based modeling (DMBM) and driver-based modeling (DBM) was conducted. The results showed that the driving process could be viewed as ﬁve core driving modes, and the DMBM has the potential to bring upwards of a 13% accuracy improvement with fewer parameters.


Introduction
The car-following (CF) model was established to describe how a driver reacts to the leading vehicle in the longitudinal direction, which is the most basic driving behavior [1].The precision with which CF models can replicate the behavior is of paramount importance for traffic operation and safety, for example, precise CF modeling can benefit our understanding of the mechanisms of traffic congestion evolution and the study of negative externalities in traffic flow such as oscillations [2].It also can provide reliable behavioral support for adaptive cruise control strategies in advanced driving assistance systems (ADASs) [3].Thus, in past decades, many efforts have been made to narrow the gap between the CF model outputs and the actual human driving behavior through model comparisons or improvements in structure.Essentially, most of these models are formulated so that the simulated driver follows a long-term intrinsic law defined by initial unified parameters, referred to as driver-based modeling (DBM) in this study.
However, affected by the dynamic human-vehicle-road environment system, drivers will correspondingly change the CF process's behavior [4,5].This intra-driver variance can be recognized as the occasional short-time and common behavior variance in different driving modes [6].The former is an occasional behavior in the CF process, and the location of these events in the record is relatively clear, usually caused by obvious abnormal changes in the external environment.On the contrary, the driving mode, defined as a certain behavior consistency or repetition in the CF interaction in a time segment, can be viewed as the basic component of CF behavior.Both objective environmental changes and subjective adjustments of drivers in driving task realization can lead to behavior variance in different driving modes, which almost exists in all the CF events.However, the core pathway of the current DBM is to apply a complex mapping function to pool what are essentially different states and, as a result, the CF model based on the DBM is likely to be inaccurate in the description of each mode, but to achieve a kind of average we lower the total error level.Therefore, the concept of driving-mode-based modeling (DMBM) is proposed; that is, to convert the parameters of a CF model from the mapping of different drivers to the mapping of different driving modes.However, our understanding of driving modes remains limited.Specifically, there is a dearth of knowledge regarding the fundamental modes that constitute the car-following process and how the behavior of these different driving modes influences the accuracy of existing car-following models.
To recognize several basic CF behavior modes from massive empirical driving data is a process of segmenting data and then categorizing these segments according to the potential data characteristics.A long multivariate CF time series is to be segmented into short sequences which contain repeatedly reoccurring behavior "patterns" or "modes", such as certain slight fluctuations in the velocity dimension.For consistency, we use "mode" to refer to it.Meanwhile, these short sequences of common repeat features will be labeled with the same mode number.Conventionally, the rule-based division is applied.Taking a common example, the classical Wiedemann model consists of different threshold-determined state-action regimes representing drivers' differentiated behavior modes, such as unconscious reaction, reaction, deceleration, etc.Nevertheless, this mode segmentation is subjective and provides less insight into structural similarities in the data, and therefore the Wiedemann model has not shown advantages in modeling accuracy in the past compared with those traditional nonlinear models based on the DBM [1].In contrast, an unsupervised data-driven method is more promising and adaptable for this question.Recently, Higgs et al. [7] applied a distance-based time series clustering method to the CF records and found up to 30 driving modes for car drivers and another 30 modes for truck drivers.This quantity is outside of the scope of the basic driving modes we are aiming to investigate, which will lead to an unacceptable number of parameters if it is applied to the DMBM.Furthermore, conventional clustering methods relying on distance-based metrics may emphasize matching values rather than searching for nuanced structural similarities in the CF process.In summary, how to precisely segment and determine CF behavior modes from empirical data records remains elusive.
Furthermore, both the volume and accuracy of data are critical for accurately identifying potential subtle data changes and dependencies in different driving modes through data-driven approaches.Despite the increasing use of real-world trajectories in CF model calibration and comparison [8,9], due to the development of data collection and storage, the accuracy of some commonly used data cannot be well guaranteed.For example, Next Generation Simulation (NGSIM) data [10], a contributive dataset has been widely adopted in traffic flow and travel behavior modeling, has also been criticized for its accuracy issues [11].On the other hand, for comparison with the DBM, sufficient records for each driver are necessary to obtain accurate fitting parameters for drivers.Some recent datasets such as High D [12] have indeed been improved with new recording approaches, but their observations of each driver are extremely limited.Higgs et al. introduced naturalistic driving data with long records for each driver, but the fact that there are only ten drivers for each type of vehicle limits the credibility of the results.
Therefore, to fully overcome the challenges mentioned above, we first introduce a novel data-driven approach, named Toeplitz Inverse Covariance-based Clustering (TICC), to solve the driving modes mining.Then, a large-scale naturalistic driving study (NDS) is adopted to validate and compare the DMBM and the DBM.The main contributions of this study are:

•
With up to 40 representative drivers from the large-scale NDS data, a total of 4000 highresolution CF events were extracted for CF behavior modes mining.The duration of each event is longer than 30 s to guarantee that more continuous data dependencies in the CF behavior can be found;

•
An accurate subsequence clustering tool of multivariate time series, TICC, is introduced for driving mode recognition.Rather than conventional distance-based clustering, it is based on the graphical dependency structure of each subsequence to recognize the potential reoccurred CF behavior patterns more precisely;

•
The number of CF behavior modes is determined by careful consideration of the clustering results and the CF behavior modeling results.Based on this, a comparison of the DBM and the DMBM is conducted to demonstrate the potential of the DMBM to improve the accuracy of CF modeling.
The rest of this paper is organized as follows.Section 2 reviews the related works regarding CF behavior and mode discovery techniques; Section 3 describes the data information and extraction of CF events; Section 4 details the methodology of mode recognition and CF model calibration; Section 5 determines the number of driving modes and shows the modeling performance; Section 6 provides discussions; and Section 7 concludes the study and discusses future work.

Related Works
In the past several decades, a multitude of car-following models have been developed and refined to more accurately replicate human car-following behavior.These models are optimized using specific sets of parameters that best fit the driving characteristics of individual drivers.In a conventional way, most CF models are the mapping functions of different drivers' full trajectories, but some researchers have noticed that there are different driving behavior modes or patterns in the CF process, which can be viewed as correlations of behavioral characteristics.In this section, we summarize related works from the following three perspectives: (1) CF behavior modeling, (2) driving modes in CF, and (3) mode discovery of time series.

CF Behavior Modeling
In the past seventy years, consideration of developing and improving CF behavior modeling, which is one of the foremost fundamental driving behaviors, has been ongoing.Various mapping functions, ranging from simple linear equations to complex sets of nonlinear equations, have been proposed to better understand interactions between following vehicles and leading vehicles in non-free traffic flow.Four fundamental models have been proposed and well-developed and have been demonstrated to be reliable in simulation practices: stimulus-response logic, safety distance logic, desired measures logic, and perceptual-thresholds logic.While attempts based on machine learning techniques have been made, they are beyond the scope of this paper due to limitations in the analysis of human-driven mechanisms, which hinders further applications.
First, the stimulus-response CF model is commonly recognized as the foremost classical CF interaction logic.This kind of model takes the interaction characteristics between the CF pair as a "stimulus", based on which the following vehicle makes "responses" accordingly.Commonly adopted factors that are considered as stimuli include the velocity of the leading vehicle, spacing between the CF pair, and the velocity difference [13].Numerous upgraded and advanced forms of the original version proposed by Chandler et al. [14] have been proposed, such as models with memory [15], or acceleration and deceleration asymmetry [16].Second, Kometani [17] initially defined the safety distance logic.The main hypothesis of this logic is that gaps between the CF pair are more linked to driver response rather than relative velocities.Nonlinear constructs were introduced to fulfil this logic by Newell [18].There is also a model proposed by Gipps [19] that increased the adaptability with several human behavioral parameters considered, such as reaction time and desired velocity, which has been the most popular in its category.Third, models based on the concept of the desired measures were established by Helly [20].Differing from the logics above, drivers are supposed to achieve expected values of CF characteristics such as time headway.Another famous and well-adopted model is the intelligent driver model (IDM).The IDM is proven to be most accurate and adaptable in reproducing the CF behaviors of drivers, and it remains accurate when applied to different driving styles or different traffic flow facilities [1].
Finally, to avoid neglecting the impact of human psychological reactions, a new concept based on "perceptual threshold" was proposed by Wiedemann.It is supposed that not all stimuli from CF pair interactions can lead to subjective operations from the following vehicle.The threshold of perception for a driver is defined as the minimum value of a stimulus that can be detected and elicit a response.Therefore, a dynamic threshold is used to distinguish between free-flow mode or CF mode, which is one of the innovations of this model [13].In essence, it is one of the few models that can reproduce the following behavior from different driving phases or modes and divide and conquer for each mode.Then, researchers attempted to improve the accuracy of the Wiedemann model by changing the sub-model with the GHR model [21].However, the driving modes division in these models is relatively subjective and lacks empirical verification.The thresholds that divide different CF modes, such as SDV, BX, SDX, etc. [22], were subjectively proposed based on the theoretical hypothesis, and it is difficult to find empirical evidence to prove that these modes are consistent with realistic driving modes in human CF behavior.Additionally, the Wiedemann model shows disadvantages in the comparison of CF models over the IDM with much more behavioral parameters [1], while the latter only contains free-flow mode and CF mode.
Despite the approach of perceptual-thresholds logic, the aforementioned models are based on the assumption of a single mapping function for each driver.Consequently, regardless of the complexity of the mapping function, these existing methods may still produce errors due to pooling essentially different modes.Moreover, there is still an insufficient and fragmented understanding of driving modes in the car-following process.

Driving Modes in the CF Process
The formulation of human behavior is a highly complex process influenced by a multitude of factors, including individual physiological and psychological characteristics, interpersonal interactions, and external environmental conditions [1].To address the oversimplifications inherent in modeling such a complex process, numerous studies have endeavored to elucidate the variability in CF behavior.Most of them focused on occasional unusual behavior, such as distraction or short-time variance caused by cut-ins, but less attention was paid to the CF process's basic modes.
A simple attempt is to divide the CF process according to the acceleration.Drivers in the accelerating and decelerating process will adopt different driving strategies [23], which is named asymmetric driving behavior and needs to be modeled separately [24], but there is less insight into the division of driving modes since only two modes can be provided and investigated.Meanwhile, Higgs et al. [7] viewed the driving process as the mapping function of a driver's reaction and the driving state variables, in which driving modes are the different divided parts of the state space.They adopted a two-step algorithm and clustered up to 30 driving modes from 10 car drivers, but the quantity is out of the scope of the fundamental driving modes; therefore, it is almost impossible to be applied in the simulation practices due to the considerable number of parameters.Some researchers, such as Lin et al. [25], attempted to discover the modes of sub-state transition in CF to divide driving modes, but, essentially, there still exists different behavior relevance in each mode.
As viewed above, although attempts have been made to decompose CF behavior and find some basic modes, the results are not satisfactory and still cannot answer what repetitive correlations of behavioral characteristics make up the CF behavior.

Mode Discovery of Time Series
Multivariate time series are exceptionally common, existing in various areas such as engineering, science, or finance, which often contain some repeatedly occurring modes.The differences between those modes can be very small.Discovering those modes contributes to realizing the underlying mechanisms, while in this study, we are attempting to apply a proper tool to find the behavioral modes which exist in the CF process.
A lot of work has been done to cluster similar modes from time series, but fewer are introduced to discover the modes in the CF or even in transportation engineering [7].In terms of data dimensionality, the research can be broadly categorized into two primary groups: univariate and multivariate clustering.In this study, we mainly focused on the latter.In multivariate clustering, many techniques were introduced, such as dynamic time warping [26], piecewise approximation [27], and symbolic representations [28].Most of these methods cluster time series from different perspectives based on similarity measures, and one widely used measure of time series shape-based clustering is Euclidean distance.Some efforts have also been made for simultaneous clustering and segmentation, known as time-point clustering [29,30].However, these shape-or distance-based methods tend to match raw values rather than pay attention to the inherent structural correlations between the different dimensions in each subsequence.Additionally, some have claimed that unreliable, even random, results could be obtained [31].Therefore, another kind of clustering algorithm, called model-based clustering, may be more adaptable to our core need in this study.In this kind, methods based on Gaussian mixture, hidden Markov models, and ARMA are both commonly used in time series clustering.Recently, a novel model-based method called TICC was proposed, which applies Markov random fields (MRFs) to describe the dependency structures of short-time subsequences.Meanwhile, its performance has been verified to be better than the abovementioned model-based methods in preventing over-fitting and accuracy.

Data Source
To investigate the adaptability of models to car-following (CF) behavior in various situations, it is essential to use driving data that provide full spatial-temporal coverage.
In the present study, we employed large-scale data obtained from the Natural Driving Research Project, a collaborative endeavor involving GM, Huawei, SAIC, and Tencent, which collected data from 60 drivers for up to 200,000 km.
The quality and resolution of the NDS data are well guaranteed.The data were collected in a naturalistic driving context to minimize the impact of acquisition devices on driving behavior.Figure 1 displays the data acquisition system.It consists of a Doppler radar, a triaxial accelerometer, a GPS, and four synchronized cameras.These components work together to capture location data and kinematic information for both the subject vehicle (including speed, longitudinal acceleration, and lateral acceleration) and surrounding vehicles.The data record driving information in time series with up to 86 dimensions.The recording frequency for most data is 10 Hz, while for a few dimensions it is 1 Hz.The radar detects road participants around the vehicle and records at a frequency of 10 Hz.It can record the relative position and speed of up to eight objects at the same time.We determine the vehicle being followed by the subject vehicle based on the relative position relationship.Additionally, the system records any manual operations performed by the driver within the vehicle.
To understand the impact of the driving mode on the modeling accuracy, an extensive comparison of DBDM and DBM is required, which requires enough drivers with enough driving records to be included in the study.In the NDS, natural driving behavior was recorded for a total of 40 Chinese drivers, primarily from the Shanghai metropolitan area, and they were selected at random.The demographic information (gender, driving experience, age) of the 40 drivers is similar to the population of registered drivers in China [32,33].The difference between the distribution of NDD on each dimension and the overall distribution is within 5%.In contrast to the trajectory datasets discussed in Section 2 and used in CF modeling and driving modes, the NDS offers extensive multi-dimensional data with full spatial-temporal coverage.The NDS not only includes a larger number of participating drivers, but it also provides greater traffic flow facility coverage for each driver than other datasets containing the long-term observations of drivers.
enough driving records to be included in the study.In the NDS, natural driving behavio was recorded for a total of 40 Chinese drivers, primarily from the Shanghai metropolita area, and they were selected at random.The demographic information (gender, drivin experience, age) of the 40 drivers is similar to the population of registered drivers in Chin [32,33].The difference between the distribution of NDD on each dimension and the overal distribution is within 5%.In contrast to the trajectory datasets discussed in Section 2 and used in CF modeling and driving modes, the NDS offers extensive multi-dimensional dat with full spatial-temporal coverage.The NDS not only includes a larger number of partic ipating drivers, but it also provides greater traffic flow facility coverage for each drive than other datasets containing the long-term observations of drivers.

CF Events Extraction
The dataset was deemed satisfactory for analysis due to minimal noise and missin data.To further enhance the quality of the data, Kalman filtering and cubic spline inter polation techniques were employed for noise reduction and inpainting.An automati search filter was then utilized to extract CF events based on specific thresholds such a time headway and lateral distance.The extraction criteria were informed by previous re search [34,35].
Following data extraction, a random sampling methodology was implemented to en sure that the sample accurately reflected general driving behavior.A total of 100 CF event were randomly selected for each driver from a range of traffic flow facilities and condi tions.This resulted in a total of 4000 samples collected from the 40 drivers for subsequen analysis, with a total recording time exceeding 2500 min.

Methodology
The framework of this study is illustrated in Figure 2. To solve the problem of the C data segmentation and clustering according to data structure dependencies, a promisin approach, TICC, is applied to the extracted CF events from the NDS.It applies MRFs t encode and match the graphical dependency structure of each subsequence.Then, classi CF models (the IDM and the hybrid IDM-Wiedemann model) are calibrated and validated to compare the performances of the DBM and DMBM.

CF Events Extraction
The dataset was deemed satisfactory for analysis due to minimal noise and missing data.To further enhance the quality of the data, Kalman filtering and cubic spline interpolation techniques were employed for noise reduction and inpainting.An automatic search filter was then utilized to extract CF events based on specific thresholds such as time headway and lateral distance.The extraction criteria were informed by previous research [34,35].
Following data extraction, a random sampling methodology was implemented to ensure that the sample accurately reflected general driving behavior.A total of 100 CF events were randomly selected for each driver from a range of traffic flow facilities and conditions.This resulted in a total of 4000 samples collected from the 40 drivers for subsequent analysis, with a total recording time exceeding 2500 min.

Methodology
The framework of this study is illustrated in Figure 2. To solve the problem of the CF data segmentation and clustering according to data structure dependencies, a promising approach, TICC, is applied to the extracted CF events from the NDS.It applies MRFs to encode and match the graphical dependency structure of each subsequence.Then, classic CF models (the IDM and the hybrid IDM-Wiedemann model) are calibrated and validated to compare the performances of the DBM and DMBM.

Problem Description
As mentioned above, drivers will take different driving modes in the CF process, but the accurate definitions and classification standards for these modes cannot be found in past studies on driving behavior.Driving modes can be viewed as different reoccurring patterns in the time series of the driving behavior of a certain driver, while the time series of CF is usually recorded as multivariate time series data.

Problem Description
As mentioned above, drivers will take different driving modes in the CF process, but the accurate definitions and classification standards for these modes cannot be found in past studies on driving behavior.Driving modes can be viewed as different reoccurring patterns in the time series of the driving behavior of a certain driver, while the time series of CF is usually recorded as multivariate time series data.
We set  as the CF trajectory of  sequential -dimensional observations, which is expressed as: where  ∈ ℝ is the  -th multivariate observation of the trajectory, and  equals 4, which is the number of the key variable in the CF process (acceleration, velocity, gap, and velocity difference).Discovering driving modes in CF requires a segmentation and clustering process of multivariate time series, which is quite a challenging mathematical problem.Differing from standard time series segmentation, multiple segments can belong to the same cluster, explained as the driving mode in the CF process.On the other hand, driving behavior can be viewed as a continuous relationship in time, thus it is also harder than simple subsequence clustering since each data point cannot be clustered individually (neighboring points are encouraged to belong to the same cluster) [36].Additionally, these clusters are required to each present a certain basic CF mode, that is, to be in line with people's common understanding of basic driving behavior.

Toeplitz Inverse Covariance-Based Clustering (TICC) Method
Therefore, in this research, we adopted a novel unsupervised learning method for the subsequence clustering of multivariate time series to obtain proper driving modes in the CF process for the following calibration, which is called TICC.The discovery of driving modes involves the simultaneous segmentation and clustering of multivariate time series data, and TICC is an innovative method designed to address this problem.This issue is more challenging than conventional time series segmentation because several segments may belong to the same cluster [37,38].However, it is also more complex than the problem of subsequence clustering because individual data points cannot be clustered separately (since neighboring points are encouraged to be in the same cluster) [39,40].Distance-based We set x c f as the CF trajectory of T sequential n-dimensional observations, which is expressed as: where x i ∈ R n is the i-th multivariate observation of the trajectory, and n equals 4, which is the number of the key variable in the CF process (acceleration, velocity, gap, and velocity difference).Discovering driving modes in CF requires a segmentation and clustering process of multivariate time series, which is quite a challenging mathematical problem.Differing from standard time series segmentation, multiple segments can belong to the same cluster, explained as the driving mode in the CF process.On the other hand, driving behavior can be viewed as a continuous relationship in time, thus it is also harder than simple subsequence clustering since each data point cannot be clustered individually (neighboring points are encouraged to belong to the same cluster) [36].Additionally, these clusters are required to each present a certain basic CF mode, that is, to be in line with people's common understanding of basic driving behavior.

Toeplitz Inverse Covariance-Based Clustering (TICC) Method
Therefore, in this research, we adopted a novel unsupervised learning method for the subsequence clustering of multivariate time series to obtain proper driving modes in the CF process for the following calibration, which is called TICC.The discovery of driving modes involves the simultaneous segmentation and clustering of multivariate time series data, and TICC is an innovative method designed to address this problem.This issue is more challenging than conventional time series segmentation because several segments may belong to the same cluster [37,38].However, it is also more complex than the problem of subsequence clustering because individual data points cannot be clustered separately (since neighboring points are encouraged to be in the same cluster) [39,40].Distance-based methods, such as dynamic time warping, are often used to solve these types of problems [41].However, they focus more on matching raw values rather than exploring the overall structural consistency of time subsequences by examining the correlations between multidimensional data.TICC was introduced in 2017 and has been widely used in engineering [42], finance [43], and medicine [44] due to its effectiveness.MRFs are used to describe the non-time-varying correlation structure within a window to identify and cluster different subsequences and represent stronger or more complex relationships than simple correlations.TICC alternates between assigning points to clusters, which it accomplishes through dynamic programming, and updating the cluster MRFs, which it does via alternating the direction method of multipliers (ADMM).In comparison with several advanced time series clustering methods, TICC has at least a 41% higher clustering accuracy [36].
The TICC method avoids clustering each observation, since for the CF process, an isolated observation may tell an instantaneous state of the CF pair.The TICC cluster short subsequence of size w (w < T), consisting of observations x t−w+1 , . . ., x t , was concatenated into an nw-dimensional subsequence called X t .Then, a new sequence was formed and named X, from X 1 to X T , on which the clustering process was applied.An nw-dimensional subsequence X t allows more time-varying information to be analyzed and clustered in the driving process.Meanwhile, the adjacent subsequences are encouraged to be clustered into the same cluster, which is called temporal consistency [36].
Rather than choosing simple correlation-based models, the segmenting and clustering of the multivariate time series is based on Markov random fields (MRFs), which is a multilayer correlation network which encodes the structural relationships between data of different dimensions, as shown in Figure 3.It not only considers the interdependencies of all dimensions of x t ; the dependency of neighbor points is also taken into account.The Toeplitz matrix Θ i ∈ R nw×nw defines the MRFs edge structure using a Gaussian inverse covariance instead of covariance for the computational advantages as the result of a tendency of being sparse and preventing overfitting.methods, such as dynamic time warping, are often used to solve these types of problem [41].However, they focus more on matching raw values rather than exploring the overal structural consistency of time subsequences by examining the correlations between mul tidimensional data.TICC was introduced in 2017 and has been widely used in engineering [42], finance [43], and medicine [44] due to its effectiveness.MRFs are used to describe th non-time-varying correlation structure within a window to identify and cluster differen subsequences and represent stronger or more complex relationships than simple correla tions.TICC alternates between assigning points to clusters, which it accomplishes through dynamic programming, and updating the cluster MRFs, which it does via alternating th direction method of multipliers (ADMM).In comparison with several advanced time se ries clustering methods, TICC has at least a 41% higher clustering accuracy [36].
The TICC method avoids clustering each observation, since for the CF process, an isolated observation may tell an instantaneous state of the CF pair.The TICC cluster shor subsequence of size  (  ), consisting of observations  ,…,  , was concate nated into an -dimensional subsequence called  .Then, a new sequence was formed and named , from  to  , on which the clustering process was applied.An -di mensional subsequence  allows more time-varying information to be analyzed and clustered in the driving process.Meanwhile, the adjacent subsequences are encouraged to be clustered into the same cluster, which is called temporal consistency [36].
Rather than choosing simple correlation-based models, the segmenting and cluster ing of the multivariate time series is based on Markov random fields (MRFs), which is a multilayer correlation network which encodes the structural relationships between data of different dimensions, as shown in Figure 3.It not only considers the interdependencie of all dimensions of  ; the dependency of neighbor points is also taken into account.Th Toeplitz matrix Θ ∈  × defines the MRFs edge structure using a Gaussian inverse covariance instead of covariance for the computational advantages as the result of a ten dency of being sparse and preventing overfitting.One challenge associated with the TICC approach is addressing the assignment problem of allocating data points to one of the K clusters and determining the assignment sets P = {P 1 , . . . ,P K } with P i ⊂ {1, 2, . . . ,T}.Additionally, the algorithm must update cluster parameters Θ = {Θ 1 , . . . ,Θ K } based on previously calculated assignment mappings.This optimization problem can be expressed as follows: argmin where T represents the set of symmetric block Toeplitz nw × nw matrices and serves as an additional constraint for constructing MRFs to ensure the time-invariant property of each cluster.The first expression λ • Θ i 1 denotes an additional sparsity constraint based on the Hadamard product of the inverse covariance matrix with the regularization parameter λ ∈ R nw×nw .The second component, (X t , Θ i ), specifies the core optimization problem of fitting cluster parameters given the assignment set P i (log likelihood).The indictor function 1{X t−1 / ∈ P i } is checks the temporal consistency, that is, whether adjacent data points belong to the same cluster.If two observations are consecutive but assigned to different clusters X t−1 / ∈ P i , there will be a penalty β.There are two key input parameters for the TICC algorithm: the window size w and the number of the clusters K.The TICC does not cluster each data point x t individually, it instead clusters short subsequences from t − w to t.The size of the Toeplitz nw × nw matrices determines whether it is too short to contain the complete MRF structure of each cluster, and it is too long to choose a segment boundary or disobey the time-invariance as well.To effectively use this algorithm, we refer to Hallac, Vare, Boyd, and Leskovec [36] and set the window size to 10 since they suggested that when the window size is bounded between 4 and 15, the clustering result of the car data would be reliable and robust.Another key parameter is K. Since we do not have any proper precedent for CF mode extraction, we will test different values of K according to the behavior reproduction performance as well as the Bayesian information criterion (BIC).
The TICC problem is a combinatorial optimization problem, which has two coupled non-convex problems to search the optimized cluster parameter Θ = {Θ 1 , . . . ,Θ K } and the cluster assignments P = {P 1 , . . . ,P K }.Since it is difficult to solve global optimization, an expectation-maximization (EM)-like algorithm is applied, which constantly and alternately assign P with Θ determined and update Θ with P determined.For the solution process of the TICC, please refer to Appendix A. A brief outline of TICC clustering is shown in Algorithm 1.

CF Model Calibration and Validation Based on Driving Modes
In a conventional way, the potential intra-driver variance in a CF event is actually "averaged".Given the recognized driving modes in CF events, the CF model can be calibrated based on driving modes rather than drivers to further understand the impact of this neglect.This subsection describes the calibration and validation of three typical CF models through the extracted empirical CF events.The selected CF models are first introduced, then a standardized calibration and validation process is explained.

Investigated CF Models 1. Intelligent Driver Model (IDM)
We have thoroughly investigated and compared the CF model performances in our previous study [1]; the IDM was validated as the best in several commonly used models for its superiority in CF behavioral reproduction, and therefore, IDM was also chosen to be the main CF model in this study.The IDM will be calibrated both from DBM and DMBM perspectives.
The IDM is a widely recognized CF model that utilizes the concept of desired measures.Unlike the Gipps model, the IDM possesses the capability to intelligently select CF modes, facilitating a smooth transition between free-flow and CF conditions.Additionally, each parameter in the IDM represents a distinct aspect of driving behavior, rendering it more parsimonious and easier to calibrate.The IDM can be defined as: Appl.Sci.2023, 13, 5665 where a (n) max is the maximum acceleration, b com f is the comfort deceleration, V n is the desired velocity, S n is the relative distance, S n (t) is the desired spacing, and s 0 is the minimum distance when at a standstill.

Wiedemann model
The logic of CF that the CF interaction can be divided by perceptual thresholds was proposed by Wiedemann of Karlsruhe University.The Wiedemann model is recognized as a quintessential psycho-physical CF model that incorporates human factors to more accurately replicate realistic driving behavior, of which the CF logic is shown in Figure 4.The CF process is divided by different thresholds with a certain shape, and the subject vehicle takes different acceleration maneuvers.In this study, we utilized the Wiedemann 99 model, an updated version of the original Wiedemann model that accounts for both driver characteristics and driving modes.This model has recently gained popularity in both academic research and practical applications for its ability to accurately simulate car-following behavior.The model was calibrated individually for each driver in our study.For a detailed formulation, please refer to the literature [45,46]. is the comfort deceleration,  i sired velocity,  is the relative distance,  () is the desired spacing, and  is imum distance when at a standstill.

Wiedemann model
The logic of CF that the CF interaction can be divided by perceptual thresho proposed by Wiedemann of Karlsruhe University.The Wiedemann model is rec as a quintessential psycho-physical CF model that incorporates human factors accurately replicate realistic driving behavior, of which the CF logic is shown in F The CF process is divided by different thresholds with a certain shape, and the vehicle takes different acceleration maneuvers.In this study, we utilized the Wie 99 model, an updated version of the original Wiedemann model that accounts driver characteristics and driving modes.This model has recently gained popu both academic research and practical applications for its ability to accurately simu following behavior.The model was calibrated individually for each driver in ou For a detailed formulation, please refer to the literature [45,46].

Hybrid Wiedemann-IDM model
Some researchers have argued that the sub-models in the Wiedemann mod tively affect the performance rather than the driving mode division part and ha proposed a hybrid Wiedemann-IDM model [21].That is, the original Wiedeman is altered by replacing the acceleration equations in different CF modes (free-flow closing-in mode, following mode, and emergency braking mode, as shown in F with the most adaptable CF model so far, the IDM.These IDMs of each CF regim Wiedemann will be calibrated and given a set of parameters for each of the mode hybrid model.The reason why we introduce this hybrid model is to figure out (1) it is the acceleration equations or the driving modes that weaken the performan

Hybrid Wiedemann-IDM model
Some researchers have argued that the sub-models in the Wiedemann model negatively affect the performance rather than the driving mode division part and have thus proposed a hybrid Wiedemann-IDM model [21].That is, the original Wiedemann model is altered by replacing the acceleration equations in different CF modes (free-flow mode, closing-in mode, following mode, and emergency braking mode, as shown in Figure 4) with the most adaptable CF model so far, the IDM.These IDMs of each CF regime of the Wiedemann will be calibrated and given a set of parameters for each of the modes in this hybrid model.The reason why we introduce this hybrid model is to figure out (1) whether it is the acceleration equations or the driving modes that weaken the performance of the Wiedemann model, and (2) whether the objective rule-based division of driving modes in the Wiedemann can reasonably identify different modes in human CF process.

CF Calibration
The evaluation of the calibration results is crucial for determining the capability of a CF model to accurately replicate real-world driving behavior, minimizing discrepancies between simulated data and actual observations within the parameter space being searched.The validation process is usually responsible for the effectiveness of model parameter calibration and parameter generalization, for example, testing how whether a set of parameters can fit well the other CF events of a certain driver.
K-fold cross-validation is a commonly used model evaluation method.It divides the dataset into five parts and uses four of them as training data and the remaining one as test data each time.We adopted five-fold cross validation in this study, which allows for five rounds of training and testing, with the average value taken as the performance evaluation of the model, as shown in Figure 5.This method can effectively avoid overfitting and better evaluate the model's performance on unknown data.For each driver involved in the calibration, 80 of the 100 events were for calibration and the other 20 events were for validation, iterating in turn.The average error on the validation data was used to compare the performance of the investigated models.Wiedemann model, and (2) whether the objective rule-based division of driving modes in the Wiedemann can reasonably identify different modes in human CF process.

CF Calibration
The evaluation of the calibration results is crucial for determining the capability of a CF model to accurately replicate real-world driving behavior, minimizing discrepancies between simulated data and actual observations within the parameter space being searched.The validation process is usually responsible for the effectiveness of model parameter calibration and parameter generalization, for example, testing how whether a set of parameters can fit well the other CF events of a certain driver.
K-fold cross-validation is a commonly used model evaluation method.It divides the dataset into five parts and uses four of them as training data and the remaining one as test data each time.We adopted five-fold cross validation in this study, which allows for five rounds of training and testing, with the average value taken as the performance evaluation of the model, as shown in Figure 5.This method can effectively avoid overfitting and better evaluate the model's performance on unknown data.For each driver involved in the calibration, 80 of the 100 events were for calibration and the other 20 events were for validation, iterating in turn.The average error on the validation data was used to compare the performance of the investigated models.To ensure the accuracy of comparison results in the calibration process, we followed a specific protocol.First, our objective function was defined as the deviation between simulated and observed spacing measures [47,48], which has been demonstrated to be a reliable metric in previous studies [49,50].Following an in-depth analysis of the objective function presented in [51], we employed the root mean square normalized error (RMSNE) as a measure of the relative error.This has been shown to be effective in previous studies [48,52], defined as: where  and  are the th observed spacing and simulated spacing, respectively, with  ranging from 1 to .
To address the constrained nonlinear optimization problem in the calibration process, a genetic algorithm (GA) was employed in our empirical experiments.The effectiveness of GA was previously validated in the literature [1,9].To ensure fast convergence and realistic kinematics, we set the bounds and initial searching values of the CF model parameters based on the values reported in previous studies [21,24,49].Table A1 summarizes these values.During the calibration of the DBM, a set of parameters achieving global optimization on the error of reproducing the driver's CF events set is assigned to each driver.Similarly, during the calibration of the DMBM, each mode is assigned a set of parameters.To ensure the accuracy of comparison results in the calibration process, we followed a specific protocol.First, our objective function was defined as the deviation between simulated and observed spacing measures [47,48], which has been demonstrated to be a reliable metric in previous studies [49,50].Following an in-depth analysis of the objective function presented in [51], we employed the root mean square normalized error (RMSNE) as a measure of the relative error.This has been shown to be effective in previous studies [48,52], defined as: where S obs i and S sim i are the ith observed spacing and simulated spacing, respectively, with i ranging from 1 to N.
To address the constrained nonlinear optimization problem in the calibration process, a genetic algorithm (GA) was employed in our empirical experiments.The effectiveness of GA was previously validated in the literature [1,9].To ensure fast convergence and realistic kinematics, we set the bounds and initial searching values of the CF model parameters based on the values reported in previous studies [21,24,49].Table A1 summarizes these values.During the calibration of the DBM, a set of parameters achieving global optimization on the error of reproducing the driver's CF events set is assigned to each driver.Similarly, during the calibration of the DMBM, each mode is assigned a set of parameters.The obtaining of the results of the DMBM involves a two-step calibration process.First, TICC was used to obtain the driving-mode labels of each time step (the labels were generated for each candidate value of the number of undetermined K).Then, in the calibration process, the objective function (which simulates CF trajectories and calculates calibration errors) dynamically chooses different sets of parameters based on the driving mode labels, as shown in Figure 6.After calibration, the optimal parameters can be solved.
errors) dynamically chooses different sets of parameters based on t as shown in Figure 6.After calibration, the optimal parameters can In an effort to attain the global optimum, we conducted the o ten times for each car-following event.The set of parameters t RMSNE was recorded.Since the optimization process requires a l GA, while the searching spacing of the parameters of the hybrid model is much larger than the IDM, we utilized a high-performa calibration in parallel.Its CPU is AMD EPYC 7H12 with 64 cores a

Results
In this section, the results of the two CF behavior modeling pa based or driver-based) are presented.First, we compared the CF m der different settings of  the clustering number, and then attem most proper  value and provide a qualitative interpretation o each mode.Second, the CF model performances under different C compared.

Segmentation and Clustering of CF Behavior
So far, there is no ground truth of specific driving behavior se timization of CF modeling.The TICC algorithm is unsupervised, a ters  for the next CF modeling is undetermined.Therefore, the criterion (BIC) was utilized to determine the appropriate value of criterion for model selection among a finite set of models.Generall BIC score are considered more favorable.Meanwhile, for the CF m In an effort to attain the global optimum, we conducted the optimization procedure ten times for each car-following event.The set of parameters that yielded the lowest RMSNE was recorded.Since the optimization process requires a large population size in GA, while the searching spacing of the parameters of the hybrid model or the DMBM model is much larger than the IDM, we utilized a high-performance server to solve the calibration in parallel.Its CPU is AMD EPYC 7H12 with 64 cores and 128 threads.

Results
In this section, the results of the two CF behavior modeling pathways (driving-modebased or driver-based) are presented.First, we compared the CF model performance under different settings of K the clustering number, and then attempted to determine the most proper K value and provide a qualitative interpretation of physical meanings to each mode.Second, the CF model performances under different CF modeling views are compared.

Segmentation and Clustering of CF Behavior
So far, there is no ground truth of specific driving behavior segmentation for the optimization of CF modeling.The TICC algorithm is unsupervised, and the number of clusters K for the next CF modeling is undetermined.Therefore, the Bayesian information criterion (BIC) was utilized to determine the appropriate value of K, which is a standard criterion for model selection among a finite set of models.Generally, models with a lower BIC score are considered more favorable.Meanwhile, for the CF model IDM, of which the number of parameters is five, the parameter quantity for each CF model under the segmentation of K clusters is five times K. Therefore, for efficient CF modeling, the value of K is expected to be as small as possible while achieving ideal model accuracy.In this study, we hoped to achieve a balance between the BIC of the segmentation, the number of clusters, and the CF model's performance.We tested the IDM's performance, calculated the BICs on different K values (from 2 to 10), and the results are presented in Figure 7.
As shown in Figure 7, BIC increases with the value of  and stabiliz while the CF modeling error decreases sharply before four and then reaches in volatility.Therefore, the numbers of , four or five, are potential candida RMSNEs have reached an ideal range (under 0.22, which have been clearl the errors of driver-based modeling).Meanwhile, both the BIC and RMSN value five is lower than four with only five parameters added, thus we ch proper  value in this study.The calibrated IDM parameters for each mod Table A2.Then, based on the current  value, the segmentations of CF samples in Figure 8. Benefiting from time-window-based clustering and its emphas consistency by function   ∉  , different phases in a CF event can avo tion well.Furthermore, in order to understand the relationship between cl behaviors, that is, to determine whether different clusters can represen modes, we manually checked more than 100 segmented events and attemp tively distinguish and interpret the physical meaning of each cluster, and results are shown in Table 1.This shows that the application of data-driven deed divide different CF modes well.Please note that this interpretation d that these CF modes can be simply divided based on the variable features additional thresholds, as the variable relationships inherent in the TICC are difficult to be described quantitatively.As shown in Figure 7, BIC increases with the value of K and stabilizes after seven, while the CF modeling error decreases sharply before four and then reaches a slow decline in volatility.Therefore, the numbers of K, four or five, are potential candidates since their RMSNEs have reached an ideal range (under 0.22, which have been clearly smaller than the errors of driver-based modeling).Meanwhile, both the BIC and RMSNE values of K value five is lower than four with only five parameters added, thus we chose five as the proper K value in this study.The calibrated IDM parameters for each mode are listed in Table A2.
Then, based on the current K value, the segmentations of CF samples are presented in Figure 8. Benefiting from time-window-based clustering and its emphasis on temporal consistency by function 1{X t−1 / ∈ P i }, different phases in a CF event can avoid fragmenta- tion well.Furthermore, in order to understand the relationship between clusters and CF behaviors, that is, to determine whether different clusters can represent different CF modes, we manually checked more than 100 segmented events and attempted to qualitatively distinguish and interpret the physical meaning of each cluster, and the summary results are shown in Table 1.This shows that the application of data-driven TICC can indeed divide different CF modes well.Please note that this interpretation does not mean that these CF modes can be simply divided based on the variable features in the table or additional thresholds, as the variable relationships inherent in the TICC are complex and difficult to be described quantitatively.

Comparative Results
Before we compare the calibration errors, it should be clear what the quantities of the parameters for the CF modeling are.For the DBM, the calibration process must be conducted once for every driver so that the CF model can well replicate the inter-driver heterogenous behavior of each driver.Thus, the number of parameters for the DBM are N times the number of model parameters, where N represents the number of drivers.For example, 200 parameters are needed to be calibrated for 40 drivers using IDM, while 440 parameters are needed when using Wiedemann.Meanwhile, the DBDM is not related to the number of drivers because it is determined by the kinds of segmented driving modes, so the number of parameters for the DMBM is K times five, where K represents the number of driving modes.In this study, this means that 25 parameters are needed for the IDM and 55 are needed for the Wiedemann.
First, on the comparison of the calibration errors of three DBM models, which is shown in Figure 9a, the IDM shows a clear advantage over the Wiedemann model, reaching 15%, but the error of the IDM is higher than that of the hybrid model.Considering that the parameters of the hybrid model are four times more than those of IDM, this reduction in error was expected.Meanwhile, the calibration error of the DBM is significantly higher than that of the DMBM.The latter reduces the error by 13.12%.This indicates that DMBM could better reproduce CF behavior with much less parameters than DBM.

Comparative Results
Before we compare the calibration errors, it should be clear what the quantities of the parameters for the CF modeling are.For the DBM, the calibration process must be conducted once for every driver so that the CF model can well replicate the inter-driver heterogenous behavior of each driver.Thus, the number of parameters for the DBM are  times the number of model parameters, where  represents the number of drivers.For example, 200 parameters are needed to be calibrated for 40 drivers using IDM, while 440 parameters are needed when using Wiedemann.Meanwhile, the DBDM is not related to the number of drivers because it is determined by the kinds of segmented driving modes, so the number of parameters for the DMBM is  times five, where  represents the number of driving modes.In this study, this means that 25 parameters are needed for the IDM and 55 are needed for the Wiedemann.
First, on the comparison of the calibration errors of three DBM models, which is shown in Figure 9a, the IDM shows a clear advantage over the Wiedemann model, reaching 15%, but the error of the IDM is higher than that of the hybrid model.Considering that the parameters of the hybrid model are four times more than those of IDM, this reduction in error was expected.Meanwhile, the calibration error of the DBM is significantly higher than that of the DMBM.The latter reduces the error by 13.12%.This indicates that DMBM could better reproduce CF behavior with much less parameters than DBM.
When examining the results more deeply, we found that in the five-fold cross-validation process, the mean validation error of the DBM is clearly higher than its calibration error (over 13%), while this does not exist in the DMBM (only 4.53% error growth), as shown in Figure 9b.This may indicate another advantage of the DMBM, that is the better portability or practicality.The parameters from the DBM may specialize in specific CF events, but the existing behavior variance between different events, even from the same driver, makes these parameters invalid to some extent, which can be viewed as a kind of "overfitting".

Discussion
This study introduces and investigates a new perspective in CF modeling, that is, modeling CF driving modes rather than for an individual human driver or an entire CF event (or a CF period in some studies).Although we have known and verified many times that the CF behavior of human beings is not static in the driving process, this study is the When examining the results more deeply, we found that in the five-fold cross-validation process, the mean validation error of the DBM is clearly higher than its calibration error (over 13%), while this does not exist in the DMBM (only 4.53% error growth), as shown in Figure 9b.This may indicate another advantage of the DMBM, that is the better portability or practicality.The parameters from the DBM may specialize in specific CF events, but the existing behavior variance between different events, even from the same driver, makes these parameters invalid to some extent, which can be viewed as a kind of "overfitting".

Discussion
This study introduces and investigates a new perspective in CF modeling, that is, modeling CF driving modes rather than for an individual human driver or an entire CF event (or a CF period in some studies).Although we have known and verified many times that the CF behavior of human beings is not static in the driving process, this study is the first in-depth study to discover the driving modes of the CF process and the improvements that can be achieved by modeling CF driving modes, with the help of the application of a novel approach in the clustering of multivariate time series data.This study also benefits from large-scale naturalistic driving data, which provides sufficient and diverse samples of drivers' CF behavior.
In past research, a "consensus" has gradually formed that well-constructed driverbased models, which can reproduce CF trajectories better than those of driving-status-based models, such as the Wiedemann model.To some extent, these kinds of investigation results may lead to an emphasis on the inter-driver or inter-CF-period differences rather than intra-driver or intra-period variances in future CF modeling research.Higgs et al. proposed that the imprecise sub-models in Wiedemann lead to unsatisfactory results and pointed out a pathway to combine the state division of the Wiedemann with other CF models.That is why we also introduced a Wiedemann-IDM model for comparison.The result shows that this combination can be better than the original Wiedemann model, even IDM, but the error improvement of the IDM is relatively limited.This drives us to think about another aspect, that being how much potential improvement there could be if driving modes are well partitioned and modeled.
Therefore, we introduced a novel and recent unsupervised multivariate time series clustering approach.The large accuracy increase in the DMBM has proved the effectiveness of the adopted clustering method.By using the graphical dependency structure of each subsequence, the algorithm can dig out the underlying multidimensional time-invariant relationship that cannot be expressed and distinguished using a simple threshold division.Meanwhile, once a TICC model has been trained, it can be applied to a new multivariate CF time series to predict clusters, which will not reduce applicability due to the inability to provide threshold division.
The results confirm our initial conjecture, which is that modeling accuracy can be improved by modeling several shared underlying driving modes in the CF process.This is encouraging since it has been verified with only five driving mode classifications.As mentioned above, some researchers [7] have attempted to preliminary verify a conclusion, but on a scale of up to 30 classifications and with only 20 drivers, and the parameter number used for the DMBM is far more than the DBM (120 versus 80, based on the 4-parameter GHR model).That means that the CF event will be too fragmented, and it is hard to determine whether the accuracy improvement is due to reasonable behavior segmentation or that more parameters are used to describe shorter segments.Meanwhile, based on our experience in CF research, an enormous number of parameters will bring about an extreme increase in the calibration time, which is very negative for the practice of CF modeling.In contrast, we proved that only 25 parameters could well describe the CF behavior from 30 drivers rather than 120 parameters from the DBM.In a word, the error caused by the neglect of behavioral differences between driving modes has been underestimated before, and it is promising and necessary to involve targeted driving modes modeling in future CF modeling practices.
We expect our model to be mainly applied in two aspects, as mentioned in the introduction: accurate traffic flow simulation and ADASs on intelligent vehicles.Both require a better understanding of human driver behavior.In traffic flow simulation, traditional approaches assign uniform behavioral parameters to all vehicles.Transitioning from individual vehicles to driving-mode-based modeling can increase heterogeneity while avoiding difficulties in parameter generation [53].In intelligent vehicle applications, one potential application is in driver identification.Based on our proposed methods, the driving mode distribution or transition characteristics of each driver can be quantitatively calculated to achieve driver identification [54].This is very useful for vehicles with intelligent driving assistance that value in-car privacy protection.Another application is to help drivers better drive the vehicle during CF in ACC, with more human-like intra-driver differences in the driving process.
Other than the novelty and strengths of this proposal, there are a few limitations.First, the combination of driving modes and individual differences was ignored because it is a big computational challenge to calibrate 25 parameters.A future study will enable each driver to have their own parameters for different driving modes, which can help advanced driver assistance systems select more personalized parameters for drivers.Second, this study did not explore the quantitative characteristics or the identification of each driving mode.Since each driving mode is identified by the complex TICC method, which involves the spatiotemporal relationships between multiple variables, it may be unrealistic to determine the mathematical expression of each mode in this study.In the future, deep learning models should be used to consider the long-and short-term spatiotemporal correlations contained in each driving mode.

Conclusions
Due to the importance of CF behavior in both macro traffic flow management or micro effective safety functions in ADASs, CF behavior has been extensively investigated to reproduce more realistic human driving trajectories.However, deep knowledge of driving modes in CF and the behavioral differences between modes is lacking, meaning there needs to be a further improvement in the CF modeling accuracy.This study adopted a novel MRFbased time series clustering method to solve the segmentation and clustering of driving modes from massive multivariate naturalistic driving data.A comprehensive comparison of different CF modeling perspectives (DBM and DMBM) is conducted, benefiting from the long-term behavior records of more than 40 drivers.
Using the TICC method, five types of CF driving modes are discovered, which can significantly increase the accuracy in reproducing CF behavior with a relatively low clustering BIC criterion.These five data-driven classifications are understandable and basically correspond to the different steps in the human CF process, namely, continuous distance, continuous approach, high-speed stable follow-up, low-speed close-range stable follow-up, and vulgar long-distance follow-up.These five modes describe structural repetitions in human operational behavior.
Precise CF modeling: Based on the determined CF modes, the driving-mode-based IDM performed better than the driver-based IDM and the hybrid Wiedemann-IDM model, leading by 17.49% and 13.59%, respectively.The lower error compared to the hybrid model further affirms the effect of the TICC algorithm on the recognition of driving modes rather than the mode-division part in the Wiedemann.
Efficient CF modeling: This is an impressive improvement by the DMBM with far fewer parameters, demonstrating how much potential DMBM has if the driving modes in the CF process can be well recognized.It does not require specific modeling for each driver, whose parameter set is likely not transferable to describe the CF behavior of the other drivers.Meanwhile, the error of the DMBM during the validation process does not increase much compared to that of the calibration, indicating its good transferability for future applications.
This study answered an essential question of how the different modes in the CF affect the CF modeling accuracy.By doing so, it expanded the understanding of human CF behavior.On the one hand, it offered an objective perspective on the basic modes involved in the human CF process.On the other hand, it explains the importance of integrating intra-driver variance, especially the behavioral differences between CF modes, into the consideration of the development of CF modeling and personalized advanced driving assistance systems (ADASs).Furthermore, the clustering results of this study may provide support for potential driver recognition since CF behavior can be viewed as a different mode combination of different drivers.
However, we should note that this is a preliminary heuristic study on the DMBM.In this study, we focused on discovering the existence of CF modes and their impact on behavior modeling but did not deeply explore the spatiotemporal connection or mathematical quantitative expressions between different modes, which is the direction of our next work.

Figure 1 .
Figure 1.Information contained in NDS data.

Figure 1 .
Figure 1.Information contained in NDS data.

Figure 2 .
Figure 2. Framework of the driving-mode-based modeling.

Figure 3 .where
Figure 3. Exemplary MRF structures for two distinct clusters.

Figure 3 .
Figure 3. Exemplary MRF structures for two distinct clusters.

Figure 4 .
Figure 4.The CF logic of the Wiedemann model.

Figure 4 .
Figure 4.The CF logic of the Wiedemann model.

Figure 5 .
Figure 5. Illustration of the five-fold cross-validation.

Figure 5 .
Figure 5. Illustration of the five-fold cross-validation.

Figure 6 .
Figure 6.Driving mode labels and trajectory generation of DMBM.

Figure 6 .
Figure 6.Driving mode labels and trajectory generation of DMBM.

Figure 7 .
Figure 7. Illustration of the five-fold cross-validation.

Figure 7 .
Figure 7. Illustration of the five-fold cross-validation.

Figure 9 .
Figure 9. Comparative results of the models from two CF modeling perspectives.(a) Modeling errors of the investigated models; (b) calibration and validation errors of the hybrid model and the DMBM model.

Figure 9 .
Figure 9. Comparative results of the models from two CF modeling perspectives.(a) Modeling errors of the investigated models; (b) calibration and validation errors of the hybrid model and the DMBM model.

Table 1 .
Qualitative interpretation of clustered driving modes.

Table 1 .
Qualitative interpretation of clustered driving modes.