3.3.1. First Stage: Driving Time Characteristics and User Driving Demands
- 1.
Driving Time Characteristics: Time-Series Clustering
Different users use a car for different purposes in a given period; that is, the DVMT is usually different. Four users’ DVMT over a week, extracted randomly from our study data, are presented as an example in
Figure 4 to describe the difference of vehicle users. The DVMT values of Users 1 and 3 are much larger on the workday compared to the weekend. This is consistent with taxi drivers or long trip travelers. For User 2, the driving behavior is similar to office workers, because of the similarity in the DVMT, demonstrating regular driving behavior. User 4 represents users who use cars mostly on weekends, such as parents taking children on weekend outings. There is a large difference in driving behavior between users. Therefore, to better explain driving behavior, the DVMT of different users is an important part of the experiment.
In addition,
Figure 5 shows that one user had significant diversity in vehicle use, with travel patterns changing on different days. The figure shows the different time–mileage-series patterns of User 1 over a 5-day period (weekdays).
Market segmentation and satisfying personalized demands is facilitated by considering the mean and standard deviation of the DVMT, as they reflect the overall trends in each user’s demand. The mean DVMT refers to the user’s average daily vehicle driving mileage, and reflects the central tendency of the overall daily driving distance for each user. The standard deviation of DVMT can reflect the discreteness, or differences, in each user’s driving behavior.
Therefore, this study applies a two-stage hybrid user classification method to classify users. The first step is identifying the typical characteristics of the time-series changes in mileage, that is, the typical daily time–mileage travel patterns (TMTP). This allows the extraction of the typical vehicle usage of all users. Then, the mean and standard deviation of all users are calculated to obtain user driving demands. Finally, based on the distribution of the typical daily TMTP of each user and the mean and standard deviation of all users, the users with similar TMTP distribution and similar travel demands are clustered into individual categories.
The user driving time characteristics are extracted in the first stage. This involves identifying the typical daily time–mileage travel patterns (TMTP) to reflect user driving time characteristics. The time–mileage-series is expressed by the changes of mileage with time . Where denotes the time–mileage-series, and represents the mileage traveled at time . The series is the input of the first stage clustering to get the TMTP centroid.
The K-means algorithm is widely used in studies for clustering, because it is simple in principle, fast in convergence, and strong in interpretability [
35]. As such, this study applies the K-means algorithm to cluster time–mileage-series. When using the K-means algorithm for time-series clustering, it is critical to measure the similarity between time series, which is the
series.
The Euclidean distance and Dynamic Time Warping (DTW) methods are widely used to measure the similarity of time-series in time-series clustering [
36]. Most of the time-series are similar, but out of phase, which means they have different elastic movement on the time axis. For example,
Figure 4 shows similar patterns in the time series changes of User 1 on Tuesday and Wednesday. The incomplete time series may be related to the time when the user begins traveling. However, the principle of Euclidean distance is to measure similarity using one-to-one alignment mapping. As such, the Euclidean distance is sensitive to noise and has difficulty capturing dynamic changes, such as time-series translations. Using the Euclidean distance for measurement does not consider the dynamic changes in time, reducing its accuracy.
In contrast, DTW distance allows for elastic movement in the time axis, which can measure the similarity of shape-based time-series [
37]. The time-series alignment of DTW is more flexible, and has been widely used to measure the similarities between time-series, replacing the traditional Euclidean distance [
36]. Therefore, we use the DTW algorithm to measure the similarity between
series.
The process of using the DTW algorithm to measure the similarity between series is as follows.
Set the
series
, and construct an
matrix
. Each element
in the matrix
represents the similarity between each point of the series
and each point of the series
. The smaller the distance, the higher the similarity. The variable
is expressed as:
The curved regular path
between the series
and the series
is expressed as:
The -th element of , . This path must meet the following conditions:
- (1)
- (2)
For it must meet
To find the curved and regular path between the series and the series , we introduced a time window into the similarity measure. However, if the time window is set to be too large, the difference in the travel patterns in the time period will be significant, resulting in inaccurate results. Therefore, we matched the series and the series in the previous hour and the next hour, as the change of travel patterns for one user in a single hour in different days is very likely to be small.
The cumulative distance
is:
The optimal path is the path that minimizes the accumulated distance along the path.
Then, the similarity between the series
and the series
is the smallest regularization cost between the
and the
, expressed as
:
Clustering the
series using the K-means and DTW algorithm generates the
class
centroid:
- 2.
User Driving Demands: Daily Vehicle Mileage Traveled (DVMT)
The mean and standard deviation of DVMT of each user reflect the user driving demands, expressed as:
where
and
refer to the mean and standard deviation of DVMT of
-th user, respectively. The variables
and
refer to the total mileage and total travel days traveled by user
during the observation period. The
is the DVMT of user
traveled on day
.
3.3.2. Second Stage: User Clustering
The users with similar TMTP distributions have similar vehicle usage characteristics; as such, they are assumed to have the same travel patterns. Therefore, we introduce the proportion of each vehicle’s series in the class of as a factor for user clustering. Through the first stage of time–mileage-series clustering, we generate the typical , and the type of the series of each vehicle belongs to.
The variable
denotes the total number of
series of all vehicles;
denotes the total number of
series of the
-th vehicle; and
denotes the number of the
-th vehicle’s
series belonging to the
class. Then, the proportion of the
series of the
-th vehicle belonging to the
class is:
Then, the proportion of the series of user
is:
However, the typical daily TMTP mainly reflects the distribution of driving time and characteristics. As described in
Section 3.3.1, user clustering is needed to consider the mean and standard deviation of DVMT, which are thus introduced as two factors to cluster users.
For user
, the driving time characteristics and driving demands can be expressed as:
Then, users with similar characteristics are clustered into one class, based on their driving time characteristics and driving demands (TCDD). The similarity is calculated using the Euclidean distance. This clustering yields user types: