Commuting Pattern Recognition Using a Systematic Cluster Framework

: Identifying commuting patterns for an urban network is important for various tra ﬃ c applications (e.g., tra ﬃ c demand management). Some studies, such as the gravity models, urban-system-model, K-means clustering, have provided insights into the investigation of commuting pattern recognition. However, commuters’ route feature is not fully considered or not accurately characterized. In this study, a systematic framework considering the route feature for commuting pattern recognition was developed for urban road networks. Three modules are included in the proposed framework. These modules were proposed based on automatic license plate recognition (ALPR) data. First, the temporal and spatial features of individual vehicles were extracted based on the trips detected by ALPR sensors, then a hierarchical clustering technique was applied to classify the detected vehicles and the ratio of commuting trips was derived. Based on the ratio of commuting trips, the temporal and spatial commuting patterns were investigated, respectively. The proposed method was ﬁnally implemented in a ring expressway of Kunshan, China. The results showed that the method can accurately extract the commuting patterns. Further investigations revealed the dynamic temporal-spatial features of commuting patterns. The ﬁndings of this study demonstrate the e ﬀ ectiveness of the proposed method in mining commuting patterns at urban tra ﬃ c networks. in the PM was larger than 0.4. The opposite changing trend illustrates a correspondence relation between the origins and destinations for commuters. The consistent commuting behaviors illustrates the reasonability of the proposed method.


Introduction
The commuting traffic contributes a lot to traffic congestion, air pollution and greenhouse gas emissions [1]. With cities expanding, the average commute time is increasing and road networks have become more congested [2]. To mitigate traffic congestions and achieve the full potential of intelligent transportation systems, the investigation of commuting patterns for passenger cars is of great importance. Specifically, commuting pattern recognition is a key step for some applications such as congestion pricing and active traffic management. For improving commuting efficiency, many studies have been conducted to model the commuting flows and investigate the characteristics of commuting patterns.
Early studies try to model the commuting flows at zonal or regional levels using analogies between commuting flows and some physics phenomena, or socioeconomic and probabilistic arguments [3]. The commonly used methods are the gravity models and the radiation models. The gravity model [4] illustrates the macroscopic relationships between places (such as homes and workplaces). The interaction between two locations is assumed to be declined with the increasing distance (or time, cost). Similarly, the radiation model [5] relies on the involved regions' populations and the distances from each other. Furthermore, Stefanouli and Polyzos [6] compared the gravity model with the radiation model and found that distance is an important factor that affects the commuting behavior. Masucci [7] tested the radiation model and the gravity model for identifying commuting patterns; the thermodynamic limit assumption [5] for the original radiation model significantly underestimated the commuting flows in large cities. Subsequently, Varga [3] proposed a further generalization for the original radiation model-flow and jump model (FJM); test results showed that the FJM can offer an improved description for commuting data. Despite some limitations of the previous models [4][5][6][7][8][9], these models are widely used due to the advantages in approaching the mobility laws [6]. Some studies try to investigate the commuting pattern considering the stop-making behavior or land use properties. For example, Bhat [10] used a discrete-continuous econometric system to model the activity and travel pattern of workers during evening commute. Wan et al. [11] utilized an urban-system-model approach to model the commuting patterns in Beijing, and concluded that the model prediction fit better with the actual value [12]. Limited by traffic data availability in large-scale road networks, these studies estimated the commuting flows at zonal or regional levels based on demographic information, distances between traffic zones, land use, and so on. These models are all traffic planning-oriented and thus cannot be used to derive commuting patterns for the purpose of traffic management.
Recently, with the emerging big data technologies [13], the commuting pattern at an individual level can be efficiently derived using advanced data-driven methods (e.g., machine learning) [14][15][16][17][18][19][20][21][22][23][24][25][26][27]. Various kinds of data were utilized in these data-driven based methods, including Global Positioning System (GPS) data, mobile phone call detail records (CDRs), smart card data and remote sensing imagery [14][15][16], which provide new sights for traffic control-oriented applications. Zhou et al. [17] proposed a density-based clustering method to quantify the spatial distribution of O-D (Origin-Destination) demands of urban traffic using GPS data. Kung et al. [18] compared different commuting patterns using mobile phone data and concluded that home-work time distributions and average values within a single region are largely independent of commuting distance or country. Altintasi et al. [19] investigated the arrival and departure of car-based commuting behavior on campus using Radio Frequency Identification technology (RFID). Ma et al. [20] developed a series of data mining methods to identify the spatiotemporal commuting patterns of Beijing public transit riders using smart card data. McNeill et al. [21] explored the extent to which local commuting patterns can be estimated from data drawn from Twitter and concluded that the Twitter data offer a good proxy for local commuting patterns, etc. [22]. Generally, the commuting flow can be effectively derived with various sources of data, and the commuting pattern was revealed in a much finer spatial and temporal scale. Recently, with the widely use of automatic license plate recognition (ALPR) data, many studies investigated the commuting pattern based on license plate matching. Chang and Yang [23] analyzed the temporal and spatial distributions of the commuting vehicles based on the ALPR data. Chen et al. [24] used a K-means clustering algorithm to extract the commuting travel vehicles based on ALPR data. Our previous study [25] estimated the O-D patterns using vehicle trajectory data collected by ALPR devices and investigated the temporal-spatial distribution patterns of trip generation and attraction, etc. [26]. With detailed information, large sample size, and real-time data availability of ALPR data [28], these studies highlighted their potentials in individual level traffic pattern recognition. Basically, these data-driven studies added empirical evidence to commuting pattern recognition from different aspects with a series of data mining methods. However, commuters' route features are not considered or not accurately characterized among these studies. As commuters tend to select several familiar routes for work/home, the route feature plays an important role in identifying commuters. It is of great importance to accurately characterize the route features for commuters' pattern recognition. Besides, few studies focused on an implementation framework for commuting pattern recognition which is important for engineering applications. Specifically, with the increasing of commuting demands, commuting pattern recognition becomes more important for making suitable traffic control measures [2]. Considering the implementation of commuting pattern recognition, a systematic framework which can be easily utilized in different engineering applications is needed. Considering commuters' route features were not fully considered or not accurately characterized and few studies focused on an implementation framework for commuting pattern recognition, this study developed a systematic framework to identify commuting pattern considering the route feature of travelers. In particular, the contributions of this paper are in two aspects: (1) a systematic framework for commuting pattern recognition which can be implemented in engineering applications is proposed; (2) a commuting pattern recognition method was proposed based on ward's hierarchical clustering. To improve the accuracy of commuters' recognition, a Longest Common Subsequence-based method is used to quantify the route feature of travellers. The rest of this paper is organized as follows: Section 1 introduces the related studies on commuting pattern recognition; Section 2 proposes the methodology for commuting pattern recognition; Section 3 is the implementation of the proposed method, and the conclusions are in Section 4.

Methodology
To investigate the commuting pattern of urban traffic, a systematic framework for commuting pattern recognition was developed and shown in Figure 1. The framework consists of three modules: data preprocessing, feature extraction and selection, and commuting pattern recognition. recognition, this study developed a systematic framework to identify commuting pattern considering the route feature of travelers. In particular, the contributions of this paper are in two aspects: (1) a systematic framework for commuting pattern recognition which can be implemented in engineering applications is proposed; (2) a commuting pattern recognition method was proposed based on ward's hierarchical clustering. To improve the accuracy of commuters' recognition, a Longest Common Subsequence-based method is used to quantify the route feature of travellers. The rest of this paper is organized as follows: Section 1 introduces the related studies on commuting pattern recognition; Section 2 proposes the methodology for commuting pattern recognition; Section 3 is the implementation of the proposed method, and the conclusions are in Section 4.

Methodology
To investigate the commuting pattern of urban traffic, a systematic framework for commuting pattern recognition was developed and shown in Figure 1. The framework consists of three modules： data preprocessing, feature extraction and selection, and commuting pattern recognition.  Figure 1. The systematic framework for commuting pattern recognition. "ALRP" means Automatic License Plate Recognition technology, "Veh" is a Vehicle, "LCS" means the Longest Common Subsequence method, "DBSCAN" is a clustering method named "Density-Based Spatial Clustering of Applications with Noise". As shown in Figure 1, firstly, to analyze the travel behavior of a single vehicle the trips of each vehicle were extracted from ALPR data in a module using a trajectory reconstruction method [25]. Secondly, considering the repeatability of commuting vehicles, the temporal and spatial features were both extracted and selected through commuting activities analysis, specifically, the route Figure 1. The systematic framework for commuting pattern recognition. "ALRP" means Automatic License Plate Recognition technology, "Veh" is a Vehicle, "LCS" means the Longest Common Subsequence method, "DBSCAN" is a clustering method named "Density-Based Spatial Clustering of Applications with Noise". As shown in Figure 1, firstly, to analyze the travel behavior of a single vehicle the trips of each vehicle were extracted from ALPR data in a module using a trajectory reconstruction method [25]. Secondly, considering the repeatability of commuting vehicles, the temporal and spatial features were both extracted and selected through commuting activities analysis, specifically, the route feature was derived using the LCS (Longest Common Subsequence) method. The LCS aims to find the longest subsequence common to all sequences in a set of sequences, it can be used to classify vehicle trajectories. Thirdly, the selected features of vehicles were utilized by a hierarchical clustering method to identify the commuting vehicles.

Trip Generation from ALPR Data
Trip generation is an important basis for commuting pattern recognition using ALPR data. A trip is composed of a sequence of activities for a particular purpose [29]. To generate trips with ALPR data the license matching method is utilized. Briefly, a vehicle's trajectory is formed by a group of time-ordered ALPR devices that captured the vehicles. The trips are generated by analyzing the activities of the traveler.
Generally, the travel time of a vehicle between two intersections must be in a certain range. If the vehicle's travel time between two adjacent intersections is far beyond the range, the vehicle's trajectory can be referred to as two different trips. Based on this, a time threshold [20] is utilized to segregate the trajectory into multiple trips. With generated trips, the origins and destinations can be easily derived. It is worth mentioning that, as the ALPR devices are installed at intersections, the actual origin and destination of trips may not be captured. Therefore, the first and the last records of each trip are approximately regarded as the trip's origin and destination, respectively [20]. The recorded time of the first and last record of each trip can be viewed as the trip's departure time and arrival time.

Feature Selection and Extraction
To identify the commuting pattern of an urban road network, the commuters and non-commuters are distinguished with different travel behaviors using ALPR data. Generally, the commuting travelers are likely to go off for work/home regularly with relatively fixed locations at similar times. Specifically, commuters' behavior has two features: (1) High repeatability during the same duration (e.g., morning peak) on weekdays; (2) high repeatability of origin and destination (e.g., family, workplace) on weekdays. Therefore, the features of a commuting vehicle can be temporally and spatially measured.
The temporal features of a traveler could be departure time, arrival time, travel time and the number of travelling weekdays. Affected by travel distance and traffic conditions, the arrival time and travel time have large uncertainty and thus cannot be used for exploring the temporal pattern. As commuters tend to travel at morning and evening peaks on weekdays, the departure time and the number of travelling weekdays are used to describe the temporal features of a traveler. It is obvious that the higher occurrence at evening/morning peaks is more likely to signal that the traveler is a commuter.
To quantify the temporal feature of a traveler, the vehicles that appear both at morning and evening peaks of weekdays are extracted. Then, the number of days these vehicles appear is calculated as Equation (1).
where N i d is the number of days that a vehicle appears at weekday peaks, D is the total number of studied days. n i k = 1 means that vehicle i appears both at the morning peak and the evening peak on the k-th weekday.
The spatial features of a traveler can be expressed as the repeatability of origins and destinations as well as the selected routes. To improve the reliability of commuting trip recognition, the three features were all utilized. Generally, the origin and destination of a commuter are relatively stable on weekdays. Therefore, the higher repeatability of origins and destinations, the more likely the traveler is a commuter. Specifically, the first trip and the last trip on a weekday were considered as a Sustainability 2020, 12, 1764 5 of 20 home-to-work trip and a work-to-home trip, respectively. The origin of the first trip of a vehicle for a weekday was defined as a traveler's origin (TrO), and the origin of the last trip for a weekday was defined as a traveler's destination (TrD). The repeatability of origins and destinations of a traveler were quantified by the number of unique TrOs and TrDs on the studied weekdays which are shown as Equation (2).
where o i k and d i k are the TrO and TrD of vehicle i on the kth weekday. A i and B i are noted the collections of TrO and TrD for multiple weekdays, individually. O i and D i are the collections of different origins and destinations in A i and B i , respectively. N i s and N i e are the numbers of different origins and destinations in O i and D i , individually. "Unique" is a function for enumerating the different entities in an array, "length" is an function for getting the number of entities in an array.
The third spatial feature of a traveler is the repeatability of routes N i r . Commuters are inclined to choose several familiar routes to ensure the reliability of travelling; these routes are called frequently used routes (FURs). It was assumed that the FURs of a commuter were the ones with high repeatability in multiple weekdays, and any route can be classified as one of the FURs. The FURs may vary with different considerations, for example, a commuter may select a route with relative short distance or travel time. However, the number of FURs should be fixed to a small value. The repeatability of the selected routes of a traveler was derived with following steps. First, the similarity between different routes on multiple weekdays was measured with the Longest Common Subsequence (LCS). Second, the selected routes for multiple days were classified into several groups using the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) method, which is a density-based non-parametric clustering algorithm. Then the FURs were included in one or more groups. Last, for the groups including FURs, the smaller averaged within-cluster distance, the more likely the traveler was a commuter. For the groups without FURs, the fewer number of routes, the more likely the traveler was a commuter.
To quantify the similarity between different selected routes, the Longest Common Subsequence (LCS) was utilized [30]. The LCS aims to find the longest common subsequence of two sequences. To derive the LCS for two routes, first, the similarity measurement method for two points of two different routes was defined. Second, the optimal alignment was determined through establishing the correspondence of the two routes' points. Finally, the ratio of the actual travel distance of the LCS was calculated. Specifically, a route was formed by a series of connected intersections, and the intersections were abstracted as points, which is described as Equation (3).
where r Ik A , r Ik B are route A and route B of vehicle I on the k th weekday with size M and N, s i , s j are the i th and j th point of r Ik A and r Ik B . To measure the similarity of two points in two different routes, the approximation degree of spatial location of the two points (simP(s i , s j )) was utilized. The specific formulas are expressed in Equation (4).
where dis(s i , s j ) is the Euclidean distance derived from the geographical coordinates of the two points. γ is the allowable maximum distance. The two points are matched when simP(s i , s j ) > 0.
To find the optional alignment between two routes, a dynamic programing algorithm was utilized [30]. The optional alignment was derived through maximizing the accumulated similarity of the matched points. The cumulative similarity can be calculated as Equation (5).
where r Ik A(m) , r Ik B(n) are the first m and n points of route r Ik A and r Ik B , respectively. β(r Ik A(m) , r Ik B(n) ) is the cumulative similarity of the matched points under the optimal alignment between r Ik A(m) and ). The correspondence between points of the two routes for optional alignment was established through keeping track of the matched points which contributed to the final cumulative similarity β(r Ik A(M) , r Ik B(N) ). Based on this, the LCS was determined by the consecutive points with the largest number. As the accumulated similarity between each pair of points of the two routes was calculated only once, the algorithm's time complexity was O(M × N), where M and N are the number of points of the two routes, respectively.
With the LCS identified, the similarity between the two routes was measured with the ratio of the actual travel distance of the LCS. Specifically, the actual travel distance of LCS was derived by summing all the distance derived from the geographical coordinates of the consecutive points in LCS, which is described as len(L (i,j) ) = X−1 x=1 dis(L x , L x+1 ), where X is the number of matches, L (i,j) = (L 1 , . . . , L X ) is the derived LCS for route r Ik A and r Ik B , L 1 , . . . , L X are the matched points. As the matched points are approximate, the LCS can be extracted either from route r Ik A or r Ik B . The LCS for the two routes are described as L (i,j) A and L (i,j) B , respectively. Based on this, the similarity of r Ik A and r Ik B was derived by calculating the ratio of len(L (i,j) ) in the route with shorter distance which is shown in Equation (6).
where δ is the allowable minimum length for a valid LCS between two routes; simR(r Ik A , r Ik B ) represents the similarity between route r Ik A and r Ik B . With the similarity value (simR) between different routes, a DBSCAN clustering method was then used to classify the routes. The details of the DBSCAN method can be find in our previous study [17]. The repeatability of selected routes of a traveler was quantified as N i r , and the calculation for N i r is shown in Equation (7).
where distC FUR represents the averaged within-cluster distance for clusters with FURs, N(C non-FUR ) is the number of routes within the clusters without FURs, k FUR represents the number of clusters with FURs, C j is the j-th cluster, c j is the centroid of cluster C j , simR(j) − c j is the L2 norm (Euclidean distance) between the two vectors, and simR(j) is a data point in C j . It is worth noting that to make the parameters comparable, the simR(j) and N(C non-FUR ) were scaled. Obviously, the higher the value of N i r , the more likely the traveler was a commuter.

Commuting Vehicles Identification Using Ward's Hierarchical Clustering
To identify the commuting vehicles, a clustering technique was utilized to analyze temporal-spatial features extracted from ALPR data. Many clustering algorithms and strategies, such as K-means [24,31], DBSCAN [32], GMM (Gaussian Mixture Model) [33], nested clustering [34], online agglomerative clustering [35], hierarchical clustering [36], and other algorithms [37,38] had been proposed in the past decades. Hierarchical clustering, as a typical unsupervised machine learning algorithm, has been applied to a wide spectrum of transportation researches. Since the Ward's hierarchical clustering (Hclust) [36] method does not need an initial number of clusters and initial composition rules and has the advantage of possessing easily computable extended dissimilarity indices [39], it was utilized to extract the commuting vehicles in our study. Due to the computational complexity of merging at each step for the large dataset, the Lance-Williams algorithm [36] was utilized to improve the computing efficiency. A suitable number of clusters were selected using the hierarchy (which is also known as "dendrogram") from successive merging. The details of the Hclust method are as follows: The features extracted from all vehicles are included in dataset C. C = P 1 , P 2 , . . . , P i , . . . , P j , . . . , P m . P i and P j are the i th and j th vehicle's feature in C, m is the total number of vehicles. Vehicle P i has four features P i = (N i d , N i s , N i e , N i r ), specifically, N i d is the number of days that a vehicle appears at weekday peaks, N i s and N i e are the numbers of different origins and destinations, N i r denotes the repeatability of selected routes of a traveler. It is noted that the variables have rescaled before calculating the distance. The Hclust algorithm consists of two parts: the objection function and dissimilarity measurement.

Objective Function
The objective of the Hclust algorithm is to minimize the increase of total within-cluster variance at each iteration until all clusters are merged into one. In other words, the objective of the Hclust algorithm is to find a pair of clusters with minimum distance among all pairwise distances at each step. Take the first iteration as an example to initialize the algorithm, set each object P i and P j as cluster C P i and C P j , individually. The dissimilarities between C P i and C P j is noted as d (P i )(P j ) . The total number of vehicles to be clustered is m, the objective function J for the first iteration is described in Equation (8):

Dissimilarity Measurement
With the agglomeration of vehicles' features, there are two kinds of data to be classified: a single feature and a combination of features. The combination of features may be formed by many clusters.
To determine the pair of clusters with minimum distance, the distance between feature to feature (Euclidean distance), feature to the combination of features (intra-cluster distance), and the combination of features to the combination of features (global cluster distance) should be calculated. In brief, two kinds of dissimilarity, the intra-cluster and global cluster dissimilarity, are measured between different clusters. Assume that C P i and C P j are included in clusters C i and C j , and C i and C j have the minimum distances during t-th iteration. Then C i and C j are merged as C i∪j in the (t + 1)-th iteration. After (t + 1) th iteration, assume that there is a cluster C k in C, if C i∪j and C k have the minimum distance, then C i∪j and C k can be merged in the next iteration. Given that the dissimilarities between C i and C j are noted as d ij , the intra-cluster dissimilarity between cluster C i∪j and C k can be expressed as Equations (9) and (10): α i = n i + n k n i + n j + n k , α j = n j + n k n i + n j + n k , β = −n k n i + n j + n k (10) where d ij , d ik and d jk are the pair-wise distances between clusters C i , C j and C k , respectively. d (ij)k is the distance between the clusters C i∪j and C k . n i , n j and n k represent the size of cluster C i , C j and C k , respectively. α i , α j and β in Equations (9) and (10) illustrate that the distance between C i∪j and C k is related to the size of each cluster. The global cluster dissimilarity can be described as Equation (11): where d AB is the weighted Euclidean distance between the centroids of cluster C A and C B , → c A and → c B are the averaged vector of cluster C A and C B , which represent the centroids of the points in cluster C A and C B , respectively. || || is the operation of calculating the Euclidean distance. When the cluster-pairs both are formed by a single object, the dissimilarity is equals to the Euclidean distance between them.

Clustering Procedure
Let C = C 1 , . . . C m be the sets of clusters. The procedure of Ward's hierarchical clustering method is shown in module 3 of Figure 1 and the main steps are summarized as follows: Step 1: Initialization. Set each object (P i ) as a cluster.
Step 3: Clusters merging. Find a cluster-pairs C i , C j with the minimum distances and merge them into a new cluster C i∪j . Then, add C i∪j to C, remove C i and C i from C.
Step 4: Check the condition whether iteration terminates. Check if the total number of clusters in C is equal to 1, if yes, then stop iteration; if not, go to Step 5.

Data Description and Feature Extraction
A ring expressway with 35 on-ramps and 24 off-ramps in Kunshan City, China was selected as the test site (as shown in Figure 2). As the selected network is a closed ring expressway network, the vehicle path can be determined by an origin and a destination. Therefore, only the origin and destination information were used to describe the spatial features of a traveler. The ALPR devices were located at on-ramps and off-ramps of the selected network. where d AB is the weighted Euclidean distance between the centroids of cluster C A and C B , c A ⃗ and c B ⃗ are the averaged vector of cluster C A and C B , which represent the centroids of the points in cluster C A and C B , respectively. ‖ ‖ is the operation of calculating the Euclidean distance. When the cluster-pairs both are formed by a single object, the dissimilarity is equals to the Euclidean distance between them.

Clustering Procedure
Let C=C 1 ,…C m be the sets of clusters. The procedure of Ward's hierarchical clustering method is shown in module 3 of Figure 1 and the main steps are summarized as follows: Step 1: Initialization. Set each object (P i ) as a cluster. C P i = P i , C P j = P j .
Step 3: Clusters merging. Find a cluster-pairs C i ,C j with the minimum distances and merge them into a new cluster C i∪j . Then, add C i∪j to C, remove C i and C i from C.
Step 4: Check the condition whether iteration terminates. Check if the total number of clusters in C is equal to 1, if yes, then stop iteration; if not, go to Step 5.

Data Description and Feature Extraction
A ring expressway with 35 on-ramps and 24 off-ramps in Kunshan City, China was selected as the test site (as shown in Figure 2). As the selected network is a closed ring expressway network, the vehicle path can be determined by an origin and a destination. Therefore, only the origin and destination information were used to describe the spatial features of a traveler. The ALPR devices were located at on-ramps and off-ramps of the selected network. The data was collected by 59 ALPR devices from 1-31 May 2017. A sample of raw data is shown in Table 1. The "TIME_KEY" consists of hour, minute, second and millisecond. For example,  The data was collected by 59 ALPR devices from 1-31 May 2017. A sample of raw data is shown in Table 1. The "TIME_KEY" consists of hour, minute, second and millisecond. For example, "92449840" means 09:24:49:840. "INSTALL_TYPE" represents the location of ALPR devices, "1" and "0" represent the devices installed at on-ramp and off-ramp, respectively. "LP_CAMERA_ID" is the ID of the ALPR devices. To protect travelers' privacy, only the last four digits of the license number were displayed in the table. Note: "LP_CAMERA_ID" is the ID of the ALPR devices, "Mon" means Monday, "WB" means westbound.
The data was pre-processed as follows: (1) To generate a vehicle trajectory, the "Lp_CAMERA_ID" were arranged together in time order; (2) a single trip was presented when the ordered neighboring records had different "INSTALL_TYPE" and the travel time was less than a threshold. In this study, the maximum travel time between two ALPR devices in test site is about 20 min, therefore, 20 min was set to be the time threshold; (3) the records with unidentified license plates were excluded; (4) given that the road network was totally-enclosed, any vehicle had to enter the network through the on-ramp and leave it through the off-ramp. The number of vehicles identified at on-ramps was viewed as the traffic demand ratio. With processed ALPR data, the extracted features were derived and a partial of which were shown in Table 2. Considering the higher temporal and spatial repeatability of a commuter, it was inferred that as the ID in Table 2 increases, the possibility of being a commuter decreased. To quantify the possibility and identify commuters, a clustering method was utilized in the following section.

Performance Evaluation
With extracted temporal and spatial features of commuting vehicles, the identification of commuting vehicles was implemented using the Hclust clustering method. To evaluate its performance, a comparison analysis was conducted between the Hclust method and the traditional clustering method, K-means. Before clustering, considering that the clustering results largely depends on the selection of the number of clusters, the dendrogram [40] (graphical structure) which revealed how the data objects merge into a single cluster and the Calinski-Harabasz Index (CHI) [41] were utilized to determine the optimal number of clusters for the Hclust and K-means methods, respectively. Specifically, CHI evaluates the cluster validity based on the average between and within cluster sum of squares which are shown as Equations (12)- (14).
where SS w is the sum of squares within the clusters while SS B is the sum of squares among the clusters, m is the total number of data objects in the dataset, c is the dataset's centroid, m i is the number of objects in C i , K denotes the number of clusters, P i is the data point, C i is the i th cluster, c i is the centroid of cluster C i , and ||P i − c i || is the L2 norm between the two vectors. The higher the value of CHI, the more reasonable a cluster number is. Figure 3a is the dendogram of extracted features (N d , N s , N e ) based on the Hclust and it depicts the hierarchical relationship of 494,528 vehicle features. The dendrogram graphically displays the merging process and the intermediate clusters, and the graphical structure shows how points can be merged into a single cluster. The height of the dendogram implies the distance between the clusters. Based on the graphical interpretation of the dendrogram in Figure 3a, the optimal number of clusters could be two or four. The data structure with four clusters can be found by the horizontal line (red dotted line). Figure 3b is the CHI for K-means. It is obvious that the highest CHI (optimal) occurs when the number of clusters is four. Therefore, a four-cluster model was used in this study. revealed how the data objects merge into a single cluster and the Calinski-Harabasz Index (CHI) [41] were utilized to determine the optimal number of clusters for the Hclust and K-means methods, respectively. Specifically, CHI evaluates the cluster validity based on the average between and within cluster sum of squares which are shown as Equations (12)- (14).
where SS w is the sum of squares within the clusters while SS B is the sum of squares among the clusters, m is the total number of data objects in the dataset, c is the dataset's centroid, m i is the number of objects in C i , K denotes the number of clusters, P i is the data point, C i is the i th cluster, c i is the centroid of cluster C i , and ‖P i -c i ‖ is the L2 norm between the two vectors. The higher the value of CHI , the more reasonable a cluster number is. Figure 3a is the dendogram of extracted features (N d ,N s ,N e ) based on the Hclust and it depicts the hierarchical relationship of 494,528 vehicle features. The dendrogram graphically displays the merging process and the intermediate clusters, and the graphical structure shows how points can be merged into a single cluster. The height of the dendogram implies the distance between the clusters. Based on the graphical interpretation of the dendrogram in Figure 3a, the optimal number of clusters could be two or four. The data structure with four clusters can be found by the horizontal line (red dotted line). Figure 3b is the CHI for K-means. It is obvious that the highest CHI (optimal) occurs when the number of clusters is four. Therefore, a four-cluster model was used in this study. As shown in Figure 4, the commuters' features were clustered into four clusters using Hclust and K-means, respectively. As introduced in Section 2.2, commuters tend to recur repeatedly (large N ) during the fixed duration, a fixed origin (smaller N s ) and a fixed destination (smaller N e ) for multiple weekdays. Based on this, the Cluster 4 in Figure 4a and the Cluster 2 in Figure 4b were considered as the identified commuting vehicles and the other Clusters were labelled as noncommuting vehicles. As shown in Figure 4, the commuters' features were clustered into four clusters using Hclust and K-means, respectively. As introduced in Section 2.2, commuters tend to recur repeatedly (large N d ) during the fixed duration, a fixed origin (smaller N s ) and a fixed destination (smaller N e ) for multiple weekdays. Based on this, the Cluster 4 in Figure 4a and the Cluster 2 in Figure 4b were considered as the identified commuting vehicles and the other Clusters were labelled as non-commuting vehicles. Specifically, it is believed that Cluster 1 and Cluster 2 in Figure 4a represented the travelers occasionally traveling through the test site at peak hours, and Cluster 3 in Figure 4a were the noncommuting travelers due to their flexible origins and destinations. Similar results can also be found in Figure 4b. A detailed description is shown in Table 3 for further analyzing the clustering results. Specifically, it is believed that Cluster 1 and Cluster 2 in Figure 4a represented the travelers occasionally traveling through the test site at peak hours, and Cluster 3 in Figure 4a were the non-commuting travelers due to their flexible origins and destinations. Similar results can also be found in Figure 4b. A detailed description is shown in Table 3 for further analyzing the clustering results. With labelled vehicles of commuters and non-commuters, an approach for performance evaluation was proposed between the proposed method and the traditional method. It was assumed that the better clustering method can identify more commuters with reasonable features (larger N d and N r , smaller N s and N e ) and lower variance. To evaluate the methods, a performance evaluation indicator (PF) was developed. The proposed evaluation metric PF reflects the reasonability of identified commuters' features. A higher PF value indicates a more reasonable result of identified commuters. The proposed evaluation metric can improve the efficiency of determining the suitable method and increase the accuracy for commuters' recognition. It is worth noting that as there are large differences in magnitude among the three features, N i d , N i r , N i s and N i e were rescaled as N i d , N i r , N i s and N i e to ensure the comparability of the variables. Take N i d as example, the detailed rescaling methods are expressed as Equation (15). The other formulas for performance evaluation are shown in Equations (16)- (18).
where pf i is the performance indicator for identified commuting vehicles, m and l are the total number of all vehicles and identified commuting vehicles, respectively, and V is the variance of features for identified commuting vehicles. The performance evaluation results are shown in Table 3. It was found that the identified commuters (displayed in gray background) had similar characters: larger N d , smaller N s and smaller N e , which indicates that both methods successfully identified the commuters. The importance of features was evaluated based on random forests [42]. Specifically, the importance (IMP) of a feature was quantified by the impact of the feature's permutation on the accuracy of the clustering results. For example, the importance of feature N d was derived by calculating the mean decrease accuracy of the clustering results through the model's predicting after rearranging the order of values of N d . The higher value of IMP, the more significant it was to the clustering result. As expressed in Table 3, the values of IMP were similar, illustrating that the three features are of equal importance for the two methods. For comparison, the performance evaluation indicators were derived using Equations (15)- (18). The higher PF value indicated that the method was more accurate in identifying commuters. As shown in Table 3, it was found that the Hclust method had a higher PF value. Specifically, the higher l/m value revealed that, compared with the Hclust method, more commuting vehicles were identified using the K-means method; lower PF illustrates a lower repeatability of temporal and spatial features using the K-means method; higher V means higher variance of commuting vehicles using the K-means method. The test site consisted of four lines: East line, West line, South line and North line. Given that commuters tend to have stable origins (on-ramps) or destinations (off-ramps), they were supposed to travel through the road network through two lines for work/home. Synthetically considering Figures 2 and 4b, it was found that there were up to six destinations (off-ramps) for one line, however, the number of destinations for identified commuting vehicles varied from one to nine using the K-means method. That is to say, the identified commuters using the K-means method traveled through the road network through more than two lines. This is inconsistent with actual commuting behaviors. While for the Hclust method, all the three features of commuting vehicles were in accordance with that of the real world. Therefore, the Hclust method was utilized for commuting vehicles' identification.

Commuting Pattern Analysis
To reveal the dynamic temporal-spatial features of commuting travels, the temporal and spatial commuting patterns were investigated based on the clustering results.

Temporal Commuting Pattern
In terms of the temporal pattern, the time-of-day, day-of-week and day-to-day ratio of commuting trips of the test site was analyzed, respectively. The temporal commuting pattern was expressed as the average ratio of commuting trips at a specific duration. The ratio of commuting trips was considered as the proportion of commuting trips in the total number of trips at a period of time. For example, the time-of-day commuting pattern at different days of the week was derived as the average ratio of commuting trips at different days of the week. Figure 5a describes the average ratio of commuting trips every 5 min at different days of the week. It was shown that the overall ratio of commuting trips at morning peaks was larger and more concentrated compared to those at evening peaks. The reason is that the departure time at evening peaks is more flexible than that at morning peaks. Part of the commuters may go shopping or have dinner outside after work. Furthermore, the temporal commuting pattern was similar from Monday to Saturday, and the ratio of commuting trips at some peak hours of the weekdays reached 0.4. Besides, the daily traffic demands and their structure are shown in Figure 5b. The red and green rectangles describe the number of commuting and non-commuting trips, respectively. The labels are the ratios of commuting trips in a whole day. It shows that the ratio of commuting trips is relatively stable around 0.16 per day and the traffic demand is smaller on weekends (marked with black rectangles). l/m value revealed that, compared with the Hclust method, more commuting vehicles were identified using the K-means method; lower PF illustrates a lower repeatability of temporal and spatial features using the K-means method; higher V means higher variance of commuting vehicles using the K-means method. The test site consisted of four lines: East line, West line, South line and North line. Given that commuters tend to have stable origins (on-ramps) or destinations (off-ramps), they were supposed to travel through the road network through two lines for work/home. Synthetically considering Figures 2 and 4b, it was found that there were up to six destinations (offramps) for one line, however, the number of destinations for identified commuting vehicles varied from one to nine using the K-means method. That is to say, the identified commuters using the Kmeans method traveled through the road network through more than two lines. This is inconsistent with actual commuting behaviors. While for the Hclust method, all the three features of commuting vehicles were in accordance with that of the real world. Therefore, the Hclust method was utilized for commuting vehicles' identification.

Commuting Pattern Analysis
To reveal the dynamic temporal-spatial features of commuting travels, the temporal and spatial commuting patterns were investigated based on the clustering results.

Temporal Commuting Pattern
In terms of the temporal pattern, the time-of-day, day-of-week and day-to-day ratio of commuting trips of the test site was analyzed, respectively. The temporal commuting pattern was expressed as the average ratio of commuting trips at a specific duration. The ratio of commuting trips was considered as the proportion of commuting trips in the total number of trips at a period of time. For example, the time-of-day commuting pattern at different days of the week was derived as the average ratio of commuting trips at different days of the week. Figure 5a describes the average ratio of commuting trips every 5 min at different days of the week. It was shown that the overall ratio of commuting trips at morning peaks was larger and more concentrated compared to those at evening peaks. The reason is that the departure time at evening peaks is more flexible than that at morning peaks. Part of the commuters may go shopping or have dinner outside after work. Furthermore, the temporal commuting pattern was similar from Monday to Saturday, and the ratio of commuting trips at some peak hours of the weekdays reached 0.4. Besides, the daily traffic demands and their structure are shown in Figure 5b. The red and green rectangles describe the number of commuting and non-commuting trips, respectively. The labels are the ratios of commuting trips in a whole day. It shows that the ratio of commuting trips is relatively stable around 0.16 per day and the traffic demand is smaller on weekends (marked with black rectangles). To characterize a more detailed temporal pattern, the ratio of commuting trips of partial frequently used ramps was analyzed. Figure 6 describes the ratios of commuting trips for device 1000037, 1000038, 1000069 and 1000079 on Tuesday, 16 May 2017. It shows that, compared with the average ratio of all ramps, these frequently used ones tend to have larger ratios of commuting trips, and the maximum ratio at some periods for these ramps is close to 0.5. These facilities are installed at ramps with higher ratios of commuting trips which have a greater impact on the selected network. It is necessary to meter or operate theses frequently used ramps to improve the efficiency of the network traffic.

Spatial Commuting Pattern
To investigate the spatial commuting pattern at the test site, the location-varying average ratio of commuting trips at AM/PM peak in different weekdays was analyzed. The AM and PM peak (7:00-9:00 and 17:00-19:00) was derived from the results in the previous section. Specifically, the spatial To characterize a more detailed temporal pattern, the ratio of commuting trips of partial frequently used ramps was analyzed. Figure 6 describes the ratios of commuting trips for device 1000037, 1000038, 1000069 and 1000079 on Tuesday, 16 May 2017. It shows that, compared with the average ratio of all ramps, these frequently used ones tend to have larger ratios of commuting trips, and the maximum ratio at some periods for these ramps is close to 0.5. These facilities are installed at ramps with higher ratios of commuting trips which have a greater impact on the selected network. It is necessary to meter or operate theses frequently used ramps to improve the efficiency of the network traffic. To characterize a more detailed temporal pattern, the ratio of commuting trips of partial frequently used ramps was analyzed. Figure 6 describes the ratios of commuting trips for device 1000037, 1000038, 1000069 and 1000079 on Tuesday, 16 May 2017. It shows that, compared with the average ratio of all ramps, these frequently used ones tend to have larger ratios of commuting trips, and the maximum ratio at some periods for these ramps is close to 0.5. These facilities are installed at ramps with higher ratios of commuting trips which have a greater impact on the selected network. It is necessary to meter or operate theses frequently used ramps to improve the efficiency of the network traffic.

Spatial Commuting Pattern
To investigate the spatial commuting pattern at the test site, the location-varying average ratio of commuting trips at AM/PM peak in different weekdays was analyzed. The AM and PM peak (7:00-9:00 and 17:00-19:00) was derived from the results in the previous section. Specifically, the spatial

Spatial Commuting Pattern
To investigate the spatial commuting pattern at the test site, the location-varying average ratio of commuting trips at AM/PM peak in different weekdays was analyzed. The AM and PM peak (7:00-9:00 and 17:00-19:00) was derived from the results in the previous section. Specifically, the spatial Sustainability 2020, 12, 1764 15 of 20 commuting pattern at AM (PM) peak was derived by averaging the ratios of commuting trips during 7:00-9:00 (17:00-19:00) at all the weekdays for each device. Figure 7a-d illustrates the spatial distribution of location-varying commuting pattern of ALPR devices at the AM and PM peak, respectively. It was observed that the spatial distributions of the ratio of commuting trips at the morning peak and evening peak have significant differences. Briefly, the test site was divided into four parts: East line, West line, South line, North line. For the AM peak, the origins (on-ramps) with a higher ratio of commuting trips were widely located in all these four lines while the destinations (off-ramps) with a higher ratio of commuting trips were distributed around the area where the East line and the South line meets together. Given the fact that Kunshan is a satellite city of the Shanghai metropolitan and this area is close to a widely used freeway to Shanghai, it implies that this area is frequently used by commuters who live in Kunshan and work in Shanghai. As for the PM peak, most of the origins with a higher ratio of commuting trips was also located in the East line and the South line. The consistency of the origins and destinations with a higher ratio of commuting trips for the PM and AM peaks implies the rationality of the commuters' identification. Besides, it was observed that most of the destinations with a higher ratio of commuting trips during the evening peak were in the West line. Considering that the west of the test site in Kunshan city is a residential area, the result is in accordance with the traffic demands in the real world. commuting pattern at AM (PM) peak was derived by averaging the ratios of commuting trips during 7:00-9:00 (17:00-19:00) at all the weekdays for each device. Figure 7a-d illustrates the spatial distribution of location-varying commuting pattern of ALPR devices at the AM and PM peak, respectively. It was observed that the spatial distributions of the ratio of commuting trips at the morning peak and evening peak have significant differences. Briefly, the test site was divided into four parts: East line, West line, South line, North line. For the AM peak, the origins (on-ramps) with a higher ratio of commuting trips were widely located in all these four lines while the destinations (off-ramps) with a higher ratio of commuting trips were distributed around the area where the East line and the South line meets together. Given the fact that Kunshan is a satellite city of the Shanghai metropolitan and this area is close to a widely used freeway to Shanghai, it implies that this area is frequently used by commuters who live in Kunshan and work in Shanghai. As for the PM peak, most of the origins with a higher ratio of commuting trips was also located in the East line and the South line. The consistency of the origins and destinations with a higher ratio of commuting trips for the PM and AM peaks implies the rationality of the commuters' identification. Besides, it was observed that most of the destinations with a higher ratio of commuting trips during the evening peak were in the West line. Considering that the west of the test site in Kunshan city is a residential area, the result is in accordance with the traffic demands in the real world. To further analyze the spatial commuting pattern, the location-varying ratio of commuting trips at days of the week were investigated in the morning and evening, respectively. The spatial commuting pattern in the morning and evening were derived by averaging the ratio of commuting To further analyze the spatial commuting pattern, the location-varying ratio of commuting trips at days of the week were investigated in the morning and evening, respectively. The spatial commuting pattern in the morning and evening were derived by averaging the ratio of commuting trips before and after 12:00 on different days of the week for each device. Figure 8a,b depict the ratio of commuting trips of each device in the morning for on-ramps and in the evening for off-ramps. Similarly, Figure 8c,d depict the ratio of commuting trips for each device in the evening for the on-ramp and in the morning for the off-ramp, respectively. The ALPR devices installed at on-ramps were labelled from "1000022" to "1000056," and the rest were installed at off-ramps. The large points with labelled numbers were the devices with the top three highest ratios of commuting trips at corresponding mornings/evenings on different days of week, which were considered to be frequently used ramps. trips before and after 12:00 on different days of the week for each device. Figure 8a,b depict the ratio of commuting trips of each device in the morning for on-ramps and in the evening for off-ramps. Similarly, Figure 8c,d depict the ratio of commuting trips for each device in the evening for the onramp and in the morning for the off-ramp, respectively. The ALPR devices installed at on-ramps were labelled from "1000022" to "1000056," and the rest were installed at off-ramps. The large points with labelled numbers were the devices with the top three highest ratios of commuting trips at corresponding mornings/evenings on different days of week, which were considered to be frequently used ramps. The comparison between frequently used ramps in Figure 8a,b and Figure 8c,d reveal that there were distinct correspondence relations between the frequently used on-ramps at AM and the frequently used off-ramps at PM, and the details of the correspondence relation are described as follows: The comparison between frequently used ramps in Figure 8a,b and Figure 8c,d reveal that there were distinct correspondence relations between the frequently used on-ramps at AM and the frequently used off-ramps at PM, and the details of the correspondence relation are described as follows: Figure 9a displays the locations of frequently used ramps at AM and PM in Figure 8. The on-ramps and off-ramps are distinguished with blue and red circles, respectively. The frequently used on-ramps and off-ramps at AM and PM are categorized by connected dotted lines. It was found that there were distinct correspondence relations between the frequently used on-ramps at AM and the frequently used off-ramps at PM. Take part of the frequently used ramps as an example, "1000023" was next to "1000061" in the North line, "1000049" was adjacent to "1000069" in the East line, "1000037," "1000038" were next to "1000071," "1000072" in the West line. Similar findings can also be found in Figure 8c,d, such as "1000049," "1000051," "1000055," and "1000069" were close together in the East line, "1000040" was adjacent to "1000076," "1000079" in the South line, etc. The results validated the consistency of commuting behaviors; it also illustrates that the proposed method is accurate. ramps and off-ramps are distinguished with blue and red circles, respectively. The frequently used on-ramps and off-ramps at AM and PM are categorized by connected dotted lines. It was found that there were distinct correspondence relations between the frequently used on-ramps at AM and the frequently used off-ramps at PM. Take part of the frequently used ramps as an example, "1000023" was next to "1000061" in the North line, "1000049" was adjacent to "1000069" in the East line, "1000037," "1000038" were next to "1000071," "1000072" in the West line. Similar findings can also be found in Figure 8c,d, such as "1000049," "1000051," "1000055," and "1000069" were close together in the East line, "1000040" was adjacent to "1000076," "1000079" in the South line, etc. The results validated the consistency of commuting behaviors; it also illustrates that the proposed method is accurate.
Finally, Figure 8d illustrates that the ratio of commuting trips of devices "1000076" and "1000079" were significantly higher in the morning than other devices. As shown in Figure 9a, the device "1000040" which was close to devices "1000076" and "1000079" also had a significantly higher ratio of commuting trips in the evening than other devices. Considering that all three devices were close to a widely used freeway, it implies that many people who live in Kunshan have jobs in Shanghai. Specifically, many home-to-work commuting trips from Kunshan to Shanghai were detected by the devices "1000076" and "1000079" on weekday mornings. Many work-to-home commuting trips from Shanghai to Kunshan were detected by the device "1000040"on weekday evenings.   Figure 9a). Figure 9b indicates that the ratio of commuting trips of on-ramps in the AM was larger than 0.4 and the ratio of commuting trips of off-ramps in the PM was larger than 0.4. The opposite changing trend illustrates a correspondence relation between the origins and destinations for commuters. The consistent commuting behaviors illustrates the reasonability of the proposed method. Finally, Figure 8d illustrates that the ratio of commuting trips of devices "1000076" and "1000079" were significantly higher in the morning than other devices. As shown in Figure 9a, the device "1000040" which was close to devices "1000076" and "1000079" also had a significantly higher ratio of commuting trips in the evening than other devices. Considering that all three devices were close to a widely used freeway, it implies that many people who live in Kunshan have jobs in Shanghai. Specifically, many home-to-work commuting trips from Kunshan to Shanghai were detected by the devices "1000076" and "1000079" on weekday mornings. Many work-to-home commuting trips from Shanghai to Kunshan were detected by the device "1000040"on weekday evenings. Figure 9b describes the time of day ratio of commuting trips for device 1000037, 1000038, 1000071 and 1000072 on Thursday, 18 May 2017. Devices 1000037 and 1000038 were installed at on-ramps, and devices 1000071 and 1000072 were installed at off-ramps. The four frequently used ramps were close to each other (as shown in Figure 9a). Figure 9b indicates that the ratio of commuting trips of on-ramps in the AM was larger than 0.4 and the ratio of commuting trips of off-ramps in the PM was larger than 0.4. The opposite changing trend illustrates a correspondence relation between the origins and destinations for commuters. The consistent commuting behaviors illustrates the reasonability of the proposed method.

Conclusions
This paper developed a method which can accurately identify the commuters and proposed a systematic framework for commuting pattern recognition to support decision-making in practical applications. The characteristics of a commuting pattern for urban road networks were explored using ALPR data. The temporal and spatial features of individual vehicles were extracted and classified using Ward's hierarchical clustering (Hclust) in order to identify the commuting vehicles. Comparison between the proposed method and traditional method (Hclust and K-means) was performed to evaluate the reasonability of the proposed clustering method and the indicator PF that characterized the commuters' features was developed. The dendrogram and the CHI were utilized to determine the optimal number of clusters for the Hclust and K-means method. Based on this, a case study was conducted in Kunshan city, China. The comparison results demonstrated that the Hclust method performed better.
The proposed method investigated the temporal and spatial pattern of commuters. The findings in our paper can be summarized as follows: (1) the exploration of the temporal commuting pattern showed that the ratio of commuting trips for a whole day was relatively stable around 0.16 at weekdays, and the ratio of commuting trips at partial peaks of weekdays reached 0.4; (2) the exploration of the spatial commuting pattern revealed distinct correspondence relations between the frequently used on-ramps and off-ramps. The results validated the consistency of commuters' behaviors, which can be a proof of the reasonability of the proposed method; (3) a pair of off-ramps in the morning and an on-ramps in the evening which were next to each other had significantly higher ratios of commuting trips. The reasons were analyzed with the properties of land use and demographic information of the test site. In the future, the impact of commuting patterns on the distribution of network traffic congestion will be investigated. More field validation works are needed to further interpret the commuting pattern.