Identify Optimal Trafﬁc Condition and Speed Limit for Hard Shoulder Running Strategy

: Highway system is experiencing increasing trafﬁc congestion with fast-growing number of vehicles in metropolitan areas. Implementing trafﬁc management strategies such as utilizing the hard shoulder as an extra lane could increase highway capacity without extra construction work. This paper presents a method of determining an optimal trafﬁc condition and speed limit of opening hard shoulder. Firstly, the trafﬁc states are clustered using K-Means, mean shift, agglomerative and spectral clustering methods, and the optimal clustering algorithm is selected using indexes including the silhouette score, Davies-Bouldin Index and Caliski-Harabaz Score. The results suggested that the clustering effect of using K-Means method with three categories is optimal. Then, cellular automata model is used to simulate trafﬁc conditions before and after the hard shoulder running strategy is applied. The parameters of the model, including the probabilities of random deceleration, slow start and lane change, are calibrated using real trafﬁc data. Four indicators including the trafﬁc volume, the average speed, the variance of speed, and the travel time of emergency rescue vehicles during trafﬁc accident obtained using the cellular automata model are used to evaluate various hard shoulder running strategies. By using factor analysis and TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) methods, the optimal trafﬁc condition and speed limit of opening hard shoulder could be determined. This method could be applied to highway segments of various number of lanes and different speed limits to optimize the hard shoulder running strategy for highway management. volume, the average speed, the variance of the speed and the travel time of emergency rescue vehicles. (c) Finally in the decision optimization part, we used factor analysis method to determine the weight of the four trafﬁc parameters obtained from the cellular automata model, and then the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) method is used to identify the optimal trafﬁc state and speed limit of opening hard shoulder.


Introduction
With the development of economy, the highway system become increasingly overloaded with traffic. The excessive traffic load induced increasing frequency of traffic congestion and incidents, especially during holidays. The problem of traffic congestion has affected people's quality of life and caused economic losses. It has been reported that the economic loss of travel time delay and fuel consumption was $121 billion in 2011 worldwide [1,2], and the annual loss was estimated to reach $199 billion in 2020 [3]. Construction of new highways would cost huge funding budget and require long construction period. In addition, the growth rate of new highway mileages is far less than the growth rate of traffic volumes. Therefore, the more effective way to relieve the traffic congestion on highway is to use appropriate transportation management techniques to maximize the operating efficiency of current transportation facilities compared with expanding more highway mileages.
Recently the Active Traffic Management System (ATMS) has aroused wide attention of researchers and engineers. The effective approaches of the ATMS include the Advisory Variable Speed Limits (AVSLs), Lane-Use Control Signals (LUCS) and Dynamic Hard Shoulder Running. One research studied the effects of the three approaches and found that most of the improvements occurred in the hard shoulder running section [4]. A typical hard shoulder has a width of more than three meters and is usually placed on the right side of the road which can meet the parking and driving need of vehicles. In most countries the hard shoulder is used as an emergency lane which allow drivers to stop and handle emergency events, however running on the emergency lane is generally prohibited for safety concerns. Some countries started to use the hard shoulder as an extra running lane to alleviate traffic congestion on highway [5][6][7]. Germany has used hard shoulder to evacuate traffic flow since 1996 [8,9]. In Netherlands, bus was allowed to operate on the hard shoulder, and hard shoulder running has been implemented on around 1000 km of highway segments since 2003 [8]. It was found that using the hard shoulder as additional travel lane led to a saving of 50,000 Euros per day in Germany, and a reduction in the overall fuel consumption by 4% was measured in England [10]. One research estimated that when the hard shoulder was opened, congestion frequency fell by between 68% and 82%, and average speed (same traffic volume) rose by 9% [11]. Another research suggested that using hard shoulder as additional travel lanes would not induce adverse changes and some countries even found decreases in accidents associated with opening hard shoulder [10]. Therefore, as can be induced from previous experiences, if the safety factors could be properly managed, opening the hard shoulder have great potential in increasing traffic capacity of highways.
Various studies have been conducted in investigating proper methods to utilize the hard shoulder for normal vehicle operation [12]. Farrag et al. [13] evaluated the impacts of hard shoulder running strategy and ramp closure strategy using a simulationbased method with the Vissim software. Berger and Maurer [14] conducted a costbenefit analysis to identify what traffic volume range is economically advisable to permit hard shoulder running. The cost includes the investment cost, maintenance cost, user's cost (travel time and operating cost), environmental damage cost and accidents cost. Bergmeister et.al [15] utilized the speed of the traffic flow to determine the opening and closing of the hard shoulder. Zhou.et.al [16] used Q-learning method to coordinate hard shoulder running control strategy to optimize the travel time. Karan [17] proposed a dynamic control strategy of opening hard shoulder using variable speed limit. In conclusion, the previous researches could be improved in three aspects: (1) Previous researches mainly relied on single traffic parameter such as the traffic volume or the speed to determine the proper condition to permit hard shoulder running. However, it's not accurate to represent the traffic condition purely depend on one single traffic parameter. A comprehensive standard could be established to evaluate various traffic conditions. (2) Many of the previous researches focused on improving the traffic efficiency, however as indicated by Kellerman [11], when the hard shoulder lane was opened, the possibility of lane changing would increase, which may result in rear end collisions. Therefore, when optimizing hard shoulder running strategy, both efficiency factors and safety factors should be considered.
(3) Few studies have mentioned how to set the optimal speed limit of the hard shoulder lane. When applying hard shoulder running strategy, the speed limit should be set properly to optimize both traffic efficiency and travel safety.
This paper aims to identify proper traffic condition to open hard shoulder as an extra lane and to optimize the speed limit of the hard shoulder lane. The rest of the paper is organized as follows: Section 2 introduces the data used in this research; Section 3 introduced the methodology, which includes three parts: methods of clustering traffic states, traffic simulation method using cellular automata model, and optimization method for determining optimal traffic condition and speed limit. Section 4 presents the results, and the conclusions and discussions are introduced in the final section.

Research Data
The traffic data used in this research is collected in the Shanghai-Nanjing expressway, which is a tier 1 level expressway that connects two major metropolitan areas of China. This expressway was built in 1996, which has four lanes in each direction and each lane is 3.5 m in width. Hard isolation is used to separate opposite driving directions in this highway segment. The width of the central separation belt is 4.5 m. In each direction, there is hard shoulder with the width of 3.25 m. The current speed limits on the four-lane highway segment is 120 km/h, 120 km/h, 100 km/h, 100 km/h respectively. The crosssection design and the speed limit configuration followed the regulations of the Ministry of Transport of China. In 2015, this regulation is updated which require that the lane width for the new expressways should be 3.75 m and the hard shoulder should be 3.5 m in width. Therefore, for both of the existing expressways and new expressways, the difference of the widths between the normal driving lanes and the hard shoulder is not significant, and the width of the hard shoulder is sufficient for traffic operation.
Millimeter-wave radar sensors are installed along both directions of the highway, which could collect traffic data include the traffic counts and the vehicles' speed simultaneously from the four lanes every 30 s. These traffic data are aggregated by 5 minutes' interval in order to facilitates further analysis. The traffic data used in this research are collected from May 1 to May 4 in 2019, which is during national holiday period. In this four-day long holiday, tolling fee is free for all types of vehicles. Compared with normal days, the traffic volume and chances of traffic incidents are higher in the tolling-free holidays. We select the traffic data in this period in order to simulate more complex traffic conditions. The data is used to calibrate the parameters of the cellular automata model. Figure 1 presents the variation of traffic volume and Figure 2 presents the average speed of the road segment during the research period. On May 1, from 8:30:00 to 17:30:00, it was apparent that the average speed was lower and the traffic volume was higher. On May 2, from 9:00:00 to 14:30:00, the fluctuation of traffic volume and average speed was comparatively large. From 14:30:00 to 20:00:00, it could be seen that traffic volume was higher and average speed was lower. On May 3 and May 4, the traffic volume was higher and the average speed was lower from 12:00:00 to 23 This expressway was built in 1996, which has four lanes in each direction and each lane is 3.5 m in width. Hard isolation is used to separate opposite driving directions in this highway segment. The width of the central separation belt is 4.5 m. In each direction, there is hard shoulder with the width of 3.25 m. The current speed limits on the four-lane highway segment is 120 km/h, 120 km/h, 100 km/h, 100 km/h respectively. The cross-section design and the speed limit configuration followed the regulations of the Ministry of Transport of China. In 2015, this regulation is updated which require that the lane width for the new expressways should be 3.75 m and the hard shoulder should be 3.5 m in width. Therefore, for both of the existing expressways and new expressways, the difference of the widths between the normal driving lanes and the hard shoulder is not significant, and the width of the hard shoulder is sufficient for traffic operation.
Millimeter-wave radar sensors are installed along both directions of the highway, which could collect traffic data include the traffic counts and the vehicles' speed simultaneously from the four lanes every 30 s. These traffic data are aggregated by 5 minutes' interval in order to facilitates further analysis. The traffic data used in this research are collected from May 1 to May 4 in 2019, which is during national holiday period. In this four-day long holiday, tolling fee is free for all types of vehicles. Compared with normal days, the traffic volume and chances of traffic incidents are higher in the tolling-free holidays. We select the traffic data in this period in order to simulate more complex traffic conditions. The data is used to calibrate the parameters of the cellular automata model. Figure 1 presents the variation of traffic volume and Figure 2 presents the average speed of the road segment during the research period. On May 1, from 8:30:00 to 17:30:00, it was apparent that the average speed was lower and the traffic volume was higher. On May 2, from 9:00:00 to 14:30:00, the fluctuation of traffic volume and average speed was comparatively large. From 14:30:00 to 20:00:00, it could be seen that traffic volume was higher and average speed was lower. On May 3 and May 4, the traffic volume was higher and the average speed was lower from 12:00:00 to 23:55:00. However

Methods
The methodology could be divided into three parts, including the traffic states classification using clustering methods, the traffic simulation using cellular automata model and the decision-optimization methods for identifying optimal traffic states and speed limit, respectively. Figure 5 presents the flow chart of the methodology. (a) Firstly, we used several clustering methods to group similar traffic states. The traffic state varies with

Methods
The methodology could be divided into three parts, including the traffic states classification using clustering methods, the traffic simulation using cellular automata model and the decision-optimization methods for identifying optimal traffic states and speed limit, respectively. Figure 5 presents the flow chart of the methodology. (a) Firstly, we used several clustering methods to group similar traffic states. The traffic state varies with different time-of-day and different day-of-week, different traffic management strategies could be applied to different traffic states in order to achieve higher traffic efficiency. However, there's no uniform indicator to quantify the traffic states of highway. In order to simply the solution, we firstly used clustering methods to group similar traffic patterns together based on the traffic volume and the average speed on each lane. (b) Secondly, the cellular automata model is used to simulate varies traffic scenarios, including the traffic condition after the hard shoulder is used as an extra lane under various speed limits and the traffic condition when traffic accident happened. The parameters of the cellular automata model, including the probabilities of the lane change, slow start, and random deceleration behavior, are calibrated using the real world data as introduced in the previous section. By using the cellular automata simulation model, we can obtain four traffic parameters including the traffic volume, the average speed, the variance of the speed and the travel time of emergency rescue vehicles. (c) Finally in the decision optimization part, we used factor analysis method to determine the weight of the four traffic parameters obtained from the cellular automata model, and then the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) method is used to identify the optimal traffic state and speed limit of opening hard shoulder. section. By using the cellular automata simulation model, we can obtain four traffic parameters including the traffic volume, the average speed, the variance of the speed and the travel time of emergency rescue vehicles. (c) Finally in the decision optimization part, we used factor analysis method to determine the weight of the four traffic parameters obtained from the cellular automata model, and then the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) method is used to identify the optimal traffic state and speed limit of opening hard shoulder.

Clustering Traffic States
Nowadays with the wide spread of electronic sensors equipped on highways, such as loop coil, video detector, ultrasonic detector and etc., it is easy to obtain traffic parameters such as speed, time occupancy, traffic volume, time-headway and etc. All of them could reflect traffic states. However there's no uniform indicator to quantify the level of congestion of highway [18,19]. Low traffic volume could indicate two contradictory conditions. For example, when the number of vehicles on the road is few, drivers can drive at the high speed limit; when the number of vehicles on the road is very high, conflictions between vehicles would increase and the frequency of stopping and starting would also increase, which result in low average speed and low traffic volume. Therefore, classifying traffic states should consider more than one traffic flow parameter. This paper selected

Clustering Traffic States
Nowadays with the wide spread of electronic sensors equipped on highways, such as loop coil, video detector, ultrasonic detector and etc., it is easy to obtain traffic parameters such as speed, time occupancy, traffic volume, time-headway and etc. All of them could reflect traffic states. However there's no uniform indicator to quantify the level of congestion of highway [18,19]. Low traffic volume could indicate two contradictory conditions. For example, when the number of vehicles on the road is few, drivers can drive at the high speed limit; when the number of vehicles on the road is very high, conflictions between vehicles would increase and the frequency of stopping and starting would also increase, which result in low average speed and low traffic volume. Therefore, classifying traffic states should consider more than one traffic flow parameter. This paper selected indications of the average speed and the traffic volume of each lane to cluster the traffic states.
There are various clustering models, such as K-means, mean shift, agglomerative clustering, spectral clustering and etc. The K-means method is one of the most popular and simplest clustering method [20]; the mean shift method is a centroid-based algorithm with the aim of discovering blobs in a smooth density of samples [21]; the spectral clustering method is based on graph theory [22]; the agglomerative clustering method is based on hierarchical methods with the main idea that each observation started in its own cluster, and clusters are successively merged together [23]. Therefore, it is difficult to choose an optimal clustering algorithm to cluster traffic states.
The purpose of clustering is to ensure that the differences in the same traffic state cluster are small and the differences in different traffic state clusters are large. Silhouette Score, Davies-Bould in Index and Caliski-Harabaz Score are chosen to evaluate the effect of the clustering methods.
Silhouette score can be calculated by the following equation: where a is the average distance between the sample and all points at the same cluster, b is the average distance between the sample and the closest cluster of all points. The range of silhouette score is from −1 to 1. When the silhouette score is closed to 1, it means that the sample is similar to all points at the same cluster, and the sample is not similar to other points at other clusters. Higher silhouette score indicates better clustering effect.
Davies-Bouldin Index can be calculated by the following equation: where: d c i , c j is the distance between centers of clusters; a i is the average distance between the center and all points at the same cluster. N is the number of clusters. When the Davies-Bouldin Index is small, it indicates that the effect of clustering is good.
Caliski-Harabaz Score can be calculated by the following equation: where: M is the number of samples, K is the number of clusters, T r (B k ) is the trace of covariance matrix of different clusters, T r (W k ) is the trace of covariance matrix of the same clusters. When Caliski-Harabaz Score is large, it represents that the effect of clustering is good.

Cellular Automata Model
The cellular automata model is a simulation model based on discrete space, time and states, and it's suitable for fast computer calculations [24]. A cellular automata comprises a regular grid of cells, each cell can only take finite states, and the rules of states updating should be local and synchronous [25]. State updating of each cell is only related to its own state and the states of the surrounding cells. Each cell follows the same rule of updating according to the states of the neighborhood cells and its own state [26]. This process is similar with actual vehicles running and therefore the cellular automata model is frequently used in traffic simulation. In addition, the mechanism of the cellular automata model ensured that error accumulation won't happen in the process of simulation, and the accuracy of the simulation would be sufficient to be used in long-time simulation.
This research utilized cellular automata model to obtain the traffic volume, average speed, time-headway and travel time of emergency rescue vehicles during traffic accident. As the research highway segment have four running lanes and one emergency lane (hard shoulder), five rows of cellular automata cells are built with 1000 cells in each row. The length of each cell is set as 50/9 m. Two different types of cell spacing are used, which includes the ordinary lane cell spacing and hard shoulder cell spacing. The cellular automata adopted the periodical boundary in simulation. The cells have three cell states: 0 represents that the cell is NOT occupied, 1 represents the cell is occupied, −1 represents the cell is impassable. In reality, the speed limit of each lane is 120 km/h, 120 km/h, 100 km/h, and 100 km/h from the first lane to the fourth lane respectively. Therefore, in cellular automata model, the speed limit of each lane is set as 6 cells/s, 6 cells/s, 5 cells/s, and 5 cells/s respectively. The hard shoulder did not have speed limit. However, from the perceptive of safety, when the hard shoulder is opened, all vehicles in the hard shoulder should obey speed limit rule. Therefore, it is meaningful to study optimal traffic states of opening hard shoulder and the speed limits for the hard shoulder. The original function of the hard shoulder lane is to serve as an emergency lane for emergency rescue vehicles to bypass traffic jams to get to the accident location or as a temporary safe parking place for malfunctional vehicles. It's necessary to compare traffic condition with the influence of traffic accidents before and after the hard shoulder is opened. The length of accident area is set as 30 cells, which equals to 500/3 m.
This research utilized cellular automata model to obtain the traffic volume, average speed, time-headway and travel time of emergency rescue vehicles during traffic accident. As the research highway segment have four running lanes and one emergency lane (hard shoulder), five rows of cellular automata cells are built with 1000 cells in each row. The length of each cell is set as 50/9 m. Two different types of cell spacing are used, which includes the ordinary lane cell spacing and hard shoulder cell spacing. The cellular automata adopted the periodical boundary in simulation. The cells have three cell states: 0 represents that the cell is NOT occupied, 1 represents the cell is occupied, −1 represents the cell is impassable. In reality, the speed limit of each lane is 120 km/h, 120 km/h, 100 km/h, and 100 km/h from the first lane to the fourth lane respectively. Therefore, in cellular automata model, the speed limit of each lane is set as 6 cells/s, 6 cells/s, 5 cells/s, and 5 cells/s respectively. The hard shoulder did not have speed limit. However, from the perceptive of safety, when the hard shoulder is opened, all vehicles in the hard shoulder should obey speed limit rule. Therefore, it is meaningful to study optimal traffic states of opening hard shoulder and the speed limits for the hard shoulder. The original function of the hard shoulder lane is to serve as an emergency lane for emergency rescue vehicles to bypass traffic jams to get to the accident location or as a temporary safe parking place for malfunctional vehicles. It's necessary to compare traffic condition with the influence of traffic accidents before and after the hard shoulder is opened. The length of accident area is set as 30 cells, which equals to 500/3 m.
In this paper, we assumed five rules for cellular automata, which includes the acceleration, random deceleration, lane change, slow start and deceleration rule.

1.
Acceleration rule: V n → min(V n + 1, V max ) . This rule described the drivers are expected to drive at the maximum speed.

2.
Random deceleration rule: with probability P 1 , V n → max(V n − 1, 0) , the speed of vehicles would slow down due to various uncertain reasons.

3.
Lane change rule: with probability P 2 , and d n < min(V n + 1, V max ), d n,other >d n , d n,back > V max , the vehicle would change the lane.

4.
Slow start rule: with probability P 3 , and V n = 0, instead of accelerating, the vehicle would keep the original position.

5.
Deceleration rule: V n → min(V n , d n ) . This rule described the measures taken by the driver to avoid a collision with other cars.
Where, d n is the number of cells between the vehicle and the vehicle in front on the same lane; d n,other is the number of cells between the vehicle and the vehicle in front on the target lane; d n,back is the number of cells between the vehicle and the vehicle behind on the target lane; V max is the speed limit of the lane; V n is the speed of the vehicle. The cellular automata model is calibrated using the research data including the traffic volume and the average speed aggregated by 5 min interval. For each of the three traffic conditions, the corresponding set of model parameters P 1 , P 2 , P 3 are adjusted to minimize the RMSE (Root Mean Square Error) between the field data and the simulated data. The RMSE can be calculated as follows: where, y i represents the value obtained by simulation, y i refers to the real value, n is the number of values. By comparing RMSE, better probabilities of P 1 , P 2 , P 3 could be selected.

Decision-Optimization Methods
After using cellular automata simulation models, we could obtain the traffic volume, average speed, variance of speed and travel time of emergency rescue vehicles during traffic accident. For better transportation service, we wish to maximize the traffic volume and the average speed to improve the traffic efficiency. However, for safety concerns, we need to minimize the variance of the speed, since large speed variance may lead to safety issues. We also concerned if accidents do happen, we need to use the hard shoulder as the emergency lane, and the travel time of the emergency rescue vehicle to the accident location should be minimized. In order to balance the multiple contradictory targets, we use factor analysis method in combination of TOPSIS method as the decision-optimization methods to determine an optimal traffic state and speed limit of opening hard shoulder.
Factor analysis is a multivariate statistical analysis method, which is based on the study of the internal dependencies of variables, and some variables with complex relationships are reduced into a few comprehensive factors [27,28]. The original variables are expressed and processed through linear combination [29]. The specific steps of factor analysis include: calculating the correlation coefficient matrix and selecting the main factors based on eigenvalues, calculating the rotated matrix by the maximum variance method, and obtaining weight of indicators with the adoption of contribution rate. The rate of contribution can be calculated as followed: where λ j is eigenvalue, b j is rate of contribution, y j is the value of new comprehensive index. Higher comprehensive score indicates better scheme. TOPSIS (Technique of Order Preference Similarity to the Ideal Solution) is a useful technique in dealing with multi-attributes or multi-criteria decision making problems [30]. It could help decision makers carry out analysis, comparisons and ranking of the alternatives [31]. The TOPSIS method first construct positive ideal solutions and negative ideal solutions of the decision-optimization problems, then calculate the distance between each scheme and the positive ideal solution and the negative ideal solution for comparisons [32]. The computation process of the TOPSIS method is straightforward and easy to implement [33]. In addition, the decision process permits the pursuit of best alternatives for each criterion depicted in a simple mathematical form [33]. The reason of both positive ideal solutions and negative ideal solutions are used is that, when only positive ideal solution is used, sometimes two schemes may have the same distance between the positive ideal solution. In order to distinguish the pros and cons of the two schemes, it is necessary to use both positive ideal solutions and negative ideal solutions.
The distance between each scheme and the positive ideal solution is calculated as followed: The distance between each scheme and the negative ideal solution is calculated as followed: Finally, an optimal scheme is selected by the following equation: where: C * j is the positive ideal solution, C 0 j is the negative ideal solution. When f * i is higher, it means that the scheme is better.

Results
This section describes the result of applying our methodology to identify the optimal traffic condition and speed limit for hard shoulder running strategy on the research highway segment. Section 4.1 compares the effects of several traffic states clustering method. Section 4.2 presents the results of traffic simulation using the cellular automata model and validates the model by comparisons between the simulated data with the field data. In Section 4.3, the cellular automata model is applied to obtain variables including the traffic volume, average speed, variance of speed and the travel time of emergency res-cue vehicles during traffic accidents, before and after the opening of the hard shoulder. Section 4.4 compares various hard shoulder running scenarios under different traffic conditions and speed limits, and presents the result of the decision optimization method.

Results of Clustering Traffic States
In this paper, four clustering methods including the K-means method, the mean shift method, the agglomerative clustering method and the spectral clustering method are used to group similar traffic states. Silhouette score, Caliski-Harabaz Score and Davies-Bouldin Index are used to evaluate and select the optimal clustering algorithm. Higher value of silhouette score and Caliski-Harabaz Score lower value of Davies-Bouldin Index represents better clustering effect. Tables 1-4 record the silhouette score, Caliski-Harabaz Score and Davies-Bouldin Index of each clustering method.  By comparing data in Tables 1-4, it's obvious that the clustering effect is better under five conditions: K-means, spectral clustering and agglomerative algorithms with three categories, and mean-shift with bandwidth of 1.7 and 2 respectively. Therefore, in order to choose an optimal clustering algorithm, we also calculated the number of silhouette score less than or equal to 0, the number of silhouette score greater than average silhouette score, and the number of silhouette score greater than 0 and less than or equal to average silhouette score and showed in Figure 6. By comparing data in Tables 1-4, it's obvious that the clustering effect is better under five conditions: K-means, spectral clustering and agglomerative algorithms with three categories, and mean-shift with bandwidth of 1.7 and 2 respectively. Therefore, in order to choose an optimal clustering algorithm, we also calculated the number of silhouette score less than or equal to 0, the number of silhouette score greater than average silhouette score, and the number of silhouette score greater than 0 and less than or equal to average silhouette score and showed in Figure 6. As shown in Figure 6, when the K-means algorithm is adopted and the cluster number is 3, the number of silhouette score less than 0 is minimum and the number of other categories is maximum. When the silhouette score is negative, it means that the effect of some clusters is not good. The result indicated that the K-means algorithm performs better than the agglomerative, spectral clustering and mean-shift methods. Therefore, the Kmeans algorithm is adopted to cluster traffic states, and the traffic states would be grouped into three categories.
When the K-Means algorithm is adopted and the cluster number is three, in order to evaluate the clustering effect of each category, the silhouette score of each sample is plotted in Figure 7. In Figure 7, the red vertical line represents the average of the silhouette score of all samples. As Figure 7 presents, when the cluster label is 0 or 1, the number of samples with above-the-average silhouette score is larger, and there exists no sample with silhouette score less than 0. Therefore, for samples labeled 0 or 1, the effect of clustering is better. As shown in Figure 6, when the K-means algorithm is adopted and the cluster number is 3, the number of silhouette score less than 0 is minimum and the number of other categories is maximum. When the silhouette score is negative, it means that the effect of some clusters is not good. The result indicated that the K-means algorithm performs better than the agglomerative, spectral clustering and mean-shift methods. Therefore, the K-means algorithm is adopted to cluster traffic states, and the traffic states would be grouped into three categories.
When the K-Means algorithm is adopted and the cluster number is three, in order to evaluate the clustering effect of each category, the silhouette score of each sample is plotted in Figure 7. In Figure 7, the red vertical line represents the average of the silhouette score of all samples. As Figure 7 presents, when the cluster label is 0 or 1, the number of samples with above-the-average silhouette score is larger, and there exists no sample with silhouette score less than 0. Therefore, for samples labeled 0 or 1, the effect of clustering is better. Figure 8 illustrates the traffic states of each group. Cluster label 0 represents the high density group with corresponding density ranging from 140 pcu/km/four-lanes to 700 pcu/km/four-lanes, the range of 5 min traffic volume is from 270 pcu to 630 pcu, and the average speed is between 15 km/h and 55 km/h. Cluster label 1 represents low density group, the range of density is from 0 pcu/km/four-lanes to 50 pcu/km/four-lanes, the range of four-lanes-traffic volume of 5 min is from 65 pcu to 270 pcu, the average speed is between 75 km/h and 95 km/h. Cluster label 2 represents the moderate density group, the range of density is from 50 pcu/km/four-lanes to 140 pcu/km/four-lanes, the range of 5 min traffic volume is from 270 pcu to 560 pcu, and the average speed is between 55 km/h and 95 km/h.  Figure 8 illustrates the traffic states of each group. Cluster label 0 represents the high density group with corresponding density ranging from 140 pcu/km/four-lanes to 700 pcu/km/four-lanes, the range of 5 min traffic volume is from 270 pcu to 630 pcu, and the average speed is between 15 km/h and 55 km/h. Cluster label 1 represents low density group, the range of density is from 0 pcu/km/four-lanes to 50 pcu/km/four-lanes, the range of four-lanes-traffic volume of 5 min is from 65 pcu to 270 pcu, the average speed is between 75 km/h and 95 km/h. Cluster label 2 represents the moderate density group, the range of density is from 50 pcu/km/four-lanes to 140 pcu/km/four-lanes, the range of 5 min traffic volume is from 270 pcu to 560 pcu, and the average speed is between 55 km/h and 95 km/h.  Figure 8 illustrates the traffic states of each group. Cluster label 0 represents the high density group with corresponding density ranging from 140 pcu/km/four-lanes to 700 pcu/km/four-lanes, the range of 5 min traffic volume is from 270 pcu to 630 pcu, and the average speed is between 15 km/h and 55 km/h. Cluster label 1 represents low density group, the range of density is from 0 pcu/km/four-lanes to 50 pcu/km/four-lanes, the range of four-lanes-traffic volume of 5 min is from 65 pcu to 270 pcu, the average speed is between 75 km/h and 95 km/h. Cluster label 2 represents the moderate density group, the range of density is from 50 pcu/km/four-lanes to 140 pcu/km/four-lanes, the range of 5 min traffic volume is from 270 pcu to 560 pcu, and the average speed is between 55 km/h and 95 km/h.   Figure 9 displays the speed-volume relationship of the field data and the simulated data generated by the cellular automata model. The blue dots represent the field data and the red dots represent the simulated data. Generally, the simulated data matches well with the field data. The Root Mean Squared Error (RMSE) between the field data and the simulated data for the traffic volume and the average speed are 6.9124, and 9.0095 respectively. The R 2 value of the traffic volume simulation and the average speed simulation are 0.9645, and 0.8944 respectively, which indicates the cellular automata model could simulate traffic conditions on the research highway effectively.  Figure 9 displays the speed-volume relationship of the field data and the simulated data generated by the cellular automata model. The blue dots represent the field data and the red dots represent the simulated data. Generally, the simulated data matches well with the field data. The Root Mean Squared Error (RMSE) between the field data and the simulated data for the traffic volume and the average speed are 6.9124, and 9.0095 respectively. The R 2 value of the traffic volume simulation and the average speed simulation are 0.9645, and 0.8944 respectively, which indicates the cellular automata model could simulate traffic conditions on the research highway effectively.

Application of the Cellular Automata Model
The cellular automata model is used to simulate the traffic states before and after the hard shoulder is used as an extra lane. We collected four variables including the traffic volume, average speed, variance of speed and travel time of emergency rescue vehicles during traffic accidents. We run the simulation model for 10 times to eliminate the random error. The traffic accident location is randomly selected in each run. Table 5 describes the change rate of the variables before and after the hard shoulder is opened. g = , − * 100 (9) where, , represents the variable obtained when the hard shoulder is closed, y refers to the variables obtained after the harder shoulder is opened during the simulation, g is the change rate in percentage.

Application of the Cellular Automata Model
The cellular automata model is used to simulate the traffic states before and after the hard shoulder is used as an extra lane. We collected four variables including the traffic volume, average speed, variance of speed and travel time of emergency rescue vehicles during traffic accidents. We run the simulation model for 10 times to eliminate the random error. The traffic accident location is randomly selected in each run. Table 5 describes the change rate of the variables before and after the hard shoulder is opened.
where, y represents the variable obtained when the hard shoulder is closed, y refers to the variables obtained after the harder shoulder is opened during the simulation, g is the change rate in percentage. As indicated in Table 5, when the density is low or moderate, the variations of the traffic states before and after the hard shoulder is used as running lane is not significant. When the density is high, opening the hard shoulder for running vehicles could improve the traffic volume by more than 20% under various speed limit, and the average speed could also be improved. However, opening the hard shoulder in the high-density condition also brings side effect of increased speed variance and travel time of emergency rescue vehicle.

Results of Decision Optimization Method
One key issue in this study is to determine the optimal traffic state and speed limit of opening hard shoulders. We use cellular automata model to simulate traffic states when the hard shoulder is opened for normal vehicle operation. Four variables including the traffic volume, the average speed, the speed variation and the travel time of emergency rescue vehicles to the accident location are collected to evaluate traffic conditions under different speed limits of the hard shoulder. We use the factor analysis method in combination of TOPSIS method to establish the evaluation standard. The factor analysis method is used to identify appropriate weights of the four variables and TOPSIS method is used to rank the different hard shoulder operation strategies. The hard shoulder had two states: opened and closed. When the state of hard shoulders changed, changes of the traffic volume and the average speed are used as benefit attributes, changes of the variance of speed and the travel time of emergency rescue vehicles during traffic accident are used as cost attributes.
KMO test and Bartlett sphere test are used to determine the correlation of data and whether it conforms to the requirement of factor analysis [34]. The value of KMO test is between 0 and 1. The higher the KMO value, the better effect of factor analysis. Generally, when the value of KMO test is greater than 0.5, it indicates that the factor analysis is suitable.
In this research, the value of KMO test is calculated as 0.78, which indicates the validity of using factor analysis. The weight of traffic volume, average speed, variance of speed and travel time of emergency rescue vehicles during traffic accident are computed as 0.26, 0.23, 0.25, 0.26 respectively. Four different speed limits including 60 km/h, 80 km/h, 100 km/h and 120 km/h are considered under three categories of traffic states including the high density, moderate density and low density. The TOPSIS method is used to rank the 12 alternatives. Table 6 displays the results of the calculation.
As Table 6 shows, the rank is higher when the cluster label is 0, which indicates opening hard shoulder as an extra lane can achieve better effect when the traffic density is high. When the traffic is congested, opening the hard shoulder could provide an extra lane for the excessive traffic volume. However, when the traffic is not congested, an extra lane would not bring significant improvement of traffic condition. In addition, Table 6 indicates that the speed limit for the hard shoulder lane could be set as 100 km/h to achieve better effect. If the speed limit is too high, the speed variations would increase, which may lead to potential traffic incidents. If the speed limit is too low on the hard shoulder lane, the drivers may be discouraged to use the extra lane. In our case, 100 km/h is found to be optimal. The reason maybe attribute to the speed limit of the adjacent lane is also 100 km/h. For other highways, if the speed limit of the adjacent lanes is set differently and the number of the lanes of the highway are different, the optimal speed limit for the hard shoulder lane may be different. However, similar methodology could be applied to identify the optimal speed limit and the optimal traffic condition for other highways.

Discussion and Conclusions
This paper proposed an optimization method for identifying the proper traffic condition and speed limit for hard shoulder running strategy. Our work contributes to the traffic management strategies in the following aspects: (1) For different traffic condition, we could implement different traffic management strategy to maximize traffic control efficiency. We used clustering method to group similar traffic states and tested four clustering methods including the K-Means, agglomerative, mean shift and spectral clustering method. The silhouette score, DBI and Caliski-Harabaz Score are used to compare the clustering effect and the results showed that the K-Means method with three traffic state categories is best. (2) We used the cellular automata model to examine the traffic condition when the hard shoulder is opened. The parameters of the cellular automata model include the probabilities of random deceleration, slow start and lane change are calibrated using real-world traffic data. In addition, we simulated traffic accident condition to evaluate the influence of hard shoulder running strategy on the safety aspect. (3) We proposed that four variables including the traffic volume, average speed, variance of speed and travel time of emergency rescue vehicles during traffic accident should be used as indicators to compare various hard shoulder running strategies. The traffic volume and the average speed could reflect the traffic efficiency, meanwhile the variance of speed and the travel time of emergency rescue vehicles during traffic accident could indicate the level of safety. We used factor analysis method to determine the weights of the four variables, and TOPSIS is used as the decision-optimization method to identify the optimal traffic condition and speed limit to open the hard shoulder as an extra driving lane.
The methodology framework to calculate the optimal traffic condition and speed limit as proposed in this paper could be applied to other highways where the width of the hard shoulder satisfy the traffic operation requirement (>3.25 m). In our research, the result suggested that the hard shoulder could be opened when the traffic density is high and the speed limit should be set as 100 km/h to achieve best performance. However, when applied to other highway segments with different geometric design and different speed limits, the result may be different since the traffic conditions may be different. In order to ensure traffic safety when applying hard shoulder running strategy, dynamic message signs and traffic surveillance devices such as loop detectors, video cameras, and radar detectors should be equipped along the roadway. The dynamic message signs are prerequisites since the drivers need to be informed whether the hard shoulder is opened or closed in a certain segment. The traffic surveillance devices are needed to collect traffic states information. For practice, traffic engineers could predict traffic states based on historical traffic statistics and make reasonable decisions in advance on when to open the hard shoulder. The traffic surveillance devices are also needed in case of accidents, the hard shoulder should be closed immediately to help reduce the travel time of emergency rescue vehicles.
For future research, we could extend our current research in the following directions: (1) We could further evaluate the safety issues that may be aroused by applying the hard shoulder running strategy. More simulations should be conducted to study traffic conditions under accidents of various severity levels, which may block more than one lane and larger road surface. (2) We could consider dividing the highway into different segments by dynamic message signs and apply more flexible control strategies. The spacing of the dynamic message signs could be optimized, and the speed limits for each segment under various traffic condition could be controlled using dynamic optimization methods.

Data Availability Statement:
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.