Applying Machine Learning to Develop Lane Control Principles for Mixed Trafﬁc

: The mixed trafﬁc environment often has high accident rates. Therefore, many motorcycle-related trafﬁc improvements or control methods are employed in countries with mixed trafﬁc, including slow-trafﬁc lanes, motorcycle two-stage left turn areas, and motorcycle waiting zones. In Taiwan, motorcycles can ride in only the two outermost lanes, including the curb lane and a mixed trafﬁc lane. This study analyzed the new motorcycle-riding space control policy on 27 major arterial roads containing 248 road segments in Taipei by analyzing before-and-after accident data from the years 2012–2018. In this study, the equivalent-property-damage-only (EPDO) method was used to evaluate the severity of crashes before and after the cancelation of the third lane prohibition of motorcycles (TLPM) policy. After EPDO analysis, the random forest analysis method was used to screen the crucial factors in accidents for speciﬁc road segments. Finally, a classiﬁcation and regression tree (CART) was created to predict the accident improvement effects of the road segments with discontinued TLPM in different situations. Furthermore, to provide practical applications, this study integrated the CART results and the needs of trafﬁc authorities to determine four rules for canceling TLPM. In the future, on the accident-prone road segment with TLPM, the inspection of the four rules can provide the authority to decide whether to cancel TLPM to improve the accident or not.


Introduction
In countries where mixed traffic is predominant, such as south Asian countries, the problem of motorcycle crashes is often severe [1]. In the past, many studies have investigated the behavioral differences between motorcycles and cars in mixed traffic, including arrival patterns, headway characteristics, safe distances, acceleration, speed distributions, speed differences between passing and passed vehicles, lateral movement, longitudinal movement, and riding side-by-side maneuvers [2][3][4][5]. Because of the differences between motorcycles and cars, the road traffic safety in mixed traffic is seriously affected [6]. To reduce the number of motorcycle accidents, some countries use motorcycle waiting zones, two-stage left-turn waiting zones, slow traffic lanes, and exclusive motorcycle lanes to create space divergence and decrease weaving among motorcycles and cars [7][8][9][10]. In other countries with low-density motorcycle traffic, roundabouts are readily and often used in the transport network to calm traffic [11,12]. They have been shown to improve the safety of various road users [13,14].
In Taiwan, divergent measures for mixed traffic were first implemented in 1984. The traffic authority first planned exclusive motorcycle lanes, and it began to promote motorcycle two-stage left-turn waiting zones at intersections to reduce the conflicts between left-turn motorcycles and opposite traffic flow in 1985. In 1998, the traffic authority began to set up motorcycle waiting zones at intersections so that motorcycles with a negative starting delay could cross the intersection first by green time starting [15].
Because of the design of motorcycle-related facilities and the motorcycle's traffic characteristics, motorcycles tend to travel on the right side of the cross-section, whereas cars tend to travel in the middle or on the left side of the cross-section [16]. Therefore, in 2001, the traffic authority restricted motorcycles to the outermost two lanes of roads with more than two lanes to provide an exclusive space for cars. By that time, Taiwan had completed the main design of divergent traffic measures, such as two-stage left-turn motorcycle waiting zones and motorcycle waiting zones at intersections so that left-turning motorcycles and straight motorcycles could have exclusive spaces. In addition, the curb lane (also known as the first lane or slow traffic lane), the mixed traffic lane (also known as the second lane), and the motorcycle-prohibited lane (also known as the third lane or express lane) provided different spaces for different vehicle types and different speed limits.
That design of motorcycle-related facilities in Taiwan was maintained until 2009, when the numbers of motorcycles and cars had increased to 14,604,330 and 5,559,247, respectively [17]. By 2017, Taiwan had 584 motorcycles and 287 cars per 1000 people. The average annual mileage of motorcycles in Taiwan was 4855 km [18], whereas that of cars was 21,593 km [19]. With this increase in the number of vehicles on the road, traffic conflicts were becoming increasingly severe [20]. National statistics revealed that in 2013 and 2017, the annual averages of fatal and injury accidents in Taiwan were 1713 and 398,995, respectively [21]. According to the accident database, about 40% of the motorcycle accidents occurred on road segments. Since motorcycles can only travel in the outer two lanes when the motorcycle traffic volume increases significantly, it is easy for motorcycles to conflict with temporarily stopped cars, buses, and taxis, which affects the safety of the road segment. Therefore, the traffic authority started to review the necessity of third lane prohibition of motorcycles (TLPM) on some road segments and planned to cancel TLPM on different road segments year by year.
In recent years, the Taipei City traffic authority has tried to reduce the number of motorcycle accidents. From 2013 to 2017, the authority canceled TLPM on many road segments in Taipei. On some road segments where TLPM was canceled, accident rates fell, but on others, they did not. Therefore, it is difficult for the traffic authority to evaluate which road segments are suitable for the cancellation of the TLPM policy. Notably, the decision process that the traffic authority follows in canceling TLPM consists mainly of expert discussion, and there are no clear or consistent principles for deciding whether or not to cancel TLPM. Therefore, this study aimed to apply machine learning to explore the critical risk factors of roadway crashes by examining the roadway traffic characteristics. Furthermore, by examining the before-and-after crash data of road segments where TLPM was canceled, we attempted to develop evaluation principles for TLPM cancellation for application by the traffic authority.

Data
In this study, the Taipei City Traffic Engineering Office provided a total of 27 major roadways on which TLPM was canceled between 2013 and 2017, and the signalized intersections were used as cut-off points. In the sample of road segments, we further considered the different driving directions. Thus, a total of 248 road segments were sampled, as shown in Table 1. The crash data (including fatal accidents, injury accidents, and property-damage-only accidents) were collected from 2012 to 2018. The data of roadway crashes were collected from the stop line (the A-A section) to the start of the lane (the B-B section) of the roadway (Figure 1).  In this study, the annual average equivalent property damage only (EPDO) per km was calculated from the before-and-after crash data as the crash risk in each road segment (see Equation (1)) [22], and the differences in the before-and-after annual average EPDO values per km were used as the basis for evaluating the effectiveness of road safety improvement measures after the elimination of TLPM.
where: N f : the no. of fatal accidents; N i : the no. of injury accidents; and N p : the no. of property-damage-only accidents. For the road segment accident factors, the geometric characteristics of the road, onstreet parking control, traffic volume, roadside disturbance, and land use related factors were considered in this study. The dependent variables and independent variables of this study are listed in Table 2.

Methods and Data Processing
The random forest is a bagging method in which a random sample is removed from the training data and then returned after the removal. A random forest consists of multiple decision trees, and the final prediction output is the average result of multiple decision trees. The decision tree is trained to find the cut-off variable and cut-off point that minimize impurity (see Equation (2)).
where, G x i , v ij : the impurity of the node; x i : the splitting variable; v ij : the split-value for the splitting variable; N s : the number of training samples before splitting; n le f t /n right : the number of training samples of the left/right child nodes after splitting; X le f t /X right : the training sample set of left/right child nodes; H(X): the function for measuring the impurity of a node. If a feature's importance is high in a random forest, it means that the feature has more influence on the prediction result. For a decision tree, the importance of a node k can be calculated as n k (see Equation (3)).
where, w k , w le f t , w right : the ratio of the number of training samples to the total number of training samples in node k and its left/right child nodes in its branch; G k , G le f t , G right : the impurity of node k and its left/right child nodes.
After obtaining the importance of each node, the feature importance f i is calculated (see Equation (4)): A classification and regression tree (CART) is applied to solve classification problems. The concept is to build a branching tree from the root node in an iterative manner using recursive binary splitting. When the homogeneity of the tree nodes reaches a certain standard, the existing training samples are divided into several classes. The decision tree is first built with the original data as the root, and the algorithm selects the threshold that can best distinguish the target from the variables according to the goal. Then, by exploring the lower heterogeneity, it finds the purer components in the child node generated by the split point. After that, the decision tree repeatedly performs threshold screening and downward differentiation until the stopping condition is satisfied. Finally, when all nodes in the lowermost level are leaf nodes, the decision tree is constructed.
In this study, 80% of the 248 road segment samples were used as training samples to establish the CART, and the remaining 20% were used as test samples. Stratified sampling was adopted. This study performed random forest analysis with 15 independent variables. Random forest ranks the importance of variables, and we used the crucial factors to build the CART model. The height of the CART was set to 5 to avoid overly complicated judgment logic and thereby meet the requirements of traffic authority applications. Furthermore, the minimum number of samples in leaf nodes was limited to more than 5% of the training samples.

Random Forest
In the random forest, 1000 decision trees were created, and the out-of-bag (OOB) error and test accuracy were 0.44 and 0.56, respectively, as shown in Figure 2. The importance of each factor is shown in Figure 3. To construct the CART, this study selected important factors, including the number of bus departures at peak hours, the red curb ratio, the motorcycle traffic volume per lane, the car traffic volume per lane, the pavement width, and the outer lane width.

CART
In this study, a CART was created according to important accident factors, as shown in Figure 4. The corresponding confusion matrix is shown in Table 3. The accuracy of the CART was 0.76.

Discussion
As shown in Figure 3, the random forest selected important factors: the number of bus departures at peak hours, the red curb ratio, the motorcycle traffic volume per lane, the car traffic volume per lane, the pavement width, and the outer lane width. When the traffic volume is greater, the traffic conflict becomes more serious [23]. Therefore, the number of bus departures at peak hours, the car traffic volume per lane, and the motorcycle traffic volume per lane will have an impact on accident incidence [24,25], and these factors will also affect the speed and driving space of the motorcycles at the same time [26,27]. The combination ratio of different traffic volumes will also change the behavior of vehicles in overtaking and lateral displacement, which will change the degree of traffic conflict. In addition, the red curb ratio affects the range of the space where motorcycles can ride in the outer lane without interference. When the red curb ratio is low, the probability of motorcycle-involved door crashes with temporarily stopped vehicles on the street may also increase [28].
In Taiwan, the curb lane is often designed as a slow traffic lane [29]. For motorcycles, a proper curb lane width can provide more suitable overtaking and driving spaces [30]. Moreover, the curb lane width also affects motorcycle overtaking speeds and lateral positions. Therefore, it also affects road safety. In addition, when the pavement width is different, the number of lanes or the curb lane width might also be different. These differences may further affect the cross-section distribution of vehicles and in turn affect the number of accidents on the road segment [31].
In the results of the random forest analysis, the test accuracy was only 0.56. However, the selected important factors could be inferred from the relevance of the road segment accident. Therefore, this study built the CART on the important crash factors selected by the random forest analysis. CART analysis provides dual advantages in theory and application. Compared with regression models, a CART does not need to consider the collinearity problem between accident factors. A CART is the result of a series of crash factors and is structured as a sequence of "if-then" questions for traffic characteristics. The results of canceling TLPM are presented in different situations, as shown in Figure 4. In addition, a CART can automatically search for the best split point, eliminating the need for experimentation or determinations based on other research results [32]. Thus, it significantly improves the convenience of accident analysis. Figure 4 shows that under four scenarios, canceling TLPM can improve road safety. According to Terminal Node 10, when bus departures at peak hours exceed 170 vehicles, canceling TLPM can reduce the number of accidents. Since the curb lane is mainly used by motorcycles, a temporarily stopped bus may block the traffic flow and cause a rear-end collision [33]. The temporary stopping behavior of the bus will have impacts on the cars and motorcycles in different lanes. When a bus pauses at a bus stop, cars in the curb and mixed traffic lanes will switch to the inner lane (i.e., the third lane). When a vehicle is forced to change lanes, the vehicle will first decelerate and then switch lanes. Therefore, a stopping bus affects the traffic flow and speed. Although vehicles driving in the third lane will not be directly affected by the bus, they may still be indirectly influenced by cars and motorcycles that change lanes, resulting in decreased average speed. Therefore, when the number of bus departures is high, cancelling TLPM can allow motorcycles to travel in the third lane with relatively stable traffic flow, which helps to improve motorcycle safety.
Terminal Node 4 in Figure 4 shows that when the number of bus departures at peak hours is less than 117, the motorcycle traffic volume per lane is not low (between 400 and 1280), the outer lane width is limited (less than 4.3 m), and more on-street parking road segments are available (the red line ratio is less than 0.8). In these conditions, canceling TLPM can improve traffic safety. The space limitation caused by roadside parking interference may not leave sufficient space for the motorcycle to ride safely in the outer lane. Without the TLPM policy, some motorcycles can ride in the third lane. Because of the reduced proportion of motorcycles in the outer lane, conflicts between motorcycles and interfering vehicles at the roadside also decrease.
However, the CART cannot provide confidence intervals for accident factors, and it is difficult to deal with the interactions between accident factors. Therefore, in Terminal Node 2 or Terminal Node 7 in Figure 4, the corresponding scenarios involve the motorcycle traffic volume per lane, the car traffic volume per lane, the outer lane width, the pavement width, and the red curb ratio. However, it is difficult to explain the relationship between the improvement and these accident factors. In addition, CART does not easily allow elasticity analysis or sensitivity analysis [34]. When the traffic authority needs to evaluate the measures required to cancel TLPM, such as widening or reducing the outer lanes, red curb ratio adjustments, and other traffic engineering measures, it is not easy to compare the impacts of different measures on the accident rate [35].
To formulate the lane control principles, this study considered the characteristics of Taiwan's traffic, the principles of the traffic authority's traffic survey, and the road crosssection design to determine the lane control rules. When the road segment involves the factors of car volume per lane and bus schedules, the threshold will be ten vehicles/h as the minimum unit. Moreover, the motorcycle traffic volume in each lane is 100 vehicles/h as the minimum unit; in terms of the width of the outer lane and the width of the pavement, the minimum unit is 1 m. When the current conditions of the road segment meet one of the rules, TLPM should be canceled to improve traffic safety. The four rules are as follows: Rule 1. The number of bus departures at peak hours is greater than 170 buses/h. Rule 2. The number of bus departures at peak hours is less than or equal to 120 buses/h, the motorcycle traffic volume per lane is less than or equal to 1300 vehicles/h, the outer lane width is greater than 5 m, the pavement width is greater than 12 m, and the car traffic volume per lane is less than or equal to 270 vehicles/h. Rule 3. The number of bus departures at peak hours is less than or equal to 120 bus/h, the motorcycle traffic volume per lane is less than 1300 vehicles/h, and the outer lane width is between 4 m and 5 m. Rule 4. The number of bus departures at peak hours is less than or equal to 120 buses/h, the motorcycle traffic volume per lane is between 400 and 1300 vehicles/h, the outer lane width is less than or equal to 4 m, and the red curb ratio is less than or equal to 0.8.
The rules for canceling TLPM stated above are only for evaluating a single road segment. For multiple road segments of the same arterial road, evaluation with these rules may indicate that TLPM should be canceled on some road segments but not on others. To maintain the lane consistency of an arterial road, this study recommends that the results for the majority of the road segments evaluated with these rules be used as the lane control principle for continuous road segments.
This paper is subject to limitations. This paper obtained the accident data from crash scene diagrams. The crash scene diagrams of the police department are kept for only five years before being destroyed. Therefore, the samples for some road segments before and after the incident data period were not consistent. Although this study used the annual average EPDO for analysis, some errors in the analysis may have occurred.

Conclusions
In Taiwan, traffic control methods such as slow traffic lanes, motorcycle waiting areas, two-stage left-turn waiting areas, and the TLPM policy have reduced the incidence of road accidents. However, Taiwan's traffic characteristics have changed over time. As a result, the TLPM policy has caused many traffic conflicts among vehicles on some roads. Although the traffic authority intends to address the traffic safety problems caused by TLPM, it is limited by the geometric characteristics, traffic characteristics, traffic control, and land use. In addition, there has been a lack of clear evaluation criteria as a reference for lane control decision-making.
This study used the road segments on which TLPM was canceled in the years 2013-2017 to analyze the changes in the annual average EPDO of the road segments, and it also assessed whether canceling TLPM yielded benefits. In addition, this study also analyzed the factors related to road accidents by random forest analysis. We found that the important road segment accident factors are the number of bus departures at peak hours, the red curb ratio, the motorcycle traffic volume per lane, the car traffic volume per lane, the pavement width, and the outer lane width. Although the OOB error of the random forest was 0.44, most of the selected factors could be inferred to be related to road accidents. Therefore, this research further used these crucial factors to build a CART to formulate the evaluation criteria for lane control.
The accuracy of the CART built in this study was 0.76, and canceling the TLPM policy produced four safety improvement scenarios and six safety deterioration scenarios. The CART results showed that when the number of bus departures at peak hours is very high, it is more suitable to cancel TLPM. However, if the number of bus departures at peak hours is not sufficiently high, we still need to consider the motorcycle traffic volume per lane, the car traffic volume per lane, the outer lane width, the pavement width, and the red curb ratio in evaluating whether to cancel TLPM. In addition, this research based the formulation of four rules for cancellation of TLPM on four improvement scenarios, the practical investigations of the traffic authority, and the current road design principles.
In addition, it was found from the investigation of the current situations of the road segments that the cancellation of TLPM led to improvements. The curb lanes of these road segments are often occupied by buses stopped at bus stops or temporarily parked cars, and it is difficult for motorcycles to drive in the curb lane without encountering interference. Although it was originally planned that motorcycles could travel in the outer two lanes, this interference reduced the actual lane space in which the motorcycles could travel to only the second lane. Therefore, after the cancellation of TLPM, the motorcycles could use the second and third lanes as the main driving space, significantly reducing lane changes and weaving. Therefore, road segments with high bus departure rates are suitable for cancellation of TLPM.
In multiple road segments of the same arterial road, the control suggestions for the continuous road segments may vary. To maintain the consistency of the lane control, we recommend that the results for the majority of road segments evaluated with the rules be used as the lane control method for continuous road segments.
The before-and-after incident data showed that the overall trend of traffic safety significantly deteriorated after the cancellation of TLPM. A possible reason is that motorcycles tend to weave from the outer lane to the inner lane (i.e., the third lane) because of the cancellation of TLPM. This study suggests that if the traffic authority needs to reduce the accident rate or improve the motorcycle traffic environment in the future, the authority should still implement TLPM and reduce the motorcycle accident rate by other means. In this way, the authority can prevent increases in the accident rate due to the cancellation of TLPM.
In this study, we only applied one-time validation for analysis. In future studies with more samples for analysis, k-fold cross-validation could be utilized to evaluate the statistical performance of the proposed method. Moreover, it will be possible to explore further whether TLPM should be restored on the road segments where the annual average EPDO deteriorated and whether it is possible to reduce the incidence of collisions through enforcement targeting illegal on-street parking. In addition, for major roads on which TLPM was not canceled that do not meet the proposed rules for canceling TLPM, the feasibility of other crash reduction measures should be further explored.