Preceding Vehicle Detection Using Faster R-CNN Based on Speed Classiﬁcation Random Anchor and Q-Square Penalty Coe ﬃ cient

: At present, preceding vehicle detection remains a challenging problem for autonomous vehicle technologies. In recent years, deep learning has been shown to be successful for vehicle detection, such as the faster region with a convolutional neural network (Faster R-CNN). However, when the host vehicle speed increases or there is an occlusion in front, the performance of the Faster R-CNN algorithm usually degrades. To obtain better performance on preceding vehicle detection when the speed of the host vehicle changes, a speed classiﬁcation random anchor (SCRA) method is proposed. The reasons for degraded detection accuracy when the host vehicle speed increases are analyzed, and the factor of vehicle speed is introduced to redesign the anchors. Redesigned anchors can adapt to changes of the preceding vehicle size rule when the host vehicle speed increases. Furthermore, to achieve better performance on occluded vehicles, a Q-square penalty coe ﬃ cient (Q-SPC) method is proposed to optimize the Faster R-CNN algorithm. The experimental validation results show that compared with the Faster R-CNN algorithm, the SCRA and Q-SPC methods have certain signiﬁcance for improving preceding vehicle detection accuracy.


Introduction
The development of an autonomous vehicle system that enhances safety and traffic efficiency is in progress continuously; this system requires a road environment detection facility in order to control the host vehicle [1,2]. The detection of preceding vehicles plays a decisive role in realizing rational planning of the driving path, maintaining the correct distance, and ensuring driving safety for autonomous vehicles [3,4]. Recently, preceding vehicle detection has become a research hotspot due to its necessity for autonomous vehicles, and many detection algorithms have been proposed [5][6][7][8].
At present, methods of preceding vehicle detection are mainly divided into the traditional machine learning method and the deep learning method. The traditional machine learning method extracts vehicle features by feature extraction operators such as histogram of oriented gradient (HOG) [9], Haar-like features [10], etc., and inputs the features into a classifier such as support vector machine (SVM) [11], AdaBoost [12], etc. [13][14][15][16]. However, these methods design features manually, the design process is subjective, there is a lack of theoretical guidance, and the generalization ability is poor [17,18]. With the development of deep learning, convolutional neural networks (CNNs) are widely used 1.
The factor of speed is introduced: Through a real vehicle experiment, the vehicle speed is divided into three stages: 0-20, 20-60, and 60-120 km/h. The sizes of preceding vehicles at different speed stages are collected, and the k-means clustering algorithm [34] is used to analyze the rule of preceding vehicle sizes at different speed stages.

Structure of Faster R-CNN
The dominant paradigm in modern object detection is based on a two-stage approach: The first stage generates a sparse set of candidate proposals that should contain all objects while filtering out the majority of negative locations, and the second stage classifies the proposals into foreground/background classes and predicts proposal locations [35]. Faster R-CNN is a classical two-stage algorithm due to its efficient proposal extraction design. As shown in Figure 1, the overall structure of the Faster R-CNN algorithm is mainly divided into four parts: (1) Image feature extraction, corresponding to using VGG16 network model [36], a stack of basic convolutional (conv; 13 layers) + rectified linear unit (ReLU) activation function (13 layers) + pooling (4 layers) layers [37] is used to extract the input image features, and feature maps are used in the subsequent RPN and region of interest (RoI) pooling; (2) RPN: According to the feature maps input from the previous layer, RPN determines the object candidate proposals, which are the rough position of the object, including a rectangular box for proposals by the bounding box regression layer (reg layer), and the probability of object existence through the Soft-max function layer (cls layer) [38]; (3) RoI pooling: Proposal information and feature maps are collected, and proposals and feature maps are comprehensively extracted to get proposal features; (4) Classification and bounding box regression: According to the proposal features, the reg layer and cls layer are used to determine the object location and categories.

Working Principle of RPN
RPN is a fully convolutional neural network. The RPN workflow in the test phase is shown in Figure 2. The previous layer of the input of this network is feature maps, and the output is rectangle proposals. First, a sliding window of n × n (in this paper, n = 3) is performed on the feature map of the shared convolution network, new 512-dimensional feature maps are generated, and k regions are predicted on the original input image at the same time; these regions are anchors. Then the 512-dimensional feature maps are mapped to a low-latitude vector by 1 × 1 convolution operation. This vector is used in two layers, the reg layer and the cls layer.
For the k anchors, the anchor point is located in the centroid of the sliding window, anchors are bounding boxes that have three different sizes (128 × 128, 256 × 256, and 512 × 512) and three different aspect ratios (0.5, 1, and 2), thus k = 9, and the width and height of anchors are determined by Equation (1) W i, j = 2 s j × round( (x × y)/r i ) where W, H represent the width and height of an anchor; r i = (r 1 , r 2 , r 3 ) = (0.5, 1, 2), and r i indicates the aspect ratio; s j = (s 1 , s 2 , s 3 ) = (3,4,5), and 2 s j indicates the expansion factor of size; round stands for rounding; and x, y indicate the width and height of pixels of a feature point mapped back to the original input image, and x = y = 16.

Working Principle of RPN
RPN is a fully convolutional neural network. The RPN workflow in the test phase is shown in Figure 2. The previous layer of the input of this network is feature maps, and the output is rectangle proposals. First, a sliding window of n × n (in this paper, n = 3) is performed on the feature map of the shared convolution network, new 512-dimensional feature maps are generated, and k regions are predicted on the original input image at the same time; these regions are anchors. Then the 512-dimensional feature maps are mapped to a low-latitude vector by 1 × 1 convolution operation. This vector is used in two layers, the reg layer and the cls layer.
For the k anchors, the anchor point is located in the centroid of the sliding window, anchors are bounding boxes that have three different sizes (128 × 128, 256 × 256, and 512 × 512) and three different aspect ratios (0.5, 1, and 2), thus k = 9, and the width and height of anchors are determined by Equation (1) , where , W H represent the width and height of an anchor;  In the Faster R-CNN algorithm, any picture of P Q × pixels will be adjusted to 800 600 × before being input to the CNN. Through the VGG16 model, the feature map of 50 38 × is obtained; one feature point corresponds to k anchors, and thus, there is a total of 50 38 k × × anchors on the input image. According to the cls layer and reg layer, each anchor obtains two scores on whether it contains an object (positive and negative) and four coordinates (horizontal and vertical coordinates of centroid, width, and height), and according to these parameters, all anchors are postprocessed, and finally, about 300 proposals are obtained. The postprocess steps are as follows:  In the Faster R-CNN algorithm, any picture of P × Q pixels will be adjusted to 800 × 600 before being input to the CNN. Through the VGG16 model, the feature map of 50 × 38 is obtained; one feature point corresponds to k anchors, and thus, there is a total of 50 × 38 × k anchors on the input image. According to the cls layer and reg layer, each anchor obtains two scores on whether it contains an object (positive and negative) and four coordinates (horizontal and vertical coordinates of centroid, width, and height), and according to these parameters, all anchors are postprocessed, and finally, about 300 proposals are obtained. The postprocess steps are as follows: 1.
According to coordinates, the position of each anchor is adjusted, the top 6000 position-corrected positive anchors are extracted based on confidence scores, the positive anchors whose range exceeds the image are removed, and smaller anchors (width or height less than the threshold) are excluded.

2.
Using NMS to handle the selected anchors, and based on the confidence scores, the top 300 anchors are selected as the proposals.

NMS and Soft-NMS
Faster R-CNN generates detection bounding boxes and scores for specific categories of objects. However, adjacent windows tend to have associated scores, which increases the false positives of the test results. To avoid such problems, the NMS algorithm is usually used to postprocess the detection bounding boxes.
The working process of the NMS is as follows: First, a series of boxes B i and corresponding confidence scores S i (i = 1, 2 · · · j · ··) are generated. The highest confidence score S j and corresponding box B j are selected to determine the intersection over union (IoU) of the box B i (i j) and B j ; IoU can be obtained by Equation (2). Then, the confidence score of B i (i j) is updated in accordance with Equation (3). If the confidence score of B i is 0, remove the box, then select a box other than S j and repeat the above operation until all the results are obtained: Although NMS effectively reduces the false positives of the test results, this method is too crude; the detection bounding boxes adjacent to the highest confidence score box are forcibly cleared. If a real object appears in the overlapping area and has a large overlap, it will fail to detect and reduce the mAP of the algorithm. To solve this problem, Bodla [32] proposed the Soft-NMS algorithm, smoothed Equation (3), and proposed the penalty coefficients λ of linear and Gaussian weighting, which are shown in Equations (4) and (5), respectively; the confidence score formula is shown in Equation (6). By lowering the confidence score instead of directly deleting the box and removing the box whose confidence score is less than the set threshold, this method effectively improves the detection ability of Faster R-CNN for occluded objects: where Threshold = 0.3 and δ = 0.3.

Problem of Preceding Vehicle Size and Host Vehicle Speed
In the course of vehicle running, there are generally multiple scales of preceding vehicles. To guarantee safe driving, the host vehicle should keep a safe distance from the front vehicle. Taking the widely used Mazda safe distance model [39] as an example, if the speed of the host vehicle increases, the safety distance should also increase. According to the working principle of the camera, vehicle size is negatively correlated with distance; vehicle size is small when distance is large [40]. This also means that with the increasing speed, the occurrence frequency of small vehicles also increases. To verify this point, a real vehicle experiment was carried out. We divided vehicle speed into three stages and defined three vehicle sizes, then we collected the occurrence frequency of vehicles of different sizes. The details of real vehicle experiments are shown in the next section. In this section, we just show the frequency ratio chart to verify the above viewpoint.
As shown in Figure 3, at different speed stages, the occurrence frequency ratio of vehicles of different sizes is different; thus, the Faster R-CNN algorithm should take speed into account when detecting preceding vehicles.
In the course of vehicle running, there are generally multiple scales of preceding vehicles. To guarantee safe driving, the host vehicle should keep a safe distance from the front vehicle. Taking the widely used Mazda safe distance model [39] as an example, if the speed of the host vehicle increases, the safety distance should also increase. According to the working principle of the camera, vehicle size is negatively correlated with distance; vehicle size is small when distance is large [40]. This also means that with the increasing speed, the occurrence frequency of small vehicles also increases. To verify this point, a real vehicle experiment was carried out. We divided vehicle speed into three stages and defined three vehicle sizes, then we collected the occurrence frequency of vehicles of different sizes. The details of real vehicle experiments are shown in the next section. In this section, we just show the frequency ratio chart to verify the above viewpoint.
As shown in Figure 3, at different speed stages, the occurrence frequency ratio of vehicles of different sizes is different; thus, the Faster R-CNN algorithm should take speed into account when detecting preceding vehicles. Moreover, Faster R-CNN has two stages: In the first stage, RPN generates anchors with three sizes (128 × 128, 256 × 256, and 512 × 512) and three aspect ratios (1:1, 1:2, and 2:1), which creates a total of k = 9 anchors at each pixel of the feature map. The shapes of the anchors are invariant, and the anchor sizes do not match small vehicle sizes; they are too large to fit the small vehicles. Useful clues may be drowned in too many unnecessary clues; thus, the sizes of anchors should be adjusted to fit vehicle sizes.
The SCRA method is proposed to solve these problems.

Problem of Vehicle Occlusion
Although Soft-NMS improves the detection ability of the Faster R-CNN algorithm for occluded vehicles and has a good effect on vehicle detection, the optimization of Soft-NMS only introduces the penalty coefficient of linear and Gaussian weighting, without considering the impact of continuing to optimize the penalty coefficients. In this paper, the results of occluded vehicle detection were obtained, the method of Q-times multiplication of penalty coefficients was adopted to optimize the penalty coefficients, and the Q-SPC method is proposed to optimize the Soft-NMS.

Design of Optimized Faster R-CNN Algorithm
The working principles of the SCRA and Q-SPC methods are shown in Figure 4. The SCRA method was used to optimize anchors, and the Q-SPC method was used to optimize Soft-NMS. Moreover, Faster R-CNN has two stages: In the first stage, RPN generates anchors with three sizes (128 × 128, 256 × 256, and 512 × 512) and three aspect ratios (1:1, 1:2, and 2:1), which creates a total of k = 9 anchors at each pixel of the feature map. The shapes of the anchors are invariant, and the anchor sizes do not match small vehicle sizes; they are too large to fit the small vehicles. Useful clues may be drowned in too many unnecessary clues; thus, the sizes of anchors should be adjusted to fit vehicle sizes.
The SCRA method is proposed to solve these problems.

Problem of Vehicle Occlusion
Although Soft-NMS improves the detection ability of the Faster R-CNN algorithm for occluded vehicles and has a good effect on vehicle detection, the optimization of Soft-NMS only introduces the penalty coefficient of linear and Gaussian weighting, without considering the impact of continuing to optimize the penalty coefficients. In this paper, the results of occluded vehicle detection were obtained, the method of Q-times multiplication of penalty coefficients was adopted to optimize the penalty coefficients, and the Q-SPC method is proposed to optimize the Soft-NMS.

Design of Optimized Faster R-CNN Algorithm
The working principles of the SCRA and Q-SPC methods are shown in Figure 4. The SCRA method was used to optimize anchors, and the Q-SPC method was used to optimize Soft-NMS.   As shown in Figure 4a, taking speed into account, we proposed the SCRA method to make anchor sizes fit vehicle sizes. The design steps of the SCRA method are as follows: Step 1. The speed of the host vehicle is introduced, and according to the experiment, speed is divided into three stages: 0-20, 20-60, and 60-120 km/h. In order to make anchor sizes fit vehicle sizes, the rule of vehicle sizes should be analyzed at different speed stages. Thus, by collecting images and labels, the width (W) and height (H) of each vehicle are acquired.
Step 2. To analyze the rule of vehicle sizes at different speed stages, we employed the k-means clustering algorithm to deal with the width and height.
Step 3. Based on the results of clustering and postprocessing cluster centroids, maximum and minimum vehicle size, weighted mean value of cluster centroids, and extracted vehicle aspect ratios, anchors will be redesigned.
Step 4: Based on data such as cluster centroids and aspect ratios, too many anchors are generated; select anchors randomly to make the quantity reasonable.
Step 5: After finishing the above steps, the original anchors are updated with the redesigned anchors.
For our Q-SPC method, the design steps are as follows: Step 1. The Faster R-CNN algorithm is used with NMS to deal with bounding boxes, because we want to optimize the Soft-NMS; thus, the NMS is replaced by Soft-NMS. Then the Faster R-CNN is used on the vehicle occlusion dataset, and the detection results are generated.
Step 2. Analyzing these results, some requirements are proposed. To satisfy requirements, the Q-times multiplication of penalty coefficients is adopted.
Step 3. Use optimized Soft-NMS to update Soft-NMS. In the following sections, we introduce our SCRA and Q-SPC methods in detail.

The Rule of Preceding Vehicle Size at Different Speed Stages
In this section, in order to fit anchor sizes to vehicle sizes at different speed stages, we needed to divide the speed into stages and to explore the rule of vehicles size at different speed stages. As shown in Figure 4a, taking speed into account, we proposed the SCRA method to make anchor sizes fit vehicle sizes. The design steps of the SCRA method are as follows: Step 1. The speed of the host vehicle is introduced, and according to the experiment, speed is divided into three stages: 0-20, 20-60, and 60-120 km/h. In order to make anchor sizes fit vehicle sizes, the rule of vehicle sizes should be analyzed at different speed stages. Thus, by collecting images and labels, the width (W) and height (H) of each vehicle are acquired.
Step 2. To analyze the rule of vehicle sizes at different speed stages, we employed the k-means clustering algorithm to deal with the width and height.
Step 3. Based on the results of clustering and postprocessing cluster centroids, maximum and minimum vehicle size, weighted mean value of cluster centroids, and extracted vehicle aspect ratios, anchors will be redesigned.
Step 4: Based on data such as cluster centroids and aspect ratios, too many anchors are generated; select anchors randomly to make the quantity reasonable.
Step 5: After finishing the above steps, the original anchors are updated with the redesigned anchors.
For our Q-SPC method, the design steps are as follows: Step 1. The Faster R-CNN algorithm is used with NMS to deal with bounding boxes, because we want to optimize the Soft-NMS; thus, the NMS is replaced by Soft-NMS. Then the Faster R-CNN is used on the vehicle occlusion dataset, and the detection results are generated.
Step 2. Analyzing these results, some requirements are proposed. To satisfy requirements, the Q-times multiplication of penalty coefficients is adopted.
Step 3. Use optimized Soft-NMS to update Soft-NMS. In the following sections, we introduce our SCRA and Q-SPC methods in detail.

The Rule of Preceding Vehicle Size at Different Speed Stages
In this section, in order to fit anchor sizes to vehicle sizes at different speed stages, we needed to divide the speed into stages and to explore the rule of vehicles size at different speed stages. Thus, a real vehicle experiment was carried out to search reasonable speed stages and collect the images. The parameters of the actual vehicle experiment are shown in Table 1. Here, the frame rate of camera is 104 fps, the resolution is 640 × 480, and the camera is mounted behind the front windshield.
In our experiment, based on the status of the experiment vehicle and the speed limit conditions on urban roads and motorways, we chose four speed stages: 0-20, 20-60, 60-90, and 90-120 km/h ( Table 2). When the vehicle exceeded 60 km/h, due to the high speed and large distance, the preceding vehicle size was small; moreover, when the vehicle exceeded 90 km/h, fewer vehicles were in front of the host vehicle, and the vehicle size was too small; in the process of labeling, vehicles had a similar size, and the width and height of each vehicle were almost the same (as shown in Figure 5). The rule of vehicles size may be unconvincing; thus, 90-120 km/h is not a suitable speed to analyze the rule of vehicle size. We grouped 60-90 and 90-120 km/h together, and divided the speed into 0-20, 20-60, and 60-120 km/h. Thus, a real vehicle experiment was carried out to search reasonable speed stages and collect the images. The parameters of the actual vehicle experiment are shown in Table 1. Here, the frame rate of camera is 104 fps, the resolution is 640 × 480, and the camera is mounted behind the front windshield.
In our experiment, based on the status of the experiment vehicle and the speed limit conditions on urban roads and motorways, we chose four speed stages: 0-20, 20-60, 60-90, and 90-120 km/h ( Table 2). When the vehicle exceeded 60 km/h, due to the high speed and large distance, the preceding vehicle size was small; moreover, when the vehicle exceeded 90 km/h, fewer vehicles were in front of the host vehicle, and the vehicle size was too small; in the process of labeling, vehicles had a similar size, and the width and height of each vehicle were almost the same (as shown in Figure 5). The rule of vehicles size may be unconvincing; thus, 90-120 km/h is not a suitable speed to analyze the rule of vehicle size. We grouped 60-90 and 90-120 km/h together, and divided the speed into 0-20, 20-60, and 60-120 km/h. Here, the definition of More, Fewer, and Much Fewer can be expressed by the mean of the number of preceding vehicles in per image; in this paper, we used 2.5, 1.5, and 0.5 to represent the approximate average values which correspond to More, Fewer, and Much Fewer, respectively.  Here, the definition of More, Fewer, and Much Fewer can be expressed by the mean of the number of preceding vehicles in per image; in this paper, we used 2.5, 1.5, and 0.5 to represent the approximate average values which correspond to More, Fewer, and Much Fewer, respectively. In our experiment, we collected and screened images with a resolution of 600 × 480 at different speed stages, then each image was labeled manually to obtain the width and height of preceding vehicles in the image; the types of selected vehicles are car and SUV. The parameters of the screened images are shown in Table 3. In this paper, we classify the preceding vehicle sizes as small, medium, large, and very large. Small, medium, and large are preceding vehicle sizes in the vehicle running experiment, and we added very large as a new size to fit some vehicles when the speed of the experimental vehicle is 0. As the k-means algorithm is a classical clustering algorithm which can set the number of clusters, and in order to corresponding to four vehicle sizes, k-means algorithm was adopted to deal with the width and height of preceding vehicles (K = 4). The clustering results at different speed stages are shown in Figure 6. In our experiment, we collected and screened images with a resolution of 600 480 × at different speed stages, then each image was labeled manually to obtain the width and height of preceding vehicles in the image; the types of selected vehicles are car and SUV. The parameters of the screened images are shown in Table 3. In this paper, we classify the preceding vehicle sizes as small, medium, large, and very large. Small, medium, and large are preceding vehicle sizes in the vehicle running experiment, and we added very large as a new size to fit some vehicles when the speed of the experimental vehicle is 0. As the k-means algorithm is a classical clustering algorithm which can set the number of clusters, and in order to corresponding to four vehicle sizes, k-means algorithm was adopted to deal with the width and height of preceding vehicles (K = 4). The clustering results at different speed stages are shown in Figure 6. weighting. According to Equation (7), the centroid of new cluster A ( , ) is obtained. The cluster centroids and number of vehicle sizes per cluster at each speed stage are shown in Table 4.
Among them, , x y represent the horizontal and vertical coordinates of cluster centroid, and N represents the number of vehicle sizes in each cluster. However, it can be seen from the clustering results that there are too few vehicle sizes in cluster 4. Clusters 3 and 4 are recombined, and the centroids of clusters 3 and 4 are recombined by weighting. According to Equation (7), the centroid of new cluster A (x K=A , y K=A ) is obtained. The cluster centroids and number of vehicle sizes per cluster at each speed stage are shown in Table 4.
Among them, x, y represent the horizontal and vertical coordinates of cluster centroid, and N represents the number of vehicle sizes in each cluster.
Here, the horizontal and vertical coordinates of cluster centroids represent the width and height of preceding vehicles, respectively. After recombination, the new cluster is obtained, and the vehicle sizes are redefined as small, medium, and large, corresponding to clusters 1, 2, and A, respectively. Each cluster is described by the mean of all values of this cluster (cluster centroid); thus, the cluster centroid is selected to describe the vehicle sizes that belong to this cluster. Fitting anchor sizes to vehicle sizes, as shown in Equation (1), vehicle size should include not only width and height, but also the aspect ratio (width/height); thus, the proportion of vehicle aspect ratios in clusters 1, 2, and A at different speed stages is collected to provide the basis for the design of anchors, and the results are shown in Figure 7.  Here, the horizontal and vertical coordinates of cluster centroids represent the width and height of preceding vehicles, respectively.
After recombination, the new cluster is obtained, and the vehicle sizes are redefined as small, medium, and large, corresponding to clusters 1, 2, and A, respectively. Each cluster is described by the mean of all values of this cluster (cluster centroid); thus, the cluster centroid is selected to describe the vehicle sizes that belong to this cluster. Fitting anchor sizes to vehicle sizes, as shown in Equation (1), vehicle size should include not only width and height, but also the aspect ratio (width/height); thus, the proportion of vehicle aspect ratios in clusters 1, 2, and A at different speed stages is collected to provide the basis for the design of anchors, and the results are shown in Figure  7.

Anchor Design Based on Vehicle Pixel Dimensions
In this section, without considering the aspect ratios, we defined the anchors as initial anchors, and after processing initial anchors by aspect ratio, we defined them as final anchors. At different speed stages, the occurrence frequency ratio of different vehicle sizes determines the number of initial anchors (NIA) belonging to the cluster, and the proportion of vehicle aspect ratios in each cluster determines the number of aspect ratios (NAR) belonging to the cluster.  Table 5 as much as possible. (2) The NIA of each cluster should sum to 9 to meet the design of the original Faster R-CNN: k = 9.
(3) The NIA should be larger than 1, so that the size of the anchor is variable to cover more vehicle sizes. The design principles of

Anchor Design Based on Vehicle Pixel Dimensions
In this section, without considering the aspect ratios, we defined the anchors as initial anchors, and after processing initial anchors by aspect ratio, we defined them as final anchors. At different speed stages, the occurrence frequency ratio of different vehicle sizes determines the number of initial anchors (NIA) belonging to the cluster, and the proportion of vehicle aspect ratios in each cluster determines the number of aspect ratios (NAR) belonging to the cluster. The number of final anchors corresponding to this cluster is NIA × NAR, and the number of final anchors corresponding to the vehicle speed stage is A K=1 (NIA K × NAR K ); K = 1, 2, A. At each speed stage, the design principles of NIA of each cluster are as follows: (1) The NIA should meet the occurrence frequency ratio of vehicle sizes shown in Table 5 as much as possible. (2) The NIA of each cluster should sum to 9 to meet the design of the original Faster R-CNN: k = 9.
(3) The NIA should be larger than 1, so that the size of the anchor is variable to cover more vehicle sizes. The design principles of NAR of each cluster are as follows: (1) Compared with NIA, the ratio of NIA × NAR should be closer to the occurrence frequency ratios of vehicle sizes shown in Table 5. (2) Corresponding to the aspect ratios shown in Figure 7, the NAR should be as large as possible. The values of NIA and NAR are shown in Table 6. Here, ratios are obtained through the result of clustering shown in Table 4; at each speed stage, the ratio of each type (corresponding to cluster K) is N K / A K=1 N K , K = 1, 2, A. Table 6. Values of number of initial anchors (NIA) and number of aspect ratios (NAR).

Cluster (K)
Speed stage 0-20 km/h 20-60 km/h 60-120 km/h After obtaining the NIA and NAR, the vehicle sizes and aspect ratios should be selected. According to the acquired data at different speed stages, we established a method to select vehicle sizes. The method of selection and data chosen at different speed stages are shown in Figure 8. The selected vehicle sizes should satisfy the NIA condition and have a span to make anchor sizes more reasonable.
NAR of each cluster are as follows: (1) Compared with NIA, the ratio of NIA NAR × should be closer to the occurrence frequency ratios of vehicle sizes shown in Table 5. (2) Corresponding to the aspect ratios shown in Figure 7, the NAR should be as large as possible. The values of NIA and NAR are shown in Table 6. Here, ratios are obtained through the result of clustering shown in Table 4; at each speed stage, the ratio of each type (corresponding to cluster K) is Table 6. Values of number of initial anchors (NIA) and number of aspect ratios (NAR).

Speed stage Cluster (K) 0-20 km/h 20-60 km/h 60-120 km/h
After obtaining the NIA and NAR, the vehicle sizes and aspect ratios should be selected. According to the acquired data at different speed stages, we established a method to select vehicle sizes. The method of selection and data chosen at different speed stages are shown in Figure 8. The selected vehicle sizes should satisfy the NIA condition and have a span to make anchor sizes more reasonable.  0-20 km/h •

0-20 km/h
When the vehicle speed is 0-20 km/h, the cluster centroid of clusters 1, 2, and A is (38,28),(75, 50), and (153, 96), respectively, the minimum vehicle size of cluster 1 is (10,9), the maximum vehicle size of cluster A is (300, 200), corresponding to Figure 8, and the vehicle sizes corresponding to each cluster are selected: The selected aspect ratios should satisfy the NAR condition and cover the aspect ratios shown in Figure 7 as much as possible. According to the proportion of different vehicle aspect ratios and NAR in each cluster, aspect ratios R corresponding to each cluster are selected:

20-60 km/h
When vehicle speed is 20-60 km/h, the cluster centroid of clusters 1, 2, and A is (27,25),(60, 45), and (130, 85), respectively, the weighted mean value of the centroid of clusters 1 and 2 is (38, 32), the minimum vehicle size of cluster 1 is (10, 7), the maximum vehicle size of cluster A is (300, 175), corresponding to Figure 8, and the vehicle sizes and aspect ratios R corresponding to each cluster are selected:

60-120 km/h
When the speed exceeds 60 km/h, the cluster centroid of clusters 1, 2, and A is (16,13), (40,29), and (104, 99), respectively, the minimum vehicle size of cluster 1 is (5, 6), the maximum vehicle size of cluster A is (202, 123), the weighted mean value of centroid of clusters 1 and 2 is (21, 16), the average of cluster centroid 1 and minimum vehicle size is (11, 10), corresponding to Figure 8, and the vehicle sizes and aspect ratios R corresponding to each cluster are selected:  Due to different clusters corresponding to different vehicle sizes, the anchor sizes are fit to vehicle sizes, so initial anchors correspond to different clusters as well. After selecting the width and height of vehicle sizes, the width (IAw) and height (IAh) of initial anchors are obtained by Equation (8): After obtaining the width and height of initial anchors, aspect ratio R should be considered. The corresponding final anchor sizes are acquired; thus, Equation (1) is optimized as Equation (9), and the final anchor sizes at different speed stages are obtained by Equation (9): where S represents the speed stage; K represents the Kth cluster at S, and K = 1, 2, A; i, j represent the ith selected initial anchor size and the jth selected aspect ratio R, and i = 1 S,K , 2 S,K , . . . , NIA S,K , j = 1 S,K , 2 S,K , . . . , NAR S,K ; and FAw, FAh represent the width and height of the final anchor. In Faster R-CNN, the size of images is 800 × 600; in our experiment, the size of collected images is 640 × 480, so we multiplied by the expansion factor of 1.25.

Random Selection of Anchors
In order to adjust the number of anchors and improve the generalization ability of the training model, a random anchor structure is proposed in this paper. The purpose is as follows: 1.
There are many repetitions in the final anchor sizes, so the random selection of anchor sizes is adopted to reduce repeatability.

2.
Anchors are constantly changing to enhance the generalization ability of the training model.

3.
Compared with k = 9, increasing the number of anchors will increase the proposals, cover more possible object areas, and improve the generalization ability of the training model.
Compared with the Faster R-CNN algorithm, we designed 27, 31, and 37 final anchors at the three speed stages, and the repeatability is strong. Therefore, in order to adjust the number of final anchor sizes and improve the generalization ability of the training model, a random selection model is proposed by Equation (10): where random j,m () represents randomly extracted m data from j data. In this paper, m = 2, so at each speed stage, the number of final anchor sizes is 18, and the anchor sizes in the Faster R-CNN algorithm [41] are updated by final anchor sizes.

Q-square Penalty Coefficient
Compared with NMS, the Soft-NMS algorithm establishes a smoothing relationship between IoU and confidence score S i and proposes the penalty coefficients λ of linear and Gaussian weighting; thus, the detection accuracy of occluded objects is improved by reducing the confidence score instead of strictly deleting the candidate boxes. The Soft-NMS algorithm was applied to the Faster R-CNN to replace NMS for detection of occluded preceding vehicles. From the detection results, there are several situations (as shown in Figure 9), and we optimized Soft-NMS from three aspects:

1.
When vehicle occlusion is slight, occluded vehicles can be detected using the Soft-NMS algorithm; thus, we expect that the penalty intensity of our optimized Soft-NMS algorithm can be as consistent as possible with the original Soft-NMS algorithm, so the effect of detection remains as constant as possible.

2.
When vehicle occlusion is serious, occluded vehicles cannot be detected using the Soft-NMS algorithm; thus, we expect that the penalty intensity of our optimized Soft-NMS algorithm can be stronger, the confidence score drops more sharply, and score ranking is lower, to reduce the influence of heavily occluded detection boxes.

3.
When vehicle occlusion is between the above two conditions, some occluded vehicles can be detected using Soft-NMS, while some cannot be detected; thus, we explored the influence of adjusting penalty intensity on detection of occluded preceding vehicles. Therefore, to meet the above requirements, according to Equations (11) and (12), the Q-SPC method is proposed, and the Q λ of linear and Gaussian weighting are obtained by Equations (11) and (12), respectively: 1 IoU( , ) (1 IoU( , )) IoU( , ) δ . Figure 10 shows the relationships between the penalty coefficient of linear weighting, Gaussian weighting, and the value of IoU, as well as the relationship between penalty coefficients and the power of Q. As shown in Figure 10a, when IoU < threshold, 1 Q λ = can make the penalty intensity stay in the same condition as Soft-NMS when occlusion is slight, while with increased IoU, the penalty intensity is stronger and the confidence score drops more sharply; and with increased Q, the intensity of punishment also changes. As shown in Figure 10b, when occlusion is slight, punishment is maintained as far as possible, but compared with linear weighting, the effect of preservation is not enough, and other effects are basically the same as linear weighting. Therefore, the method of multiplying the penalty coefficient by Q times can satisfy the requirements of optimized Soft-NMS. Therefore, to meet the above requirements, according to Equations (11) and (12), the Q-SPC method is proposed, and the λ Q of linear and Gaussian weighting are obtained by Equations (11) and (12), respectively: where Q represents the times of multiplication; in this paper, Q = 1, 2, . . . , 7, Threshold = 0.3, and δ = 0.3. Figure 10 shows the relationships between the penalty coefficient of linear weighting, Gaussian weighting, and the value of IoU, as well as the relationship between penalty coefficients and the power of Q. As shown in Figure 10a, when IoU < threshold, λ Q = 1 can make the penalty intensity stay in the same condition as Soft-NMS when occlusion is slight, while with increased IoU, the penalty intensity is stronger and the confidence score drops more sharply; and with increased Q, the intensity of punishment also changes. As shown in Figure 10b, when occlusion is slight, punishment is maintained as far as possible, but compared with linear weighting, the effect of preservation is not enough, and other effects are basically the same as linear weighting. Therefore, the method of multiplying the penalty coefficient by Q times can satisfy the requirements of optimized Soft-NMS. The optimized confidence score i S is obtained by Equation (13): According to the confidence score, the code of Soft-NMS [42] is optimized, the possible object detection boxes are rescreened to improve the occluded vehicle detection accuracy of Faster R-CNN.

Experiments
In this section, we describe the experimental verification and comparison tests that were carried out to verify the effectiveness of the improved algorithm. Firstly, the evaluation indicators and training environment are introduced, then the datasets of the SCRA method at different speed stages and the Q-SPC are introduced. Finally, the different data training modes of the SCRA method are presented.

Performance Evaluation Indicators
The most common model evaluation indicators (model performance) are mainly precision (P), recall (R), average precision (AP), and mean average precision (mAP). Precision represents the ability of the model to identify related objects; it is the percentage of correct predictions. Recall refers to the ability of the model to find all relevant objects; it is the percentage of true positives detected in all ground truths. For binary classification problems, AP measures the performance of the classifier, namely, the area of the P-R curve; AP is introduced to reflect the evaluation effect of the balance between precision and recall [43,44]. Precision and recall are determined by Equations (14) and (15), respectively: TP R TP FN = + (15) where the meanings of TP, FP, and FN are shown in Table 7. The optimized confidence score S i is obtained by Equation (13): According to the confidence score, the code of Soft-NMS [42] is optimized, the possible object detection boxes are rescreened to improve the occluded vehicle detection accuracy of Faster R-CNN.

Experiments
In this section, we describe the experimental verification and comparison tests that were carried out to verify the effectiveness of the improved algorithm. Firstly, the evaluation indicators and training environment are introduced, then the datasets of the SCRA method at different speed stages and the Q-SPC are introduced. Finally, the different data training modes of the SCRA method are presented.

Performance Evaluation Indicators
The most common model evaluation indicators (model performance) are mainly precision (P), recall (R), average precision (AP), and mean average precision (mAP). Precision represents the ability of the model to identify related objects; it is the percentage of correct predictions. Recall refers to the ability of the model to find all relevant objects; it is the percentage of true positives detected in all ground truths. For binary classification problems, AP measures the performance of the classifier, namely, the area of the P-R curve; AP is introduced to reflect the evaluation effect of the balance between precision and recall [43,44]. Precision and recall are determined by Equations (14) and (15), respectively: where the meanings of TP, FP, and FN are shown in Table 7.

Training Environment and Datasets
The training parameters of our experiment are shown in Table 8. In order to verify the effect of our design, training datasets were established by collecting images through a real vehicle experiment and labeling them manually. The parameters of our experimental datasets are shown in Tables 9 and 10. The datasets in Table 9 were used in the SCRA method; different datasets were collected at different speed stages. Training sets were used for training the algorithm, and test sets were used to test our training model. The proportion of training set and test set images is 70% and 30%, respectively. The datasets in Table 10 were used in the Q-SPC method, because vehicle occlusion usually occurs under low speed; thus, vehicle occlusion images were collected at 0-20 km/h in the real vehicle experiment.

Data Training Mode and Training
For the SCRA method, to verify the possible impact of the image data, as shown in Figure 11, data training of separate and overall training modes was adopted. For each training mode, redesigned anchors belonging to different speed stages were used to optimize Faster R-CNN; thus, there are three optimized Faster-RCNN algorithms corresponding to the three speed stages, Faster R-CNN with SCRA 1, Faster R-CNN with SCRA 2, and Faster R-CNN with SCRA 3, based on the data in Table 9. These algorithms were used for training; each optimized algorithm generated a detection model, Models 1, 2, and 3. Each model was used to test different test sets to verify the effect of this model. detection model, Models 1, 2, and 3. Each model was used to test different test sets to verify the effect of this model. For the Q-SPC method, because Soft-NMS optimization does not require retraining the model, based on the data in Table 10, the Faster R-CNN algorithm was used for training, and a detection model was generated. After modification, multiple tests can achieve verification of the Q-SPC method.

Results and Discussion
In Table 11, for the test sets belonging to different speed stages (TestData1, TestData2, and TestData3), the models with the highest AP values are Models 1, 2, and 3. With increased speed, the AP of both the optimized Faster R-CNN and Faster R-CNN algorithms declines, but compared to the Faster R-CNN algorithm, based on the SCRA method, the improvement in detection accuracy is 7.65%, 9.27%, and 15.14%, respectively. Here, in order to make a comparison with our proposed algorithm, we selected the best detection result of nonoptimized Faster R-CNN for each test set, and we defined the acquired detection model as Model 0. Under separate training modes, we used Faster R-CNN to process the training and test sets belonging to same speed stage; under the overall training mode, Faster R-CNN was used to train all data , and the corresponding model 0 was used to test TestData all, TestData 1, TestData 2, and TestData 3.
The test results under overall training mode are shown in Table 12. For TestData1, TestData2, and TestData3, the models with the highest test results are Models 1, 2, and 3, and all AP values for overall training mode are higher than those for separate training modes. This means that the increased training set data samples have a positive impact on the detection accuracy, and compared to the Faster R-CNN algorithm, based on the SCRA method, the improvement in detection accuracy is 8.22%, 10.66%, and 16.11%, respectively. For the Q-SPC method, because Soft-NMS optimization does not require retraining the model, based on the data in Table 10, the Faster R-CNN algorithm was used for training, and a detection model was generated. After modification, multiple tests can achieve verification of the Q-SPC method.

Results and Discussion
In Table 11, for the test sets belonging to different speed stages (TestData1, TestData2, and TestData3), the models with the highest AP values are Models 1, 2, and 3. With increased speed, the AP of both the optimized Faster R-CNN and Faster R-CNN algorithms declines, but compared to the Faster R-CNN algorithm, based on the SCRA method, the improvement in detection accuracy is 7.65%, 9.27%, and 15.14%, respectively. Here, in order to make a comparison with our proposed algorithm, we selected the best detection result of nonoptimized Faster R-CNN for each test set, and we defined the acquired detection model as Model 0. Under separate training modes, we used Faster R-CNN to process the training and test sets belonging to same speed stage; under the overall training mode, Faster R-CNN was used to train all data, and the corresponding model 0 was used to test TestData all, TestData 1, TestData 2, and TestData 3.
The test results under overall training mode are shown in Table 12. For TestData1, TestData2, and TestData3, the models with the highest test results are Models 1, 2, and 3, and all AP values for overall training mode are higher than those for separate training modes. This means that the increased training set data samples have a positive impact on the detection accuracy, and compared to the Faster R-CNN algorithm, based on the SCRA method, the improvement in detection accuracy is 8.22%, 10.66%, and 16.11%, respectively. The experimental results show the following: 1.
For the test sets collected at different speed stages, no matter which data training mode we chose, the highest test results of TestData1, TestData2, and TestData3 are with Models 1, 2, and 3. That means that using the vehicle speed to adjust the anchor size adaptively can achieve the best detection effect.

2.
By using the overall training mode, all the experimental results of Models 1, 2, and 3 were higher than with the Faster R-CNN algorithm. This proves that the rule of preceding vehicle size is effective and reasonable.

3.
With increased vehicle speed, the occurrence frequency of small vehicles increased and the accuracy of the Faster R-CNN was gradually reduced, but the detection accuracy of the optimized algorithm was improved more obviously. This proves that the optimized anchors can match vehicle sizes, especially for small vehicles.

4.
When preceding vehicle size was small and the color close to the background, sometimes there were some false positives of preceding vehicles detection.

5.
Comparing separate and overall training, separate training modes had a poor detection effect on the test sets that do not belong to the corresponding speed stage, but under the overall training mode, the detection effect was better, and the generalization ability of overall training mode was better.
For occlusion of preceding vehicles, the Faster R-CNN algorithm was trained, and NMS, Soft-NMS based on linear weighting penalty coefficient and on Gaussian weighting penalty coefficient taking the influence of parameter δ into account, and optimized Soft-NMS based on Q-SPC algorithm were applied to test separately, and the detection effect of occluded preceding vehicles is shown in Figure 12.  The experimental results show the following: 1. For the test sets collected at different speed stages, no matter which data training mode we chose, the highest test results of TestData1, TestData2, and TestData3 are with Models 1, 2, and 3. That means that using the vehicle speed to adjust the anchor size adaptively can achieve the best detection effect.
2. By using the overall training mode, all the experimental results of Models 1, 2, and 3 were higher than with the Faster R-CNN algorithm. This proves that the rule of preceding vehicle size is effective and reasonable.
3. With increased vehicle speed, the occurrence frequency of small vehicles increased and the accuracy of the Faster R-CNN was gradually reduced, but the detection accuracy of the optimized algorithm was improved more obviously. This proves that the optimized anchors can match vehicle sizes, especially for small vehicles.
4. When preceding vehicle size was small and the color close to the background, sometimes there were some false positives of preceding vehicles detection.
5. Comparing separate and overall training, separate training modes had a poor detection effect on the test sets that do not belong to the corresponding speed stage, but under the overall training mode, the detection effect was better, and the generalization ability of overall training mode was better.
For occlusion of preceding vehicles, the Faster R-CNN algorithm was trained, and NMS, Soft-NMS based on linear weighting penalty coefficient and on Gaussian weighting penalty coefficient taking the influence of parameter δ into account, and optimized Soft-NMS based on Q-SPC algorithm were applied to test separately, and the detection effect of occluded preceding vehicles is shown in Figure 12. δ ) for occluded preceding vehicles, detection accuracy improved by nearly 2%, and the effect of Soft-NMS based on linear weighting penalty coefficient was slightly better than that of Gaussian weighting penalty coefficient. The following can be seen from the experimental results:

1.
When Q = 1, compared with NMS, applying nonoptimized Soft-NMS (linear weighting, Gaussian weighting with δ = 0.3, Gaussian weighting with δ = 0.4) for occluded preceding vehicles, detection accuracy improved by nearly 2%, and the effect of Soft-NMS based on linear weighting penalty coefficient was slightly better than that of Gaussian weighting penalty coefficient.

2.
With increased Q, detection accuracy increased continuously. For the optimized Soft-NMS with linear weighting, AP reached the maximum when Q = 4; for the optimized Soft-NMS with Gaussian weighting, AP reached the maximum when Q = 6, and δ had little effect on the detection accuracy. On the whole, the effect of the linear weighting penalty coefficient was better than that of the Gaussian weighting penalty coefficient.

3.
With the introduction of the Q-SPC method, for occluded vehicles, compared with the detection result when Q = 1, detection accuracy by Faster R-CNN improved 1%-2%, and the best Q values of linear and Gaussian weighting were 4 and 6, so this method has a certain effect. 4.
According to our requirements, to optimize Soft-NMS, the ability of Gaussian weighting to maintain the penalty intensity was not enough, and this is one reason why the effect of Gaussian weighting was worse compared with linear weighting. Figure 13 shows the test results based on the optimized Faster R-CNN algorithm using the SCRA method and the nonoptimized Faster R-CNN algorithm. Figure 13d shows the test results of occluded preceding vehicles based on the optimized Soft-NMS using the Q-SPC method and nonoptimized Soft-NMS. Gaussian weighting, AP reached the maximum when Q = 6, and δ had little effect on the detection accuracy. On the whole, the effect of the linear weighting penalty coefficient was better than that of the Gaussian weighting penalty coefficient.
3. With the introduction of the Q-SPC method, for occluded vehicles, compared with the detection result when Q = 1, detection accuracy by Faster R-CNN improved 1%-2%, and the best Q values of linear and Gaussian weighting were 4 and 6, so this method has a certain effect.
4. According to our requirements, to optimize Soft-NMS, the ability of Gaussian weighting to maintain the penalty intensity was not enough, and this is one reason why the effect of Gaussian weighting was worse compared with linear weighting. Figure 13 shows the test results based on the optimized Faster R-CNN algorithm using the SCRA method and the nonoptimized Faster R-CNN algorithm. Figure 13d shows the test results of occluded preceding vehicles based on the optimized Soft-NMS using the Q-SPC method and nonoptimized Soft-NMS.

Conclusions
In this paper, to improve the detection accuracy of preceding vehicles, an optimized Faster R-CNN algorithm based on SCRA and Q-SPC methods was proposed. Firstly, the reasons for degraded detection accuracy when the host vehicle speed increases were analyzed, and the factor of vehicle speed was introduced to redesign the anchors. Redesigned anchors can adapt to changes of preceding vehicle size when the host vehicle speed increases, and the SCRA method was proposed. Secondly, to achieve better performance on occluded vehicles, the Q-SPC method was proposed to optimize the Faster R-CNN algorithm. Finally, the experimental results showed that introducing the factor of host vehicle speed to make anchors adapt to vehicle size can bring 7%-17% accuracy improvement, and the method of Q-times multiplication of penalty coefficients can bring 1%-2% accuracy improvement for occluded vehicles. It was proved that the SCRA and Q-SPC methods have certain significance for improved accuracy of preceding vehicle detection.
In this paper, we improved the detection accuracy of preceding vehicles, and our method had no influence on detection speed, but Faster R-CNN is a two-stage algorithm, which causes this algorithm not to work in real time. In the next step, we will try to optimize the structure of Faster R-CNN to improve the detection speed and try to extend our design to one-stage detection algorithms such as you look only once (YOLO) [45] or single shot multibox detector (SSD) [46]. Moreover, other types of vehicles such as buses and trucks and the influence of moving targets such

Conclusions
In this paper, to improve the detection accuracy of preceding vehicles, an optimized Faster R-CNN algorithm based on SCRA and Q-SPC methods was proposed. Firstly, the reasons for degraded detection accuracy when the host vehicle speed increases were analyzed, and the factor of vehicle speed was introduced to redesign the anchors. Redesigned anchors can adapt to changes of preceding vehicle size when the host vehicle speed increases, and the SCRA method was proposed. Secondly, to achieve better performance on occluded vehicles, the Q-SPC method was proposed to optimize the Faster R-CNN algorithm. Finally, the experimental results showed that introducing the factor of host vehicle speed to make anchors adapt to vehicle size can bring 7%-17% accuracy improvement, and the method of Q-times multiplication of penalty coefficients can bring 1%-2% accuracy improvement for occluded vehicles. It was proved that the SCRA and Q-SPC methods have certain significance for improved accuracy of preceding vehicle detection.
In this paper, we improved the detection accuracy of preceding vehicles, and our method had no influence on detection speed, but Faster R-CNN is a two-stage algorithm, which causes this algorithm not to work in real time. In the next step, we will try to optimize the structure of Faster R-CNN to improve the detection speed and try to extend our design to one-stage detection algorithms such as you look only once (YOLO) [45] or single shot multibox detector (SSD) [46]. Moreover, other types of vehicles such as buses and trucks and the influence of moving targets such as pedestrians and bikes were not considered. In the future, we will focus on diverse vehicle types and the impact of other road factors.