Robust Vehicle Speed Measurement Based on Feature Information Fusion for Vehicle Multi-Characteristic Detection

A robust vehicle speed measurement system based on feature information fusion for vehicle multi-characteristic detection is proposed in this paper. A vehicle multi-characteristic dataset is constructed. With this dataset, seven CNN-based modern object detection algorithms are trained for vehicle multi-characteristic detection. The FPN-based YOLOv4 is selected as the best vehicle multi-characteristic detection algorithm, which applies feature information fusion of different scales with both rich high-level semantic information and detailed low-level location information. The YOLOv4 algorithm is improved by combing with the attention mechanism, in which the residual module in YOLOv4 is replaced by the ECA channel attention module with cross channel interaction. An improved ECA-YOLOv4 object detection algorithm based on both feature information fusion and cross channel interaction is proposed, which improves the performance of YOLOv4 for vehicle multi-characteristic detection and reduces the model parameter size and FLOPs as well. A multi-characteristic fused speed measurement system based on license plate, logo, and light is designed accordingly. The system performance is verified by experiments. The experimental results show that the speed measurement error rate of the proposed system meets the requirement of the China national standard GB/T 21555-2007 in which the speed measurement error rate should be less than 6%. The proposed system can efficiently enhance the vehicle speed measurement accuracy and effectively improve the vehicle speed measurement robustness.


Introduction
Vehicle speed measurement is one of the most important tasks of intelligent traffic monitoring system [1][2][3][4]. It helps to monitor speeding behavior and improve road safety. Common vehicle speed measurement methods can be divided into intrusive and nonintrusive. The intrusive method uses inductive loop detectors (ILDs) embedded in the road to measure the average vehicle speed, which is difficult to install and maintain due to the damage to road [5]. The nonintrusive speed measurement methods include Radar [6], Lidar [7], and video-based speed measurement systems [8]. The Radar method uses Doppler effect produced by the relative motion between vehicle and radar equipment (fixed or mobile) to measure the vehicle speed [6]. The Lidar method uses a laser to measure the distance from a fixed lidar equipment to vehicle twice within a specific time interval, and calculates the average vehicle speed [7]. Both methods transmit signals, which are easy to be detected and hard to make secret measurements. Video-based speed measurement methods can be further divided into two categories: monocular 2D video-based method and binocular 3D video-based method [9]. The monocular 2D video-based methods [8,10,11] use the perspective projection relationship of 2D imaging to estimate the distance traveled by the vehicle within a fixed frame interval, thereby calculate the vehicle speed. Due to the perspective projection relationship, they can only accurately measure the speed of vehicle in straight-line motion, but not in curved motion. The binocular 3D video-based methods [12][13][14] use the stereo imaging principle of the binocular camera, which can directly calculate the distance traveled by the vehicle within a fixed frame interval, thereby calculate the vehicle speed. However, the existing methods still have disadvantages such as low object detection efficiency and lack of intelligence, which need to be further improved.
To overcome the shortcomings of the existing methods, a binocular stereovision-based vehicle speed measurement system is proposed in [14]. The system consists of three parts: vehicle characteristic detection, vehicle tracking and stereo matching, and vehicle speed and trajectory measurement. The system uses a SSD (Single-Shot Multibox Detector) network [15] optimized for license plate detection to efficiently detect the license plate in binocular stereo video, performs fast stereo matching [16,17] in the detected left-view and right-view license plate areas, calculates the 3D coordinates of the matching point pairs, eliminates the abnormal points and selects the point closest to the license plate center as the speed measurement point of the current frame, and calculates the vehicle speed by dividing the distance between the speed measurement points in consecutive frames by the frame interval. This system is of low cost and high accuracy. It can realize nonintrusive and secret vehicle speed measurement, and can simultaneously measure the speed of multiple vehicles in different motions and on multiple lanes. The system is based on vehicle license plate detection, and the detection network is optimized for license plate detection. Moreover, the system designed is also based on the premise that the license plate characteristic of the vehicle can be accurately detected. However, vehicles license plate violation often exists, including no license plate, deliberately shielding or blurring license plate, faking license plate, etc. [18]. According to traffic statistical reports [19][20][21][22], license plate violation accounts for a relatively high proportion of traffic violation, which poses a challenge to the binocular stereovision-based vehicle speed measurement system with license plate in [14]. Once license plate violation occurs, the system will be invalid.
A robust binocular stereovision-based vehicle speed measurement system with vehicle multi-characteristic detection is proposed in this paper. The system model is shown in Figure 1, wherein, A, B, C and D respectively represent the multiple characteristics of vehicle, i.e., mirror, light, logo, and license plate. ∆S 1 , ∆S 2 , ∆S 3 , and ∆S 4 , respectively, represent the displacements of multiple characteristics in the speed measurement frame interval ∆t. As the multiple characteristics of the vehicle are of multiple scales at multiple distances, You Only Look Once (YOLO) v4 network is selected, and a YOLOv4-based vehicle multi-characteristic detection network is proposed, which can efficiently detect multiple characteristics of vehicle. Combined with attention mechanism, Efficient Channel Attention (ECA) module is chosen, and an ECA-YOLOv4-based vehicle multi-characteristic detection network is proposed, which can further improve the detection efficiency and accuracy of the network for multiple characteristics of the vehicle. Accordingly, a multicharacteristic combined speed measurement system based on binocular stereovision with license plate, logo and light detection is proposed. In normal case, this system can efficiently enhance the speed measurement accuracy. In the case of license plate violation, this system can effectively improve the speed measurement robustness, which can solve the problem that the system in [14] cannot measure the vehicle speed in the case of license plate violation.
The rest of the paper is organized as follows. In Section 2, related work is discussed. In Section 3, a robust binocular stereovision-based vehicle speed measurement system with vehicle multi-characteristic detection is proposed. The main work focuses on the optimization of object detection algorithm for multiple characteristics of vehicle and the design of the multi-characteristic combined vehicle speed measurement system. In Section 4, the experimental setup and results are reported. In Section 5, the conclusion is drawn.

Object Detection Algorithm
Accurate vehicle characteristic detection, namely, accurate object detection is an important premise of the vehicle speed measurement system proposed in this paper, and is also the main research content of this paper. Object detection includes two broad categories: traditional object detection and deep learning-based object detection [23]. Traditional object detection extracts informative feature sets by handcrafted features, while deep learning based object detection extracts informative feature sets by various end-to-end deep learning networks [24]. In recent years, thanks to the rapid development of artificial intelligence, object detection based on deep learning has become the mainstream, with better adaptability and more intelligence. Deep learning networks can be divided into Convolutional Neural Networks (CNNs) for Euclidean data and the currently state-of-the-art Graph Neural Networks (GNNs) for non-Euclidean data [25]. CNNs extract multi-scale local spatial information and fuse them to construct feature representation [26]. GNNs capture the dependency in graph by information transfer between nodes in graph, among which Graph Convolutional Networks (GCNs) have become a hot topic. An improved version of GCNs, known as miniGCNs, has been proposed, which allows to train in a minibatch fashion and is capable of inferring out-of-sample data without retraining [27]. However, as the object detection in this paper is performed on the captured stereo images which belong to Euclidean data, CNNs based object detection is enough for this application scene.
CNN-based object detection algorithm aims to detect the object of interest from the image, determine the category of each object, and locate the bounding box of each object by self-learning the high-level features of the images. Feature extraction backbone network includes AlexNet [28], VGG [29], Inception [30], ResNet [31], DenseNe [32], DarkNet [33], CSPDarkNet [34], and so on. According to the different ways of using the extracted feature maps, the object detection algorithms can be divided into three categories: object detection algorithm based on single feature map, object detection algorithm based on pyramid feature hierarchy, and object detection algorithm based on Feature Pyramid Network (FPN) [35]. R-CNN [36], Fast R-CNN [37], Faster R-CNN [38], and YOLOv1 [33] are all based on the single feature map. The advantage is the fast detection speed and little memory requirement. The disadvantage is that only the features of the last high-level are used; the resolution is low and is not conductive to small object detection. SSD [15], RFB [39], and YOLOv2 [40] are all based on the pyramid feature hierarchy. The advantage is the simultaneous use of multi-layer feature maps. The disadvantage is that the calculated low-level high-resolution feature maps are not reused, the spatial information in the low-level feature maps is not fully utilized, and the small object detection task is still not well completed.
The FPN-based object detection algorithm takes the feature pyramid hierarchy as the foundation structure and makes joint prediction by the fused feature map of each layer. Commonly used feature fusion schemes include additive fusion, elementwise multiplicative fusion, and concatenation fusion [27]. Several state-of-the-art fusion strategies include concatenation-based fusion: early fusion, middle fusion, and late fusion, and compactnessbased fusion: encoder-decoder fusion and cross-fusion [41]. The purpose of feature fusion is to combine the resulting features using different fusion strategies before the final classification. By feature fusion from the top layer to the bottom layer, the fused feature map contains both rich high-level semantic information and detailed low-level location information [42,43]. Then, the predicted joint feature pyramid hierarchy is used for object detection. For distinguishing multiple objects of multiple scales in the image, multiple feature maps in the predicted joint feature pyramid hierarchy can be used. YOLOv3 [44], M2Det [45], Retinanet [46], YOLOv4 [34], and EfficientDet [47] are all based on FPN. The advantage is that it can transfer the semantic information from the high-level to the low-level by information fusion and effectively supplement the semantic information to the low-level, therefore obtain features with high-resolution and rich semantic information, which can successfully complete the small object detection task.

Attention Mechanism
The essence of the attention mechanism is to locate the information of interest and suppress the useless information. In object detection network, the convolution operation module in the feature extraction backbone network can be replaced by the attention module to assign more weight to the part containing the information of interest and reduce the weight of the part containing the interference information. Thereby, the object detection accuracy can be improved.
The attention module can be divided into spatial attention module, channel attention module, and mixed attention module according to the attention principle [48][49][50]. The task of spatial attention module is to find the 2D spatial position containing the information of interest in a single feature map. It performs average pooling and maximum pooling on the single feature map of the input features, connects the average pooling, and maximum pooling results by convolution, and generates a corresponding 2D spatial attention features map.
The task of channel attention module is to find the channel position containing the information of interest among different channels. Squeeze-and-Excitation (SE) [51] and ECA [52] are typical representatives of channel attention module. SE performs global average pooling on the feature map of each channel, obtains the correlation between the two fully connected layers by sharing the multi-layer perception (MLP), and generates a corresponding channel attention map. For the two convolutions between the two fully connected layers in SE by which the channel dimension is first reduced and then increased, ECA replace them with a one-dimensional convolution that utilizes the local cross channel interaction information which fuses the information of each channel and its k neighbors. Dimension reduction can be avoided by cross-channel interaction which is a kind of information fusion as well. ECA can significantly reduce the model complexity while improving the network performance. It is an extremely lightweight channel attention module.
The task of mixed attention module is to simultaneously find the channel position containing the information of interest among different channels and the 2D spatial position containing the information of interest in each channel. Convolutional Block Attention Module (CBAM) [53] is a typical representative of mixed attention module. It divides the attention process into two independent cascaded modules: channel attention module and spatial attention module. First, channel attention map is generated by the channel attention module for input features, and the refined features of channel attention optimization are obtained by multiplying the input features with the generated channel attention map. Then, spatial attention map is generated by the spatial attention module for the refined feature, and the features of channel attention and spatial attention joint optimization are obtained by multiplying the refined features of channel attention optimization with the generated spatial attention map. Mixed attention model can refine features from both spatial and channel dimensions at the same time, which can better obtain the feature information of the region of interest and provide more effective information for prediction, thus improving the performance of object detection algorithm.

Cross-Entropy Loss
Cross-entropy is an important concept in Shannon's information theory. Cross-entropy loss is an important index to measure the performance of an object detection classification model in deep learning training. It measures the similarity between the prediction and the actual target, which can effectively avoid the problem of learning rate decline in gradient descent. Cross-entropy loss increases as the predicted probability diverges from the actual label. For multi-class classification, the cross-entropy loss function is as shown in Equation (1) [54].
wherein, C is the number of classes; N is the number of samples; y ij is the indicator variable, which is 1 if the predicted class of sample i is the same as the actual class j, and 0 otherwise; p ij is the predicted probability that the sample i belongs to the actual class j. The smaller the cross-entropy loss is, the better the model prediction will be.

The Proposed Method
The system proposed is built by two industrial cameras and a laptop. The Hikvision MV-CA050-11UC industrial camera has a resolution of 2448 × 2048, with a Wallis WL1608-5MP fixed-focus lens of 8 mm. The laptop is equipped with an Intel Core i7-10750H CPU, 16 GB RAM, and a Nvidia RTX2060 6G graphics card. The stereo camera is calibrated by Zhengyou Zhang's calibration method, and the cell size of the calibration board is 30 mm × 30 mm. The system configuration is shown in Figure 2. The whole procedure of the proposed system is shown in Figure 3, which includes three parts: vehicle multi-characteristic detection and decision, stereo matching, and speed measurement. In the vehicle multi-characteristic detection and decision part, the proposed ECA-YOLOv4 object detection network is trained by the vehicle multi-characteristic dataset to get a model. The model is used to detect the left-view and right-view images separately to obtain multiple characteristics of vehicle, i.e., vehicle, license plate, logo, and light. The bounding box of the detected vehicle is used to constrain the detected multiple characteristics into the same vehicle area. Assuming the light characteristic always exists and can be detected, the speed measurement scheme is decided according to whether license plate and logo exist in the detection results, which can be divided into four cases: (1) three vehicle characteristics detected, i.e., license plate, logo and light detected; (2) two vehicle characteristics detected, i.e., license plate and light detected; (3) two vehicle characteristics detected, i.e., logo and light detected; (4) one vehicle characteristic detected, i.e., only light detected. In the stereo matching part, the stereo matching algorithm in [14] is reused, while the speed measurement point selection is slightly adjusted for the multiple characteristics with irregular shapes. In the speed measurement part, the binocular stereovision calibration algorithm in [14] is reused to perform speed measurement of single characteristic respectively, then the vehicle speed is calculated according to the speed measurement scheme decided in the detection and decision part.

Vehicle Multi-Characteristic Detection Based on YOLOv4
First, the vehicle multi-characteristic dataset is constructed. We randomly select 6103 images with a resolution of 1600 × 1200 from the Open ITS dataset [55], 1921 images with a resolution of 1920 × 1080 from the BIT Vehicle dataset [56], and 280 images with a resolution of 720 × 1160 from the CCPD dataset [57]. In addition, we capture 3351 images with a resolution of 6000 × 4000 by Nikon d3200 SLR camera and 480 images with a resolution of 2448 × 2048 by the Hikvision MV-CA050-11UC camera. The dataset has a total of 12,135 vehicle images with multiple characteristics. Figure 4 shows some image examples of our vehicle multi-characteristic dataset.
According to regulation on the relative size of small objects by SPIE, an object with a pixel ratio less than 0.12% can be regarded as small object, otherwise it can be regarded as regular object [58] . Statistical analysis is performed on the pixel ratio of four common vehicle characteristics (license plate, logo, light, and mirror) in the image captured at different distance within the speed measurement range. As shown in Figure 5, the distance threshold of small object for the license plate is about 9 m; the distance threshold of small object for the logo is about 4 m; the distance threshold of small object for the light is about 9 m; and the distance threshold of small object for the mirror is about 7 m. Therefore, the detection of multiple characteristics of vehicle is a multi-scale object detection problem.
As mentioned above, object detection algorithms based on single feature map and pyramidal feature hierarchy are not suitable for this multi-scale varying object detection problem, while object detection algorithm based on FPN is suitable for this multi-scale varying object detection problem. Therefore, the object detection algorithm based on FPN is selected to detect the multiple characteristics of vehicle. Among them, YOLOv4 based on FPN is chosen due to its faster speed and higher accuracy [34]. Figure 6 is the schematic block diagram of the YOLOv4-based vehicle multi-characteristic object detection algorithm. Features are extracted from the input vehicle images by the feature extraction backbone network CSPDarknet53. The extracted features of different scales are fused by SPP and PANet. Finally, prediction is performed on the three different scale feature maps output by PANet, so as to obtain the bounding box, category, and confidence of the multiple characteristics of vehicle.
There are 23 Cross-Stage Partial (CSP) modules in CSPDarknet53. The CSP module enhances the learning ability of CNN and reduces the memory usage, whose structure is shown in Figure 7. The CBM module is composed of convolution, batch normalization and mish activation function. The structure of the residual unit is shown in Figure 8. The input feature map is added to the original feature map after two convolution operations.
After three convolution operations on the last feature map of CSPDarknet53, the Spatial Pyramid Pooling (SPP) module executes maximum pooling operations of four different scales, i.e., 1 × 1, 5 × 5, 9 × 9, and 13 × 13, and then the maximum pooling results of multiple scales are cascaded. After that, the feature maps are fused from the bottom to the top by PANet, and then the fused feature maps are fused again from the top to the bottom. Finally, the bounding box, category, and confidence of the object are predicted by the fused feature maps of three different scales.     The performance of the YOLOv4-based vehicle multi-characteristic detection algorithm is experimentally verified. Six representative object detection algorithms, i.e., Faster R-CNN, SSD, RFB, Retinanet, M2Det, and YOLOv3 are also selected for performance comparison. During the experiment, the model training parameters are the same, and the same model measurement indexes AP and mAP are selected. The experimental results are shown in Table 1. YOLOv4 has an AP of 96.47% for car, 92.13% for license plate, 87.72% for logo, 94.17% for light, and 91.2% for mirror, whose detection accuracy for vehicle single-characteristic detection is better than the other six algorithms. The average accuracy mAP of YOLOv4 is 92.34%, which is also better than that of the other six algorithms. Experimental results show that the proposed algorithm based on YOLOv4 is suitable for vehicle multi-characteristic detection and performs well.

Vehicle Multi-Characteristic Detection Based on ECA-YOLOv4
The vehicle characteristics to be detected are local information, and the attention mechanism that conforms to the human perception mechanism helps to focus on the local information. As mentioned above, the multiple vehicle characteristics to be detected are multi-scale varying, and a large proportion of them are small objects. Thus, the channel attention mechanism is suited to solve such problem. However, the position of the vehicle characteristics in the imaging plane constantly changes with the moving of the vehicle, and the spatial attention mechanism is not applicable. Therefore, the channel attention module is selected to optimize the YOLOv4-based vehicle multi-characteristic detection algorithm. In the two commonly used channel attention modules, the ECA module is an upgrade version of the SE module. Therefore, the ECA module is chosen to optimize the network structure of the YOLOv4 and improve the detection performance.
In the ECA module [52], cross-channel interaction and channel weight sharing are adopted. A weight matrix W k is defined for the cross-channel interaction, as shown in Equation (2).
wherein W k has k × C parameters. For the ith channel y i , as shown in Equation (3), the interactive relationship of k adjacent channels y j i (including y i ) to y i should be considered to calculate its weight ω i , j = 1, . . . , k.
wherein Ω k i represents the set is composed of the k adjacent channels of y i After cross-channel interaction, channel weight sharing is carried out, so that all the channels share the same k parameters, as shown in Equation (4).
In this way, the parameter number of the weight matrix W k are reduced from k × C to k. Weight parameter learning can be achieved by 1D fast convolution with a convolution kernel size of k.
wherein C1D represents 1D convolution operation. As shown in Figure 9, the two CBM modules in each residual unit in each CSP module of YOLOv4 are replaced by one ECA module, and the YOLOv4 is improved to ECA-YOLOv4. Moreover, SE channel attention module and CBAM mixed attention module are also, respectively, used to improve YOLOv4 to SE-YOLOv4 and CBAM-YOLOv4 in the same way for performance comparison. The model parameter sizes and floating point operations (FLOPs) of the object detection networks with different attention modules are shown in Table 2 Table 3. From the AP of vehicle, license plate, logo, and mirror, the detection performances of CBAM-YOLOv4, SE-YOLOv4, and ECA-YOLOv4 are all superior to that of YOLOv4. From the AP of light, the detection performances of SE-YOLOv4 and ECA-YOLOv4 are superior to that of YOLOv4. From the mAP, the detection performances of CBAM-YOLOv4, SE-YOLOv4, and ECA-YOLOv4 are all superior to that of YOLOv4. From the FPS, only the detection speed of ECA-YOLOv4 is faster than that of YOLOv4. Note that for all four networks, the AP of logo is slightly lower than that of other vehicle characteristics. This is because the shapes of logos are more diverse than those of license plates, lights and mirrors, and a few special logos may not be accurately detected, which will reduce the precision corresponding to some recall and thus reduce the AP of logo. Considering the four indexes of model parameter size, AP, mAP, and FPS. ECA-YOLOv4 has the smallest model parameter size and the fastest FPS, with AP for vehicle single-characteristic detection and mAP for vehicle multi-characteristic superior to that of YOLOv4 and equivalent to that of CBAM-YOLOv4, and SE-YOLOv4. In summary, the performance of ECA-YOLOv4 is better than that of the other three algorithms.

Optimal Design of Binocular Stereovision-Based Vehicle Speed Measurement System with Vehicle Multi-Characteristic Detection
After the multiple characteristics of the vehicle are detected, as shown in Figure 10, stereo matching is performed on each detected vehicle characteristic by the stereo matching algorithm in [14]. Thus, the stereo matching point pairs in the left-view and right-view images of the vehicle characteristic are obtained. Then, the matching point pair with the smallest square sum of the Euclidean distance to the bounding box center is selected as the speed measurement point. Zhengyou Zhang's calibration method is used to calculate the 3D coordinates of the selected speed measurement point, the displacement of adjacent speed measurement points are calculated, and the vehicle speed can be obtained by dividing the displacement by the time interval. The video capture frame rate is 30FPS, the vehicle speed is set to 43 km/h, the vehicle drives to the camera in a straight line from far and near at a constant speed, and the speed measurement is performed ten times per second. The speed measurement data by a professional satellite speed meter P-Gear P-510 are taken as the ground truth for comparison. The speed measurement results of single characteristic are shown in Table 4, and the error rate curve is shown in Figure 11. It can be seen from Table 4 [59]. However, the speeds measured by the mirror differs greatly from the satellite ground truth, with an error rate range of [−82.27%, 184.69%], which is far more than the 6% error rate limit.
By analyzing the speed measurement procedure of the mirror, the large error is mainly due to the smooth surface which is lack of detailed information such as texture. Thus, large error occurs in the stereo matching, which leads to the incorrect speed measurement. Therefore, in the following design of the multi-characteristic speed measurement system, only three vehicle characteristics (license plate, logo, and light) are selected, and the mirror characteristic is abandoned.  Assuming that the vehicle light characteristic always exists, the vehicle speed measurement algorithm of the proposed binocular stereovision vehicle speed measurement system with vehicle multi-characteristic detection is designed as Algorithm 1.  14: end if 15: return v 16: end function

Experiments
In the actual speed measurement test, the video capture frame rate is 30 FPS, the vehicle drives to the camera in a straight line from far and near at a constant speed, and the speed measurement is performed ten times per second. The speed measurement data by the professional satellite speed meter P-Gear P-510 is taken as the ground truth for comparison. To verify the algorithm, four speed measurement scenarios are set up, as shown in Figure 12. Figure 12a

Vehicle Speed Measurement Experiments with License Plate Detected
First, experiments are carried out in the speed measurement scenario where license plate, logo, and light all exist, as shown in Figure 12a. Table 5 shows the speed measurement results of a single experiment in this scenario, in which the vehicle drives in a straight line at a constant speed of 46 km/h.  Next, as shown in Figure 12b, the logo is artificially covered to simulate the speed measurement scenario where license plate and light exist but logo does not exist, and the experiments are carried out therein. Table 7 shows the speed measurement results of a single experiment in this scenario, in which the vehicle drives in a straight line at a constant speed of 30 km/h. As shown in Table 7  .22%], respectively. The maximum absolute error rate is 4.58%, which also meets the 6% error rate limit specified by the China national standard GB/T21255-2007 [59].

Vehicle Speed Measurement Experiments with License Plate Undetected
Then, as shown in Figure 12c, the license plate is artificially covered to simulate the speed measurement scenario where logo and light exist but license plate does not exist, and the experiments are carried out therein. Table 9 shows the speed measurement results of a single experiment in this scenario, in which the vehicle drives in a straight line at a constant speed of 45 km/h. , respectively. The maximum absolute error rate is 5.61%, which also meets the 6% error rate limit specified by the China national standard GB/T21255-2007 [59]. Finally, as shown in Figure 12d, the license plate and the logo are artificially covered to simulate the speed measurement scenario where only light exists but license plate and logo do not exist, and the experiments are carried out therein. Table 11 shows the speed measurement results of a single experiment in this scenario, in which the vehicle drives in a straight line at a constant speed of 36 km/h.

Contrast Experiments
The speed measurement results of the proposed system are compared with that of the original system in [14] in the same speed measurement scenario, including the speed measurement scenario with license plate not covered and the speed measurement scenario with license plate covered. The former is to verify the speed measurement accuracy of the proposed system, while the latter is to verify the speed measurement robustness of the proposed system. Table 13 shows the speed measurement results of two contrast experiments in the scenario with license plate not covered, in which the vehicle drives in a straight line at a constant speed of 32 km/h. Figure 13 is the corresponding speed measurement curve. As shown in Table 13, in the scenario with license plate not covered, the speed measurement error rate range of the proposed system is [−2.23%, 3.61%] and the maximum absolute error rate is 3.61%, while the speed measurement error rate range of the system in [14] is [−3.91%, 4.53%] and the maximum absolute error rate is 4.53%. Both meet the 6% error rate limit specified by the China national standard GB/T21255-2007 [59]. However, both the error rate range and the maximum absolute error rate of the proposed system are smaller than those of the system in [14]. As can be seen from the speed measurement curve in Figure 13, the proposed system has a smaller error fluctuation range, which is closer to the satellite ground truth, and has higher speed measurement accuracy.
Meanwhile, the speed measurement performances are compared between the proposed system and the various existing speed measurement methods. Table 14 shows a comparison of the speed measurement error between the proposed system and the other four methods in the scenario with license plate not covered. It can be seen that the root mean square error (RMSE) of the proposed system is smaller than that of the other four methods, and the maximum error is also smaller than that of the other four methods. Therefore, in the scenario with license plate not covered, the speed measurement accuracy of the proposed method is superior to that of the other four methods, that is, the speed measurement accuracy of the system is improved.     Table 15 shows the speed measurement results of two contrast experiments in the scenario with license plate covered, in which the vehicle drives in a straight line at a constant speed of 33 km/h. As shown in Table 15, in the scenario with license plate covered, the speed measurement error rate range of the proposed system is [−2.50%, 4.99%] and the maximum absolute error rate is 4.99%, which meets the 6% error rate limit specified by the China national standard GB/T21255-2007 [59], while the system in [14] fails in the speed measurement. The proposed system can still measure the vehicle speed accurately in the scenario with license plate covered. Meanwhile, the speed measurement performances are compared between the proposed system and the various existing speed measurement methods. Table 16 shows a comparison of the speed measurement error between the proposed system and the other four methods in the scenario with license plate covered. It can be seen that the methods in [8,14] fail, and the RMSE and maximum error of the proposed system are smaller than that of the other two methods. Therefore, in the scenario with license plate covered, the speed measurement robustness of the proposed method is superior to that of the other two invalid methods, and the speed measurement accuracy of the proposed method is superior to that of the other two valid methods, that is, the speed measurement robustness of the system is improved.

Conclusions
In this study, we solved the problem of effectively measuring the vehicle speed based on binocular stereovision system in case of license plate violation. We proposed a robust binocular stereovision-based vehicle speed measurement system with vehicle multi-characteristic detection. We optimized the object detection algorithm for vehicle multi-characteristic detection, and thus optimized the whole vehicle speed measurement system. The FPN-based YOLOv4 object detection algorithm was selected for the vehicle multi-characteristic detection, and the ECA channel attention mechanism was combined to improve YOLOv4. An improved ECA-YOLOv4 object detection algorithm was proposed for the vehicle multi-characteristic detection, which was trained and verified by our constructed vehicle multi-characteristic dataset. The experimental results showed that the proposed ECA-YOLOv4 object detection algorithm can efficiently improve the detection accuracy of multiple characteristics of the vehicle and minimize the network model parameter size. Three vehicle characteristics, that is, license plate, logo, and light, were chosen to design the corresponding speed measurement system for vehicle multi-characteristic detection. Many experiments were carried out in different speed measurement scenarios. Experimental results show that the proposed speed measurement system improves the speed measurement accuracy efficiently and effectively solve the problems in robustness of the system in [14].