# Optimized YOLOv3 Algorithm and Its Application in Traffic Flow Detections

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## Featured Application

**This article applies the neural network of one-stage target detection to detect and count the urban traffic flows in different scenarios and weather conditions. This research can be used to provide information-based support for the development and optimization of the transportation systems of a modern smart city. When the data of detections and statistical analyses of traffic flows has been further applied, traffic management departments can make better decisions on road infrastructure optimization or traffic limits to avoid a large number of traffic congestion and traffic accidents, and that can improve the life quality and convenience of urban people.**

## Abstract

## 1. Introduction

## 2. The Composition and Principle of Traffic Flow Detection System

- 1)
- First, the images of size 416 × 416 are input into the Darknet-53 network. After performing many convolutions, a feature map of size 13 × 13 is obtained, and then 7 times by 1 × 1 and 3 × 3 convolution kernels are processed to realize the first class and regression bounding box prediction.
- 2)
- The feature map with size 13 × 13 is processed 5 times by 1 × 1 and 3 × 3 convolution kernels, and then the convolution operation is performed by using 1 × 1 convolution kernel, followed by 2 times the upsampling layer, and stitching to the size on the 26 × 26 feature map. The new feature map of size 26 × 26 is then processed 7 times using 1 × 1 and 3 × 3 convolution kernels to achieve the second category and regression bounding box prediction.
- 3)
- A new feature map has a size of 26 × 26. Firstly, we use 1 × 1 and 3 × 3 convolution kernels to process 5 times, perform a double upsampling operation, and stitch it onto the feature map of size 52 × 52. Then, the feature map is processed 7 times using 1 × 1 and 3 × 3 convolution kernels to achieve the third category and regression bounding box prediction.

## 3. YOLO v3 Algorithm Optimization

#### 3.1. Network Structure Optimization

- 1)
- The input size can be ignored and a fixed-length output can be generated to solve the problem of inconsistent input image size.
- 2)
- When the multi-level spatial multi-scale block pooling operation is used, not only a sliding window of size for the pooling operation is used, the speed of computing the entire network of features can be improved.
- 3)
- The space pyramid module divides the feature maps into different levels at different levels, calculates the features of each level, and finally fuses the features of each level together, that is, conducive to the situation of large differences in target sizes and in the images to be detected. Especially, the complex multi-target detection can be improved by YOLOv3, for that the detection accuracy has been greatly improved.

#### 3.2. Loss Function Optimization

^{gt}= (x

^{gt}, y

^{gt}, w

^{gt}, h

^{gt}) is the real box and B= (x, y, w, h) is the prediction box. Usually, the distance between the bounding boxes is measured by the coordinates of B and B

^{gt}using the l

_{n}-norm (n = 1 or 2) loss function. In recent years, IOU loss is usually used to improve the IOU index. The expression of IOU loss is listed as followed:

^{gt}represent the center points of the prediction frame and the real frame, respectively, ρ represents the calculation of the Euclidean distance between the two center points. The calculation result is the value of d in Figure 5, and c represents the diagonal distance between the smallest closed area that can contain both the prediction box and the real box.

## 4. Making the Data Set

- 1)
- Collecting daytime, dusk, evening, and rainy pictures from the DETRAC data set, a total of 6203 pictures were collected;
- 2)
- Combining the 6203 pictures and the VOC_2007 data set to make a DL_CAR data set containing 26,820 pictures;
- 3)
- Randomly extracting 80% of the DL-CAR data set to make a training verification set;
- 4)
- Randomly extracting 80% from the training verification set to make the training set;
- 5)
- The remaining 20% of the DL-CAR data set is used as the verification set and test set in a 1:1 ratio;
- 6)
- Organizing your own data set according to the structure of the VOC data set. The folder structure of the VOC data set is shown in Figure 6;
- 7)
- Use OpenCV to read all the images in the folder, name them in the order of reading and unify the format to facilitate later statistics.

- 1)
- Using the mouse to select and frame the target vehicle area;
- 2)
- Double-clicking to mark the corresponding target category;
- 3)
- Clicking “Save” after marking.

_{c}, y

_{c}, w, h), and column 5–24 represents the object class sequence number. Next, we parse the XML file and take out all target categories in the file and their coordinate values (x

_{min}, y

_{min}, x

_{max}, y

_{max}) in the upper left and lower corners, these data were then multiplied while using ratio values according to the 448 × 448 image scaling factor to obtain (x1

_{min}, y1

_{min}, x2

_{max}, y2

_{max}). Subsequently, Equation (6) is used to convert the coordinates into the form of center point coordinates, and Equation (7) is used to calculate which grid the target center falls into. In the image label, the grid confidence degree is set to 1, the coordinate of the center point is set to the calculation results of Equations (6)–(7), and the corresponding target category index is set to 1.

## 5. Experiment and Analysis of Results

#### 5.1. Experimental Platform

#### 5.2. Network Training

#### 5.3. Analysis of Experimental Data

#### 5.3.1. Experimental Evaluation Parameters

_{r}is the maximum value of the recall rate greater than the corresponding precision rate of the interval segment, and then the average value of the maximum value of 11 points is calculated. In practice, we do not directly calculate the PR curve but smooth the PR curve. That is, for each point on the PR curve, the value of Precision is the value of the largest Precision to the right of that point.

#### 5.3.2. Comparative analysis of Different Algorithm Experiments

#### 5.3.3. Comparative Analysis of Experiments in Different Scenarios

#### 5.3.4. Video Stream Experimental Data Analysis

## 6. Conclusions

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## References

- Liu, Y. Big Data Technology and Its Analysis of Application in Urban Intelligent Transportation System. In Proceedings of the International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Xiamen, China, 25–26 January 2018; pp. 17–19. [Google Scholar]
- Xu, Y.Z.; Yu, G.Z.; Wang, Y.P.; Wu, X.K.; Ma, Y.L. A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images. Sensors
**2016**, 16, 1325–1348. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Qiu, Q.J.; Yong, L.; Cai, D.W. Vehicle detection based on LBP features of the Haar-like Characteristics. In Proceedings of the 11th World Congress on Intelligent Control and Automation, Shenyang, China, 29 June–4 July 2014; pp. 1050–1055. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; Mcallester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Softw. Eng.
**2010**, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Prakash, J.S.; Vignesh, K.A.; Ashok, C.; Adithyan, R. Multi class Support Vector Machines classifier for machine vision application. In Proceedings of the International Conference on Machine Vision and Image Processing (MVIP), Taipei, Taiwan, 14–15 December 2012; pp. 197–199. [Google Scholar]
- Kenan, M.U.; Hui, F.; Zhao, X.; Prehofer, C. Multiscale edge fusion for vehicle detection based on difference of Gaussian. Opt.-Int. J. Light Electron Opt.
**2016**, 127, 4794–4798. [Google Scholar] - Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version] - Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell.
**2015**, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Thakar, V.; Saini, H.; Ahmed, W.; Soltani, M.M.; Aly, A.; Yu, J.Y. Efficient Single-Shot Multibox Detector for Construction Site Monitoring. In Proceedings of the 4th IEEE International Smart Cities Conference (ISC2), Kansas City, MO, USA, 16–19 September 2018; pp. 1–6. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1854–1862. [Google Scholar]
- Liu, J.; Huang, Y.; Peng, J.; Yao, J.; Wang, L. Fast Object Detection at Constrained Energy. IEEE Trans. Emerg. Top. Comput.
**2018**, 6, 409–416. [Google Scholar] [CrossRef] - Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell.
**2017**, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Tesema, F.B.; Lin, J.; Ou, J.; Wu, H.; Zhu, W. Feature Fusing of Feature Pyramid Network for Multi-Scale Pedestrian Detection. In Proceedings of the 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 14–16 December 2018; pp. 10–13. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 10–13 December 2016; pp. 770–778. [Google Scholar]
- Cao, C.Y.; Zheng, J.C.; Huang, Y.Q.; Liu, J.; Yang, C.F. Investigation of a Promoted You Only Look Once Algorithm and Its Application in Traffic Flow Monitoring. Appl. Sci.
**2019**, 9, 3619. [Google Scholar] [CrossRef] [Green Version] - Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 234–258. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. 2019, pp. 1458–1467. Available online: https://arxiv.org/abs/1911.08287 (accessed on 9 March 2020).
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–21 June 2019; pp. 658–666. [Google Scholar]

**Figure 7.**Comparison of Vehicle Detection Based on Optimized Models (

**a**) YOLOv3 algorithm detection results and (

**b**) YOLOv3-DL algorithm detection results.

Data Sets | Number of Actual | Number of Statistics | Precision (%) | Recall (%) | Accuracy Rate (%) |
---|---|---|---|---|---|

Training set | 11,1351 | 11,2422 | 94.23 | 95.14 | 95.14 |

Training verification set | 139098 | 14,0427 | 94.24 | 95.14 | 95.14 |

Verification set | 27,747 | 27,982 | 94.36 | 95.16 | 95.16 |

Test set | 34,760 | 35,416 | 93.22 | 94.98 | 95.01 |

Data Sets | Number of Actual | Number of Statistics | Precision (%) | Recall (%) | Accuracy Rate (%) |
---|---|---|---|---|---|

Training set | 11,1351 | 11,3929 | 96.73 | 98.97 | 98.97 |

Training verification set | 13,9098 | 14,0937 | 96.77 | 98.05 | 98.97 |

Verification set | 27,747 | 28,374 | 96.92 | 99.11 | 99.11 |

Test set | 34,760 | 3,5870 | 95.82 | 98.88 | 98.83 |

Data Sets | Number of Actual | Number of Statistics | Precision (%) | Recall (%) | Accuracy Rate (%) |
---|---|---|---|---|---|

Sunny | 6823 | 6837 | 94.85 | 95.05 | 95.05 |

Cloudy | 5922 | 5968 | 94.52 | 95.26 | 95.25 |

Rainy | 8517 | 8629 | 93.49 | 94.73 | 94.69 |

Night | 5731 | 5825 | 93.74 | 95.27 | 95.26 |

Data Sets | Number of Actual | Number of Statistics | Precision (%) | Recall (%) | Accuracy Rate (%) |
---|---|---|---|---|---|

Sunny | 6823 | 6861 | 98.33 | 98.87 | 98.91 |

Cloudy | 5922 | 6031 | 98.17 | 99.99 | 99.97 |

Rainy | 8517 | 8694 | 97.59 | 99.61 | 99.56 |

Night | 5731 | 5852 | 97.88 | 99.95 | 99.94 |

Video | Number of Actual | Number of Statistics | Accuracy (%) |
---|---|---|---|

Test-1.mp4 | 50 | 44 | 88 |

Test-2.mp4 | 67 | 62 | 92.5 |

Test-3.mp4 | 25 | 23 | 92 |

Video | Number of Actual | Number of Statistics | Accuracy (%) |
---|---|---|---|

Test-1.mp4 | 50 | 49 | 98 |

Test-2.mp4 | 67 | 66 | 98.5 |

Test-3.mp4 | 25 | 25 | 100 |

Algorithm | Accuracy (%) | Time/ms |
---|---|---|

ViBe | 96.2 | 158 |

Faster R-CNN | 83.5 | 85 |

SSD | 85.8 | 54 |

YOLOv3 | 90.8 | 32 |

YOLOv3-DL | 98.8 | 25 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Huang, Y.-Q.; Zheng, J.-C.; Sun, S.-D.; Yang, C.-F.; Liu, J.
Optimized YOLOv3 Algorithm and Its Application in Traffic Flow Detections. *Appl. Sci.* **2020**, *10*, 3079.
https://doi.org/10.3390/app10093079

**AMA Style**

Huang Y-Q, Zheng J-C, Sun S-D, Yang C-F, Liu J.
Optimized YOLOv3 Algorithm and Its Application in Traffic Flow Detections. *Applied Sciences*. 2020; 10(9):3079.
https://doi.org/10.3390/app10093079

**Chicago/Turabian Style**

Huang, Yi-Qi, Jia-Chun Zheng, Shi-Dan Sun, Cheng-Fu Yang, and Jing Liu.
2020. "Optimized YOLOv3 Algorithm and Its Application in Traffic Flow Detections" *Applied Sciences* 10, no. 9: 3079.
https://doi.org/10.3390/app10093079