You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

30 December 2022

Robust Wheel Detection for Vehicle Re-Identification

and
Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Monitoring System for Aircraft, Vehicle and Transport Systems

Abstract

Vehicle re-identification is a demanding and challenging task in automated surveillance systems. The goal of vehicle re-identification is to associate images of the same vehicle to identify re-occurrences of the same vehicle. Robust re-identification of individual vehicles requires reliable and discriminative features extracted from specific parts of the vehicle. In this work, we construct an efficient and robust wheel detector that precisely locates and selects vehicular wheels from vehicle images. The associated hubcap geometry can hence be utilized to extract fundamental signatures from vehicle images and exploit them for vehicle re-identification. Wheels pattern information can yield additional information about vehicles in questions. To that end, we utilized a vehicle imagery dataset that has thousands of side-view vehicle collected under different illumination conditions and elevation angles. The collected dataset was used for training and testing the wheel detector. Experiments show that our approach could detect vehicular wheels accurately for 99.41% of the vehicles in the dataset.

1. Introduction

Extracting discriminating features is an essential part of target classification and re-identification (re-ID). The extracted features can be utilized to re-identify the target accurately and precisely. Accurate and efficient target representation is crucial for real time decision making. Recent developments in deep learning have provided many possibilities in developing efficient target re-ID models. Computer Vision has witnessed a wealth of great research activity in object detection, classification and re-ID, all with their strengths and limitations.
Vehicle re-ID has become a challenging problem in computer vision and intelligent transportation systems. Vehicle re-ID objective is to match a target vehicle across different captured vehicle images. Some of the challenges that face vehicle re-ID include lack of sufficient or realistic surveillance data, changes of viewpoints, illumination conditions, and occlusions. Due to the ongoing advancements in computing, neural network architectures, and dataset collection, real-time vehicle re-ID is now a realizable goal. Vehicle re-ID is a vital task in intelligent transportation systems due to its broad applicability. Vehicle re-ID methods can be utilized in many significant real-world applications including but not limited to suspicious vehicle search, road access restriction management, cross-camera vehicle tracking, traffic time estimation, toll collection, traffic behavior analysis, vehicle counting, access control, and border control.
Conventional vehicle re-identification methods such as license plate recognition and Radio frequency identification tags RFID [1,2] have been used for that purpose for a long time. Unfortunately, such license plate based methods can often be unreliable because of camera imperfections, occluded characters, and the absence of plates due to pose variations.
In this paper, we propose to develop a foundational capability that help precisely detect and extract wheels from vehicle images. Our approach will aid vehicle re-ID and tracking not only by enabling comparison of the wheels and their associated geometry, but also enabling precise alignment of an overall vehicle image by its wheel coordinates for subsequent comparison. To the best of our knowledge, this is the first work that discusses the topic of vehicular wheel detection/selection using a post processing approach. Developing a precise wheel detector will thus provide supplementary information and help to discriminate between very similar looking vehicles. As a result, similar looking vehicles could be distinguished if their hubcaps are distinct as shown in Figure 1. For that purpose, we utilized a large vehicle imagery dataset representing diverse conditions of illumination, backgrounds and elevation angles [3,4].
Figure 1. Similar looking vehicles with different wheels.
The rest of the paper is organized as follows, in Section 2, we describe related work. In Section 3, the dataset structure is demonstrated. In Section 4, we define our approach and present our experimental results, while Section 5 provides concluding remarks and summarizes future work.

3. Dataset Description

To substantiate the validation of the proposed wheel detector model, we utilized a large dataset of vehicular imagery consisting of thousands of side-view images. The collected dataset was published in [3]. A roadside sensor system was employed to collect vehicle imagery using using various cameras and a radar unit. Side-view vehicle images were taken from distances ranging between 1 and 20 m using 1.8-mm to 6-mm lenses. Vehicle images were collected during both day and night over the course of several years. The road-side cameras were placed both at ground level and at elevated angles, providing a clear profile view of passing vehicles. The collections were done in speed zones ranging 25 to 45 MPH and near intersections where vehicles may have slowed or stopped. Moreover, license plate readers were employed to create a ground-truth label for each vehicle. Actual license plate numbers were replaced with a number and used as a label for the vehicle. In Figure 2, we show some sample images from the PRIMAVERA dataset.
Figure 2. Sample images from the PRIMAVERA dataset.
The PRIMAVERA dataset has 636,246 images picturing 13,963 vehicles. Each vehicle has a different number of images depending on the number of times it passed by the cameras.

4. Experiments and Results

In this Section, we will elaborate on the structure of our wheel detector and evaluate its performance on the dataset.

4.1. Vehicle Detection

A vehicle detector was trained to detect a vehicle from video frames using a part of the dataset. The training data for the vehicle detector consisted of 543,926 images representing 11,918 vehicles. To that end, we retrained a SSD Mobilenet V2 network [27] to detect and locate vehicles using the training set. SSD is a single-shot model intended to perform object detection and implemented using the Caffe framework. The SSD model has been trained on the Common Objects in Context (COCO) image dataset [28]. The pre-trained SSD provides a good performance for locating the vehicle bounding box in each frame. The output of the vehicle detection network is a bounding box that provides the coordinates of the vehicle location in each image. We crop the vehicle image and resize it to 300 × 300 while preserving the aspect ratio. Specifically, we re-scale the bounding box coordinates such that it corresponds to a square around the vehicle. We then crop the vehicle using the x-axis and y-axis coordinates of the square bounding box. We then scale the vehicular images and preserve the original aspect ratio. All the images in the dataset are flipped such that the vehicle is facing to the right, as the direction of the vehicle determined by a tracking algorithm applied to the original image sequence. The goal of this step is to remove any pose variability present in the data. The rest of the cropped image is padded with zeros. In Figure 3, we show an example for a vehicle image before and after cropping.
Figure 3. Before and after vehicle detection.

4.2. Wheel Detection

Similar to the vehicle detector, we trained a wheel detector to detect and locate vehicle wheels in video frames. We introduced an older version of the wheel detector in a previous paper [4]. The wheels were manually labeled for 4077 images using LabelImg [29]. The training set was constrained to represent diverse types of vehicles such as sedans, SUVs, trucks, vans, and big-rigs. We also included images that were taken under different lighting conditions. We subsequently retrained the SSD Mobilenet v2 network [27] to provide the bounding boxes information for vehicles’ wheels. Each detection is accompanied with a confidence score. In Figure 4, we show an example of detected wheels. From the figure, it can be concluded that some of the detections might not be accurate. In Section 4, we will elaborate on how we select the best candidate pair of wheels from the wheel detector output.
Figure 4. A Vehicle image and its detected wheels before wheel selection.

4.3. Wheel Selection

In the following section, we will elaborate on how we select the best candidate pair of wheels from the wheel detector output. Each wheel detection is represented by a bounding box. Each bounding box has 4 coordinates [ x l e f t , y b o t t o m , x r i g h t , y t o p ] , where each coordinate is normalized between [ 0 , 1 ] . We investigated the wheels location statistics using a portion of the dataset. We manually verified and labeled the wheel locations for 80,562 images representing 2128 different vehicles. In Figure 5, we show the four measures that are utilized in our approach to select the best pair of wheel candidates for each vehicle image.
Figure 5. The measures utilized in our approach; y-axis coordinate, back wheel location, front wheel location, and the distance between wheel centers.
The first measure we considered is the vertical location of the wheels with respect to the vehicle bounding box. In Figure 6, the histogram of the y t o p coordinate. From the histogram, it can be inferred that the wheels lie in the lower half of the box.
Figure 6. Wheels y-max co-ordinates histogram.
Next, we show the histograms of the wheels location on the x-axis. In Figure 7, the histogram of back wheel x r i g h t coordinate is demonstrated, while Figure 8 shows the histogram of front wheel x l e f t coordinate. From the plots, it can be concluded that the back wheel typically lies in the back third of the image while the front wheel is located in the front third of the box, assuming the vehicle is moving from left to right.
Figure 7. Back-wheel x-max co-ordinate histogram.
Figure 8. Front-wheel x-min co-ordinate histogram.
In addition to the wheels locations in the video frames, we also investigated the distance between wheel centers or wheelbase. The wheelbase is the horizontal distance between the centers of the front and rear wheels. The wheelbase is crucial for vehicle stability on the road and impacts the turning circle of a vehicle. Vehicles with a longer wheelbase are more stable at highway speeds but become more difficult to turn in tight turns. In Figure 9, the wheelbase histogram is demonstrated. From the the plot, it is clear that the wheelbase usually lies between 50 % and 70 % of the bounding box dimension. In our experiments, the bounding boxes’ coordinates are thresholded to filter out the wrong wheel detections and find the wheels that make sense physically. In particular, the thresholds applied on the y t o p coordinate and the wheelbase are optimized to achieve the highest wheel-detection accuracy. To that end, we utilized parameter optimization loops to iterate over different thresholds in reasonable ranges. This pre-processing step is referred to here as “Wheel Selection”.
Figure 9. The distance between wheel-centers (wheelbase) histogram.
In addition to thresholding the wheel coordinates, we also aligned the wheel detections. To that end, we used Random sample consensus (RANSAC) method to fit a line to those wheel centers having high confidence scores. RANSAC is an iterative approach that estimates a model from the data that has outliers. The RANSAC algorithm identifies the outliers in the data and predicts the model that fits noisy data. We subsequently filter out detections whose centers lie too far away from that line. This filtering step is here referred to as “Wheel alignment”.

4.4. Experimental Results

In the following, the results of our wheel detection approach are shown. After cropping and resizing the vehicle images, we applied our wheel detector to provide the bounding boxes information for the wheels. We then filtered out the wrong wheel detections using the wheel selection and wheel alignment processes as explained in the previous section. In Figure 10, we show some examples of detected wheels after applying wheel selection and wheel alignment.
Figure 10. Vehicle images and their detected wheels before and after wheel selection.
The validation set utilized has 80,562 images representing 2128 vehicles. Out of the 80,562 images, 76,547 images have two wheel detections or more. Each vehicle can have a different number of frames. In Table 1, we compare the detection performance of the wheel selection + wheel alignment approach versus only using wheel alignment.
Table 1. Wheel Detection Approach Performance. (Wheel Selection + Wheel Alignment vs. Wheel Alignment).
To better explore the potential contribution of our proposed approach, we compare it with a baseline method. The baseline approach selects the two wheels with highest confidence in each image. In Figure 4, we show an example for three wheel detections with the highest confidence scores. The baseline method picks the first pair of images as the predicted wheel pair. Accuracy here is defined as the percent of images for which the Euclidean distance between the centers of the detected wheels bounding boxes D P and the ground truth detections D G T is less than 3% of the bounding box dimension, i.e., ( D G T 1 D P 1 2 + D G T 2 D P 2 2 ) < 3 % . The detected wheels bounding boxes might be slightly shifted to the left or to the right compared to the ground truth bounding boxes. In addition, the detected wheels’ bounding boxes can be bigger or smaller than the ground truth bounding boxes. As a result, we had to allow a tolerance for variations when the detection accuracy is measured, hence the 3 % difference between the detected and ground truth box centers. Besides the number of images, we also report the number of vehicles having one or more frames with correct wheel detections. From the results, it can be concluded that utilizing our approach can precisely detect the wheel bounding boxes and retain a higher number of vehicles.

4.5. Vehicle Re-Identification

In this section, we will investigate the efficacy of the proposed wheel detector. The finer details from the wheels will be utilized to match a a pair of vehicles and produce a matching score. Based on the matching score, a decision is made either that is the same vehicle (True match) or if it is a different one (False match). The aim is to show that detected wheel patterns can be utilized to differentiate between vehicles. To that end, we trained a Siamese network [30] to match the front and rear wheels for each pair of vehicles. The wheel matching Siamese network, which we introduced in [4], consists of two identical branches. The two branches share the same parameters and network weights are updated similarly during training. The network computes the similarilty between the input wheel images by matching their signatures. Each branch consists of five convolutional layers. The input to each branch is a single wheel image that was resized to 100 × 100 × 3. Our proposed wheel detector estimates the bounding boxes for the front and rear wheels utilizing the mechanism described in Section 4.2 and Section 4.3, respectively. For validation, we compare the results of the vehicle re-ID using both the wheels detected by the baseline approach and our proposed wheel detector. Specifically, the detected wheels corresponding to each pair of vehicles in the validation set are cropped and matched using the Siamese model. The output feature vectors of the two branches is then fed into the last layer of the network, which produces the similarity score. The output score is a measure between 0 and 1. For each pair of vehicles, the front wheel of the first vehicle is compared to the front wheel of the other vehicle and similarly the rear wheels are also compared. A threshold is then applied to the average of the two wheel matching scores, i.e., if the average of the scores is more than 0.5, then the two vehicles are declared to be the same and vice versa. We compared the performance of vehicle re-ID using the baseline detector versus our proposed wheel detector and the results are shown in Table 2. From the results, it can be inferred that the proposed vehicle matching network that uses the wheels from our wheel detector is more reliable and accurate than the re-ID network that utilizes the wheels detected with the baseline method.
Table 2. Vehicle re-ID performance using detected wheels.

5. Conclusions and Future Work

In this paper, a wheel detection approach is proposed for reliably detecting wheel bounding boxes using deep neural networks and a vehicle-specific post-processing algorithm to eliminate false detections. This approach enables accurate re-centering of the vehicle image based on wheel coordinates for improved re-identification. Subsequently leveraging the wheel geometry can provide additional identifying information about the vehicle. A software for our wheel detection and post-processing approach is provided. We compared the performance of our framework with a conventional wheel detector that selected the pair of wheels with highest confidence. Experimental results demonstrate the efficacy of our proposed wheel detector under different illumination conditions and elevation angles. A limitation of this work is that it is applicable as long as the wheels are shown in the vehicle image, no matter what the shooting angle or the illumination conditions are. Proposed future work includes investigating the use of multi-view vehicular imagery for vehicle re-identification.

Author Contributions

Conceptualization, S.G. and R.A.K.; methodology, S.G. and R.A.K.; software, S.G. and R.A.K.; validation, S.G. and R.A.K.; formal analysis, S.G. and R.A.K.; investigation, S.G. and R.A.K.; resources, R.A.K.; data curation, S.G. and R.A.K.; writing—original draft preparation, S.G.; writing—review and editing, R.A.K.; visualization, S.G.; supervision, R.A.K.; project administration, R.A.K.; funding acquisition, R.A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doepublic-access-plan, accessed on 1 June 2021).

Institutional Review Board Statement

Ethical review and approval were waived for this study after the Oak Ridge Site-wide Institutional Review Board (ORSIRB) reviewed the study and determined that the proposed work is not human subjects research.

Data Availability Statement

Profile Images and Annotations for Vehicle Reidentification Algorithms (PRIMAVERA). Available online: http://doi.ccs.ornl.gov/ui/doi/367 (accessed on 1 January 2022), doi:10.13139/ORNLNCCS/1841347.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Anagnostopoulos, C.; Alexandropoulos, T.; Loumos, V.; Kayafas, E. Intelligent traffic management through MPEG-7 vehicle flow surveillance. In Proceedings of the IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing (JVA’06), Sofia, Bulgaria, 3–6 October 2006; pp. 202–207. [Google Scholar]
  2. Kathawala, Y.A.; Tueck, B. The use of RFID for traffic management. Int. J. Technol. Policy Manag. 2008, 8, 111–125. [Google Scholar] [CrossRef]
  3. Kerekes, R. Profile Images and Annotations for Vehicle Reidentification Algorithms (PRIMAVERA); Oak Ridge National Lab.(ORNL): Oak Ridge, TN, USA, 2022. [Google Scholar]
  4. Ghanem, S.; Kerekes, R.A.; Tokola, R. Decision-Based Fusion for Vehicle Matching. Sensors 2022, 22, 2803. [Google Scholar] [CrossRef] [PubMed]
  5. Liu, X.; Liu, W.; Ma, H.; Fu, H. Large-scale vehicle re-identification in urban surveillance videos. In Proceedings of the 2016 IEEE International Conference on Multimedia and Expo (ICME), Seattle, WA, USA, 11–15 July 2016; pp. 1–6. [Google Scholar]
  6. Sochor, J.; Špaňhel, J.; Herout, A. Boxcars: Improving fine-grained recognition of vehicles using 3-d bounding boxes in traffic surveillance. IEEE Trans. Intell. Transp. Syst. 2018, 20, 97–108. [Google Scholar] [CrossRef]
  7. Kanacı, A.; Zhu, X.; Gong, S. Vehicle re-identification in context. In German Conference on Pattern Recognition; Springer: Cham, Switzerland, 2018; pp. 377–390. [Google Scholar]
  8. Lou, Y.; Bai, Y.; Liu, J.; Wang, S.; Duan, L. Veri-wild: A large dataset and a new method for vehicle re-identification in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3235–3243. [Google Scholar]
  9. Liu, X.; Liu, W.; Mei, T.; Ma, H. Provid: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimed. 2017, 20, 645–658. [Google Scholar] [CrossRef]
  10. Liu, X.; Zhang, S.; Huang, Q.; Gao, W. Ram: A region-aware deep model for vehicle re-identification. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018; pp. 1–6. [Google Scholar]
  11. Shen, Y.; Xiao, T.; Li, H.; Yi, S.; Wang, X. Learning deep neural networks for vehicle re-id with visual-spatio-temporal path proposals. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1900–1909. [Google Scholar]
  12. Teng, S.; Liu, X.; Zhang, S.; Huang, Q. Scan: Spatial and channel attention network for vehicle re-identification. In Pacific Rim Conference on Multimedia; Springer: Cham, Switzerlands, 2018; pp. 350–361. [Google Scholar]
  13. He, B.; Li, J.; Zhao, Y.; Tian, Y. Part-regularized near-duplicate vehicle re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3997–4005. [Google Scholar]
  14. Wei, X.-S.; Zhang, C.-L.; Liu, L.; Shen, C.; Wu, J. Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification. In Asian Conference on Computer Vision; Springer: Cham, Switzerlands, 2018; pp. 575–591. [Google Scholar]
  15. Zhao, Z.-Q.; Zheng, P.; Xu, S.-T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
  16. Cao, Z.; Simon, T.; Wei, S.-E.; Sheikh, Y. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
  17. Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international Conference on Multimedia, Orlando, FL, USA, 7 November 2014; pp. 675–678. [Google Scholar]
  18. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  19. Chen, C.; Seff, A.; Kornhauser, A.; Xiao, J. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2722–2730. [Google Scholar]
  20. Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
  21. Yang, Z.; Nevatia, R. A multi-scale cascade fully convolutional network face detector. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 633–638. [Google Scholar]
  22. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  23. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In European Conference on Computer Vision; Springer: Cham, Switzerlands, 2016; pp. 21–37. [Google Scholar]
  24. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
  25. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of theIn Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
  26. Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, B. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern, Las Vegas, NV, USA, 27–30 June 2016; pp. 2874–2883. [Google Scholar]
  27. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
  28. Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Cham, Switzerlands, 2014; pp. 740–755. [Google Scholar]
  29. Tzutalin LabelImg. Free Software: MIT License; 2015. Available online: http://github.com/tzutalin/labelImg (accessed on 1 July 2021).
  30. Chicco, D. Siamese neural networks: An overview. Artif. Neural Netw. 2021, 2190, 73–94. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.