An All-in-One Vehicle Type and License Plate Recognition System Using YOLOv4

In smart surveillance and urban mobility applications, camera-equipped embedded platforms with deep learning technology have demonstrated applicability and effectiveness in identifying various targets. These use cases can be found in a variety of contexts and locations. It is critical to collect relevant data from the location where the application will be deployed. In this paper, we propose an integrated vehicle type and license plate recognition system using YOLOv4, which consists of vehicle type detection, license plate detection, and license plate character detection to better support the context of Korean vehicles in multilane highway and urban environments. Using our dataset of one to four multilane images, our system detected six vehicle classes and license plates with mAP of 98.0%, 94.0%, 97.1%, and 84.6%, respectively. On our dataset and a publicly available open dataset, our system demonstrated mAP of 99.3% and 99.4% for the detected license plates, respectively. From 4K high-resolution images, our system was able to detect minuscule license plates as small as 100 pixels wide. We believe that our system can be used in densely populated regions to address the high demands for enhanced visual sensitivity in smart cities and Internet-of-Things.


Introduction
Computer vision applications automate repetitive tasks that require the human ability and attention to continuously monitor and make timely decisions. A profusion of such applications has been developed to detect, identify, and track various objects of interest. Recent advancements in smart city technologies [1] have enabled a plethora of visual sensors to be installed in the intelligent environment and smart infrastructure, such as closed-circuit television (CCTV), visual sensor networks [2], smart surveillance [3], intelligent traffic systems [1,4], security cameras, and black boxes in vehicles. A series of state-of-the-art deep learning techniques for challenging computer vision problems [5] can detect and identify a vast number of diverse objects across categories on a grand scale. Individuals and their vehicles are significant subjects of interest in large cities and metropolitan regions, which smart cameras try to recognize. A large number of license plate recognition (LPR) [6][7][8] and make and model recognition (MMR) [9][10][11] systems have been developed to relieve human operators of the tedious task of explicitly detecting, identifying, and recognizing a wide range of cars, as illustrated in Figure 1.
In this regard, we are particularly motivated to recognize modern Korean vehicle types (VT) and Korean license plates (LP) in areas with high vehicle density in South Korea. The number of cars registered in South Korea exceeded 24 million in 2020, according to the Korean Statistical Information Service, which is roughly equivalent to one car per 2.19 people or 456.6 cars per 1000 people. Furthermore, Seoul (i.e., the capital and largest metropolis of South Korea) is one of the most surveilled cities in the world, boasting 77,564 cameras for 234 square miles or 331.94 cameras per square mile (source: https: //www.comparitech.com/vpn-privacy/the-worlds-most-surveilled-cities/, accessed on 7 December 2021). In an ever-increasingly complex urban environment, we propose an all-in-one system named KVT-LPR which stands for Korean vehicle type and license plate recognition system, capable of identifying both VTs and LPs in the same processing pipeline. Our contributions in this paper are as follows. • We propose a two-phase architecture based on YOLOv4 [12] for detecting vehicle types and recognizing Korean LPs in one pipeline. • We collect and build a custom dataset for various Korean vehicle types and LPs captured from multilanes to train and validate two custom detectors in the KVT-LPR. • We show that the KVT-LPR effectively detects small license plates from 4K highresolution input images, which is an enhancement over previous detectors. • We demonstrate the feasibility and applicability of the KVT-LPR's practically deployed detection performance in different settings across two datasets (i.e., a custom dataset and a publicly open dataset) and two target platforms (i.e., from a high-end to an embedded solution).

Related Work
There have been a series of attempts to build faster and more accurate LPR systems. In recent years, deep learning-based approaches, such as single shot detector (SSD) [13] and You Only Look Once (YOLO)-based models [12,[14][15][16], have been used to detect and recognize LPs. YOLO was first designed to provide fast detection speed, but it had low accuracy [14]. Despite the fact that YOLOv2 enhanced the speed and accuracy of object identification over its predecessor [15], the SSD still outperformed for smaller objects. YOLOv3's accuracy has improved since then, but its detection speed has slowed down [16]. YOLOv4 has improved performance in both speed and accuracy compared to YOLOv3 [12]. Hendry and Chen tweaked the original YOLO to create an automatic license plate recognition (ALPR) system that had a detection accuracy of 98.22% and a recognition accuracy of 78.22% [17]. Laroca et al. developed an ALPR system based on YOLO that outperformed previous systems with a recognition rate of 96.9% when tested on public datasets [18]. Castro-Zunti et al. presented an SSD-based LPR system that accurately recognized 96.23% of the Caltech Cars dataset and 99.79% of the UCSD-Stills dataset [19].
There are several related LPR systems targeting Korean LPs and sharing similar approaches. Han et al. used the cascade structure with AdaBoost learning to offer a realtime LPR identification method for high-resolution videos [20]. Park et al. developed a multinational LPR system that recognizes multiple Korean LP styles (i.e., single-line, doubleline, various layout formats) using the K-nearest neighbors method [21]. By adding spatial pyramid pooling to YOLOv3, Kim et al. developed a multiscale vehicle detection that outperformed other detectors [22]. For recognizing multinational LPs, including Korean LPs, Henry et al. presented an ALPR system based on YOLOv3 [23]. LP detection, unified character recognition, and multinational LP layout detection were all included in their system's architecture. Initially, they have collected and made public their own Korean automobile plate dataset, known as KarPlate. However, due to legal issues, the dataset is no longer available. Sung et al. showed Korean LP identification performance on the NVIDIA Jetson TX2 board with their custom KETI-ALPR dataset that is not open to the public using YOLOv3, YOLOv4, and SSD [24]. To recognize Korean car types, Kim et al. evaluated faster-RCNN, YOLOv4, and SSD object identification approaches [25]. Their findings revealed that YOLOv4 outperformed SSD and faster-RCNN in terms of F1 score, precision, recall, and mAP. To deal with the problem of data sparsity in the training stage, Han et al. synthesized LPs using an ensemble of generative adversarial networks (GAN) [26]. Wang et al. developed a Korean LPR approach using deep learning and KarPlate dataset (when the dataset was still available) to recognize LPs under various conditions (i.e., fog and haze) [27]. Lim and Park proposed an AI machine learning system that can use CCTV images to check illegally parked cars with the LPR function [28].
In contrast to prior research, this study investigates the application of YOLOv4 for LPR and vehicle type recognition in the Korean environment with multilanes and highresolution cameras. Table 1 compares previous studies in terms of their approaches, datasets, and system support features. Our system aims to better support the Korean context by using multilanes images collected from high-resolution cameras. The size of LPs will be small in high-resolution images. We employ YOLOv4 to recognize small LPs and vehicle types and to show that its performance is embedded-platform-ready.

Proposed Methodology
The goal of a typical LPR system is to output numbers and characters on LPs as text. Similarly, a typical MMR system identifies the vehicle's make and model from several candidates. Our goal was to create an LPR system that could identify Korean LPs and recognize a variety of Korean vehicle types as defined by the Korean vehicle classification criteria. We present an all-in-one Korean vehicle type and LP recognition system, named KVT-LPR, that employs YOLOv4 as the underlying object detector model. Figure 2 shows the overview of our KVT-LPR using YOLOv4. The KVT-LPR aims to identify vehicle types and recognize license plates from high-resolution (i.e., 4K resolution) and multilane images (i.e., one to four lanes). The details of the KVT-LPR system, including YOLOv4-based object detector and data collection processes, are elaborated in the following subsections. Moreover, detailed procedures of the two custom detectors (VT_LP detector and LPC detector) are visually illustrated in Section 4.

Input :
Images

YOLOv4-Based Vehicle Type and License Plate Recognition
The KVT-LPR system processes a high-resolution input image (i.e., 3840 × 2160) decoded from a high-resolution video. We collected real Korean vehicle types and LPs to build our custom dataset. Then, we used the custom dataset to train using YOLOv4 to build two custom detectors. The first detector is a VT_LP detector, which detects seven classes (i.e., six different Korean vehicle types and LPs) in the input image. The second detector is an LPC detector, which detects 68 different numbers and characters on Korean LPs. The character size of LPs is small in relation to the entire image on a high-resolution image, making character identification more challenging. To overcome this problem, we included an LP cropping procedure to the KVT-LPR, which gives the LPC detector segmented LP regions. In phase 1, vehicle types and occurrences of LPs are detected by the VT_LP detector. If LPs are found, the cropped LP image for each LP is passed into the LPC detector for phase 2. To summarize, the VT_LP detector is called first to detect vehicle types and LPs, followed by the LPC detector for each LP found. If the input image contains a large number of LPs, the KVT-LPR's overall turnaround time multiplies.

Vehicle Types and LPs
We installed a camera on a highway overpass to manually record real traffic videos in order to collect various vehicle types and LP images that represent the context and environment of South Korea, as shown in Figure 3. The camera overlooking the highway (i.e., two-lane, three-lane, and four-lane) captured traffic videos at 3840 × 2160. We also recorded videos with a smartphone camera at 3840 × 2160. Images including one or more vehicles were extracted from the recorded video and used as training data for the custom detectors. Figure 4 shows examples of captured raw images that qualify for training uses.  To label different vehicle types, we referenced a vehicle classification according to the vehicle size and passenger capacity used by the Korea Expressway Corporation (https: //www.ex.co.kr/portal/usefee/selectUseFeeNList.do, accessed on 7 December 2021). We classified vehicles into six categories based on the vehicle size and passenger capacity. The smallest vehicles or compact cars were labeled as 'compact'. Vehicles capable of holding nine or fewer passengers were labeled as 'car'. Vehicles with a capacity of 25 or fewer passengers were labeled as 'mini van'. Big vans with 25 or more passengers were labeled as 'bus (big van)'. Smaller two-axle freight vehicles were labeled as 'mini truck', and threeor-more-axle freight vehicles were labeled as 'truck'. The six vehicle types we labeled in our dataset are shown in Figure 6. Table 2 shows the collected dataset of six vehicle types and LPs.

LP Numbers and Characters
The recorded videos were also used to manually label Korean LP numbers and characters. We also took additional pictures of LPs with a smartphone camera. LP areas were segmented and used as training data from these sources. In the case of LPs, a bounding box was drawn over the four vertices of an LP. Furthermore, bounding boxes were annotated on each number or character on LPs, as shown in Figure 7. Over 60,000 occurrences of Korean LP numbers and characters were collected and grouped into 68 classes (i.e., numbers 0 to 9: class 0 to 9, 41 Korean characters: class 10 to 50, and 17 local area prefixes: class 51 to 67). Figure 8 shows different Korean LP styles, including single-line and double-line LPs. Area prefixes and predesignated Korean characters can be found on older LPs and special-purpose vehicles. Tables 3 and 4 show the collected dataset for Korean LP numbers and characters. Note that we were not able to collect all LP characters, and numerous local area prefixes were left out (highlighted in gray in Table 4).

Experiments
To evaluate the feasibility and effectiveness of the KVT-LPR system, we evaluated the KVT-LPR system's capability of detecting small LPs, detection speed, the performance of vehicle type detection, and the performance of LPR.

Implementation
To implement our proposed KVT-LPR system, we used YOLOv4 [12] as the underlying object detector. We used an open-source darknet framework to train YOLOv4 to detect our custom set of classes (i.e., vehicle types, LP, and LP characters). We had previously experimented with several image input sizes before settling on a 256 × 256 image input size for YOLOv4 [24]. We discovered a considerable performance decrease on the lower-end embedded platform, despite the fact that a bigger input size, such as 608, increased accuracy. Figures 9 and 10 show the training loss and the mean average precision at 50% intersection-over-union threshold (mAP @ 0.5). For the VT_LP detector, the collected dataset was used as 70% train, 17.5% validation, and 12.5% test sets for each class. For the LPC detector, we used the collected dataset as 80% train and 20% test sets for all classes.

Detection Speed
To measure the detection speed of the KVT-LPR, we used images that contain one car and one LP per lane. This means that one-lane test images (33 images) contained one car and one LP, and two-lane test images (20 images) contained two cars and two LPs. Likewise, three-lane test images (22 images) contained three cars and three LPs, and four-lane test images (19 images) contained four cars and four LPs. Figure 12 shows the examples of test images.
The detection speed is defined as the time it takes to detect vehicle type and LP (phase 1, VT LP detector) and the time it takes to recognize LP characters from a cropped LP image (phase 2, LPC detector). Two platforms running Ubuntu 18.04 were evaluated: a PC with an RTX3090 graphics card (representing a high-end specification, GeForce RTX3090, NVIDIA CUDA Cores 10496, memory 24 GB, AMD Ryzen 7 3700X 8-core processor, 16 GB main memory) and a Jetson AGX Xavier (representing a low-end or embedded specification, 512-core NVIDIA Volta ™ GPU with 64 tensor cores, 8-core ARM ® v8.2 64-bit CPU, 8 MB L2 + 4 MB L3, 32 GB 256-bit LPDDR4x | 137GB/s, 32GB eMMC 5.1). Tables 6 and 7 show the measured detection speed on two platforms. The detection speed for the VT_LP detector or phase 1 is comparable across different multilanes. However, the detection speed for the LPC detector or phase 2 is significantly reduced. This can be explained by the fact that the VT_LP detector detects only seven classes, whereas the LPC detector detects a magnitude more classes.
We   Figure 13 shows examples of successfully detected vehicle types with the VT_LP detector. Figure 13. Examples of successfully detected vehicle types ( car , mini van , mini truck , truck , bus ) with the VT_LP detector in phase 1.

License Plate Recognition Performance
Phase 2 of the proposed KVT-LPR system was evaluated according to the same metrics. The LPC detector detects 68 classes (i.e., numbers 0 to 9, 17 local area prefixes, and 41 Korean characters). First, we used our custom dataset to evaluate the performance of the LPC detector. As mentioned earlier, our dataset does not include several local area prefixes (i.e., 광주 (Gwangju), 대전 (Daejeon), 세종 (Sejong), 울산 (Ulsan), 전남 (Jeonnam), 전북 (Jeonbuk), 제주 (Jeju) ). Additionally, we used a publicly available LP dataset from AI-Hub (https://aihub.or.kr/aidata/27727, accessed on 7 December 2021). This open dataset includes 100,000 cropped car number plates in JPG format. We excluded local area prefixes not collected in our dataset. We tried to gather another open dataset, such as KarPlate dataset [23], but it was no longer available due to legal issues. There are other approaches, such as synthetically generating LPs [26] and synthetic LP dataset (https://www.idai.or. kr/user/data_market/detail.do?id=63af9c70-ce79-11eb-ba8d-eb1fdd80455f, accessed on 7 December 2021), but we only evaluated our detector with the real data. Figure 14 shows LPR results on our custom dataset. Figure 15 shows LPR results on the AI-Hub dataset. Table 10 shows the performance of the LPC detector according to the evaluation metrics, and Table 11 shows the detailed per-class results. With relatively few false positives and false negatives, the LPC detector had an adjusted mAP (i.e., eliminating classes with no or sparse data) of 99.30% for our custom dataset and 99.41% for the publicly open AI-Hub dataset.

Discussion
Typical LPR systems use the camera view to monitor and check the LP of a single vehicle. The throughput (i.e., the number of LPs detected) of an LPR system can be enhanced and the deployment cost can be decreased if it can check multiple vehicles in several lanes. The proposed KVT-LPR showed that using multilane high-resolution images for LPR and vehicle type detection is possible. Table 5 shows how our system successfully detected small LP sizes of about 100 pixels. The KVT-LPR can be deployed on an embedded platform such as Jetson AGX. Table 7 shows that a standalone KVT-LPR configuration is feasible, but a networked-system (i.e., sending images to servers for recognition) approach can compensate for its shortcomings.
Our approach has some limitations. First, not all possible Korean LP styles and characters were collected in our dataset. Due to the geographical distance between other regions (i.e., cities and provinces) and our data collection location, several LPs with local area prefixes were left out. More data on those missing locations can be collected to improve our dataset. Second, the vehicle type detection can be improved by disregarding partially visible vehicles in images. In three-lane and four-lane images, those partially visible vehicles often resulted in failure cases. Third, our method assumes that the front view of the vehicle is captured. When the vehicle's rearview is used for recognition, vehicle type detection using the VT_LP detector is not possible due to this constraint. Regardless, LPR via the LPC detector works in both frontal and rear views. Lastly, as with many previous LPR studies, our dataset is not disclosed for legal reasons (i.e., obtaining the vehicle owner's consent for distribution and reuse).

Conclusions
This paper proposed KVT-LPR, a two-phase LPR system based on YOLOv4 for Korean vehicles and LPs. Using 4K high-resolution input images, six vehicle types and LPs are detected by the VT_LP detector, followed by the LPC detector for LPR. The KVT-LPR is applicable to settings (i.e., highly populated and multilane highways in Korea) where the size of LPs is small. Across two datasets (our custom dataset and an open public dataset) and two target systems (RTX3090 and Jetson AGX), two custom detectors in the KVT-LPR demonstrated LPR performance suitable for both high-end and embedded platforms.
Our approach has limitations and drawbacks discussed in previous sections that deserve further research. For example, our dataset can be extended to include national coverage and special purpose vehicles. Moreover, to optimize LPR performance in designated settings (i.e., standalone, over-the-network, on edge devices), various network parameters, including image input size for YOLOv4 or other object detectors, can be compared, and trade-offs can be analyzed. Nonetheless, we have demonstrated the merits of our proposed KVT-LPR to effectively address Korean LPR with vehicle type detection that can be used in various complex smart city applications.

Conflicts of Interest:
The authors declare no conflict of interest.