Two-Step Algorithm for License Plate Identiﬁcation Using Deep Neural Networks

: License plate identiﬁcation remains a crucial problem in computer vision, particularly in complex environments where license plates may be confused with road signs, billboards, and other objects. This paper proposes a solution by modifying the standard car–license plate–letter detection approach into a preliminary license plate detection–precise license plate detection of the four corners where the numbers are located–license plate correction–letter identiﬁcation. This way, the ﬁrst algorithm identiﬁes all potential license plates and passes them as input parameters to the next algorithm for more precise detection. The main difference between this approach and other algorithms is that it uses a relatively small image compared to the whole vehicle. Thus, a small but robust network is used to ﬁnd the four corners and perform a perspective transformation. This simpliﬁes the letter recognition task for the next algorithm, as no additional transformations are required. This solution could be useful for research focusing on this speciﬁc task. It allows to apply another compact but robust neural network, increasing the overall speed of the system. Publicly available datasets were used for training and validation. The CenterNet object detection algorithm was used as a basis with a modiﬁed Hourglass-type network. The size of the network was decreased by 40% and the average accuracy was 96.19%. Speed signiﬁcantly increased, reaching 2.71 ms and 405 FPS on average.


Introduction
License plate (LP) identification is a critical process in many areas due to the significant number of vehicles caused by urbanization and related issues such as toll collection, vehicle verification, and speeding detection.Collected data can help identify mobility patterns, travel demand, congestion patterns in cities, traffic management [1], and even estimate emissions [1,2].There are various LP recognition and identification algorithms proposed, but they can be divided into two main categories: those using deep learning and traditional algorithms.Images are now collected not only by speed recorders from passing vehicles or people but also by drones, and they are often taken at oblique angles and can be blurry, distorted, etc.Therefore, it is important to improve the quality of the images.One solution based on the phase congruency model (PCM) uses a combination of DCT-PCM (discrete cosine transform PCM) for plate text detection.The proposed method presents a new clustering approach for eight neighboring pixels that aims to avoid background pixels being classified as candidate pixels [3].This approach helps to extract the edges of the text, and false positives are further eliminated by applying a fully connected neural network.The method achieves competitive results for the total-text dataset when compared to other methods.However, it should be noted that the proposed method was specifically designed to work with both drone images and usual images taken in an orthogonal direction.Other methods may perform better when dealing with specific types of datasets.Another solution used two generative adversarial networks (GANs) to Appl.Sci.2023, 13, 4902 2 of 16 remove noise in the image and increase image resolution [4].Although computational time is larger compared to similar methods, the final image quality is increased significantly.Deep convolutional neural networks (CNNs) are commonly utilized for various image recognition tasks due to their ability to automatically extract features, making them a powerful tool in medicine, human pose, and face recognition [5][6][7][8].License plate matching is typically achieved through license plate character registration (LPCR) or license plate feature (LPF) extraction.A novel LPF-based LP recognition method was developed using a deep CNN.This method employs a multitask learning approach that first recognizes parallel letters and then classifies images [9].The proposed CNN model was designed using a modified VGG (visual geometry group) architecture, which yielded better results compared to existing state-of-the-art methods.However, it should be noted that the VGG model is a large network and may take more time to train, which could be a drawback in situations where real-time license plate identification is required.Another approach that uses a deep neural network, called the capsule network, improves the processing time by integrating the segmentation process and extracting features from the segmented data using the CN framework [10].Capsule networks are advantageous in preserving detailed information throughout the network.However, routing methods such as EM-routing or routing by agreement have a negative effect on the capsule network, preventing it from distinguishing inputs and their negative counterparts.A two-stage DNN is used to detect license plates on the unedited raw image using YOLOv2 (You Only Look Once), and then cropped images are used to recognize license plates.While it has demonstrated excellent recognition results, the computational time could be improved [11].A common problem that occurs during the recognition process is data loss being passed from one layer to another.To address this issue, an adversarial network is used to restore lost input data and then passes it to the following network (YOLOv3), which is responsible for finding symbols and can do so in three different feature map sizes: 31, 16, and 8 [12,13].YOLOv3 employs scaled anchor boxes, while YOLOv2 uses boxes of the same size.Despite yielding better results, YOLOv3 still faces challenges in detecting small objects.A slower but more efficient three-step method uses a modified Mask-RCNN network without a segmentation step [14], as well as another Mask-RCNN network for LP number recognition.Every symbol is found and identified using additional small symbol filtering and clustering based on position.This method has demonstrated promising results [15], with higher precision compared to most other methods.License plates that failed to be detected typically had either very low or very high brightness levels.Currently existing ALPR systems often require significant computer resources, and achieving a balance between high accuracy and speed remains an active topic.With the emergence of the IoT, various solutions have been proposed by developing smaller systems that require less computational power.Using the YOLOv5 light version with IoT devices or YOLOv5 for license plate detection results in faster computations compared to earlier versions of YOLO [16,17].Slight improvements are still needed for the proposed solutions, as license plates with poor illumination, captured during nighttime or under adverse weather conditions, are not accurately detected and identified.Convolutional neural network architecture is a crucial factor in achieving speed and quality, and various architecture solutions have been proposed for specific problems.The hourglass-type network is commonly used for human pose estimations, saliency, facial landmark detection, and more [5][6][7].However, hourglass networks are quite large, which results in reduced network speed.
Selmi et al. presented a system for license plate detection and recognition divided into three subcategories: LP detection, character segmentation, and character detection.License plate detection is accomplished in the following steps: the image (RGB) is converted to an HSV image, small elements are extracted using contrast maximization, a Gaussian blur filter is used to remove details and noise from images, adaptive thresholding is applied to eliminate insignificant regions in the image, contours are found, geometric filtering is applied, and finally, a CNN is used to detect license plates.The CNN is composed of four layers, with two convolutional layers for feature extraction and two fully connected layers [18].Another license plate detection method for complex backgrounds detects the vehicle region using Fast R-CNN and generates candidate regions.Then, license plates are classified and all non-license plates are removed [19].Both proposed solutions demonstrated high precision and recall.However, the first step involves identifying the vehicle itself, which requires more computational time and may be too slow for real-time license plate detection.
Zhang et al. proposed the V-LPDR framework to address the license plate detection and character recognition problem in unconstrained scenarios.The framework uses a novel flow-guided spatiotemporal attention network, which is divided into three modules: detection backbone, flow-guided feature warping, and spatiotemporal attention block.Results show very high accuracy for the proposed solution [20].The primary drawbacks that persisted included the overall processing time for a frame, which is unsuitable for some real-time applications, and reduced performance with distorted images.Tung et al. proposed a license plate detection method using RetinaFace and MobileNet to predict the license plate from the original image and determine the coordinates of the four key points on the license plate.The output of the detection module is then used as input for the character recognition module [21].Improvements can still be made to the dataset, as a novel dataset was used for training and it lacked detailed parameters compared to established datasets.This resulted in subpar performance; however, updating the dataset would likely lead to improvement.Additionally, license plates in complex environments were not detected as effectively.For object detection and recognition, a YOLO-based network was employed.However, the approach was conventional, involving detection of the car followed by the detection of the license plate, which led to increased computational time.
To address the problem of small-object detection, annotations are automatically generated for larger areas [22].
Another modern approach for license plate detection involves the use of a kernel density function.Pre-processing is carried out using a binary technique [23].The process involves downsampling the image and then converting it to a grayscale image.Candidate images are then extracted using a kernel density estimator that complies with the binary conversion technique.Nonetheless, the proposed solution requires improvement for handling vehicles in motion or images containing multiple vehicles.In [24], a real-time detector called STELLA was used for license plate detection.The detection network is implemented on RetinaNet and utilizes the feature pyramid network (FPN), while recognition is carried out using CRNN.Yu et al. used the CenterNet detection approach and a reduced architecture of Hourglass-104 as the backbone for CenterNet [25].The novelty of their approach was that instead of using bounding boxes, they used bounding ellipses.Similarly, the proposed approach in this paper also uses the CenterNet detection method as a basis and an hourglass network as a backbone.However, instead of the standard car-license plate-letter detection approach, a preliminary license plate detection method is employed, followed by precise detection of the four corners where the numbers are located, license plate correction, and finally, letter identification.The method was tested for vehicle detection at a single site, and data labeling is expected to be challenging.

Materials and Methods
License plates are easily confused with surrounding objects such as road signs, logos, commercials, road-marking symbols, etc.Therefore, it is essential that the data provided for license plate detection algorithms is as clear as possible, with minimal incorrect and potentially confusing information.Vehicle detection algorithms can be used as a solution, with the potential to provide better overall accuracy of the system.However, they may show poor results in situations where the license plate is very close to the camera, and such systems can be rather slow [26].
The solution proposed in this paper involves changing the identification process from standard car-number-letter recognition to finding the predicted license plate.It detects plate corners, corrects plate position, and then performs letter recognition.This approach differs from other algorithms in that the resulting image size is relatively small compared to the vehicle, enabling the use of a compact and fast network for plate corner detection.These coordinates are then used to perform a perspective transformation that corrects any distortion or rotation and returns the license plate to its original position, making precise license plate recognition easier for the second algorithm.The proposed solution is illustrated in Figures 1 and 2.
Appl.Sci.2023, 13, x FOR PEER REVIEW 4 of 16 The solution proposed in this paper involves changing the identification process from standard car-number-le er recognition to finding the predicted license plate.It detects plate corners, corrects plate position, and then performs le er recognition.This approach differs from other algorithms in that the resulting image size is relatively small compared to the vehicle, enabling the use of a compact and fast network for plate corner detection.These coordinates are then used to perform a perspective transformation that corrects any distortion or rotation and returns the license plate to its original position, making precise license plate recognition easier for the second algorithm.The proposed solution is illustrated in Figures 1 and 2. All experiments were conducted using publicly available databases: AOLP [27], Caltech Cars [28], EnglishLP [29], OpenALPR [30], UFPR-ALPR [31], and Platesmania data that were published in article [32].All these datasets are different and their summary is presented in Table 1.The solution proposed in this paper involves changing the identification process from standard car-number-le er recognition to finding the predicted license plate.It detects plate corners, corrects plate position, and then performs le er recognition.This approach differs from other algorithms in that the resulting image size is relatively small compared to the vehicle, enabling the use of a compact and fast network for plate corner detection.These coordinates are then used to perform a perspective transformation that corrects any distortion or rotation and returns the license plate to its original position, making precise license plate recognition easier for the second algorithm.The proposed solution is illustrated in Figures 1 and 2. All experiments were conducted using publicly available databases: AOLP [27], Caltech Cars [28], EnglishLP [29], OpenALPR [30], UFPR-ALPR [31], and Platesmania data that were published in article [32].All these datasets are different and their summary is presented in Table 1.All experiments were conducted using publicly available databases: AOLP [27], Caltech Cars [28], EnglishLP [29], OpenALPR [30], UFPR-ALPR [31], and Platesmania data that were published in article [32].All these datasets are different and their summary is presented in Table 1.

Training and Validation
Training data was from datasets Platesmania, UFPR-ALPR, and AOLP.Validation/testing data was from AOLP, UFPR-ALPR, CaltechCars, OpenALPR, and EnglishLP databases.Data distribution between training and validation was based on the distribution given in paper [5].The data used for training and testing had marked license plates, and all plates had their four corners marked to test rotation and the second part of detection.All data were stored in the TFRecord format, with the image stored as an integer within the range of [0, 255].It was stored with compression that did not affect the quality of the image.Additionally, a 32 × 32 × 5 matrix with ground truth data was stored, which was generated from the marked plates.For the precise license plate detection step, images were stored in the same format, except that they were already cropped, containing only candidate license plates and an additional vector of nine elements.The elements consisted of 4 × 2 + 1 X and Y coordinates of the license plate corners and a confidence score that indicates whether it is a license plate or not.Rotation, distortion, magnification, reduction, center modification, merging, and color processing augmentation was performed on all the training data.To simulate smaller license plates in traffic scenarios, multiple image combination augmentation was applied, and validation data was left unmodified.
"Area" resizing interpolation was used both for training and validation data.To maintain the same conditions for all experiments, the same training strategy was applied: the first 50 epochs use "Adam" optimizer with learning rate 1 × 10 −4 , β 1 = 0.9, β 2 = 0.999, ε = 1×10 −5 , and amsgrad = true.Then, SGD optimizer with cyclic learning rate and base = 1 × 10 −3 is used.The model training process stops when the test loss does not improve for 10 consecutive epochs.After that, the learning coefficient is reduced ten times and the training process is repeated twice.Accuracy of the license plate number detection algorithm is evaluated using two metrics:

•
Object detection check.IoU (Intersection Over Union) is used with a threshold of 0.4.It evaluates predicted object coordinates with real coordinates.

•
Object coordinates error check.Euclid distance was calculated between every original point and predicted point (1).
The system's speed was calculated using TensorFlow 1.15 with an NVIDIA GPU 1080Ti.To simulate real-life scenarios, the neural network's speed was tested with a group size of 16, allowing for the simultaneous processing of 16 images.This process was repeated 100 times and the average single processing speed was calculated.Finally, the single image processing time from all 16 images was computed.The speed results are presented in frames per second or milliseconds.

Model
The proposed two-step algorithm uses an Hourglass type network structure in both steps.The standard structure was modified to increase calculation speed by replacing two standard Hourglass blocks [5] with a single block that has more convolutional layers and filters.This resulted in a 40% reduction in the network size and a doubling of the speed.General network structure is presented in Figure 3.
filters.This resulted in a 40% reduction in the network size and a doubling of the speed.General network structure is presented in Figure 3.For images that are less than 1000 × 1000, a universal size of 256 × 256 was used to balance model speed and precision.Additionally, the inner block structure of the Hourglass network was modified to increase speed by using Resnet blocks, as shown in Figure 4a.The network structure for precise license plate identification is the same, but the input data is 128 × 128 × 3 and the output data is 32 × 32 × 9 (Figure 4b).For images that are less than 1000 × 1000, a universal size of 256 × 256 was used to balance model speed and precision.Additionally, the inner block structure of the Hourglass network was modified to increase speed by using Resnet blocks, as shown in Figure 4a.The network structure for precise license plate identification is the same, but the input data is 128 × 128 × 3 and the output data is 32 × 32 × 9 (Figure 4b).filters.This resulted in a 40% reduction in the network size and a doubling of the speed.General network structure is presented in Figure 3.For images that are less than 1000 × 1000, a universal size of 256 × 256 was used to balance model speed and precision.Additionally, the inner block structure of the Hourglass network was modified to increase speed by using Resnet blocks, as shown in Figure 4a.The network structure for precise license plate identification is the same, but the input data is 128 × 128 × 3 and the output data is 32 × 32 × 9 (Figure 4b).This paper is based on the CenterNet object detection method [33], in which each type of object is considered as a separate class N, and each class has 5 elements: the center point, the axis x difference coordinates, the axis y difference coordinates, the object height, and the object width.In this approach, a Gaussian distribution is applied to the feature map, which depends on the object's size.The main formulas are given in Equations ( 2)-( 5): where and are the center of each object, is the standard deviation, and is a Gaussian distribution.Coordinates of the upper left point ( , ) and lower right point ( , ) are used to calculate the center coordinates of each object ( , ), as given in (2), (3).
is the distance from the current pixel center point to the object's upper left x point; is the distance from the object's current pixel point to the object's upper left y point; and are width and height, respectively, as given in (5).Data is decoded and predicted height and width are then used to obtain the upper and This paper is based on the CenterNet object detection method [33], in which each type of object is considered as a separate class N, and each class has 5 elements: the center point, the axis x difference coordinates, the axis y difference coordinates, the object height, and the object width.In this approach, a Gaussian distribution is applied to the feature map, which depends on the object's size.The main formulas are given in Equations ( 2)-( 5): where c x and c y are the center of each object, σ is the standard deviation, and Y xy is a Gaussian distribution.Coordinates of the upper left point (x min , y min ) and lower right point (x max , y max ) are used to calculate the center coordinates of each object (c x , c y ), as given in (2), (3).O f f x xy is the distance from the current pixel center point to the object's upper left x point; O f f y xy is the distance from the object's current pixel point to the object's upper left y point; W xy and W xy are width and height, respectively, as given in (5).Data is decoded and predicted height and width are then used to obtain the upper and lower left points of the license plate.The loop iterates through each element of the matrix of center points returned by the network.If the value of that element is greater than the specified limit, other values derived by the network are considered: predictionSizeW, predictionSizeH, predictionO f f x, and predictionO f f y.Since the X and Y values of predictionO f f represent the distance and direction of each existing pixel to the original center of the object, adding these distances to the existing X and Y coordinates yields the center point.Adding half the estimated height and width to it produces two points: (x min , y min )-the upper left point, and (x max , y max )-the lower right point.The decoding algorithm is shown in (6): For y axis in range(0, mapSizeY) : For x axis in range(0, mapSizeX) i f predictionPoint[yaxis, xaxis] > threshold : In order to detect small objects more easily and minimize the number of false negative objects, the input size for the CenterNet method needs to be increased.However, the goal was to achieve accurate results without increasing the amount of input data.While CenterNet can efficiently determine the center point of an object even when it is poorly visible, X and Y distance outputs to the original center point, its predictions for the height and width of poorly visible objects are not precise.To prevent the network from learning distances, a modification to the data encoding was implemented as given in (7).The parameters O f f x, O f f y, H, and W were changed to o f f x min , o f f y min , o f f x max , and o f f y max , while leaving the center points of the object unchanged.Figure 6 illustrates the encoded license plate coordinates.

For xmin, ymin, xmax, ymax in Boxes :
For yaxis in range(min, max) : For xaxis in range(min, max) : Rotation, distortion, enlargement, reduction, centering, merging, and color processing augmentation were used for the training data, and the testing data were kept original:

Comparing to CenterNet
In this section, the results of the two-step license plate identification algorithm are presented.In the first step, we performed CenterNet test training with a small amount of data to check whether all the elements related to training were correct.Therefore, 500 training images and 500 testing images in TFRecord form were generated without any augmentation (see Figure 7, labeled "No_Augment1").During training, the "Adam" optimizer with standard parameters was used.The results showed that the original model converged, although the results were not satisfactory.As a second step, training and validation data were generated from all training and validation data to determine the starting point.The trained network demonstrated be er results (see Figure 7, labeled "No_Aug-ment_2").

Comparing to CenterNet
In this section, the results of the two-step license plate identification algorithm are presented.In the first step, we performed CenterNet test training with a small amount of data to check whether all the elements related to training were correct.Therefore, 500 training images and 500 testing images in TFRecord form were generated without any augmentation (see Figure 7, labeled "No_Augment1").During training, the "Adam" optimizer with standard parameters was used.The results showed that the original model converged, although the results were not satisfactory.As a second step, training and validation data were generated from all training and validation data to determine the starting point.The trained network demonstrated better results (see Figure 7, labeled "No_Augment_2").
Next, data was generated by following the augmentation and training protocols described in the previous section.The resulting images are labeled as "Augment_1" in Figure 7, and they show a positive improvement in the model's performance."Augment_2" shows even better results with a larger amount of data provided.The model was now capable of detecting both very small and very large objects, and the number of true positive objects had increased.However, the precision of the model still needed improvement, as it was detecting different numbers that are not necessarily related to license plates.Furthermore, the low recall indicates that the model was still unable to find a lot of small objects, which may be due to issues with the model's structure, input size, or coding principles.To achieve high accuracy with this specific model, modifications to the coding and learning methods were necessary.
positive objects had increased.However, the precision of the model still needed improvement, as it was detecting different numbers that are not necessarily related to license plates.Furthermore, the low recall indicates that the model was still unable to find a lot of small objects, which may be due to issues with the model's structure, input size, or coding principles.To achieve high accuracy with this specific model, modifications to the coding and learning methods were necessary.The proposed algorithm read and processed all generated data in TFRecord.The last four output matrices were modified to a new format.Then, the proposed network weights were reinitialized, and the network was retrained based on the CenterNet training protocol.The error function for calculating corner distances was changed from "L2" to "Wing Loss".Training the network with all databases resulted in be er performance compared to using the CenterNet method, as shown in Figure 7, labeled "2Point_wing".The proposed model demonstrated an improvement in detecting smaller objects and errors.Therefore, the Hourglass type network with modified inner blocks, as shown previously, proved to be superior to the CenterNet method.The proposed algorithm read and processed all generated data in TFRecord.The last four output matrices were modified to a new format.Then, the proposed network weights were reinitialized, and the network was retrained based on the CenterNet training protocol.The error function for calculating corner distances was changed from "L2" to "Wing Loss".Training the network with all databases resulted in better performance compared to using the CenterNet method, as shown in Figure 7, labeled "2Point_wing".The proposed model demonstrated an improvement in detecting smaller objects and errors.Therefore, the Hourglass type network with modified inner blocks, as shown previously, proved to be superior to the CenterNet method.

Detecting Small Objects
Although more smaller objects were detected, the results were still not satisfactory and the issue persisted.Therefore, additional modifications were made to the training process:

•
Weights were added to the error function.This way, the error for all negative pixels decreases and for all positive pixels increases, resulting in an increased number of false positives, but also more true positives.

•
Training data number in TFRecord increased to 50.This ensures that the model is not overfit.

•
Cropping augmentation was modified so that images would be normally distributed.
Before the changes, the most suitable option was "2Point_wing," and after the changes, it became "2Point_wing_2."As shown in Figure 7, the modifications increased both false positives and true positives.Additionally, the number of correctly identified smaller objects also increased.Since the aim was to improve the accuracy of the proposed system without changing the model structure, it was decided to add an additional detection step using the same structure as described previously: 1.
The coordinates obtained from a predicted object are used to crop a potential license plate.Since there may be cases where the preliminary detection is biased, the coordinates are scaled twice before cropping.

2.
Cropped images are processed by the network so that it returns nine values for each image: coordinates of four corners (x, y) and whether a license plate is present.
The cropped images are relatively small, so it was possible to further minimize the structure of the network by reducing the input to 128 × 128 × 3 and removing parts of the network that use the "Up-sampling2D" layer, as shown in Figure 8. overfit.


Cropping augmentation was modified so that images would be normally Before the changes, the most suitable option was "2Point_wing," an changes, it became "2Point_wing_2."As shown in Figure 7, the modification both false positives and true positives.Additionally, the number of correctly smaller objects also increased.Since the aim was to improve the accuracy of th system without changing the model structure, it was decided to add an addit tion step using the same structure as described previously: 1.The coordinates obtained from a predicted object are used to crop a poten plate.Since there may be cases where the preliminary detection is biased, nates are scaled twice before cropping.2. Cropped images are processed by the network so that it returns nine valu image: coordinates of four corners (x, y) and whether a license plate is pr The cropped images are relatively small, so it was possible to further m structure of the network by reducing the input to 128 × 128 × 3 and removing network that use the "Up-sampling2D" layer, as shown in Figure 8. Once the model was trained, it was combined with the predicted numbe algorithm.The results are shown in Figure 9 for the AOLP database and Once the model was trained, it was combined with the predicted number detection algorithm.The results are shown in Figure 9 for the AOLP database and labeled as "combined".The biggest improvement is observed with the UFPR_ALPR database, which contains many small objects.The precision of object point detection was doubled.
"combined".The biggest improvement is observed with the UFPR_ALPR database, which contains many small objects.The precision of object point detection was doubled.There was also a significant decrease in the number of incorrectly detected false positive objects at the lower limit.However, there was also a decrease in the number of correctly detected objects, which occurred because the verification model returned a sufficiently low result for positive objects.Therefore, the number of positively detected objects decreased significantly at the higher limit.To solve this problem, it was decided to combine the results of the preliminary detection and verification models using Formula (8): The proposed modification suggests that using a threshold of 0.5 would achieve the highest possible result for these combinations.The results of this modification are shown in Figure 9, labeled as "combined_mergedScores".Moreover, an additional advantage of the proposed verification model is that it provides the object coordinates of all corners, which can be utilized to perform perspective transformations, enabling the restoration of an image to its normal position, irrespective of whether it was rotated or bent.There was also a significant decrease in the number of incorrectly detected false positive objects at the lower limit.However, there was also a decrease in the number of correctly detected objects, which occurred because the verification model returned a sufficiently low result for positive objects.Therefore, the number of positively detected objects decreased significantly at the higher limit.To solve this problem, it was decided to combine the results of the preliminary detection and verification models using Formula (8): The proposed modification suggests that using a threshold of 0.5 would achieve the highest possible result for these combinations.The results of this modification are shown in Figure 9, labeled as "combined_mergedScores".Moreover, an additional advantage of the proposed verification model is that it provides the object coordinates of all corners, which can be utilized to perform perspective transformations, enabling the restoration of an image to its normal position, irrespective of whether it was rotated or bent.

License Plate Detection Comparison with Other Methods
In this section, a comparison is made with already existing license plate detection algorithms.The efficiency of the two-step license plate identification algorithm is presented and analyzed with different datasets.The speed for single-image processing and the speed for the neural network were calculated and compared to other methods.License plate detection methods in [31,32] require a vehicle detection step, making the overall license plate detection process slower.Table 2 shows the speed given as a sum of time required to detect the vehicle (YOLOv2) and the time required to detect the license plate (Fast-YOLOv2).Experiments were also performed with YOLOv3 and Fast-YOLOv3, but their performance is better on small objects.Zhang et al.'s research proposes a flow-guided spatiotemporal attention detection network to detect license plates in complex situations and track them [20].The UFPR-ALPR dataset, consisting of 4500 images from 150 vehicles in real-world environments, was used during the experiments.Although precision and recall were better in all the discussed papers, they were significantly slower than the proposed license plate detection algorithm.Results from the AOLP dataset showed that the proposed license plate detection algorithm's precision and recall yield better results than [21] or [18], but worse than [32].The AOLP dataset can be further divided into three subsets (AC, LE, and RP), and results vary for each, so average precision is taken into consideration.In [21], results show a precision of 97.70% with a recognition speed of more than 21 FPS.In [18], the average precision was 95.50% (AC-92.6%,LE-93.5%,RP-92.9%).It is worth noting that the network structures used can affect precision.Both [18,21], not being the most novel solutions, result in worse performance.
Results from the CaltechCars dataset showed that the proposed algorithm has better recall (99.19%) compared to [18,19], with recall and precision of 91.3% and 93.8%, and 96.83% and 98.39%, respectively.However, recall and precision were worse compared to [32], although the proposed algorithm was significantly faster.
A two-step license plate detection presented in [23] takes a similar approach, where the candidate license plate is extracted from the original image and then verified.Results show a high accuracy of 98.1% with 452 ms speed.Study [24] also used a different approach and achieved high precision.However, the current detection algorithm performs faster, although comparisons in these situations should be performed for the same datasets.
The proposed neural network performs poorly on very small objects.When the input data is minimized to 256 × 256, these objects are not visible to the human eye, which is likely why the model has difficulty detecting them.Images larger than this, such as 2500 × 2500 × 3, are downscaled to fit the input size, therefore, any objects that are approximately 50 pixels in diameter will disappear and appear as "clouds".However, increasing the input size to 512 × 512 would improve the system's performance in detecting these small objects at the cost of decreased overall system performance.This would likely result in a faster and more precise system than those analyzed in existing papers.

Conclusions
This study aimed to propose a solution for efficient license plate detection and recognition.In this work, a combination of two neural networks was developed, which was able to achieve an average accuracy of 96.19% at a speed of 405 frames per second.When using artificially augmented data to train any modification of the neural network, the results were 1.5-2 times better.Using two Hourglass type networks with a modified inner block structure allowed faster calculations.Its speed comes from preliminary license plate number detection, which is then passed to a second network for precise number identification.It is likely that an even wider variety of unique data would improve the accuracy of the neural network further.The next step would be to perform research by adding another light network for character recognition, the input of which is the license plates detected using the proposed two-step license plate identification algorithm.Because LPR or ALPR systems must work with a CPU-integrated camera or embedded device which require little energy consumption (5-10 W), the proposed algorithm could be applied since it does not require high computational resources and produces satisfactory results.

Figure 1 .Figure 2 .
Figure 1.Proposed two-step license plate recognition algorithm: the first network filters out candidate license plates, while the second network detects precise license plates.The final output is a clear image of only the license plate, which can then be used for the next step: character recognition.The goal of this work is to improve the license plate detection algorithm.

Figure 1 .
Figure 1.Proposed two-step license plate recognition algorithm: the first network filters out candidate license plates, while the second network detects precise license plates.The final output is a clear image of only the license plate, which can then be used for the next step: character recognition.The goal of this work is to improve the license plate detection algorithm.

Figure 1 .Figure 2 .
Figure 1.Proposed two-step license plate recognition algorithm: the first network filters out candidate license plates, while the second network detects precise license plates.The final output is a clear image of only the license plate, which can then be used for the next step: character recognition.The goal of this work is to improve the license plate detection algorithm.

Figure 2 .
Figure 2. Potential number detection and corners identification: (a) original picture with a car; (b) the original coordinates are increased and the number is cut out; (c) the four corners of the number are found.

Figure 3 .
Figure 3. Hourglass structure used in candidate license plate detection and precise license plate detection.

Figure 3 .
Figure 3. Hourglass structure used in candidate license plate detection and precise license plate detection.

Figure 3 .
Figure 3. Hourglass structure used in candidate license plate detection and precise license plate detection.

Figure 4 .
Figure 4. CNN structure for license plate number detection.The structure is identical, but the input and output differ: (a) structure of the first network for preliminary number detection; (b) structure of the second network for precise number detection.The structures of the merge and residual blocks are given in Figures5a and 5b, respectively.The steps for the merge block are as follows: 8 × 8 × 16 input -> Bilinear

Figure 4 .Figure 5 .
Figure 4. CNN structure for license plate number detection.The structure is identical, but the input and output differ: (a) structure of the first network for preliminary number detection; (b) structure of the second network for precise number detection.

Figure 6 .
Figure 6.License plate encoding: (a) original image is converted to 256 × 256 image; (b) object center point is encoded using Gaussian distribution; (c) distances from the current pixel center point to the object's upper left x point; (d) distances matrix from the object's current pixel point to the object's upper left y point; (e) distances from the current point of the center point to the lower left x point of the object; (f) distances matrix from the current pixel center point to the lower left y point of the object.

Figure 6 .
Figure 6.License plate encoding: (a) original image is converted to 256 × 256 image; (b) object center point is encoded using Gaussian distribution; (c) distances from the current pixel center point to the object's upper left x point; (d) distances matrix from the object's current pixel point to the object's upper left y point; (e) distances from the current point of the center point to the lower left x point of the object; (f) distances matrix from the current pixel center point to the lower left y point of the object.

Figure 8 .
Figure 8. Improved CNN network structure for precise license plate identification.

Figure 8 .
Figure 8. Improved CNN network structure for precise license plate identification.

Table 2 .
Detection performance comparisons for various datasets.Precision for all datasets is average precision, recall is precise for each dataset, and speed is average for all datasets.