Corner Enhancement Module Based on Deformable Convolutional Networks and Parallel Ensemble Processing Methods for Distorted License Plate Recognition in Real Environments

Kim, Sehun; Cho, Seongsoo; Kim, Jangyeop; Son, Kwangchul

doi:10.3390/app15126550

Open AccessArticle

Corner Enhancement Module Based on Deformable Convolutional Networks and Parallel Ensemble Processing Methods for Distorted License Plate Recognition in Real Environments

¹

Defense Acquisition Program, Kwangwoon University, 20, Gwangun-ro, Nowon-gu, Seoul 01897, Republic of Korea

²

Department of Convergence Science, Kongju National University, Gongju 32588, Republic of Korea

³

Department of Smart Electrical and Electronic Engineering, Kwangwoon University, 20, Gwangun-ro, Nowon-gu, Seoul 01897, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6550; https://doi.org/10.3390/app15126550

Submission received: 10 May 2025 / Revised: 4 June 2025 / Accepted: 7 June 2025 / Published: 10 June 2025

(This article belongs to the Special Issue Exploring AI: Methods and Applications for Data Mining)

Download

Browse Figures

Versions Notes

Abstract

License plate recognition is a computer vision technology that plays a crucial role in intelligent transportation systems and vehicle management. However, in real-world road environments, recognition accuracy significantly decreases due to distortions caused by various viewing angles. In particular, existing systems exhibit severe performance degradation when processing license plate images captured at steep angles. This paper proposes a new approach to solve the license plate recognition problem in such unconstrained environments. To accurately recognize text on distorted license plates, it is crucial to precisely locate the four corners of the plate and correct the distortion. For this purpose, the proposed system incorporates vehicle and license plate detection based on YOLOv8 and integrates a Corner Enhancement Module (CEM) utilizing a Deformable Convolutional Network (DCN) into the model’s neck to ensure robust feature extraction against geometric transformations. Additionally, the system significantly improves corner detection accuracy through parallel ensemble processing of three license plate images: the original and two aspect ratio-adjusted versions (2:1 and 1.5:1). Furthermore, we verified the system’s versatility in real road environments by implementing a real-time license plate recognition system using Raspberry Pi 4 and a camera module.

Keywords:

license plate recognition; YOLOv8; parallel ensemble processing; deformable convolutional network; corner enhancement module; homography; computer vision

1. Introduction

Automatic License Plate Recognition (ALPR) is a computer vision technology that plays a crucial role in intelligent transportation systems and vehicle management systems [1,2]. With recent technological advancements, ALPR systems have significantly improved in performance and are being utilized in various fields, including management, road surveillance, traffic law enforcement, and automatic toll collection [3,4,5]. In real-world environments, license plate images experience severe geometric distortion depending on vehicle parking angles and camera installation positions. In parking lots and on roads, geometrically distorted license plate images result in decreased text recognition accuracy due to significant deformation [6,7,8]. To overcome these issues, the accurate correction of distorted license plates in ALPR systems is essential for improving recognition accuracy, and various approaches are being actively developed [9,10,11,12]. Currently proposed distortion correction techniques process single input images one at a time. This single-model approach has limitations in handling various distortion patterns of license plates [13,14,15]. Particularly in real environments, license plates appear with various distortions depending on angles and vehicle positions, creating challenges that standard convolutional networks with fixed receptive fields struggle to process effectively [16,17].

To address these issues, this paper proposes an ensemble technique based on a Deformable Convolutional Network (DCN) [18] that incorporates a Corner Enhancement Module (CEM) and processes detected license plate images in parallel using versions adjusted to different aspect ratios (2:1, 1.5:1) along with the original. Experimental results showed that this method demonstrates superior performance compared to existing commercial ALPR systems, even for license plates with severe perspective distortion, and maintains stable performance under real environmental conditions.

2. Related Work

2.1. Deep Learning Based Real-Time License Plate Detection

Recent advancements in deep learning technology have led to active research on approaches that utilize deep learning techniques in license plate detection. A lightweight deep learning network that combines CNN and self-attention mechanisms for license plate recognition in low-quality images has been proposed [19]. By integrating an RNN encoder and a Transformer decoder, this approach enabled the effective learning of relationships between characters even in low-quality environments, achieving a 15% improvement in recognition accuracy compared to previously existing CNN-based methods.

A vehicle detection algorithm based on an improved RT-DETR was developed in a previous study [20,21]. This approach integrated CNN into a transformer to enhance local detection capabilities and designed a structure that effectively captured global dependencies through the self-attention mechanism. The proposed method demonstrated a 12% improvement in detection performance compared to existing YOLO series models, achieving particularly outstanding results in detecting small-sized license plates. Additionally, through DETR’s end-to-end learning approach, accurate object location prediction without post-processing was enabled.

Another study designed an efficient ANPR system for non-stop toll collection using YOLOv8 [22]. Specifically, YOLOv8’s architecture was optimized for real-time processing, achieving a processing speed of 45 frames per second while maintaining a high detection accuracy of 96.8%. By introducing multi-scale feature fusion techniques, stable detection for license plates captured at various distances was enabled, demonstrating robust performance particularly against fast vehicle movements in highway environments.

2.2. License Plate Recognition in Real Road Environments

In real road environments, perspective effects and geometric distortions caused by the relative position and angle between the camera and the vehicle pose major challenges for license plate recognition. A YOLOv8-based vehicle tracking system was therefore developed in a study [23]. To address geometric transformations of license plates captured from various viewpoints, this system utilized 3D geometric modeling to estimate the intrinsic and extrinsic parameters of the camera and develop an algorithm that can transform distorted license plate images to a frontal viewpoint based on these parameters. It also enabled the restoration of partially occluded license plates to their complete form by fusing multi-view information, achieving an 18% improvement in detection performance compared to existing methods.

A license plate identification system robust to geometric distortions through a customized YOLO architecture based on YOLOv8 was built in another study [24]. An affine transformation-based normalization module was designed to effectively handle trapezoidal deformations of rectangular license plates caused by perspective effects. The proposed method achieved a high recognition accuracy of 92.3%, even in situations where the camera angle was tilted up to 45 degrees, demonstrating stable performance particularly against rapid viewpoint changes occurring on highways or at intersections.

A preprocessing module that utilizes homography transformation to convert license plates captured from various camera viewpoints to a standard viewpoint was developed in a study [25], addressing geometric distortion issues in an integrated system for real-time vehicle detection and license plate recognition. A dynamic geometric correction algorithm was introduced to respond to complex vehicle trajectories and various entry angles occurring in intersection environments. The proposed system was designed to track vehicle movements in real-time while simultaneously predicting and correcting geometric deformations of license plates, achieving a high license plate recognition success rate of 97.1%.

2.3. Test-Time Augmentation

Test-time Augmentation (TTA) is gaining attention as an effective technique for improving model prediction performance. A selective test-time augmentation technique was proposed in one study that significantly improved computational efficiency by selectively augmenting samples with high uncertainty [26]. This approach estimated prediction uncertainty based on Bayesian inference and introduced an adaptive strategy that applies various augmentations only to samples with high uncertainty. This resulted in a 30% improvement in processing speed compared to existing methods, demonstrating its effectiveness even in resource-constrained environments.

An optimized TTA method using Bayesian model averaging was proposed in a study [27]. By combining ensemble methods and uncertainty estimation, this approach achieved more reliable prediction results, showing an average performance improvement of 5.2% across various classification tasks. Notably, more reliable ensemble results than simple averaging were obtained by integrating prediction results for each augmented sample within a Bayesian framework. Additionally, a meta-learning algorithm that dynamically adjusts augmentation policies was introduced to improve adaptability during test time.

A new TTA method utilizing generative models instead of manually designed transformations was proposed in a prior study [28]. This approach improved segmentation performance by automatically generating various transformations of input images, achieving a 12% accuracy improvement compared to existing manual design approaches. The proposed method used conditional generative models to create diverse transformations while preserving the semantic characteristics of input images. Notably, it ensured that augmented samples generated through adversarial learning reflected the actual data distribution well and introduced regularization terms to maintain consistency in the feature space.

3. Proposed Method

3.1. Corner Enhancement Module

To improve the accuracy of corner detection in license plates, this study proposes a CEM based on DCNs. Unlike standard convolution, DCNs perform deformable convolution operations that can adaptively adjust sampling positions based on input features. This is particularly useful for effectively capturing the features of objects that undergo various geometric transformations, such as license plates. In this study, we integrated a DCN module based on DCNv2 into the CEM. DCNv2 is a model that adds a modulation mechanism to DCNv1, allowing for the adaptive adjustment of not only sampling positions but also weights at each position.

As illustrated in Figure 1, the operating principle of the DCN module is as follows: offsets and modulation masks are simultaneously generated from the input feature map through offset mask convolution. For the input feature map

F \in R^{H \times W \times C}

, the offset and mask calculations are performed as follows. First, offset and mask information are simultaneously generated through a single convolution layer, as calculated in Equation (1).

O = {Conv}_{offset_mask} (F) \in R^{H \times W \times (3 \times G \times K^{2})}

(1)

where

G

is the number of deform groups and

K

is the kernel size. The output channels are set to

3 \times G \times K^{2}

, consisting of x-direction offset, y-direction offset, and modulation mask. The generated output is split into three equal parts, as performed in Equation (2).

o_{1}, o_{2}, {mask}_{raw} = chunk (O, 3, \dim = 1)

(2)

where each has a size of

R^{H \times W \times (G \times K^{2})}

. The x-direction and y-direction offsets are concatenated to generate the final offset, as calculated in Equation (3).

offset = concat (o_{1}, o_{2}) \in R^{H \times W \times (2 \times G \times K^{2})}

(3)

The modulation mask is normalized to the [0, 1] range through the sigmoid function, as calculated in Equation (4).

mask = σ ({mask}_{raw}) \in R^{H \times W \times (G \times K^{2})}

(4)

The final sampling position for the

k

-th point of deform group

g

at each sampling location

(i, j)

is calculated as shown in Equation (5).

p_{g, k}^{(i, j)} = p_{0}^{(i, j)} + p_{k} + Δ p_{g, k}^{(i, j)}

(5)

where

p_{0}^{(i, j)} = (i, j)

is the center position,

p_{k}

is the relative position of the

k

-th point within the kernel and

Δ p_{g, k}^{(i, j)}

is the learned offset. Finally, deformable convolution operations are performed as shown in Equation (6).

y^{(i, j)} = \sum_{g = 1}^{G} \sum_{k = 1}^{K^{2}} w_{g, k} \cdot F (p_{g, k}^{(i, j)}) \cdot {mask}_{g, k}^{(i, j)}

(6)

where

w_{g, k}

is the

k

-th kernel weight of deform group

g

, and

F (p_{g, k}^{(i, j)})

is the bilinearly interpolated feature value at

p_{g, k}^{(i, j)}

. The final output goes through batch normalization and ReLU activation.

The DCN module is positioned between channel attention and spatial attention within the CEM, adaptively responding to geometric transformations for features emphasized through channel attention. This is particularly effective in accurately detecting corners of license plates with large angular distortions. In the CEM, channel attention and spatial attention mechanisms are combined to effectively emphasize the corner features of license plates. These two attention mechanisms serve to emphasize features in different dimensions.

The channel attention module uses both average pooling and max pooling to learn channel-wise importance from the input feature map. These two pooling results pass through a shared Multi-layer Perceptron (MLP) and are combined to be converted into channel weights between 0 and 1 using a sigmoid function. These weights are multiplied channel-wise with the original input feature map to emphasize important channels.

Unlike regular convolution, DCN learns offsets that can adjust sampling positions according to input features. Through this mechanism, DCN adaptively responds to various geometric transformations of license plates.

The spatial attention module learns the importance based on spatial positions in the feature map. To do this, it performs average pooling and max pooling along the channel dimension to generate two 2D feature maps. These two feature maps are concatenated and passed through a convolution layer to generate a spatial attention map. This attention map is converted to values between 0 and 1 using a sigmoid function and then spatially multiplied with the original input feature map to emphasize important regions.

As shown in Figure 2, the CEM, which combines these three components, emphasizes important feature channels through channel attention, adaptively responds to geometric transformations through DCN, and focuses on important areas such as corners through spatial attention. Additionally, it mitigates the gradient vanishing problem and improves learning stability through residual connections.

As shown in Figure 3, the CEM is applied to the neck [29] part of the corner detector model to perform feature enhancement at each scale level. This improves corner detection accuracy for license plates of various sizes captured in real environments.

3.2. Test-Time Augmentation-Based Parallel Ensemble Processing Technique

In this study, we propose a test-time augmentation-based aspect ratio adjustment parallel ensemble processing technique to accurately detect corner information from severely distorted license plates. While license plates typically have a rectangular shape, they become distorted in various ways depending on the camera angle. In particular, the larger the angle between the camera imaging plane and the license plate plane, the more significantly the geometric shape and aspect ratio of the license plate become distorted, which directly affects the accuracy of corner detection.

Since detecting accurate corners in severely distorted license plate images is very challenging, this study proposes a method that performs corner detection after adjusting the aspect ratio. This allows corner detection algorithms to work more effectively by correcting the geometric characteristics of license plates that have been transformed due to perspective distortion.

The proposed technique achieves robustness against distortions at various angles by performing parallel corner detection on license plate images transformed into different aspect ratios. The theoretical background of this aspect ratio transformation is based on the perspective projection model, utilizing the principle of reversing and correcting the distortion that occurs when a planar object in 3D space is projected onto a 2D image. The details of the parallel ensemble processing algorithm can be found in Figure 4.

The parallel ensemble processing algorithm applies three aspect ratio transformations: The original ratio effectively uses the license plate image as-is for cases with minimal distortion. The 2:1 ratio transformation is effective for moderate angle distortions (approximately 30–60 degrees). When a license plate is photographed from the side, the horizontal length is relatively reduced due to perspective effects, which this transformation partially corrects. The 1.5:1 ratio transformation is effective for severe angle distortions (approximately 60–80 degrees). When a license plate is photographed from an extreme side angle, the perspective effect becomes more severe, significantly reducing the horizontal length. As adjusting to a 2:1 ratio would cause extreme pixel degradation, the 1.5:1 ratio transformation corrects the extreme perspective effect.

Each transformation is designed to modify only the aspect ratio while preserving the image content, enabling effective corner detection of license plates captured from various angles. This parallel ensemble processing enables stable corner detection under a wider range of distortion conditions compared to existing methods that use only a single aspect ratio transformation.

Next, the same corner detection network is applied in parallel to the three transformed images. The ResNet 50-based corner detection network includes a DCN-based CEM and predicts the four corner coordinates of the license plate in each image. After the three aspect ratio transformations (2:1, original, 1.5:1), the results obtained using each corner detector are combined by calculating the average of corner coordinates using each model’s confidence score as weights to produce the final output.

The method for calculating the final corner positions using the confidence provided by each model as weights is expressed in Equation (7).

C_{f i n a l} = \frac{\sum_{i - 1}^{n} w_{i} \cdot C_{i}}{\sum_{i - 1}^{n} w_{i}}

(7)

Here,

C_{f i n a l}

is the final edge position,

C_{i}

is the edge position predicted by the i-th model,

w_{i}

is the confidence score (weight) of the corresponding model, and n is the number of models. Subsequently, the model’s confidence is applied to each of the four corners (top-left, top-right, bottom-left, bottom-right) of the license plate. For each corner

j

, when the predicted position of model

i

is

C_{i, j} = (x_{i, j}, y_{i, j})

and the confidence is

w_{i, j}

, the final x-coordinate is expressed by Equation (8), and the y-coordinate by Equation (9).

x_{f i n a l, j} = \frac{\sum_{i - 1}^{3} w_{i, j} \cdot x_{i, j}}{\sum_{i - 1}^{3} w_{i, j}} = \frac{w_{1, j} \cdot x_{1, j} + w_{2, j} \cdot x_{2, j} + w_{3, j} \cdot x_{3, j}}{w_{1, j} + w_{2, j} + w_{3, j}}

(8)

y_{f i n a l, j} = \frac{\sum_{i - 1}^{3} w_{i, j} \cdot y_{i, j}}{\sum_{i - 1}^{3} w_{i, j}} = \frac{w_{1, j} \cdot y_{1, j} + w_{2, j} \cdot y_{2, j} + w_{3, j} \cdot y_{3, j}}{w_{1, j} + w_{2, j} + w_{3, j}}

(9)

The final edge coordinates

C_{f i n a l, j} = (x_{f i n a l, j}, y_{f i n a l, j})

are transformed into the original image coordinate system.

This parallel ensemble processing algorithm can effectively respond to distortions at various angles without additional model training. Since each transformed image is optimized for distortions at different angles, stable corner detection is possible under various environmental conditions.

3.3. Homography and Image Correction

Based on the corner detection results, this study applies homography and image correction techniques to convert distorted license plate images into a normalized form. This process plays a crucial role in improving the accuracy of license plate character recognition. Homography is the process of converting a distorted license plate image into a frontal view using the four corner coordinates of the detected license plate. This is performed by calculating a homography matrix and using it to transform the image. The homography matrix is a 3 × 3 matrix that defines the mapping between the four corner coordinates of the original image and the target image. Through this homography and image correction process, license plate images captured at various angles and lighting conditions are converted into a standardized form. This contributes significantly to improving the accuracy of the subsequent character recognition step.

3.4. Overall Integrated System

Vehicle and license plate detection, parallel ensemble processing, corner detection, homography, and character recognition are integrated into a sequential pipeline structure. The detailed structure of the license plate recognition system can be found in Figure 5. Since the license plate is located within the vehicle area, a YOLOv8-based vehicle detector is used to detect large objects corresponding to vehicles in the input image. By first detecting vehicles as large objects in complex backgrounds, the search area of the license plate detector is restricted, enabling efficient object detection.

Subsequently, a YOLOv8-based license plate detector is used to detect license plates within the detected vehicle area. In this process, detection is performed in a space limited to the vehicle area, reducing false detections due to background regions and improving accuracy.

4. Results

4.1. Experimental Environment and Dataset

The experiments in this study were conducted using a hardware environment consisting of an NVIDIA RTX 3090 GPU (24 GB), an Intel Core i9-10900K CPU, and 64 GB of RAM. The software environment included Ubuntu 20.04 LTS, Python 3.8, PyTorch 1.10.0, CUDA 11.3, and OpenCV 4.5.4, with Docker used as the development environment. All models were implemented based on the PyTorch framework, and the OpenCV library was used for data preprocessing and post-processing. For real-time processing performance evaluation, the models were optimized using NVIDIA TensorRT.

The dataset used in this study consisted of self-built datasets and public datasets. For corner detection, the self-constructed dataset was labeled for corners of license plate images collected from Roboflow and AIhub. It consisted of a total of 3619 images, split into 3319 for training and 301 for validation. The Roboflow dataset for license plate detection contained a total of 16,188 images, split into 16,060 for training and 128 for validation. The Roboflow dataset for vehicle detection contained a total of 24,914 images, split into 23,575 for training and 1339 for validation.

The statistical characteristics of the dataset are shown in Table 1. In particular, the image distribution according to angle distortion was maintained evenly to ensure fair performance evaluation under various distortion conditions. The dataset labeling includes information such as vehicle bounding box coordinates, license plate bounding box coordinates, four corner coordinates of license plates, license plate text, and shooting conditions (angle, lighting, weather, etc.). Through this detailed labeling, performance could be analyzed in detail under various conditions.

4.2. Evaluation Metrics and Methodology

To evaluate vehicle and license plate detection performance, COCO evaluation metrics such as Average Precision (AP), AP.5, AP.75, and Average Recall (AR) were used. AP represents the area under the precision-recall curve, indicating the mean precision at various Intersection over Union (IoU) thresholds (0.5:0.05:0.95). AP.5 and AP.75 represent the AP at IoU thresholds of 0.5 and 0.75, respectively. AR refers to the average recall at various detection confidence thresholds.

These evaluation metrics are calculated based on precision and recall. Precision is the ratio of actual objects among detected objects, as shown in Equation (10). Recall is the ratio of detected objects among actual objects, as shown in Equation (11). Here, True Positive (TP) refers to correctly detected objects, False Positive (FP) refers to incorrectly detected objects, and False Negative (FN) refers to undetected objects.

P r e c i s i o n = \frac{T P}{T P + F P}

(10)

R e c a l l = \frac{T P}{T P + F N}

(11)

The performance of license plate corner detection was also evaluated using standard keypoint detection metrics: AP, AP.5, AP.75, and AR. In this case, Object Keypoint Similarity (OKS) is used instead of IoU to measure similarity based on the distance between predicted corners and actual corners. The Euclidean distance between predicted corners and actual corners is calculated as shown in Equation (12). Here,

(x_{i j}^{p r e d}, y_{i j}^{p r e d})

represents the coordinates of the predicted corner and

(x_{i j}^{g t}, y_{i j}^{g t})

represents the coordinates of the actual corner, while

i

indicates the image index and

j

indicates the corner index. Based on this Euclidean distance, OKS is expressed as shown in Equation (13).

d_{i j} = \sqrt{{(x_{i j}^{p r e d} - x_{i j}^{g t})}^{2} + {(y_{i j}^{p r e d} - y_{i j}^{g t})}^{2}}

(12)

O K S = \exp (- \frac{d_{i j}^{2}}{2 s^{2} k^{2}})

(13)

Here, d is the Euclidean distance calculated in Equation (12), s is the scale of the object (license plate) size (typically the square root of the object area), and k is a constant depending on the corner type. AP.5 is the AP calculated by considering only cases where the OKS value is 0.5 or higher as True Positive, and AP.75 is the AP calculated by considering only cases where the OKS value is 0.75 or higher as True Positive.

To objectively evaluate the performance of the proposed method, it was compared with various existing methods. For corner detection models, CSPNext [30], RTMPose [31], Uniformer [32] were selected, and for vehicle and license plate detection models, YOLOv7 [33] and YOLOv8 were selected. Through these various comparison targets, the superiority of the proposed method was verified from multiple angles.

4.3. Experimental Results and Analysis

4.3.1. Vehicle and License Plate Detection Performance

The results of the vehicle and license plate detection performance evaluation are presented in Table 2 and Table 3, respectively. The training process of YOLOv7 and YOLOv8 is shown in Figure 6. For vehicle detection, YOLOv8 demonstrated overall superior performance compared to YOLOv7. YOLOv8 showed approximately a 4.2% improvement in performance with an mAP of 0.689 compared to YOLOv7’s 0.661, and particularly showed about 31.3% improved performance in mAP_s for small objects with 0.474 compared to YOLOv7’s 0.361. Additionally, in terms of processing speed, YOLOv8 showed approximately 13.1% faster latency at 14.6 ms compared to YOLOv7’s 16.8 ms.

In license plate detection, YOLOv8 also showed superior performance compared to YOLOv7. YOLOv8 showed approximately 1.7% improvement in performance with an mAP of 0.704 compared to YOLOv7’s 0.692 and particularly showed about 131.7% improved performance in mAP_s for small objects with 0.373 compared to YOLOv7’s 0.161. In terms of processing speed, YOLOv8 also showed approximately 2.6% faster latency at 15.2 ms compared to YOLOv7’s 15.6 ms.

4.3.2. License Plate Corner Detection Performance

Table 4 presents the ablation study results to demonstrate the effectiveness of the parallel ensemble processing approach through multi-aspect ratio transformations and the CEM module proposed in this paper for accurate corner detection in license plate images with severe distortion.

The effectiveness of the parallel ensemble processing technique can be quantitatively confirmed through ablation studies. The process of each ablation study is shown in Figure 7 and Figure 8. When comparing the baseline CSPNext model (AP: 0.608) with the CSPNext + Ensemble model incorporating ensemble techniques (AP: 0.692), ensemble processing alone achieved approximately a 13.8% performance improvement. This demonstrates that parallel processing through various aspect ratio transformations significantly enhances robustness against geometric deformations of license plates. The ablation study results confirm the critical importance of the DCN (Deformable Convolution Network). The CEM module with a DCN applied (CSPNext + CEM) achieved an AP of 0.685, while the CEM module without a DCN (CSPNext + CEM(w/o DCN)) showed an AP of 0.652, representing approximately a 4.8% performance degradation. This indicates that DCNs can effectively handle geometric deformations and distortions in license plates. In terms of processing time, the final proposed model of CSPNext + CEM + Ensemble exhibits a latency of 32.6 ms, representing approximately a 91.8% increase compared to the baseline CSPNext model’s 17.0 ms. This corresponds to a processing speed of approximately 30.7 FPS, maintaining performance suitable for real-time processing while achieving significant accuracy improvement (approximately an 18.6% improvement from AP 0.608 to 0.721).

4.3.3. Validation in Real Road Environments

To verify the applicability of the proposed system in real road environments, a real-time license plate recognition system was built using Raspberry Pi 4, Raspberry Pi Camera Module v2, and a 10,000 mAh battery pack. The experimental procedure is shown in Figure 9. This system transmits images captured by the camera to a cloud server, where they undergo license plate detection, corner detection, license plate correction, and character recognition processes. The models used in each system include a YOLOv8-based vehicle and license plate detection model and a CSPNext + CEM + Ensemble-based Corner Detection model.

These real-time systems were installed in various environments such as parking lots and roads to evaluate their performance under actual conditions. License plate recognition performance was verified under various lighting conditions, weather conditions, and shooting angles, and it was confirmed that the proposed parallel ensemble processing technique operates effectively in real environments. The operation process of the system can be found in Figure 10 and Figure 11.

Homography verification results confirmed that the proposed CEM and parallel ensemble processing technique effectively detect the corners of various distorted license plates occurring in real environments, such as road environments and parking lots, and operate effectively in real environments.

5. Conclusions

In this paper, we developed a license plate recognition system for unconstrained environments and proposed a novel method that can effectively recognize license plates with large angular distortions. The developed license plate recognition system comprises vehicle and license plate detection, license plate corner detection, license plate correction, and text recognition stages. In the proposed system, we significantly improved the accuracy of license plate corner detection by integrating YOLOv8-based vehicle and license plate detection models with a CEM based on DCNs and parallel ensemble processing techniques utilizing various aspect ratios (original, 2:1, 1.5:1). Through ablation studies, we quantitatively confirmed the effectiveness of the parallel ensemble processing technique. Comparing the baseline CSPNext model (AP 0.608) with the CSPNext + Ensemble model (AP 0.692), the ensemble processing alone achieved approximately a 13.8% performance improvement. The ablation study results also confirmed the critical importance of DCN, where the CEM module with DCN (CSPNext + CEM) achieved an AP of 0.685, while the module without DCN showed an AP of 0.652, representing approximately a 4.8% performance degradation. The final proposed model of CSPNext + CEM + Ensemble achieved significant accuracy improvement from AP 0.608 to 0.721 (approximately 18.6% improvement), while maintaining a real-time processing capability with 32.6 ms latency (approximately 30.7 FPS). This indicates that the corner detection algorithm operated more accurately by effectively correcting the geometric characteristics of distorted license plates through the DCN’s adaptive receptive field and various aspect ratio transformations. We verified its applicability in real road environments by building a real-time license plate recognition system using Raspberry Pi 4 and a camera module and establishing an efficient real-time recognition system through a processing structure utilizing cloud servers. However, the current system has the limitation of a somewhat slower processing speed due to increased computational complexity from parallel ensemble processing. Future research will focus on further optimizing the model structure and improving performance under more diverse environmental conditions. Additional research is needed to improve performance in various environmental conditions, such as snow, rain, and fog, and to enhance adaptability to license plates from different countries. In conclusion, the license plate recognition system integrating a DCN-based CEM and the parallel ensemble processing techniques proposed in this study proved to achieve high recognition accuracy even in unconstrained environments, particularly under conditions with large angular distortions. This is expected to be utilized in various applications such as intelligent transportation systems, parking management, and road surveillance.

Author Contributions

Conceptualization, S.K.; methodology, K.S.; software, S.K.; validation, S.K. and K.S.; formal analysis, S.C.; investigation, J.K.; resources, S.K.; data curation, K.S.; writing—original draft preparation, S.K.; writing—review and editing, S.C., J.K. and K.S.; visualization, S.C.; supervision, K.S.; project administration, K.S.; funding acquisition, Kwangwoon University. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Kwangwoon University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are publicly available and can be accessed through the Roboflow plaka_tanima Object Detection. Further details regarding the dataset are provided in Section 4.1 of the article.

Acknowledgments

The present Research has been conducted by the Research Grant of Kwangwoon University in 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Du, S.; Ibrahim, M.; Shehata, M.; Badawy, W. Automatic license plate recognition (ALPR): A state-of-the-art review. IEEE Trans. Circuits Syst. Video Technol. 2012, 23, 311–325. [Google Scholar] [CrossRef]
Lubna; Mufti, N.; Shah, S.A.A. Automatic number plate Recognition: A detailed survey of relevant algorithms. Sensors 2021, 21, 3028. [Google Scholar] [CrossRef] [PubMed]
Xu, H.; Zhou, X.D.; Li, Z.; Liu, L.; Li, C.; Xie, J. EILPR: Toward end-to-end irregular license plate recognition based on automatic perspective alignment. IEEE Trans. Intell. Transp. Syst. 2021, 23, 11454–11467. [Google Scholar] [CrossRef]
Khan, I.R.; Ali, S.T.A.; Siddiq, A.; Khan, M.M.; Ilyas, M.U.; Rehman, H.Z.U.; Chaudary, M.H.; Nawaz, S.J. Automatic license plate recognition in real-world traffic videos captured in unconstrained environment by a mobile camera. Electronics 2022, 11, 1408. [Google Scholar] [CrossRef]
He, M.X.; Hao, P. Robust automatic recognition of Chinese license plates in natural scenes. IEEE Access 2020, 8, 164946–164961. [Google Scholar] [CrossRef]
Kim, T.G.; Yun, B.J.; Kim, T.H.; Lee, J.Y.; Park, K.H.; Jeong, Y.; Lee, J.H.; Kim, J.S.; Yoon, S.H.; Lim, J.S. Recognition of vehicle license plates based on image processing. Appl. Sci. 2021, 11, 6292. [Google Scholar] [CrossRef]
Alhussein, M.; Aurangzeb, K.; Haider, S.I. Vehicle license plate detection and perspective rectification. Elektron. Elektrotech. 2019, 25, 35–42. [Google Scholar] [CrossRef]
Lin, H.Y.; Li, Y.Q.; Lin, D.T. System implementation of multiple license plate detection and correction on wide-angle images using an instance segmentation network model. IEEE Trans. Consum. Electron. 2023, 70, 71–80. [Google Scholar] [CrossRef]
Yang, S.J.; Ho, C.C.; Chen, J.Y.; Tsai, C.Y. Practical homography-based perspective correction method for license plate recognition. In Proceedings of the International Conference on Information Science and Digital Content Technology, Jeju, Republic of Korea, 26–28 June 2012; pp. 232–237. [Google Scholar]
Yoo, H.; Jun, K. Deep homography for license plate detection. Information 2020, 11, 221. [Google Scholar] [CrossRef]
Yoo, H.; Jun, K. Deep corner prediction to rectify tilted license plate images. Multimed. Syst. 2021, 27, 807–815. [Google Scholar] [CrossRef]
Sihombing, D.P.; Nugroho, H.A.; Ardiyanto, I. Perspective rectification in vehicle number plate recognition using 2D-2D transformation of planar homography. In Proceedings of the International Conference on Science and Technology, Yogyakarta, Indonesia, 11–13 November 2015; pp. 1–5. [Google Scholar]
Plavac, N. Assessment of Deep Learning Algorithms for Automatic License Plate Recognition on Distorted Images. Master’s Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2024. [Google Scholar]
Pham, T.A. Effective deep neural networks for license plate detection and recognition. Vis. Comput. 2023, 39, 927–941. [Google Scholar] [CrossRef]
Risha, K.; Hemanth, J. A Structured Review of Vehicle Registration Number Plate Detection for Improvisation in Intelligent Transportation System: Special Study on Adverse Conditions. Int. J. Intell. Transp. Syst. Res. 2025, 1, 1–20. [Google Scholar] [CrossRef]
Liu, Y.Y.; Liu, Q.; Chen, S.L.; Chen, F.; Yin, X.C. Irregular License Plate Recognition via Global Information Integration. In Proceedings of the Conference on Multimedia Modeling, Amsterdam, The Netherlands, 29 January–2 February 2024; pp. 320–332. [Google Scholar] [CrossRef]
Jiang, Y.; Jiang, F.; Luo, H.; Lin, H.; Yao, J.; Liu, J.; Ren, J. An efficient and unified recognition method for multiple license plates in unconstrained scenarios. IEEE Trans. Intell. Transp. Syst. 2023, 24, 5376–5389. [Google Scholar] [CrossRef]
Dai, J.; Qi, Z.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar] [CrossRef]
Zhao, K.; Peng, L.; Ding, N.; Yao, G.; Tang, P. Deep representation learning for license plate recognition in low quality video images. In Advances in Visual Computing; Springer: Cham, Switzerland, 2025. [Google Scholar] [CrossRef]
Wang, Y.; Xu, S.; Wang, P.; Liu, L.; Li, Y.S.; Song, Z. Vehicle detection algorithm based on improved RT-DETR. J. Supercomput. 2025, 81, 290. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar] [CrossRef]
Hussein, A.M.A. Automatic Number Plate Recognition for Seamless Toll Charging Without Stopping. Ibn AL-Haitham J. Pure Appl. Sci. 2025, 38. [Google Scholar] [CrossRef]
Arjun, R.P.; Akshitha, R.; Ranjan, N. Innovation in Vehicle Tracking: Harnessing YOLOv8 and Deep Learning Tools for Automatic Number Plate Detection. In Computing Technologies for Sustainable Development; Springer: Cham, Switzerland, 2025. [Google Scholar] [CrossRef]
Pandey, R.; Maurya, P.; Saxena, P. Advancing Automatic Number Plate Recognition: Insights, Innovations, and Future Directions. In Proceedings of the 2025 IEEE 14th International Conference, Bhopal, India, 7–9 March 2025. [Google Scholar] [CrossRef]
Ashkanani, M.; AlAjmi, A.; Alhayyan, A.; Esmael, Z. A Self-Adaptive Traffic Signal System Integrating Real-Time Vehicle Detection and License Plate Recognition for Enhanced Traffic Management. Inventions 2025, 10, 14. [Google Scholar] [CrossRef]
Son, J.; Kang, S. Efficient improvement of classification accuracy via selective test-time augmentation. Inf. Sci. 2023, 642, 119148. [Google Scholar] [CrossRef]
Sherkatghanad, Z.; Abdar, M.; Bakhtyari, M.; Plawiak, P.; Makarenkov, V. BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging. arXiv 2024, arXiv:2406.17640. [Google Scholar] [CrossRef]
Ma, X.; Tao, Y.; Zhang, Y.; Ji, Z.; Zhang, Y.; Chen, Q. Test-Time Generative Augmentation for Medical Image Segmentation. arXiv 2024, arXiv:2406.17608. [Google Scholar] [CrossRef]
Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
Chen, X.; Yang, C.; Mo, J.; Sun, Y.; Karmouni, H.; Jiang, Y.; Zheng, Z. CSPNeXt: A new efficient token hybrid backbone. Eng. Appl. Artif. Intell. 2024, 132, 107886. [Google Scholar] [CrossRef]
Jiang, T.; Lu, P.; Zhang, L.; Ma, N.; Han, R.; Lyu, C.; Li, Y.; Chen, K. Rtmpose: Real-time multi-person pose estimation based on mmpose. arXiv 2023, arXiv:2303.07399. [Google Scholar] [CrossRef]
Li, K.; Wang, Y.; Zhang, J.; Gao, P.; Song, G.; Liu, Y.; Qiao, Y. Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12581–12600. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]

Figure 1. Structure of the DCN Module.

Figure 2. Structure of CEM.

Figure 3. CEM Applied Neck Structure of Corner Detection.

Figure 4. Aspect ratio adjustment parallel ensemble processing pipeline.

Figure 5. License plate recognition system pipeline.

Figure 6. The training loss curves: (a) YOLOv7-based Vehicle Detector; (b) YOLOv8-based Vehicle Detector; (c) YOLOv7-based License Plate Detector; (d) YOLOv8-based License Plate Detector.

Figure 7. Corner detection model training loss curve.

Figure 8. Corner detection model COCO/AP validation curve.

Figure 9. Validation in actual road and parking lot environments.

Figure 10. Validation results in real road environments: (a) Vehicle license plate detection in real road environments; (b) Corner Detection; (c) Homography-Based Distortion Correction; (d) License plate recognition based on OCR.

Figure 11. Validation results in parking lot environments: (a) Vehicle license plate input image in parking environment; (b) Corner detection; (c) Homography-based distortion correction; (d) License plate recognition based on OCR.

Table 1. Dataset overview.

	Corner Detection Dataset	License Plate Detection Dataset	Vehicle Detection Dataset
Number of Images	3619	16,188	24,914
Training Set	3319	16,060	23,575
Validation Set	301	128	1339
Resolution Range	302 × 192~600 × 450	640 × 640	640 × 640

Table 2. Vehicle detection performance comparison.

Model	Object	mAP	mAP_50	mAP_75	mAP_s	mAP_m	mAP_l	Latency (ms)
YOLOv7	Vehicle	0.661	0.871	0.769	0.361	0.716	0.749	0.0168
YOLOv8	Vehicle	0.689	0.890	0.807	0.474	0.704	0.725	0.0146

Table 3. License plate detection performance comparison.

Model	Object	mAP	mAP_50	mAP_75	mAP_s	mAP_m	mAP_l	Latency (ms)
YOLOv7	License Plate	0.692	0.874	0.790	0.161	0.609	0.807	0.0156
YOLOv8	License Plate	0.704	0.886	0.791	0.373	0.631	0.797	0.0152

Table 4. License plate corner detection performance comparison.

Model	AP	AP.5	AP.75	AR	Latency (ms)
CSPNext + CEM + Ensemble (Proposed)	0.721	0.789	0.737	0.778	32.6
CSPNext + CEM (Proposed)	0.685	0.750	0.700	0.739	21.3
CSPNext + Ensemble	0.692	0.759	0.708	0.745	29.8
CSPNext + CEM(w/o DCN)	0.652	0.735	0.686	0.729	19.8
CSPNext	0.608	0.769	0.680	0.812	17.0
RTM Pose	0.615	0.731	0.636	0.729	21.0
Uniformer	0.608	0.713	0.710	0.658	18.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, S.; Cho, S.; Kim, J.; Son, K. Corner Enhancement Module Based on Deformable Convolutional Networks and Parallel Ensemble Processing Methods for Distorted License Plate Recognition in Real Environments. Appl. Sci. 2025, 15, 6550. https://doi.org/10.3390/app15126550

AMA Style

Kim S, Cho S, Kim J, Son K. Corner Enhancement Module Based on Deformable Convolutional Networks and Parallel Ensemble Processing Methods for Distorted License Plate Recognition in Real Environments. Applied Sciences. 2025; 15(12):6550. https://doi.org/10.3390/app15126550

Chicago/Turabian Style

Kim, Sehun, Seongsoo Cho, Jangyeop Kim, and Kwangchul Son. 2025. "Corner Enhancement Module Based on Deformable Convolutional Networks and Parallel Ensemble Processing Methods for Distorted License Plate Recognition in Real Environments" Applied Sciences 15, no. 12: 6550. https://doi.org/10.3390/app15126550

APA Style

Kim, S., Cho, S., Kim, J., & Son, K. (2025). Corner Enhancement Module Based on Deformable Convolutional Networks and Parallel Ensemble Processing Methods for Distorted License Plate Recognition in Real Environments. Applied Sciences, 15(12), 6550. https://doi.org/10.3390/app15126550

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Corner Enhancement Module Based on Deformable Convolutional Networks and Parallel Ensemble Processing Methods for Distorted License Plate Recognition in Real Environments

Abstract

1. Introduction

2. Related Work

2.1. Deep Learning Based Real-Time License Plate Detection

2.2. License Plate Recognition in Real Road Environments

2.3. Test-Time Augmentation

3. Proposed Method

3.1. Corner Enhancement Module

3.2. Test-Time Augmentation-Based Parallel Ensemble Processing Technique

3.3. Homography and Image Correction

3.4. Overall Integrated System

4. Results

4.1. Experimental Environment and Dataset

4.2. Evaluation Metrics and Methodology

4.3. Experimental Results and Analysis

4.3.1. Vehicle and License Plate Detection Performance

4.3.2. License Plate Corner Detection Performance

4.3.3. Validation in Real Road Environments

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI