Recognizing Road Surface Traffic Signs Based on Yolo Models Considering Image Flips

: In recent years, there have been significant advances in deep learning and road marking recognition due to machine learning and artificial intelligence. Despite significant progress, it often relies heavily on unrepresentative datasets and limited situations. Drivers and advanced driver as‑ sistance systems rely on road markings to help them better understand their environment on the street. Road markings are signs and texts painted on the road surface, including directional arrows, pedestrian crossings, speed limit signs, zebra crossings, and other equivalent signs and texts. Pave‑ ment markings are also known as road markings. Our experiments briefly discuss convolutional neural network (CNN)‑based object detection algorithms, specifically for Yolo V2, Yolo V3, Yolo V4, and Yolo V4‑tiny. In our experiments, we built the Taiwan Road Marking Sign Dataset (TRMSD) and made it a public dataset so other researchers could use it. Further, we train the model to distinguish left and right objects into separate classes. Furthermore, Yolo V4 and Yolo V4‑tiny results can ben‑ efit from the “No Flip” setting. In our case, we want the model to distinguish left and right objects into separate classes. The best model in the experiment is Yolo V4 (No Flip), with a test accuracy of 95.43% and an IoU of 66.12%. In this study, Yolo V4 (without flipping) outperforms state‑of‑the‑art schemes, achieving 81.22% training accuracy and 95.34% testing accuracy on the TRMSD dataset.


Introduction
Technologies for traffic signs and road markings, which have recently risen to the top of the list of research priorities, are attracting a lot of attention.It has been widely accepted because of various research studies covering areas such as engineering, traffic safety, education, and human physical abilities [1,2].Due to the lack of empirical studies on the understanding of traffic signs and markings in Taiwan, such a study may need to be conducted.It is necessary for people who use roads to be able to identify, comprehend, and obey traffic signs and road marking signs to lower the number of accidents that occur on such roads.To provide unambiguous information, traffic signs are designed using several fundamentally distinct design styles.In addition, the background of many different buildings and shop signs makes it hard for the system to identify the street signs automatically.Thus, it becomes difficult to locate the road signs in many environments [3].On the other hand, earlier research has mostly concentrated on deciphering road signs and has given less consideration to drivers' acquaintance with and adherence to traffic regulations as a separate topic in the field of traffic safety investigation [4].Accidents, such as those caused by excessive speed or inappropriate lane changes, sometimes occur when drivers ignore or fail to detect a sign ahead.In addition, there are currently only a handful of Taiwan-adapted road sign detection systems, and in many studies, researchers collected traffic signs from locations around the world for analysis.
Machine learning-based approaches proposed by Poggenhans et al. [5] include employing optical character recognition to detect road marking signs and artificial neural networks (ANNs) to categorize them.For feature extraction, they make use of a histogram of markings.It is possible to categorize text ground markings by their method, such as crosswalks, stop lines, arrow signs, and other types of surface markings [6][7][8].
Danescu and Nedevschi [9] constructed an autonomous road marking identification and tracking system that used a two-step segmentation technique for detecting and recognizing road markings.Depending on the scenario, the accuracy of the classifications achieved ranges from 80 to 95 percent, with the lowest accuracy being 80 percent.Qin and colleagues used a machine vision approach to investigate four different types of road markings.Images of the marking contours were generated at random using image processing techniques and then extracted from the images.The classification and detection modules received the extracted features after they were sent to them.Methods such as You Only Look Once (Yolo) [10] and Single Shoot Detection (SSD) [11][12][13], as well as a region proposal-based method, are used to detect road markings on a map.The regionbased strategy surpassed the sliding window search method in terms of the number of suggestions received and the amount of time it took to complete the investigation [14].
To detect objects, the Yolo V4 algorithm, which is based on the cross-stage partial network (CSPNet), has been presented.As part of this study, a network scaling approach is utilized to adjust the depth, width, and resolution of the network in addition to the topology of the network, which ultimately led to the development of the Scaled-Yolo V4 algorithm.Yolo V4 was developed specifically for real-time object detection using general graphics processing units (GPUs).To get the best speed or accuracy trade-off, C.Y. Wang et al. [15] redesigned Yolo V4 to Yolo V4-CSP [16,17].
The following are some of the most important contributions that this research has made: (1) Includes a condensed explanation of CNN-based object recognition methods, with a special emphasis on the Yolo V2, Yolo V3, Yolo V4, and Yolo V4-tiny models.(2) Our experimental studies examine and evaluate several state-of-the-art object detectors, including those used to detect traffic signs, among other things.Vital parameters such as the mean acquisition time (mAP), the detection time (IoU), and the number of BFLOPS are measured in performance metrics.(3) Experimentally, we distinguish between left and right objects.Flip data augmentation can be disabled by setting flip = 0 in the configuration file for road signs.In this study, Yolo V4 (No Flip) outperformed state-of-the-art schemes, obtaining 81.22% accuracy in training and 95.34% accuracy in testing the Taiwan Road Marking Sign Dataset (TRMSD).( 4) We investigate the importance of the flip and no flip parameters in the Yolo configuration file and provide a full discussion of them.
The following is the organizational structure of this research study.Section 2 discusses the related works that have been published recently and the technique we propose, which is described in Section 3. Section 4 offers a description of the experiment as well as the results of the investigation.Section 5 discusses our conclusions as well as our future work.

Road Marking Recognition
Many scientists and engineers have conducted substantial studies into the automated recognition of road markings [18].The identification of road markings and signage has been the subject of earlier research, which has used a number of image processing methodologies [19].Two types of object detectors are commonly used: one-stage object detectors and two-stage object detectors with several stages.Using a single convolution neural network (CNN) operation, it is possible to obtain the output of a single-stage object detector.The high-score region proposals received from the first stage are typically fed into the second stage for two-stage object detectors.Neural networks have effectively recognized road markings [20].Kheyrollahi et al. suggested a solution based on inverse perspective mapping and multi-level binarization to extract robust road marking features [21].
Ding et al. [22] described a method for detecting and identifying road markers.To recognize five road marking signs, the researchers combined the properties of the histogram of oriented gradients (HOG) with those of a support vector machine (SVM).Alternately, a neural network is used to identify road markings, significantly increasing the precision of the system.Scientists say they are using hierarchical neural networks with backpropagation techniques as a learning process [12,23].
Road sign recognition based on the Yolo architecture has also attracted considerable attention, and several papers have discussed this topic.W. Yang et al. [24] tested Yolo V3 and Yolo V4 using the CSUST (Chinese Traffic Sign Detection Benchmark) dataset, which is divided into four categories: warning, speed limit, directional, and prohibitory signs.The experimental results showed that Yolo V4 outperformed Yolo V3 in target detection, with better performance in recognizing road signs and detecting small objects.D. Mijić et al. [25] proposed a suggested solution for traffic sign detection using Yolo V3 and a custom image dataset.L. Gatelli et al. [26] proposed a vehicle classification method suitable for use in Brazil that helps management personnel address social needs related to traffic safety.
In [27], there is a proposal for a further technique that makes use of machine learning and is intended to detect and classify road markings.This illustration made use of a binarized normed gradient network, a support vector machine (SVM) classifier, and a principal component analysis to identify and classify various types of objects.For imagerecognition networks to act like biological systems, convolution can improve the accuracy of the results obtained [28].A CNN has also been used to detect and categorize traffic signs [29,30].The latest deep learning methods, such as CNN, have effectively solved the object recognition problem.On the PASCAL VOC [31] dataset, detection frameworks such as Faster R-CNN [32] and Fast R-CNN demonstrate their superior detection performance.Faster R-CNN is one of these algorithms, which abandons classical selective search in favor of region proposal networks (RPNs) to attain superior performance.
Road marking identification requires real-time processing speed, and a faster R-CNN can solve the challenge, but it falls short in terms of inference speed.As an alternative, today's leading detection frameworks, such as SSD [12] and Yolo, may be inferred in realtime while remaining resilient for applications such as road marking detection.The Yolo V2, V3, V4, and V4-tiny methods were used to recognize road markings signs in Taiwan, which was the subject of our research.We also add the flip method to our data preprocessing to improve the performance of our proposed method.

You Only Look Once (YOLO)
A single-shot detection architecture, on the other hand, utilizes a single CNN as opposed to several CNNs to forecast numerous bounding boxes and the relevant classes concurrently.Yolo V2 is a single-shot detection framework that is at the forefront of the most recent technological developments.The ability to make real-time inferences is the most significant aspect of this detector system.The Yolo V2 detection framework surpasses the Faster R-CNN algorithm in terms of both the mean average precision and the inference speed.Yolo V3 is both an heir to and an improvement over Yolo V1 and Yolo V2 [33] in that it is both an inheritance and an improvement.In the Yolo V1 technique, input images are resampled to a predetermined size and divided into an m × m grid.According to preliminary findings, the network Darknet-53, which has a larger architecture than VGG-16, is more useful in gathering various and intricate knowledge from objects and, as a result, plays a vital role in enhancing the detection accuracy of Yolo V3.The Yolo V3 algorithm, which was proposed in 2018, was comprised entirely of CNN [34,35].
On top of that, each image is divided up, and bounding boxes and probability distributions are calculated for each grid cell [36,37].Yolo V3 is composed of 106 layers, and it generates predictions by mixing data from a variety of scales.The output image has a size of 416 pixels by 416 pixels, and it was produced by blending three different scales together [38].Moreover, the detection is carried out on three different layers simultaneously [39].The width and height measurements that were given to the computer were  13 × 13 and 26 × 26, respectively, as well as 52 × 52 [40].Yolo V4 [41], the most recent version in the Yolo series, was launched in April 2020 as a new edition of the Yolo network.
As a result, we tested Yolo V4 utilizing the integrated dataset in our trials, as we believe that model enhancement can lead to breakthroughs in accuracy and efficiency.C.Y. Wang et al. [42,43] redesigned Yolo V4 to Yolo V4-tiny to get the best speed or accuracy trade-off.A cross-stage partial network (CSPNet) is designed to attribute the problem to the same gradient information within network optimization.The complexity of the network optimization can be significantly reduced while maintaining accuracy [44][45][46].

Compared Method
Our systems are depicted in Figure 1.Particularly for the Yolo V2, Yolo V3, Yolo V4, and Yolo V4-tiny image processing systems, this study employs a number of CNN-based object detection techniques.Furthermore, we employ data augmentation, flip or no flip.Specifically, in this experiment, we train the model to distinguish among objects on the left and right sides of the screen that are classified as independent classes [47][48][49].
butions are calculated for each grid cell [36,37].Yolo V3 is composed of 106 layers, and it generates predictions by mixing data from a variety of scales.The output image has a size of 416 pixels by 416 pixels, and it was produced by blending three different scales together [38].Moreover, the detection is carried out on three different layers simultaneously [39].The width and height measurements that were given to the computer were 13 × 13 and 26 × 26, respectively, as well as 52 × 52 [40].Yolo V4 [41], the most recent version in the Yolo series, was launched in April 2020 as a new edition of the Yolo network.
As a result, we tested Yolo V4 utilizing the integrated dataset in our trials, as we believe that model enhancement can lead to breakthroughs in accuracy and efficiency.C.Y. Wang et al. [42,43] redesigned Yolo V4 to Yolo V4-tiny to get the best speed or accuracy trade-off.A cross-stage partial network (CSPNet) is designed to attribute the problem to the same gradient information within network optimization.The complexity of the network optimization can be significantly reduced while maintaining accuracy [44][45][46].

Compared Method
Our systems are depicted in Figure 1.Particularly for the Yolo V2, Yolo V3, Yolo V4, and Yolo V4-tiny image processing systems, this study employs a number of CNN-based object detection techniques.Furthermore, we employ data augmentation, flip or no flip.Specifically, in this experiment, we train the model to distinguish among objects on the left and right sides of the screen that are classified as independent classes [47][48][49].As a result of the fact that the program labels each category in a distinct manner, the BBox mark tool [34] was developed in order to generate a bounding box for the entire sign image.This box may have many markings.During the first stage of the experiment, each label was only compared to a single training model, and there was only one detector model used for the detection process.The majority of platforms for annotation support the Yolo labeling format, which results in a single annotation text file being produced for each image.Each object in the image is marked with a bounding-box (BBox) annotation in each text file.Each object in the image has its own annotation.They are based on the size of the image and range from 0 to 1.Each of them is represented in the following manner: <object-class-ID> <X center> <Y center> <Box width> <Box height>.The Equations ( 1)-( 6) serve as the foundation for the adjustment procedure [50].
H indicates for the height of the image, dh refers for the absolute height of the image, W serves for the width of the image, and dw represents for the absolute width of the picture.Algorithm 1 describes the Yolo V4 road marking detection process.

1.
Create grids with a size of (n × n) using the provided image data.

2.
Make a total of K bounding boxes and give an estimate of the number of anchor boxes in each box.

3.
Utilizing CNN, fully extract all object features from the image.

5.
Choose the optimum confidence IoU truth pred of the K bounding boxes with the threshold IoU thres .6.
If IoU truth pred means that the bounding box includes the object.Otherwise, the bounding box does not contain the object.

7.
Select the group that has the best projection of being true when using non-maximum suppression (NMS).8.
Shows the results of tests performed on road markings.
Nevertheless, NMS is structured in the following ways: First, arrange predictions according to their level of confidence in their accuracy.If we look at the predictions for the same class and find that the IoU with the existing prediction is more than 0.5, we have no choice but to start with the best rankings and ignore the prediction that is currently in place.The last stage of the process results in the production of a categorized image that bears a label indicating the class.

Experiment Setting
Table 1 explains our experiment setting.In addition, we used the default settings of Yolo V2, Yolo V3, Yolo V4, and Yolo V4-tiny.In this default setting, Yolo implements image flipping.Our experiment set flip = 0 for Yolo V4 and Yolo V4-tiny, which means not using flip when processing the image.

Data Pre-Processing
The Nvidia RTX 3080 GPU accelerator and an Intel i7-11700 Central Processing Unit (CPU) with eight core processors were utilized as the training model environment for the purpose of detecting and recognizing road markings.The random-access memory (RAM) is equipped with 32 gigabytes of DDR4-3200 memory.
The condition of the image being flipped is shown in Figure 2. A flipped image, commonly described as a reversed image, is a fixed or motion picture that is formed by mirroring the original image across a horizontal axis.A flipped image is one that has been reflected across the vertical axis [51,52].A flip is a feature in a photograph that enables us to rotate a picture horizontally or vertically.Furthermore, Figure 2a depicts the original image, Figure 2b depicts the flipped vertical image, Figure 2c depicts the flipped horizontal image, and Figure 2d

Data Pre-Processing
The Nvidia RTX 3080 GPU accelerator and an Intel i7-11700 Central Processing Unit (CPU) with eight core processors were utilized as the training model environment for the purpose of detecting and recognizing road markings.The random-access memory (RAM) is equipped with 32 gigabytes of DDR4-3200 memory.
The condition of the image being flipped is shown in Figure 2. A flipped image, commonly described as a reversed image, is a fixed or motion picture that is formed by mirroring the original image across a horizontal axis.A flipped image is one that has been reflected across the vertical axis [51,52].A flip is a feature in a photograph that enables us to rotate a picture horizontally or vertically.Furthermore, Figure 2a

Dataset
Furthermore, this experiment with the Taiwan road marking sign was performed using pictures that we collected from video and image sources.80% of the dataset is used for training, while 20% is used to test the results.The video was recorded by the dashboard camera during the daytime in Taichung, Taiwan.Table 2 displays the Taiwan Road Marking Sign Dataset (TRMSD).Besides, we use images that range from 391 to 409 for each class to avoid data imbalance.Therefore, the total number of images in our dataset is 6009, and their dimensions are 512 by 288.Our study included a total of 15 classes, numbered P1 through P15.These classes were as follows: "Go Straight, Turn Left", "Turn Right", "Turn Right or Go Straight", "Turn Left or Go Straight", "Zebra Crossing", "Slow Sign", "Overtaking Prohibited", "Barrier Line", "Cross Hatch", and "Stop Line".Our study also included the following speed limits: "40", "50", "60", and "70".Figure 3a depicts the labels of the TRMSD datasets, which contain 15 classes.Class P7, Class P2, and Class P9 consist of the most instances, totaling more than 400 instances.All classes in this data set have more than 300 instances, and Figure 3b illustrates the labels correlogram of the dataset.each class to avoid data imbalance.Therefore, the total number of images in our data is 6009, and their dimensions are 512 by 288.Our study included a total of 15 class numbered P1 through P15.These classes were as follows: "Go Straight, Turn Left", "T Right", "Turn Right or Go Straight", "Turn Left or Go Straight", "Zebra Crossing", "Sl Sign", "Overtaking Prohibited", "Barrier Line", "Cross Hatch", and "Stop Line".O study also included the following speed limits: "40", "50", "60", and "70".Figure 3a picts the labels of the TRMSD datasets, which contain 15 classes.Class P7, Class P2, a Class P9 consist of the most instances, totaling more than 400 instances.All classes in t data set have more than 300 instances, and Figure 3b illustrates the labels correlogram the dataset.each class to avoid data imbalance.Therefore, the total number of images in our data is 6009, and their dimensions are 512 by 288.Our study included a total of 15 class numbered P1 through P15.These classes were as follows: "Go Straight, Turn Left", "T Right", "Turn Right or Go Straight", "Turn Left or Go Straight", "Zebra Crossing", "Sl Sign", "Overtaking Prohibited", "Barrier Line", "Cross Hatch", and "Stop Line".O study also included the following speed limits: "40", "50", "60", and "70".Figure 3a picts the labels of the TRMSD datasets, which contain 15 classes.Class P7, Class P2, a Class P9 consist of the most instances, totaling more than 400 instances.All classes in t data set have more than 300 instances, and Figure 3b illustrates the labels correlogram the dataset.each class to avoid data imbalance.Therefore, the total number of images in our data is 6009, and their dimensions are 512 by 288.Our study included a total of 15 class numbered P1 through P15.These classes were as follows: "Go Straight, Turn Left", "T Right", "Turn Right or Go Straight", "Turn Left or Go Straight", "Zebra Crossing", "Sl Sign", "Overtaking Prohibited", "Barrier Line", "Cross Hatch", and "Stop Line".O study also included the following speed limits: "40", "50", "60", and "70".Figure 3a picts the labels of the TRMSD datasets, which contain 15 classes.Class P7, Class P2, a Class P9 consist of the most instances, totaling more than 400 instances.All classes in t data set have more than 300 instances, and Figure 3b illustrates the labels correlogram the dataset.each class to avoid data imbalance.Therefore, the total number of images in our data is 6009, and their dimensions are 512 by 288.Our study included a total of 15 clas numbered P1 through P15.These classes were as follows: "Go Straight, Turn Left", "T Right", "Turn Right or Go Straight", "Turn Left or Go Straight", "Zebra Crossing", "S Sign", "Overtaking Prohibited", "Barrier Line", "Cross Hatch", and "Stop Line".O study also included the following speed limits: "40", "50", "60", and "70".Figure 3a picts the labels of the TRMSD datasets, which contain 15 classes.Class P7, Class P2, a Class P9 consist of the most instances, totaling more than 400 instances.All classes in data set have more than 300 instances, and Figure 3b illustrates the labels correlogram the dataset.each class to avoid data imbalance.Therefore, the total number of images in our data is 6009, and their dimensions are 512 by 288.Our study included a total of 15 clas numbered P1 through P15.These classes were as follows: "Go Straight, Turn Left", "T Right", "Turn Right or Go Straight", "Turn Left or Go Straight", "Zebra Crossing", "S Sign", "Overtaking Prohibited", "Barrier Line", "Cross Hatch", and "Stop Line".O study also included the following speed limits: "40", "50", "60", and "70".Figure 3a picts the labels of the TRMSD datasets, which contain 15 classes.Class P7, Class P2, a Class P9 consist of the most instances, totaling more than 400 instances.All classes in data set have more than 300 instances, and Figure 3b illustrates the labels correlogram the dataset.each class to avoid data imbalance.Therefore, the total number of images in our data is 6009, and their dimensions are 512 by 288.Our study included a total of 15 clas numbered P1 through P15.These classes were as follows: "Go Straight, Turn Left", "T Right", "Turn Right or Go Straight", "Turn Left or Go Straight", "Zebra Crossing", "S Sign", "Overtaking Prohibited", "Barrier Line", "Cross Hatch", and "Stop Line".O study also included the following speed limits: "40", "50", "60", and "70".Figure 3a picts the labels of the TRMSD datasets, which contain 15 classes.Class P7, Class P2, a Class P9 consist of the most instances, totaling more than 400 instances.All classes in data set have more than 300 instances, and Figure 3b illustrates the labels correlogram the dataset.each class to avoid data imbalance.Therefore, the total number of images in our data is 6009, and their dimensions are 512 by 288.Our study included a total of 15 clas numbered P1 through P15.These classes were as follows: "Go Straight, Turn Left", "T Right", "Turn Right or Go Straight", "Turn Left or Go Straight", "Zebra Crossing", "S Sign", "Overtaking Prohibited", "Barrier Line", "Cross Hatch", and "Stop Line".O study also included the following speed limits: "40", "50", "60", and "70".Figure 3a picts the labels of the TRMSD datasets, which contain 15 classes.Class P7, Class P2, a Class P9 consist of the most instances, totaling more than 400 instances.All classes in data set have more than 300 instances, and Figure 3b illustrates the labels correlogram the dataset.each class to avoid data imbalance.Therefore, the total number of images in our data is 6009, and their dimensions are 512 by 288.Our study included a total of 15 clas numbered P1 through P15.These classes were as follows: "Go Straight, Turn Left", "T Right", "Turn Right or Go Straight", "Turn Left or Go Straight", "Zebra Crossing", "S Sign", "Overtaking Prohibited", "Barrier Line", "Cross Hatch", and "Stop Line".O study also included the following speed limits: "40", "50", "60", and "70".Figure 3a picts the labels of the TRMSD datasets, which contain 15 classes.Class P7, Class P2, a Class P9 consist of the most instances, totaling more than 400 instances.All classes in data set have more than 300 instances, and Figure 3b illustrates the labels correlogram the dataset.

Yolo Training Result
Figure 4 illustrates the training results for each model in the experiment.A to the findings of our research, utilizing a learning rate of 0.00261 for the analysis ing rate decay of 0.1 at each iteration, and a momentum learning rate of 0.949 for th helps improve the Yolo model while it is being trained.To overcome the problem fitting, we incorporate cross-validation and early stopping into our experiment d common procedure is to perform 5-fold cross-validation to obtain an out-of-sam diction error.The rules for early stopping indicate how many times a learner ca an activity before becoming overly proficient.Max batches = 30,000 iterations are this experiment, as well as the step policy: steps = 24,000 and 27,000, scales = 0 momentum = 0.949, decay = 0.0005, saturation = 1.5, exposure = 1.5, and mosaic = erally, m × class object detectors require a maximum batch size of 2000 × m for ex The training process is terminated in the experiment after 30,000 iterations (2000 × ses).Other variables are employed in the training process, including the current number and the scale (0.1, 0.1).This value is adjusted on a consistent basis, and mula for determining the current learning rate is as follows: learning rate × sca scales [1] = 0.00001.
The average loss value of Yolo V2 during the training stage is 0.1162, and the loss remains stable after 3000 epochs.Further, Yolo V2 exhibits a mAP of 76.75% IoU of 53.61%.Yolo V4 (No Flip) achieves the highest mAP, 81.22%, while train IoU at 65.98% and a loss value of 0.429.Followed by Yolo V4-tiny (No Flip) w 80.47% and IoU 63.79%.Next, Yolo V3 got a mAP value of 78.31% and an IoU of as shown in Table 3.In addition, Table 4 shows the training results for each cla P15.Classes P1, P6, P7, P8, P9, P11, and P15 achieve above 90% accuracy.The inte over union (IoU) metric is used to evaluate object detectors, as indicated in Equ [53,54].
Our method computes the area of overlap between the predicted bounding the ground truth bounding box, and this information is placed in the numerat

Yolo Training Result
Figure 4 illustrates the training results for each model in the experiment.According to the findings of our research, utilizing a learning rate of 0.00261 for the analysis, a learning rate decay of 0.1 at each iteration, and a momentum learning rate of 0.949 for the model helps improve the Yolo model while it is being trained.To overcome the problem of overfitting, we incorporate cross-validation and early stopping into our experiment design.A common procedure is to perform 5-fold cross-validation to obtain an out-of-sample prediction error.The rules for early stopping indicate how many times a learner can repeat an activity before becoming overly proficient.Max batches = 30,000 iterations are used in this experiment, as well as the step policy: steps = 24,000 and 27,000, scales = 0.1 and 0, momentum = 0.949, decay = 0.0005, saturation = 1.5, exposure = 1.5, and mosaic = 1.Generally, m × class object detectors require a maximum batch size of 2000 × m for execution.The training process is terminated in the experiment after 30,000 iterations (2000 × 15 classes).Other variables are employed in the training process, including the current iteration number and the scale (0.1, 0.1).This value is adjusted on a consistent basis, and the formula for determining the current learning rate is as follows: learning rate × scales [0] × scales [1] = 0.00001.
The average loss value of Yolo V2 during the training stage is 0.1162, and the training loss remains stable after 3000 epochs.Further, Yolo V2 exhibits a mAP of 76.75% and an IoU of 53.61%.Yolo V4 (No Flip) achieves the highest mAP, 81.22%, while training with IoU at 65.98% and a loss value of 0.429.Followed by Yolo V4-tiny (No Flip) with mAP 80.47% and IoU 63.79%.Next, Yolo V3 got a mAP value of 78.31% and an IoU of 58.62%, as shown in Table 3.In addition, Table 4 shows the training results for each class, P1 to P15.Classes P1, P6, P7, P8, P9, P11, and P15 achieve above 90% accuracy.The intersection over union (IoU) metric is used to evaluate object detectors, as indicated in Equation ( 7) [53,54].

IoU =
Area pred ∩ Area gt Area pred ∪ Area gt (7)    Our method computes the area of overlap between the predicted bounding box and the ground truth bounding box, and this information is placed in the numerator of the covariance matrix.Alternatively, the denominator is the area of union, which is defined as the area included by both the predicted bounding box and the ground truth bounding box in the same coordinate system.Taking the area of overlap and dividing it by the area of the union gives the result of IoU.There are three categories that the result examples can be placed into: (1) True positive (TP): the model predicted a label and matched it correctly as per ground truth.(2) False positive (FP): the model predicted a label, but it is not part of the ground truth.(3) True negative (TN): the model does not predict the label and is not part of the ground truth.The Equations ( 8) and ( 9) describe both the precision and the recall [55,56].
where I obj ij denotes whether the object appears in cell i, and I obj ij denotes that the j th bounding box predictor in cell i is responsible for the prediction.The next step is to utilize ( x, ŷ, ŵ, ĥ, ĉ, p) to represent the center coordinates of the predicted bounding box, as well as its breadth, height, confidence, and category likelihood.The vast majority of boxes are empty of their contents.This results in a problem known as class imbalance, in which we train the model to recognize background more frequently than we train it to recognize objects.As a solution, we reduce the significance of this loss by a factor of λnoobj = 0.5.Depending on the outcomes of the tests, it can be determined that the "No Flip" parameter has the potential to increase the performance of the Yolo V4 and Yolo V4-tiny results.In addition, the Yolo V4 mAP increased by 1.98%, from 93.55% to 95.43%.Furthermore, Yolo V4-tiny mAP rose 6.89% from 87.53% to 94.42%.In our experiment, we disabled flip data augmentation by setting flip = 0. We would like the model to be able to classify Left and Right objects as distinct classes.The road marking sign Class P11 recognition result is shown in Figure 5.All models can detect and recognize marks well, with varying accuracy.Yolo V4 (No Flip) can recognize two signs of P11 with 99% and 95% accuracy, respectively.

Result Discussions
Nevertheless, Figure 6 illustrates the recognition results for Class P8.Yolo V4 (No Flip) can recognize three signs with an accuracy of 88%, 90%, and 65%, respectively, as shown in Figure 6d.Furthermore, Yolo V4 and Yolo V4-tiny can only detect two signs of class P8 with the same image as in Figure 6c,f.The outcomes of the incorrect recognition of the road markings are depicted in Figure 7.Some models may not recognize all the markings in the image.They only recognize 1 sign, as shown in Figure 7a-f.The Yolo V4tiny underwent double detection, as shown in Figure 7e.Furthermore, this experiment's result can be applied to countries in Asia and other languages.Nevertheless, Figure 6 illustrates the recognition results for Class P8.Yolo V4 (No Flip) can recognize three signs with an accuracy of 88%, 90%, and 65%, respectively, as shown in Figure 6d.Furthermore, Yolo V4 and Yolo V4-tiny can only detect two signs of class P8 with the same image as in Figure 6c,f.The outcomes of the incorrect recognition of the road markings are depicted in Figure 7.Some models may not recognize all the markings in the image.They only recognize 1 sign, as shown in Figure 7a-f.The Yolo V4tiny underwent double detection, as shown in Figure 7e.Furthermore, this experiment's result can be applied to countries in Asia and other languages.

Conclusions
The purpose of this experiment is to provide a brief review of CNN-based object identification algorithms, with a special emphasis on the Yolo V2, Yolo V3, Yolo V4, and Yolo V4-tiny algorithms, as well as the Yolo V4-tiny algorithm.Our experimental studies examine and evaluate several state-of-the-art object detectors, including those used to detect traffic signs, among other things.The evaluation criteria measure important parameters such as the mean acquisition time (mAP), the detection time (IoU), and the number of BFLOPS.
Based on the results of our investigation, we came up with the following summary: (1) Yolo V4 and Yolo V4-tiny results can take advantage of the "No Flip" setting.In our scenario, we want the model to discriminate between Left and Right objects as distinct classes.(2) The best model in the experiment is Yolo V4 (No Flip), with a testing accuracy of 95.43% and IoU 66.12%.(3) We build our Taiwan Road Marking Sign Dataset (TRMSD).In the future, we will combine road marking recognition with explainable artificial intelligence (XAI) and test with another dataset.Furthermore, we will be upgrading our TRMSD dataset and focusing on pothole sign recognition.

Figure 1 .
Figure 1.An overview of the system.

Figure 1 .
Figure 1.An overview of the system.
depicts the original image, Figure 2b depicts the flipped vertical image, Figure 2c depicts the flipped horizontal image, and Figure 2d depicts both images.
depicts both images.

Table 3 .
Training performance results.

Table 4 .
Training performance results for each class.

Table 5
describes the test performance results of each algorithm.Yolo V4 (No Flip) showed the highest mAP of 95.43% in the experiment, with an IoU of 66.18%, precision of 83%, recall of 93%, and an F1-score of 91%.Next, Yolo V4 achieves a mAP of 93.55% with an IoU of 66.24%.Yolo V4-tiny (No Flip) showed 92.98% mAP and 64.7% IoU.Yolo V3 reached the lowest mAP value with a mAP value of 89.97% and an IoU of 61.98.augmentation as flip = 0.

Table 5 .
Testing performance results.

Table 6
describes the performance results of each class, P1 to P15.Class P6 obtained the highest average mAP of 100%, followed by Class P9 with 99.87% mAP, and Class P11 with 99.68% mAP.In contrast, a minimum mAP of 76.79% was obtained by Class P5.Class P5 is a Turn Left or Straight sign; this class is like Class P4 (Turn Right or Straight).Furthermore, Class P4 reaches 82.19% mAP, and this value is slightly different from Class P5.Overall, all classes in the experiment achieved a high mAP above 90%.Most of the marks for Speed Limit 60, Speed Limit 50, Speed Limit 40, and Speed Limit 70 receive higher mAP than those for other classes.This is because the model can recognize numbers well, and this class is very different from other classes.

Table 6 .
Testing performance results for each class.