1. Introduction
In recent decades, there has been steadily increasing concern about the quality of transport, including driving safety, vehicle pollution reduction, green logistics, and carbon reduction [
1,
2,
3,
4]. To construct an intelligent transportation framework in a smart city, transportation management has been extensively applied to large fleets using monitoring devices. Large vehicles (i.e., trucks and big buses) are utilized for long-distance travel, and the issue of forward collision has been discussed for many years. Thus, the technology of vehicle detection and recognition was discussed to assist in driving safety. Broadly speaking, remote sensing refers to a method of measuring foreign objects without contacting the target. A vehicle’s front camera can be considered the most basic remote sensing instrument, and deep learning and machine learning are the subsets of artificial intelligence (AI) that can greatly improve a system’s ability to sense impending collisions. Since large vehicles take longer for breaking response than ordinary cars, an effective detection method can help the drivers of large vehicles have proper real-time responses and warn the drivers immediately before the collision to reduce the probability of traffic accidents.
In previous studies, most research displayed a driving simulator [
1,
3,
5,
6]. This paper, on the other hand, provides practical cases. The research question is to solve misjudged detection of moving vehicles from the front at different times of day and on various roads when the large vehicle fleet has limited funds and often chooses cheap front cameras supplemented with a traditional detection approach (i.e., Haar features). Therefore, this research motivation is to enhance the software technique of front-vehicle detection without changing the front camera equipment. This study used front cameras with a sensor that can help determine whether the driver almost hits a vehicle from the front using a deep learning approach. The sensor installed on the large vehicle is an important tool for monitoring the driving procedures. Most accident events pay no attention to the vehicle from the front. Unsafe driving statuses have caused many traffic issues, resulting in severe accidents and injuries. The drivers of large commercial buses must pay attention carefully to the safely conveying passengers. Big bus fleets must reduce the violations since the passengers are on the bus, and drowsy driving can easily cause major traffic accidents, which, in turn, can develop serious traffic jams and waste a lot of social resources for dealing with the road’s clearing and environmental pollution [
4,
7]. Abnormal driving, rear-end crashes, and carbon emissions are related. When a large vehicle has a vehicle in the front that takes a sudden turn, brakes sharply, and decelerates, energy is consumed in an economically inefficient way [
7,
8]. The improper driving information for prevention beforehand is necessary, especially on frequent accident road segments. Changing lanes on normal roads occurs more often than on highways. Since large vehicles have a longer response time for brakes than ordinary cars, the large vehicle needs to keep a greater distance when detecting for closing front vehicles. Therefore, detecting the front vehicle can notify the driver to respond more quickly using an image detection method. When the driver is exhausted during a long trip, providing the detection warning is necessary due to reducing awareness and reflexes from fatigue. Abnormal driving can also be alerted in advance. Warning monitoring for truck fleet drivers reduces forward collisions and ensures better driving safety, so traffic accidents are reduced, and social resources are not wasted [
4]. Efficient forward collision avoidance can conserve vehicle energy. Moreover, mobile devices can provide rapid identification of driving situations for enhancing safe dispatching work. The image classification technique can be used with the hardware monitoring, and the driving information uploaded to the cloud for management and analysis to assist the system in operation and maintenance.
Resource conservation is to reduce vehicle carbon emissions [
3,
4]. Large vehicles are a primary resource depletion. Large and small vehicles have different carbon emissions. Taking the speed of 50 kilometers per hour on the highway as an example, the carbon emission of a large bus (except on Highway No. 5) is 371.0016 (g/km), and the carbon emission of small passenger car is 171.5453 (g/km) [
9]. The large vehicle has a greater negative impact on the environment. Generally, if a front vehicle is detected and the large vehicle needs to slow down, its repeated decelerating and accelerating arbitrarily on the road, and its related carbon dioxide emissions produced, will affect the environment. Moreover, the monitoring uses sensors to control depletion. Considering the shortage of light sources at night, the camera from our management system can combine with the infrared design that emits red light onto forward objects. By using various sensors to collect large-scale data, the hardware equipment can enhance front-vehicle detection. However, it is expensive to change hardware and not all vehicle companies can or are willing to invest in hardware improvements [
2]. Monitoring equipment also increase energy use. The objective is to reduce invisible energy loss and the waste of social resources and achieve resource conservation. The camera should be selected according to the lighting spectrum; therefore, infrared (IR) spectra are typically used for night lighting.
There have been many studies that have explored various types of deep learning methods for vehicle detection. The early warning can avoid the vehicle accidents. Theoretically, the better method is to use high-resolution images. Cheaper digital video recorder (DVR) equipment often causes misjudgments due to various factors. Thus, this paper uses advanced algorithms to improve the traditional vehicle detection. Moreover, large and ordinary vehicles have different detection vision. Traditional methods and image data on the device side are limited from car kits, which cannot be adjusted in the short term. Many sensor devices on the market still have misjudgments in detecting front vehicles. When the original algorithm of the car kits cannot be replaced, the current solution is to use manual methods to perform secondary detection on the problematic images, but manual operation is cumbersome. When detection has many misjudged images, it takes a lot of time to process. Therefore, this research proposed a new method to process misjudged images and improve detection efficiency.
In addition, spatial information technology (SIT) can be used to develop large fleet management in smart transportation. This study also analyzed front vehicle detection, which relies on remote sensing (RS) and SIT to achieve social resource protection. This study used the monitoring driving system to protect the freights in transport. The vehicles installed a global positioning system (GPS) and front DVR camera. Thus, a case study of vehicle detection for a large vehicle fleet is the main research interest in this paper. The contribution of this study is to provide a valuable assistance for helping large vehicle drivers.
This paper is organized as follows.
Section 2 provides a literature review of remote sensing using DVR with NIR, region-based and regression-based algorithms, vehicle recognition, vehicle detection, machine learning, and deep learning. In
Section 3, we developed a new approach for improving the front-vehicle detection and compare with other methods.
Section 4 provides comparison results and analyses.
Section 5 shows the discussion, and
Section 6 provides the conclusions.
3. Research Methodology
Computer vision and image processing have many techniques, such as recognition and detection. This paper only focused on front-vehicle detection and the concept of remote sensing for fleet management. To integrate remote sensing, machine learning, and deep learning for large vehicle fleet management, the application of image technology for vehicle detection can be enhanced by AI algorithms. Commercial vehicles have more duties. Especially at different times of day and on varying road types, car accidents happen when drivers do not have rapid response and/or maintain sufficient distance to front vehicles. We initially had misjudgment detection problems of large vehicles in practice. Based on the review of several existing methods in [
1,
2,
18], we extended the video and image detection idea and used a single camera-based system that is preferred for large vehicle fleet management. The front camera installed in the large vehicle used a NIR LED lighting system. The detection technique can be affected by different road conditions for delivery. This paper uses the NIR LED lighting system in a DVR image system. The abnormal driving came from car kits on trucks, whereby we can send the images to the main host computer, and then recheck whether it is a false judgment, instead of manual judgment. The single camera-based system is through the lens and infrared light source. The large vehicle driver often delivers freight at night; thus, an NIR camera is a better choice for analyzing detection.
The angle and lights of a large vehicle can affect detection, and most companies do not have much money to pay for the front camera to provide better image quality. In the market, lower-tier DVR are selected under limited budgets. The installed lower-tier DVR performed the inaccuracy detection since the misjudgment happened from car kit. We assumed the DVR camera angle and position were fixed. For the DVR videos fixed on the front of the vehicle, the collected large vehicle data can be considered of use for front-vehicle detection.
When the hardware equipment cannot be changed, the software algorithm can improve detection ability. The vehicle detection technique was based on traditional Haar features extended to vehicle detection. For fleet management, the existing Haar features were used in the car kits for small-scale detection to determine whether there are vehicles and the approximate position of the vehicles within the images. Haar function is trained from a lot of images both positive and negative. Based on the training, it can be used to detect vehicles. They are huge individual .xml files with a lot of feature sets. Haar features used the subtracting concept for black and white regions. In other words, Haar features are pushed into feature templates that have white and black rectangles. The feature value of the template is the sum of white minus black rectangle pixels. Therefore, Haar features are a detection algorithm used to identify objects in an image or video and based on the concepts of features. The traditional bounding box regression uses translation and scale zooming to approximate the ground truth. If the size is too small or the coordinates of the bounding box are not on the right points, the features extracted in the following feature extraction steps will be inaccurate, and the recognition accuracy will decline, resulting in false detection.
To quickly determine the front vehicles, the Haar detection classifier is used to detect the vehicle ahead [
31], calculate the sum of pixels of each detection window, then take their difference, and use the difference as the feature for target classification. OpenCV is a popular library in image detection. This paper used python programming and imported the OpenCV library, which has CascadeClassifier with Cars.xml and uses detectMultiScale for front-vehicle detection [
52,
53]. The traditional Haar rectangle features, which are held by shallow learning can be computed using the integral image.
This paper adopted Darknet from YOLOv4 [
54] using deep learning for the study. This paper used CSPDarknet53 with Mish activation for Backbone to feature extraction, PANet plus SPP for Neck, V100 frame per second (FPS) (62@608×608, 83@512×512), and BFLOPs (128.5@608×608). YOLOv4 is an efficient single-stage object detection algorithm and uses one-stage to process many pictures. This study used the collected large vehicle data to test the front-vehicle detection. YOLOv4 consists of a backbone, neck, and head network. For file framework, there are seven documents that are used to deal with front-vehicle detection.
Figure 1 shows the detailed document description for large vehicle fleet management. The Videos document is to store videos for testing, and Positive and Negative documents are to store images. Xml is for running Haar features. Three other documents, including Darknet, YOLOv4, and YOLOv4_largevehicle, are used to implement YOLOv4.
Shen and Hsu proposed that eight points be used in the process of the projection transformation [
18]. Based on this idea, this paper added a fence method that can be used for determination after the YOLOv4 procedure. Fence is used to improve the original Yolov4 method to enhance the accuracy of front-vehicle detection; therefore, this paper adopted the concept of fence with YOLOv4 using the collected large vehicle data for enhancing detection. YOLOv4 is a multi-level feature learning approach. This paper will improve Haar features and the original YOLOv4 approach. This study proposed YOLOv4 with the fence method to improve detection accuracy. Based on
Figure 1, this study used four methods for testing the detections.
Figure 2 is the flowchart of the improving process, and the detailed four steps are described as follows.
Step 1: There are three combinations for four procedures. The Haar features used the trained Cars.xml from Xml document, YOLOv4 had trained cfg, data, weights files in the YOLOv4 document. We had the collected trained cfg, data, weights files in YOLOv4_largevehicle document. Haar features used Cars.xml for front-vehicle detection. The trained original YOLOv4 weights used the MS COCO dataset [
54]. The ranges of width and height are determined. This study used the collected large vehicle image data based on the YOLOv4 method for training.
Step 2: There are three YOLOv4-based methods named YOLOv4(I), YOLOv4(II), and YOLOv4(III). YOLOv4(I) is the original YOLOv4. YOLOv4(II) and YOLOv4(III) use the collected large vehicle image data for training. Based on the concept of [
2], the fence method can be used to improve the detection. Hence, the proposed YOLOv4(III) denotes YOLOv4 (II) supplemented with the fence method described in
Figure 3. The fence is trapezoidal. Since the original car kit equipment cannot be changed, the fence method can be used as a judgment to enhance the detection of whether there is a front vehicle and reduce the misjudgment.
Step 3: This step uses testing data from the collected large vehicle videos for testing front-vehicle detection. Positive denotes four methods that detect the front vehicle, and negative denotes four methods that do not detect the front vehicle. Those positive and negative images are produced using the OpenCV package.
Step 4: A comparison of four methods was made to provide guidance for helping large vehicle drivers.
Based on the concept of spatial data structure and polygons [
2], the face method is based on the box size determined by
,
and
. The x-coordinate is random sampling from [
,
] and y-coordinate is random sampling from [
,
]. The point (
,
) of the vehicle falls within the fence. The rectangle area can be expressed as:
where
,
,
and
denote four coordinates. Through setting the value of
and
, the trapezoid area was determined for large vehicles. The large vehicles take long-term trips from daytime to nighttime. In the night environment, the vehicle characteristics are not obvious due to insufficient light. There is different detection between highway and normal road. The fence method can improve the above situations used as the monitor for the road environment.
Positive denotes the large truck has a front vehicle, and negative denotes none. A confusion matrix is made of predicted outcome and actual conditions including true positive (TP), true negative (TN), false positive (FP), and false negative (FN) [
55]. TP denotes the front vehicle too close and detected. TN means that front vehicle is no vehicle and not detected. FP indicates that the front vehicle is no vehicle but detected. FN denotes that the front vehicle is too close but not detected.
Precision has positive and negative formulas and recall also includes positive and negative formulas. The F1 score can be computed by precision and recall. This study also does the comparison of detection time using the FPS formula [
56] to evaluate the detection speed. The equations used in this study are listed as follows.
In summary, driving on the roads for long periods can lead to driving attention reduction and the drivers cannot carefully control the vehicle. The proposed front-vehicle detection method can reduce the need for sudden turning, sharp braking, and arbitrarily accelerating/decelerating to reduce carbon emission. Machine and deep learning are expert systems that are knowledge-based intelligent systems. Moreover, there is a difference between indoor simulation and outdoor vehicle driving environments. This study used videos and images from actual outdoor vehicle driving environments; thus, this experiment will be closer to reality. The large vehicles equipped with a front camera used feature extraction to distinguish vehicle and non-vehicle. Ultimately, we proposed a new method to enhance the detection for pursuing low-carbon fleet management and to reduce the waste of social resources.
4. Results
In this section, we demonstrated four approaches for comparisons, including Haar features, YOLOv4(I), YOLOv4(II), and YOLOv4(III). This study used a front-camera device, which is based on 1280(H) × 960(V) with NIR LED lighting installed in front of large vehicles to collect the front-vehicle videos from the car kits. The traditional Haar feature method used the trained Car.xml, and the original YOLOv4 used the test-dev2017 dataset from MS COCO website. In
Figure 4, the collected large vehicle data included 1994 images that choose 86.26% for training and 13.74% for validation. The type of front images included a highway truck, highway car, normal truck, normal car, and no-vehicle images, which were available data we collected. Using Haar features, the scopes of x, y, w, and h were adjusted to avoid false detection, and the labeling was used for YOLOv4 before the training process. YOLOv4 has a useful tool, LabelImg [
57], to manually annotate the images. The number videos and images continually increases, but we generally did not redo the training model since the training takes time. We proposed that YOLOv4 with the fence method can be used to detect the front vehicle under limited videos or images.
The placement and angle of the front camera are the main factors affecting the detections since the view of the large vehicles is different to that of the smaller cars. It is difficult to detect vehicles between front-camera angle and light conditions. Thus, this paper added the fence method to improve detection.
Figure 5 shows the fence method that can obtain the trapezoid area to detect the front vehicle using Equations (1) and (2). In addition to its own bounding box, we added a trapezoid fence for enhancing detection using the case study of a large truck.
For the training process, we used a desktop PC running Microsoft Windows 10, with an Intel Core i7-11800H CPU at 2.30 GHZ, 16 GB of RAM, and NVIDIA GeForce RTX 3070 Laptop with a high-performance graphic processing unit (GPU). This study used Python programming language for development. The object detection usually adopts mean average precision (mAP) for measuring the accuracy. The loss function from YOLOv4 definition is the sum of complete intersection over union, confidence, and class losses [
54].
Figure 6 showed mAP% is 87.7% in the loss chart for the collected data file (YOLOv4_largevehicle.cfg) and the average loss is 0.1823 using max 2000 batches.
We sampled 10 large vehicles, which had 191 videos for sample testing from the fleet management system, and these videos were used for front-vehicle detection.
Table 1 shows the driving video data that included four different time types on both highway and normal roads.
The collected 191 videos were from large trucks, as shown in
Figure 7. The negative (no front vehicle or NoVehicle) data are more than positive (have front vehicle or HaveVehicle) data, especially on the highways. Generally, there are many traffic lights on normal roads. The videos of large trucks have a lot of chances to have front vehicles. Since the highway samples had 119 non-front vehicles, their
s were zeros. The working time of large trucks is different to that of ordinary vehicles, and most of the driving time is concentrated in the morning. Early morning had fewer front vehicles. Unlike ordinary vehicles, a very serious accident will result from being hit by a large vehicle. Drivers usually stay away from large vehicles. Fewer vehicles drive in front of large vehicles, especially on the highway. The drivers seem to stay away from large vehicles to keep a greater distance for safety. We observed detailed road and time factors for discussion using a classification method based on supervised learning. This study can use a decision tree to obtain the relation of front vehicles between road type and time. Decision tree is a popular tool from machine learning for classification. This study used R programming [
58] and employed Recursive Partitioning and Regression Trees (rpart) [
59] from the Classification and Regression Trees (CART) algorithm, and the toolkit partykit [
60] can be used for displaying CART tree visualization. For verifying the study,
Figure 8 shows the result from 191 videos using the rpart and partykit packages, of which the highway and early morning on normal roads had high non-vehicle rates.
This paper also adopted random forests, which is a classifier that combines multiple decision trees to make predictions, and the result was obtained using randomForest package [
61]. Out-of-bag (OOB) score is used to validate the model. In
Figure 9, the OOB error rate is 0.3141. Based on
Figure 7, HaveVehicle had higher errors since this study did not have much collected positive data. Therefore, the result showed that random forests obtained large errors under limited data. Since decision tree and random forests belong to shallow learning, this study will use deep learning to handle complex classification and detection problems.
This paper used three YOLOv4-based models, called YOLOv4(I), YOLOv4(II), and YOLOv4(III). There are 191 videos in total, and this study chose 8 videos of accuracy rates that were less than 70% based on Haar features (Haar) for comparing three YOLOv4 improvements. For accuracy rate,
Table 2 shows that the YOLOv4 supplemented with the fence (YOLOv4(III)) method had the highest accuracy rates. YOLOv4(III) with the fence method performed better on average than YOLOv4(II).
Table 2 shows that YOLOv4(I) in Run 7 was not better than Haar, and YOLOv4(II) in Run 3 was not better than YOLOv4(I). Based on
Table 2, this study also considered that the results of false omission rate were for comparisons. For false omission rate,
Table 3 shows that the YOLOv4(II) and YOLOv4(III) methods had lower false omission rates than Haar and YOLOv4(I). Therefore, our proposed YOLOv4(III) had higher average accuracy rates and lower false omission rates. Based on the
Table 2 and
Table 3 results, this study only focused on the proposed YOLOv4(III) for the discussion and compared with Haar; thus,
Table 4 summarizes the average accuracy of Haar and YOLOv4(III) using the 191 videos. YOLOv4(III) had higher average accuracy than Haar, and the standard deviation (Stdev) of YOLOv4(III) was slightly lower than that of Haar.
Based on the equations from [
55], the performance of precision, recall, and F1 was shown in
Table 5 and
Figure 9. Some
s and
s are zeros; thus, some results denoted the errors when the number was divided by zero. There are 119
s at zero, and the precision and recall of positive are zero. For the other 72 videos, we obtained the precision and recall values of Haar and YOLOv4(III). We listed the 13 of positive precision, recall and F1 for comparison in
Table 5. YOLOv4(III) had larger precision, recall, and F1. For negative cases, YOLOv4(III) performance of precision and F1 with 191 videos were better as shown in
Figure 10.
For the output of FPS equation [
56], the elapsed time of YOLOv4 was longest. The result showed one run of frame number and elapsed time from 191 videos and summarized the average values in
Table 6. The Haar technique’s frame number was slightly greater than YOLOv4(III), and its elapsed time was shorter. To observe the elapsed time in this experiment, Haar generally takes less time.
Comparing Haar with YOLOv4(III) for the FPS in
Table 7, we expanded the test runs; therefore, this study added other five runs,
. There are 191 videos for each run. For example, the average of Best FPS denotes
Table 7 shows that Haar had higher FPS and standard deviation (Stdev). Two methods of Stdev had large differences. Most FPSs of YOLOv4(III) are less than that of Haar.
Table 7 shows that the FPS results of Haar are unstable, whereas the YOLOv4(III) has stable FPS performance.
According to the average cases of Haar and YOLOv4(III), this study tested the two samples for variances [
62], as shown in
Table 8. We assumed the null hypothesis was in effect: variances of two populations are equal.
Table 8 shows that the
p-value of the two samples for variances test was 0.000, which is statistically significant at a
p < 0.05 level. Therefore, the variances were unequal.
Consequently, the
p-value of the independent
t test was 0.000, which is statistically significant at a
p < 0.05 level, as shown in
Table 9. Therefore, the proposed YOLOv4(III) can have the small standard deviations of FPS and it is more stable.
Since Haar caused some errors, this study could make additional improvements. We proposed a detection model based on YOLOv4 using the fence method, and the experimental results showed that our proposed approach can significantly improve performance.
6. Conclusions
Large vehicle driving is a long-time working task. For analyzing collected data, most front vehicles stayed away from the large vehicles by keeping a greater distance. Moreover, providing an improved front-vehicle detection approach can reduce need for sudden turning, sharp braking, collisions, and the wasting of social resources in large vehicle fleets. This paper improves the front-vehicle detection performance using large vehicle cases. The proposed YOLOv4(III) uses collected large vehicle data with a fence method based on the concept of spatial polygons, thereby enhancing the front vehicle detection, and reducing incorrect detections for higher detection accuracy. Thus, on the technical side, this study effectively reduced misjudgments. On the application side, this study can provide the drivers with more reaction time. Although the FPS of YOLOv4(III) took more time than traditional Haar features, the resulting benefits were high. The proposed YOLOv4(III) had higher accuracy, precision, recall, and F1 score than previous methods. The performance of YOLOv4(III) was better than the three other methods to enhance the front-vehicle detection. Using statistical analysis, Haar features had the higher FPS, but the YOLOv4(III) method had stable FPS performance. The proposed YOLOv4(III) had competitive performance metrics. Finally, this study provides a usable solution for enhancing front-vehicle detection in practice and reducing large vehicle carbon emissions.
For future studies, traffic signs can affect driving safety; thus, sign detection can be considered using YOLO detection. Front-vehicle and sign detections can be combined in large vehicle applications to enhance the YOLOv4(III) approach to handle big video and image data in fleet management systems and give alerts to drivers to maintain distance and appropriate speed, thereby reducing fuel consumption and invisible carbon emissions. Kasper-Eulaers et al. [
63] applied YOLOv5 to handle static heavy goods vehicles; thus, this future paper will use YOLOv5 for handling more moving vehicle detections. More mathematical methods will be used to evaluate and measure performance.