3.1. Evaluation Metrics
To sufficiently evaluate the performance of the proposed method, three evaluation indicators were applied. They were Recall, Precision, and mAP. Generally, the detected objects can be classified into four types based on the relationship between the predicted and true results, as shown in
Table 3.
TP (true positives) and TN (true negatives) mean the predicted result was the same as the true result. FN (false negatives) and FP (false positives) mean the predicted result was different from the true result. In terms of pavement cracks, the TP and TN were the correctly detected pavement cracks and non-cracks. The FP was the non-cracks wrongly detected as cracks. Correspondingly, the FN was the cracks wrongly detected as non-cracks. Based on
Table 4, the definition of these three evaluation indicators can be acquired.
As shown in Equation (4), the calculation of the precision denotes the percentage of the true positives amongst the predicted result consisting of true positives and false positives. The recall represents the percentage of true positives amongst the predicted result composed of true positives and false positives, as shown in Equation (5). Based on the precision and recall, the F1-score can be calculated as shown in Equation (6):
Furthermore, in this paper, the average precision, AP, was also employed to evaluate the performance of the developed method. This indicator can be obtained by calculating the area enclosed by a curve, whose x-axis and y-axis were recall and precision, respectively. The mAP, mean average precision, means the average of the AP for all categories.
3.2. Pavement Crack Detection Results
Generally, the networks can get better performance with more training epochs. However, this way can also result in the overfitting issue. In this paper, to avoid this issue, the coefficient weights were selected based on this principle: in the validation sets, the weights were selected when the model got the best performance, rather than the weights being selected in the last epoch. Then, employing the weight coefficients to detect the pavement cracks in the test sets, various indicators were applied to evaluate the performance among different pavement crack detection models.
The performance evaluation results are shown in
Table 5. According to Precision, adding attention modules can effectively improve the performance of detecting pavement cracks. Specifically, adding CoordAtt can obtain the best performance among these five models. Compared with the original model, the precision was higher by 4.36%. In addition, the performance of YOLOV5s-CoordAtt was also superior to YOLOV3 [
27].
As shown in
Table 5, various attention mechanisms have different impacts on pavement crack detection performance. In conclusion, the attention modules can enhance the crack detection performance. From the precision, it can be found that YOLOV5s-ECA performed better than YOLOV5s-SE. The reason for this result was that the ECANet replaced the full-connected layer with a convolution layer, reducing the computation load and improving the performance. Compared with YOLOV5s-CBAM, YOLOV5s-SE had a poor performance. This was because YOLOV5s-CBAM put the attention on the channel and spatial information at the same time. A pavement crack is a kind of object with irregular shapes. Attaching attention on its spatial information further contributed to extracting crack features. As for YOLOV5s-CoordAtt, this attention module added the location information into the channel attention information; this method can effectively improve the performance of pavement crack detection.
The test results are shown in
Figure 7, and the proposed method can accurately classify and detect these four types of pavement cracks in the images. In addition, from
Figure 7d, it can be found that the proposed model can also recognize pavement cracks under poor light conditions, such as on cloudy days.
From
Figure 7, it can be found that the proposed methods can detect these four types of pavement cracks accurately. However, there still existed some situations where the detection results showed that the categories of pavement cracks were confused by the proposed methods. As shown in
Figure 8a, the longitudinal cracks were recognized as potholes by mistake. Moreover, the longitudinal cracks can also be recognized as transverse cracks due to the shadow, as shown in
Figure 8b. These results showed that the proposed pavement crack detection method confused the longitudinal cracks with other kinds of cracks in some circumstances, due to the similarity of clustered longitudinal cracks and potholes or transverse cracks. Additionally, other noises in images, including lighting conditions, shadows, and pavement markings, make it more difficult for the proposed approach to recognize the types of pavement cracks.
Furthermore, to sufficiently test the performance of the proposed methods, the public dataset [
24] was also employed to evaluate the performance of the proposed methods and other existing approaches [
36,
37,
38]. The corresponding results are shown in
Table 6.
As shown in
Table 6, YOLOV5s fused with the Coordinate Attention module reached the highest performance compared to other methods in the public datasets. As shown in
Table 6, it can be found that the precision of FaseRCNN, YOLOV4, YOLOV3, YOLOV5, and YOLOV5s fusing with CoordAtt were 74.2%, 71.5%, 76.5%, 74.9%, and 77.8%, respectively. Besides that, the mAP of these five models were 59.2%, 52.7%, 70.3%, 67.1%, and 71.4%, respectively. In conclusion, the proposed method, YOLOV5s fusing with CoordAtt, can get the best performance compared to other existing approaches. In conclusion, from
Table 7, it can be seen that the proposed crack detection method (YOLOV5s fusing with CoodAtt) can get better performance than the existing models (the original YOLOV5s, YOLOV3, YOLOV4, and fasteRCNN). Besides that, we have also made a comparison with the self-building datasets between the proposed methods and the existing approaches shown in
Table 8. It can be found that the proposed method can also get the highest performance among various recent crack-detection-related approaches in the self-building datasets.