Comparative Study on Rail Damage Recognition Methods Based on Machine Vision

Gao, Wanlin; Geng, Riqin; Wu, Hao

doi:10.3390/infrastructures10070171

Open AccessArticle

Comparative Study on Rail Damage Recognition Methods Based on Machine Vision

by

Wanlin Gao

^1,*,

Riqin Geng

² and

Hao Wu

³

¹

School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing 400074, China

²

School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing 400074, China

³

School of Civil Engineering, Chongqing Jiaotong University, Chongqing 400074, China

^*

Author to whom correspondence should be addressed.

Infrastructures 2025, 10(7), 171; https://doi.org/10.3390/infrastructures10070171

Submission received: 5 June 2025 / Revised: 27 June 2025 / Accepted: 2 July 2025 / Published: 4 July 2025

Download

Browse Figures

Versions Notes

Abstract

With the rapid expansion of railway networks and increasing operational complexity, intelligent rail damage detection has become crucial for ensuring safety and improving maintenance efficiency. Traditional physical inspection methods (e.g., ultrasonic testing, magnetic flux leakage) are limited in terms of efficiency and environmental adaptability. This study proposes a machine vision-based approach leveraging deep learning to identify four primary types of rail damages: corrugations, spalls, cracks, and scratches. A self-developed acquisition device collected 298 field images from the Chongqing Metro system, which were expanded into 1556 samples through data augmentation techniques (including rotation, translation, shearing, and mirroring). This study systematically evaluated three object detection models—YOLOv8, SSD, and Faster R-CNN—in terms of detection accuracy (mAP), missed detection rate (mAR), and training efficiency. The results indicate that YOLOv8 outperformed the other models, achieving an mAP of 0.79, an mAR of 0.69, and a shortest training time of 0.28 h. To further enhance performance, this study integrated the Multi-Head Self-Attention (MHSA) module into YOLO, creating MHSA-YOLOv8. The optimized model achieved a significant improvement in mAP by 10% (to 0.89), increased mAR by 20%, and reduced training time by 50% (to 0.14 h). These findings demonstrate the effectiveness of MHSA-YOLO for accurate and efficient rail damage detection in complex environments, offering a robust solution for intelligent railway maintenance.

Keywords:

rail damage; recognition method; deep learning; YOLO

1. Introduction

With the rapid growth in railway operating mileage and the increasing complexity of the operational environment, the demand for the safe operation and intelligent maintenance of railway lines has become increasingly prominent [1]. By the end of 2024, the operating mileage of urban rail transit in China is expected to reach 10,945.6 km. Among these systems, the rail serves as the core load-bearing component of the railway system, and its condition directly impacts the safety of train operations [2,3]. During actual operation, rails are prone to various types of damage, such as corrugations, scratches, and spalls, primarily caused by the cyclic effects of long-term vehicle dynamic loads [4,5,6,7]. As line mileage increases and operational environments grow more complex, the frequency of rail damage rises, and the manifestations of such damage become more diverse. These damages can reduce the structural stability of rails, impair the smoothness of vehicle operations, and significantly increase the costs associated with rail transit inspection and maintenance. Therefore, detecting rail damage enables effective evaluation of rail operational status, reduces operational safety risks, and decreases the maintenance costs of rail lines. Currently, although some railways employ rail detection vehicles for inspecting rail lines, these vehicles are primarily utilized for detecting rail alignment, obstacles, and irregularities. The majority of rail damage detection and identification are still performed manually. However, as the operating mileage of railway lines continues to increase, manual inspection methods are gradually being replaced by intelligent detection techniques due to their high consumption of human and material resources, as well as the inherent inaccuracy of empirical judgment. With the continuous advancement of non-destructive testing (NDT) technology, physical testing methods have been widely adopted for rail damage detection, including ultrasonic testing, magnetic flux leakage testing, and eddy current testing. In the field of ultrasonic testing, Wu et al. [8] proposed a parameter learning method based on multi-angle ultrasonic probe flaw detection images to identify deep rail damage. Campos-Castellanos et al. [9] conducted research on long-distance rail ultrasonic testing, achieving the detection of transverse damage at a depth of 2 mm in rails. Rizzo et al. [10] developed a rail detection model based on ultrasonic guided waves and non-contact probes, enabling automatic detection and classification of surface and internal cracks in rail heads. Liu et al. [11] proposed a CNN-based damage detection system for switch rails, which achieves over 91% testing accuracy and demonstrates strong generalization capability. In the area of magnetic flux leakage testing, Dong et al. [12] designed a detection system that integrates the Buckhausen effect, significantly enhancing the accuracy of electromagnetic NDT. Jia et al. [13] introduced a novel magnetic flux leakage detection method utilizing magnetization enhancement materials, which amplifies the detection signal of rail damage. Lonkwic et al. [14] developed a novel magnetic flux leakage (MFL) method to detect micrometer-level wear on lift guide rails, demonstrating its potential for quantitative damage assessment and sustainable maintenance practices. Gong et al. [15] introduced a magnetic field concentration method for detecting surface cracks, which significantly enhanced the magnetic field concentration effect in a uniform magnetic field environment. In the field of eddy current testing, Nafiah et al. [16] utilized feature extraction of pulsed eddy current signals to identify crack depth and orientation. Heckel et al. [17] developed an integrated ultrasonic and eddy current testing system for SPZ1 trains on German railways, achieving precise detection of rail damage at speeds of up to 80 km/h. Alvarenga et al. [18] developed an eddy current-based online rail defect detection system using convolutional neural networks, achieving 98% classification accuracy and significantly improving railway maintenance efficiency. Song et al. [19] established a correlation between the amplitude-roughness and phase-crack characteristics of detection coils, enabling the identification of hidden damage in rail heads. Despite these advancements, physical detection methods still face challenges such as low efficiency and limited environmental adaptability, which can significantly impact railway operational efficiency.

With the rapid advancement of machine vision inspection, deep learning-based object detection methods exhibit advantages such as fast detection speed, high recognition accuracy, and strong anti-interference capabilities, making them particularly suitable for rail damage detection in railway lines under complex environments. Wu et al. [20] employed a local contrast measurement method and an improved maximum entropy method to enhance the contrast of rail damage images. Li et al. [21] introduced a rail damage recognition approach based on local normalization, which significantly improved the recognition of images with subtle damage features. Hsieh et al. [22] integrated GoPro cameras with inertial navigation systems to achieve efficient detection, enabling real-time rail damage detection at speeds of up to 30 km/h. Feng et al. [23] developed two rail defect detection models with varying depths based on MobileNetV2 and MobileNetV3, effectively enhancing both the speed and performance of rail damage detection. Guan et al. [24] introduced a lightweight YOLO framework tailored to rail damage detection, taking into account both training and application requirements, thereby enhancing the accuracy of rail damage detection. Kaewunruen et al. [25] proposed a railway sleeper crack detection method by leveraging YOLOv5OBB, which attained a mean average precision (mAP) of 0.72 for crack detection and an accuracy of 92% for angle detection. Foret et al. [26] integrated damage geometric features, generality analysis, and edge sharpness detection, which substantially boosted the generalization capability of the model. Ding et al. [27] proposed GC-YOLO, an insulator damage detection method based on YOLOv5s, achieving more precise detection of small target damages but focusing solely on insulator verification. Koohmishi et al. [28] developed a GPR-InSAR-integrated railway track health monitoring method with machine learning, achieving network-wide defect diagnosis and improved underlying layer detection. Qiao et al. [29] developed a rail fastener damage detection model leveraging the Transformer architecture and local feature fusion, improving detection accuracy. Li et al. [30] presented the PEME threshold method for extracting defective regions, enabling effective separation of damage from background. Sresakoolchai et al. [31] developed a digital twin-enhanced deep reinforcement learning system for railway maintenance optimization, achieving a 21% reduction in maintenance workload and a 68% decrease in track defects. He et al. [32] enhanced the quality of rail damage images by combining an improved Perona–Malik diffusion coefficient method with adaptive threshold binarization. Mandriota et al. [33] devised a Gabor filter-based approach for the feature extraction and classification of rail surface damage, strengthening the ability to identify rail damage textures. Augusto Costa et al. [34] proposed a YOLO convolutional neural network-based method for rail damage detection, achieving 98% accuracy but validating only the detection of spalls. Shang et al. [35] introduced a railway image recognition method combining target localization with convolutional neural networks, significantly improving the recall rate of rail damage detection.

However, current research on image-based rail damage detection is limited by the scarcity of training samples, which significantly constrains the generalization capability of rail damage detection models. Moreover, existing target detection models for rail damage are predominantly designed for single-type damages. In complex environments involving multiple types of rail damage, further improvements and optimizations in detection performance remain necessary. In this study, four typical rail damage images were collected using a self-developed rail damage detection device, and the dataset was expanded through data augmentation techniques. Additionally, the comprehensive detection capabilities of the YOLO, SSD, and Faster R-CNN models under complex conditions with multiple types of rail damage were systematically compared. Finally, the MHSA module was integrated into the YOLO model to enhance its detection performance for rail damage.

2. On-Site Acquisition of Rail Damage

2.1. Acquisition Equipment for Rail Damage

Chongqing Metro is selected as the primary line for rail damage data acquisition in this study. A self-developed detection device is employed for image acquisition, which mainly consists of a support module, a travel module, and an acquisition module, as illustrated in Figure 1 [36]. First, the support module, constructed from a lightweight aluminum alloy, serves as the backbone of the entire rail damage detection device. It is designed to mount the components of both the travel module and the acquisition module. Second, the travel module comprises travel wheels that run on the rail surface and guide wheels that clamp onto either side of the rail waist. These components are responsible for the movement and guidance of the entire rail damage detection device, enabling it to operate at a speed of 2.5 km/h. The clamping wheels are spring-loaded to ensure adaptability to variations in the actual track gauge. Additionally, the acquisition module constitutes the core of the rail damage detection system, featuring an adjustable light source and an industrial camera. The light source is positioned on both sides of the acquisition box to stabilize illumination, while the industrial camera is mounted above the rail surface to capture images of rail damage.

2.2. Rail Damage Dataset

Through the tracking measurement of rail damage in Chongqing Metro Line 1, Line 9, and Line 10, four primary types of rail damage were identified: rail corrugation, rail spall, rail crack, and rail scratch, with rail corrugation being particularly prevalent. The rail corrugation is shown as periodic wavy deformation, and the wavelength is distributed in the range of 30–100 mm. The rail spall is manifested as the partial detachment of the surface material, with a length of about 15 mm and significant depth differences. The rail crack depth is more than 4 mm and the length is more than 15 mm, showing typical bifurcation characteristics [37]. The depth of the rail scratch is 0.5 mm–1.5 mm, and the length is about 100 mm. A total of 298 rail damage images were collected, including 134 images of rail corrugations, 69 images of rail spalls, 42 images of rail cracks, and 53 images of rail scratches. Given the limited number of rail damage images, which could hinder the effective training of target detection models for rail damage, an initial dataset of 298 images was augmented to 1556 images through image processing techniques such as rotation, translation, shearing, and mirroring. This resulted in the establishment of an enhanced sample database for the four typical types of rail damage, as illustrated in Figure 2. Specifically, the clipping operation is based on the minimum enclosed area of the target bounding box and randomly expands to the surrounding areas. The rotation angle is randomly taken within the range of ±5°. The clipping operation is randomly cut based on the distance from the bounding box to the edge of the image. The translation operation is randomly taken in the horizontal and vertical directions within 1/3 of the distance from the bounding box to the edge. The mirroring operation uses horizontal, vertical, and diagonal flipping, each with a 33% execution probability. To address the imbalance caused by the high proportion of rail corrugation images in the original dataset, efforts were made during the augmentation process to ensure the uniformity of the number of images for each type of damage as much as possible. Furthermore, the rail damage image dataset was divided into training, validation, and testing sets in a ratio of 6:2:2 to guarantee the comprehensiveness and fairness of subsequent model training and evaluation [36].

3. Deep Learning Networks for Rail Damage Detection

3.1. Object Detection Models

It is not only essential to determine the type of rail damage for effective detection but also to accurately locate and identify the damage. In this section, both classic single-stage object detection models such as YOLO and SSD, as well as the multi-stage object detection model Faster R-CNN, are selected for evaluation. Among these models, single-stage object detection models are particularly suited for real-time monitoring tasks due to their fast inference speed. Conversely, the multi-stage object detection model demonstrates superior performance in handling complex environments and detecting small targets, albeit at the cost of a relatively slower detection speed.

3.1.1. YOLO Model

The YOLO model is a single-stage object detection framework. Its core idea is to frame the object detection task as a unified regression problem, enabling end-to-end mapping from image pixels to bounding box coordinates and category probabilities [38,39,40]. The detection workflow of the YOLO model is illustrated in Figure 3 [41]. Initially, the input rail damage image is partitioned into an S × S grid structure. Each grid cell simultaneously predicts the location, confidence score, and categorical probability of potential rail damage bounding boxes [42]. Subsequently, the model predicts the targets and probabilities for each possible damage category within the grid and applies non-maximum suppression to eliminate redundant bounding boxes. Ultimately, the precise bounding boxes for various rail damage detection targets are generated as output [43]. In this study, the YOLOv8 model is primarily selected due to its superior detection efficiency and real-time performance.

3.1.2. SSD Model

The Single Shot MultiBox Detector (SSD) is a convolutional neural network designed for fast and accurate object detection. The architecture of the network is illustrated in Figure 4 [44]. The process begins with the input image depicting rail damage, which serves as the foundation for the entire detection procedure. Initially, ResNet50 is employed as part of the backbone network to extract damage features up to the conv4_3 layer and prior to the fc7 layer. Following the backbone network, feature extraction continues through a sequence of convolutional layers, yielding damage feature maps with dimensions of 10 × 10 × 512, 5 × 5 × 256, 3 × 3 × 256, and 1 × 1 × 256 after average pooling. These feature maps at different scales capture information at various levels within the damage image; smaller-sized maps possess stronger semantic information, while larger-sized maps retain more detailed information. Subsequently, the Detector and Classifier modules (Classifier1–Classifier6) is connected following each distinct scale of damage signatures. These modules are responsible for detecting and classifying rail damage on the corresponding scale’s feature map. Finally, the outputs from each detection and classification module are aggregated and forwarded to the non-maximum suppression module (NMS), where the bounding box with the highest confidence score is retained, resulting in the final precise rail damage detection results.

3.1.3. Faster R-CNN Model

Faster R-CNN is a representative multi-stage object detection model. Its core idea is to achieve end-to-end object detection via the Region Proposal Network (RPN). The detection pipeline of the Faster R-CNN model is illustrated in Figure 5 [45]. First, ResNet-50 serves as the backbone network to extract feature maps from the input rail damage images. Then, the RPN generates region proposals and uses a softmax classifier to determine whether each anchor contains a rail damage target, along with predicting bounding box coordinates. Subsequently, high-quality candidate regions are generated by regressing the anchor positions for more accurate localization. Next, the candidate regions of varying sizes are transformed into fixed-size feature maps through region-of-interest (RoI) pooling, preserving spatial information for further processing. Finally, precise rail damage detection results are obtained by classifying and refining the candidate regions using fully connected layers, which output the exact bounding box positions.

3.2. Detecting Performance Evaluation Indicators

In the quantitative evaluation of target detection model for rail damage, the detection accuracy, missed detection, and training efficiency are primarily selected as evaluation indicators to comprehensively assess the detection performance of each model [46].

In the evaluation of the detection accuracy, the AP and the mAP are measured, which comprehensively consider the detection performance of the target detection model across all categories. The AP is the detection accuracy comparison of each rail damage category. The mAP is the average value of the AP values of all categories. In both cases, the higher the value, the higher the detection accuracy.

A P = \int_{0}^{1} P (r) d r

(1)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(2)

P is for precision and r is for recall. P(r) is the precision value corresponding to the recall rate of R. N is the total number of damage categories in the object detection task, which is the number of rail damage types that the model needs to detect. The AP_i is the average precision of the i-th damage category, which is the average precision of each instance of rail damage calculated earlier.

In the evaluation of missed detections, they are measured by the mAR, which can reflect the ability of the object detection model to recall the real target for different rail damage categories. The higher the mAR value, the better the overall recall of the true target, which means the lower the missed detection rate.

m A R = \frac{1}{N} \sum_{j = 1}^{N} A R_{i}

(3)

N is the total number of categories in the object detection task, and AR_i is the AR value of the i-th category.

In the evaluation of training time, comparisons are made based on the duration of training. Training time is closely associated with the complexity of the target detection model, and achieving a reasonable balance between speed and accuracy remains an urgent issue to consider in optimizing the rail damage target detection model.

4. Comparative Study on Rail Damage Recognition Methods

4.1. Comparative Analysis

Referring to the evaluation index of the rail damage target detection model, based on an existing self-built enhanced database integrating rail damage images and labels, this section compares the detection accuracy, missed detections, and training time of three typical rail damage detection models: SSD, Faster R-CNN, and YOLOv8 [47]. The relevant training parameters and settings are listed in Table 1. First, from the perspective of detection accuracy, the performance of these three models on four typical types of rail damage, including rail corrugation, is compared, as shown in Figure 6. The results indicate that the YOLOv8 model exhibits slightly higher detection accuracy than the Faster R-CNN and SSD models. Specifically, the detection accuracy of the Faster R-CNN model is nearly identical to that of the SSD model, with the former being marginally superior in detecting certain types of rail damage, such as rail corrugation and rail scratch. Additionally, for rail corrugation detection, both the YOLOv8 model and the other two models achieve high accuracy rates of 0.9, 0.88, and 0.85, respectively. This outcome may be attributed to the fact that rail corrugation images constitute the largest proportion of the dataset used in the early construction of the rail damage database, reflecting the prevalence of rail corrugation issues in field measurements.

In addition, to more intuitively demonstrate the detection performance of different target detection models on four typical types of rail damages, the application effects of rail damage target detection are presented in Figure 7. Specifically, the detection results for the four types of rail damage using the SSD model are shown in Figure 7a. The detection results for the same damage types using the Faster R-CNN model are illustrated in Figure 7b, and those using the YOLO model are depicted in Figure 7c. Observations reveal that the SSD model exhibits some missed detections when identifying rail corrugations. The Faster R-CNN model demonstrates the highest accuracy for rail corrugation detection at 99%, while the YOLO model also achieves highly accurate results for this type of damage. Regarding rail scratch detection, the SSD model achieves an accuracy of 56%, whereas the Faster R-CNN model fails to detect rail scratches entirely due to missed detections. In contrast, the YOLO model successfully detects rail scratches with higher accuracy than the SSD model. For rail spall detection, which predominantly involves small-scale damages, the SSD model performs the worst, showing a higher rate of missed detections. Conversely, both the Faster R-CNN and YOLO models achieve satisfactory detection results for rail spalls. Finally, for rail crack detection, both the SSD and Faster R-CNN models fail to detect the cracks, while the YOLO model accurately identifies rail cracks and achieves good detection outcomes.

Furthermore, by considering detection accuracy, missed detections, and training time, the performance of three target detection models for rail damage is compared, as illustrated in Figure 8. Regarding detection accuracy, the Faster R-CNN model converges first as the number of training rounds increases. The SSD model exhibits a steadily growing overall curve and gradually converges. Initially, the YOLO model demonstrates the lowest growth rate with a highly oscillatory curve; however, it still achieves convergence in subsequent stages. Among these models, the mAP values are as follows: Faster R-CNN at 0.76, SSD at 0.74, and YOLOv8 achieving the highest value of 0.79, indicating the superior overall detection capability of the YOLO model. In terms of missed detections, the mAR values are as follows: Faster R-CNN at 0.55, SSD at 0.49, and YOLOv8 at 0.69. These results confirm that YOLOv8 has fewer missed detections, consistent with its previously demonstrated application effectiveness. Concerning training time, the Faster R-CNN model requires 0.86 h, the SSD model requires 0.51 h, and the YOLO model has the shortest training time at 0.28 h.

In summary, by comparing the mAP, mAR, and training time of the Faster R-CNN, SSD, and YOLO models, it is evident that the YOLO model demonstrates significant advantages over the Faster R-CNN and SSD models in terms of mAP, mAR, and training efficiency. Specifically, among various types of rail damage detection, the YOLO model achieves the best performance in detecting rail corrugation. However, the detection accuracy for other types of rail damage still requires further improvement.

4.2. Optimization of YOLO

4.2.1. Optimization Method

Although the YOLO model demonstrates superior detection performance compared to the Faster R-CNN and SSD models, its detection accuracy still requires further improvement due to its limited feature representation capabilities. Moreover, the MHSA module enhances the feature association of the detection model across different locations, effectively addressing the insufficient modeling capability of single-stage models within a finite feature subspace [48]. Consequently, this paper improves the YOLO model by integrating the MHSA module, as illustrated in Figure 9 [49]. The MHSA module is inserted between the convolutional layer and the fully connected layer with dimensions of 7 × 7 × 1024 because rich semantic and spatial features of rail damage are extracted after the preceding series of convolutional layers. Before entering the fully connected layer for classification and regression prediction, the MHSA module models the global relationships among features and integrates feature information from different regions. Within the MHSA module, queries, keys, and values generated by small blocks with positional encoding undergo linear transformations, producing multiple sets of submatrices [50]. These submatrices are then fed into multiple parallel scaled dot-product attention modules, and the outputs of each module are concatenated and linearly transformed to generate the final output. Additionally, the MHSA-YOLO model replaces the C3 module in the backbone with the C2f module to embed the MHSA module and adjusts the number of blocks from 3-6-9-3 to 3-6-6-3. In summary, the MHSA-YOLOv8 combines the object detection capabilities of the YOLO model with the global feature extraction advantages of the MHSA module, significantly enhancing the detection performance of rail damage in complex environments.

4.2.2. Optimization Results

In this section, a detailed comparison is made between the YOLO model and the MHSA-YOLO model in terms of detection accuracy, missed detections, and training time. First, regarding detection accuracy, the performance of the YOLO and MHSA-YOLO models on four typical types of rail damage, including rail corrugation, is compared, as illustrated in Figure 10. It can be observed that the MHSA-YOLO model outperforms the YOLO model in detection accuracy. Specifically, both models demonstrate the highest accuracy for detecting rail corrugation, achieving accuracies of 0.9 and 0.95, respectively. For rail scratch detection, the MHSA-YOLO model improves accuracy by 11%, reaching 0.91. In the case of rail spall detection, the MHSA-YOLOv8 model achieves an accuracy of 0.88, which is 15% higher than the YOLOv8 model’s accuracy. Lastly, for rail crack detection, the MHSA-YOLOv8 model attains an accuracy of 0.82, representing an 8% improvement over the YOLOv8 model.

In addition, Figure 11 illustrates the actual detection performance of the optimized MHSA-YOLOv8 model on four typical types of rail damage. Specifically, for rail corrugation detection, the MHSA-YOLOv8 model demonstrates superior accuracy compared to the YOLOv8 model, achieving an accuracy rate of 92%. Regarding rail scratch detection, the MHSA-YOLOv8 model exhibits enhanced precision, with an accuracy rate of 78%, which is notably higher than that of the YOLOv8 model. For rail spall detection, the MHSA-YOLOv8 model achieves a significant improvement in accuracy, increasing the overall detection accuracy by approximately 10% compared to the YOLOv8 model. Lastly, in terms of rail crack detection, the MHSA-YOLOv8 model outperforms the YOLOv8 model substantially, attaining an overall accuracy rate of approximately 90%.

Furthermore, the detection accuracy, missed detection rate, and training time of the YOLO model before and after optimization were comprehensively compared, as shown in Figure 12. In Figure 12a, there is little difference between the YOLO model and the MHSA-YOLO model in the early stages of training. However, during the later convergence stage, the MHSA-YOLO model demonstrates significantly higher performance than the YOLO model. Specifically, the mAP reaches 0.89, which is 10% higher than that of the YOLO model. In Figure 12b, the mAR of the MHSA-YOLO model remains consistently higher than that of the YOLO model throughout the entire training process, indicating a substantial improvement in reducing missed detections for the optimized model. Additionally, regarding training time, as illustrated in Figure 12c, the MHSA-YOLO model requires only 0.14 h for training, which is 50% less than the training time required by the YOLOv8 model.

In summary, the integration of the MHSA model into the YOLO model significantly enhances the detection of rail damage targets, as evidenced by substantial improvements in mAP and mAR and reduced training time.

5. Conclusions

This study conducts a comparative analysis of deep learning models for rail damage recognition, utilizing an augmented dataset of four typical damage types (corrugation, spall, crack, scratch). The YOLOv8, SSD, and Faster R-CNN models were evaluated under unified metrics (mAP, mAR, training time). The YOLOv8-based MHSA optimization further enhanced detection capabilities. The key conclusions are as follows:

(1) Among the three rail damage detection models of YOLO, SSD, and Faster R-CNN, the YOLO model exhibits the highest detection accuracy and fastest training speed. Its mAP value reaches 0.79, higher than the values of 0.76 for Faster R-CNN and 0.74 for SSD. The training time for YOLO is only 0.28 h, much less than 0.86 h for Faster R-CNN and 0.51 h for SSD.

(2) The integration of the MHSA module into the YOLO model significantly enhances its detection performance. The mAP of the optimized MHSA-YOLO model increases by 10% compared to the original YOLO model. The mAR also shows substantial improvement, and the training time is reduced by 50%.

(3) For different types of rail damage, the YOLO model and its optimized version both achieve the highest detection accuracy for rail corrugation. However, the detection accuracy for other types of rail damage still needs further improvement.

(4) Future research will focus on collecting more diverse rail damage images to expand the dataset, which include images from different operating environments and weather conditions. Efforts will be made to integrate multiple deep learning models with optimized algorithms, aiming to enhance detection accuracy and generalization capabilities for various types of rail damage. Additionally, the impact of different environmental conditions on rail damage detection will be systematically investigated, with a focus on developing a more robust and reliable rail damage detection system. Furthermore, we plan to conduct comparative experiments on different parameter combinations to determine the optimal detection parameters, addressing the current gaps in parameter research and correlation analysis.

Author Contributions

Conceptualization, methodology and writing—original draft preparation, W.G.; validation and data curation, R.G.; investigation and visualization, H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Innovation and Entrepreneurship Training Project of Chongqing Jiaotong University, grant number X202410618053.

Data Availability Statement

The data that has been used is confidential.

Acknowledgments

We thank Yicai Li and Jiawei Wang from Chongqing Jiaotong University for their guidance on code implementation in the process of algorithm optimization. We are grateful for the patient guidance of Xiaolu Cui from Chongqing Jiaotong University, as she provided valuable suggestions for the framework and content presentation of the article in the process of conceiving and writing the paper.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Zhao, L.; Mou, M.; Chen, D.; Zhong, M. Research on the influence of surface defects under the influence of rail corrosion on the fatigue damage of wheel rolling contact. Coatings 2025, 15, 589. [Google Scholar] [CrossRef]
Cui, X.; Yin, Y.; Xu, X.; Peng, S.; Qi, Y.; Linghu, J.; Qi, W. Comparative Study on the Causes of Rail Corrugations in Long Steep Grade Sections Under Traction and Braking Conditions. Veh. Syst. Dyn. 2025, 63, 964–977. [Google Scholar] [CrossRef]
Li, T.; Cui, X.; Zhao, X.; Zhang, H.; Li, J.; Qi, W. Three-Dimensional Reconstruction Method of Rail Corrugation Based on Vision Detection. Tribol. Trans. 2024, 67, 411–422. [Google Scholar] [CrossRef]
Sung, D.; Hong, S.; Lee, J. Numerical analysis of the rail surface crack propagation under rail uplift force: A parametric study on initial crack geometry. Eng. Fail. Anal. 2023, 153, 107542. [Google Scholar] [CrossRef]
Cui, X.; Zhong, Y.; Ding, H.; Zhang, H.; Li, X.; Linghu, J.; Guo, L. Evolution Mechanism of Rail Corrugation in Small-Radius Curve Section of Mountainous City Metro. Wear 2025, 570, 205943. [Google Scholar] [CrossRef]
Lu, C.; Yin, J.; Zhang, Q.; Fu, Q. Wear analysis of brake lining in running-in stage for rail vehicles. Surf. Technol. 2023, 51, 63–71. [Google Scholar] [CrossRef]
Pham, D.; Ha, M.; Xiao, C. A Novel Visual Inspection System for Rail Surface Spalling Detection. In Proceedings of the 2020 7th International Conference on Advanced Materials, Mechanics and Structural Engineering, Taipei, Taiwan, 25–27 September 2020. [Google Scholar] [CrossRef]
Wu, F.; Wei, Y.; Li, Q. Damage detection and parameter learning method for high speed rail ultrasonic imaging. Comput. Integr. Manuf. Syst. 2021, 27, 747–756. [Google Scholar]
Campos-Castellanos, C.; Gharaibeh, Y.; Mudge, P.; Kappatos, V. The application of long range ultrasonic testing (LRUT) for examination of hard to access areas on railway rails. In Proceedings of the 5th IET Conference on Railway Condition Monitoring and Non-Destructive Testing, London, UK, 28–30 November 2011. [Google Scholar] [CrossRef]
Rizzo, P.; Cammarata, M.; Bartoli, I.; Di Scalea, F.L.; Salamone, S.; Coccia, S.; Phillips, R. Ultrasonic guided waves-based monitoring of rail head: Laboratory and field tests. Adv. Civ. Eng. 2010, 2010, 291293. [Google Scholar] [CrossRef]
Liu, W.; Wang, S.; Yin, Z.; Tang, Z. Structural Damage Detection of Switch Rails Using Deep Learning. NDT E Int. 2024, 147, 103205. [Google Scholar] [CrossRef]
Dong, L. Research on Nondestructive Detection of Electromagnetic Fusion Based on BP Neural Network. Ph.D. Thesis, Chongqing University, Chongqing, China, 2018. [Google Scholar]
Jia, Y.; Liang, K.; Wang, P.; Ji, K.; Xu, P. Enhancement method of magnetic flux leakage signals for rail rail surface defect detection. IET Sci. Meas. Technol. 2020, 14, 711–717. [Google Scholar] [CrossRef]
Lonkwic, P.; Krakowski, T.; Ruta, H. Use of Magnetic Flux Leakage to Diagnose Damage to a Lift Guide Rails System with Reference to the Sustainability Aspect. Sustainability 2024, 16, 1980. [Google Scholar] [CrossRef]
Gong, W.; Akbar, M.F.; Jawad, G.N.; Shrifan, N.; Zhang, F. A magnetic field concentration method for magnetic flux leakage detection of rail-top surface cracks. IEEE Access 2024, 12, 43245–43254. [Google Scholar] [CrossRef]
Nafiah, F.; Sophian, A.; Khan, M.R.; Abidin, I.M.Z. Quantitative evaluation of crack depths and angles for pulsed eddy current non-destructive testing. Non-Destr. Test. Eval. Int. 2019, 102, 180–188. [Google Scholar] [CrossRef]
Heckel, T.; Thomas, H.-M.; Kreutzbruck, M.; Rühe, S. High speed non-destructive rail testing with advanced ultrasound and eddy-current testing techniques. In Proceedings of the NDTIP Proceedings, Prague, Czech Republic, 12–14 October 2009. [Google Scholar]
Alvarenga, T.; Carvalho, A.; Honorio, L.; Cerqueira, A.; Filho, L.; Nobrega, R.A. Detection and Classification System for Rail Surface Defects Based on Eddy Current. Sensors 2021, 21, 7937. [Google Scholar] [CrossRef]
Song, Z.; Yamada, T.; Shitara, H.; Takemura, Y. Detection of damage and crack in railhead by using eddy current testing. J. Electromagn. Anal. Appl. 2011, 3, 546–550. [Google Scholar] [CrossRef]
Wu, F.; Mao, Z.C. Image enhancement and segmentation algorithm for rail surface defects. Comput. Simul. 2015, 32, 159–162. [Google Scholar] [CrossRef]
Li, Q.; Ren, S. A real-time visual inspection system for discrete surface defects of rail heads. IEEE Trans. Instrum. Meas. 2012, 61, 2189–2199. [Google Scholar] [CrossRef]
Hsieh, C.-C.; Hsu, T.-Y.; Huang, W.-H. An online rail rail fastener classification system based on YOLO models. Sensers 2022, 22, 9970. [Google Scholar] [CrossRef]
Feng, J.H.; Yuan, H.; Hu, Y.Q.; Lin, J.; Liu, S.W.; Luo, X. Research on deep learning method for rail surface defect detection. IET Electr. Syst. Transp. 2020, 10, 436–442. [Google Scholar] [CrossRef]
Guan, L.; Jia, L.; Xie, Z.; Yin, C. A lightweight framework for obstacle detection in the railway image based on fast region proposal and improved YOLO-tiny network. IEEE Trans. Instrum. Meas. 2022, 71, 5009116. [Google Scholar] [CrossRef]
Kaewunruen, S.; Adesope, A.; Huang, J.; You, R.; Li, D. AI-Based Technology to Prognose and Diagnose Complex Crack Characteristics of Railway Concrete Sleepers. Discov. Appl. Sci. 2024, 6, 217. [Google Scholar] [CrossRef]
Foret, P.; Kleiner, A.; Mobahi, H.; Neyshabur, B. Sharpness-aware minimization for efficiently improving generalization. arXiv 2020, arXiv:2010.01412. [Google Scholar] [CrossRef]
Ding, L.; Rao, Z.Q.; Ding, B.; Li, S.J. Research on defect detection method of railway transmission line insulators based on GC-YOLO. IEEE Access 2023, 11, 102635–102642. [Google Scholar] [CrossRef]
Koohmishi, M.; Kaewunruen, S.; Chang, L.; Guo, Y. Advancing railway track health monitoring: Integrating GPR, InSAR and machine learning for enhanced asset management. Autom. Constr. 2024, 162, 105378. [Google Scholar] [CrossRef]
Qiao, X.; Huang, W. A dual frequency transformer network for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 10344–10358. [Google Scholar] [CrossRef]
Li, Q.; Ren, S. A visual detection system for rail surface defects. IEEE Trans. Syst. Man Cybern. C 2012, 42, 1531–1542. [Google Scholar] [CrossRef]
Sresakoolchai, J.; Kaewunruen, S. Railway infrastructure maintenance efficiency improvement using deep reinforcement learning integrated with digital twin based on track geometry and component defects. Sci. Rep. 2023, 13, 2439. [Google Scholar] [CrossRef]
He, Z.; Wang, Y.; Yin, F.; Liu, J. Surface defect detection for high-speed rails using an inverse PM diffusion model. Sens. Rev. 2016, 36, 86–97. [Google Scholar] [CrossRef]
Mandriota, C.; Nitti, M.; Ancona, N.; Stella, E.; Distante, A. Filter-based feature selection for rail defect detection. Mach. Vision Appl. 2004, 15, 179–185. [Google Scholar] [CrossRef]
Augusto Costa, J.; Carmona Cortes, O. A convolutional neural network for detecting faults in power distribution networks along a railway: A case study using YOLO. Appl. Artif. Intell. 2021, 35, 2067–2086. [Google Scholar] [CrossRef]
Shang, L.; Yang, Q.; Wang, J.; Li, S.; Lei, W. Detection of rail surface defects based on CNN image recognition and classification. In Proceedings of the 2018 20th International Conference on Advanced Communication Technology, Pyeongchang, Republic of Korea, 11–14 February 2018. [Google Scholar] [CrossRef]
Zhang, H.; Cui, X.; Yin, Y.; Tang, C.; Ding, H.; Zhao, X.; Zhong, J. Comparison and Optimization of Rail Defect Detection Methods Based on Object Detection Model. Tribol. Trans. 2025, 68, 171–179. [Google Scholar] [CrossRef]
Hamarat, M.; Papaelias, M.; Kaewunruen, S. Fatigue damage assessment of complex railway turnout crossings via Peridynamics-based digital twin. Sci. Rep. 2022, 12, 14377. [Google Scholar] [CrossRef] [PubMed]
Hussain, M. Yolov1 to v8: Unveiling each variant—A comprehensive review of yolo. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar] [CrossRef]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Gallagher, J.E.; Oughton, E.J. Surveying you only look once (YOLO) multispectral object detection advancements, applications and challenges. IEEE Access 2025, 13, 7366–7395. [Google Scholar] [CrossRef]
Upulie, H.; Kuganandamurthy, L. Real-time object detection using YOLO: A review. Sri Lanka Inst. Inf. Technol. Malabe Sri Lanka Tech. Rep. 2021. [Google Scholar] [CrossRef]
Zhang, Y.; Li, X.; Wang, F.; Wei, B.; Li, L. A comprehensive review of one-stage networks for object detection. In Proceedings of the 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Xi’an, China, 17–20 August 2021; pp. 1–6. [Google Scholar] [CrossRef]
Sehwag, V.; Chiang, M.; Mittal, P. SSD: A Unified Framework for Self-Supervised Outlier Detection. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar] [CrossRef]
Xu, X.; Zhao, M.; Shi, P.; Li, Y. Crack Detection and Comparison Study Based on Faster R-CNN and Mask R-CNN. Sensors 2022, 22, 1215. [Google Scholar] [CrossRef]
Xu, Y.; Yu, Q.; Wang, Y.; Guo, C.; Feng, S.; Lu, H. Ground target recognition and damage assessment of patrol missiles based on multi-source information fusion. J. Syst. Simul. 2024, 36, 511–521. [Google Scholar] [CrossRef]
Zhao, Y.; Liu, Z.; Yi, D.; Yu, X.; Sha, X.; Li, L.; Sun, H.; Zhan, Z.; Li, W.J. A review on rail defect detection systems based on wireless sensors. Sensors 2022, 22, 6409. [Google Scholar] [CrossRef]
Zhang, Z.; Lu, X.; Cao, G.; Yang, Y.; Jiao, L.; Liu, F. ViT-YOLO: Transformer-based YOLO for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, QC, Canada, 11–17 October 2021. [Google Scholar]
Li, P.; Zheng, J.; Li, P.; Wang, H. Tomato Maturity Detection and Counting Model Based on MHSA-YOLOv8. Sensors 2023, 23, 6701. [Google Scholar] [CrossRef] [PubMed]
Xiao, X.; Zhang, D.; Hu, G.; Jiang, Y.; Xia, S. CNN–MHSA: A convolutional neural network and multi-head self-attention combined approach for detecting phishing websites. Neural Netw. 2020, 125, 303–312. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Acquisition equipment for rail damage.

Figure 2. Rail damage dataset: (a) typical rail damage; (b) dataset.

Figure 3. YOLO detection process.

Figure 4. SSD detection process.

Figure 5. Faster R-CNN detection process.

Figure 6. Comparison of detection accuracy of four typical types of rail damage based on different detection models.

Figure 7. Intuitive effects of four typical rail damage detections based on different detection models: (a) SSD model; (b) Faster R-CNN model; (c) YOLOv8 model.

Figure 8. Comparative analysis of rail damage recognition methods: (a) mAP, (b) mAR, and (c) training time.

Figure 9. Optimization method of MHSA.

Figure 10. Comparison of detection accuracy of four typical rail damage types based on MHSA-YOLOv8 models and YOLOv8 model.

Figure 11. Intuitive effects of four typical rail damage detections based on MHSA-YOLOv8 model.

Figure 12. Comparison of detection results before and after optimization models: (a) mAP, (b) mAR, and (c) training time.

Table 1. The model setting parameters.

Parameters	Values
Image-size	640
Epochs	200
Batch-size	16
Close-mosaic	10
GPU model	RTX 4090D/24 GB
Cache	False

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, W.; Geng, R.; Wu, H. Comparative Study on Rail Damage Recognition Methods Based on Machine Vision. Infrastructures 2025, 10, 171. https://doi.org/10.3390/infrastructures10070171

AMA Style

Gao W, Geng R, Wu H. Comparative Study on Rail Damage Recognition Methods Based on Machine Vision. Infrastructures. 2025; 10(7):171. https://doi.org/10.3390/infrastructures10070171

Chicago/Turabian Style

Gao, Wanlin, Riqin Geng, and Hao Wu. 2025. "Comparative Study on Rail Damage Recognition Methods Based on Machine Vision" Infrastructures 10, no. 7: 171. https://doi.org/10.3390/infrastructures10070171

APA Style

Gao, W., Geng, R., & Wu, H. (2025). Comparative Study on Rail Damage Recognition Methods Based on Machine Vision. Infrastructures, 10(7), 171. https://doi.org/10.3390/infrastructures10070171

Article Menu

Comparative Study on Rail Damage Recognition Methods Based on Machine Vision

Abstract

1. Introduction

2. On-Site Acquisition of Rail Damage

2.1. Acquisition Equipment for Rail Damage

2.2. Rail Damage Dataset

3. Deep Learning Networks for Rail Damage Detection

3.1. Object Detection Models

3.1.1. YOLO Model

3.1.2. SSD Model

3.1.3. Faster R-CNN Model

3.2. Detecting Performance Evaluation Indicators

4. Comparative Study on Rail Damage Recognition Methods

4.1. Comparative Analysis

4.2. Optimization of YOLO

4.2.1. Optimization Method

4.2.2. Optimization Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI