UG-Net: An Unsupervised-Guided Framework for Railway Foreign Object Detection
Abstract
1. Introduction
1.1. Background Introduction
1.2. Challenges and Bottlenecks of Existing Methods
1.2.1. The Intrinsic Difficulty of Detection Tasks
1.2.2. Limitations of Existing Research
1.3. UG-Net
1.4. Contributions
1.4.1. Methodological Contribution
1.4.2. Architectural Contribution
1.4.3. Practical Contribution
2. Related Work
2.1. Detection Methods Based on Traditional Sensors
2.2. Detection Methods Based on Classical Computer Vision
2.3. Vision-Based Detection via Supervised Deep Learning
2.4. Self-Supervised Anomaly Detection
3. Method
3.1. Unsupervised Normality Modeling Based on MAE
3.1.1. Normal Reconstruction Loss
3.1.2. Abnormal Suppression Loss
3.1.3. Feature Contrast Loss
3.1.4. Total Loss Function
3.2. Attention Mask Generation Based on Deep Feature Difference
3.3. Attention-Guided Label-Efficient Foreign Object Detection
4. Experiments
4.1. Experimental Setup
4.2. Data Acquisition and Processing
4.3. Dataset Composition
4.3.1. Dataset Splitting and Annotation
4.3.2. Data Augmentation
4.4. Evaluation Metrics
4.5. Experimental Results and Analysis
4.6. Comparison with Mainstream Algorithms
Statistical Analysis of Stability
4.7. Ablation Study
5. Conclusions and Future Work
5.1. Conclusions
5.2. Limitations and Failure Analysis
5.3. Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Qu, J.; Li, S.; Li, Y.; Liu, L. Research on Railway Obstacle Detection Method Based on Developed Euclidean Clustering. Electronics 2023, 12, 1175. [Google Scholar] [CrossRef]
- Kim, J.H.; Patil, V.; Chun, J.M.; Park, H.S.; Seo, S.W.; Kim, Y.S. Design of Near Infrared Reflective Effective Pigment for LiDAR Detectable Paint—Addendum. MRS Adv. 2020, 5, 2535. [Google Scholar] [CrossRef]
- Berg, A.; Öfjäll, K.; Ahlberg, J.; Felsberg, M. Detecting Rails and Obstacles Using a Train-Mounted Thermal Camera. In Image Analysis. SCIA 2015; Paulsen, R., Pedersen, K., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 9127. [Google Scholar] [CrossRef]
- Agarwal, V.; Murali, N.V.; Chandramouli, C. A Cost-Effective Ultrasonic Sensor-Based Driver-Assistance System for Congested Traffic Conditions. IEEE Trans. Intell. Transp. Syst. 2009, 10, 486–498. [Google Scholar] [CrossRef]
- Amaral, V.; Marques, F.; Lourenço, A.; Barata, J.; Santana, P. Laser-Based Obstacle Detection at Railway Level Crossings. J. Sens. 2016, 2016, 1719230. [Google Scholar] [CrossRef]
- Adam, A.; Rivlin, E.; Shimshoni, I.; Reinitz, D. Robust Real-Time Unusual Event Detection Using Multiple Fixed-Location Monitors. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 555–560. [Google Scholar] [CrossRef] [PubMed]
- Mukojima, H.; Deguchi, D.; Kawanishi, Y.; Ide, I.; Murase, H.; Ukai, M.; Nagamine, N.; Nakasone, R. Moving Camera Background-Subtraction for Obstacle Detection on Railway Tracks. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3967–3971. [Google Scholar] [CrossRef]
- Nakasone, R.; Nagamine, N.; Ukai, M.; Mukojima, H.; Deguchi, D.; Murase, H. Frontal Obstacle Detection Using Background Subtraction and Frame Registration. Q. Rep. RTRI 2017, 58, 298–302. [Google Scholar] [CrossRef] [PubMed]
- Tian, Q.; Zhuang, Y.; Yao, C. Efficient railway tracks detection and turnouts recognition method using HOG features. Neural Comput. Appl. 2013, 23, 245–254. [Google Scholar] [CrossRef]
- Schölkopf, B.; Williamson, R.C.; Smola, A.J.; Shawe-Taylor, J.; Platt, J. Support Vector Method for Novelty Detection. In Proceedings of the Advances in Neural Information Processing Systems 12 (NIPS), Denver, CO, USA, 29 November–4 December 1999; pp. 582–588. [Google Scholar]
- Núñez, M.; Hernández, F.C.L.; Granados, J.J.R. Automatic Surveillance of People and Objects on Railway Tracks. Int. J. Interact. Multimed. Artif. Intell. 2025, 9, 107–116. [Google Scholar] [CrossRef]
- Niu, H.; Feng, D.; Hou, T. Research on foreign object intrusion detection in railway tracks based on MSL-YOLO. J. Eng. Appl. Sci. 2025, 72, 136. [Google Scholar] [CrossRef]
- Ye, T.; Zhang, X.; Zhang, Y.; Liu, J. Railway Traffic Object Detection Using Differential Feature Fusion Convolution Neural Network. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1701–1711. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Volume 36, pp. 51094–51112. [Google Scholar]
- Zhang, S.; Chang, Y.; Wang, S.; Li, Y.; Gu, T. An Improved Lightweight YOLOv5 Algorithm for Detecting Railway Catenary Hanging String. IEEE Access 2023, 11, 114061–114070. [Google Scholar] [CrossRef]
- Yan, P.; Jia, L.; Wang, J.; Xin, Y.; Huang, K. High-speed railway foreign object intrusion detection algorithm based on improved YOLOv7. Radio Eng. 2024, 54, 1099–1109. [Google Scholar] [CrossRef]
- Liu, Z.; Li, Z.; Mofor, R.N.; Ning, D. Unsupervised Anomaly Detection in Railway Catenary Condition Monitoring Using Autoencoders. In Proceedings of the 2020 IEEE Industrial Electronics Society Annual Conference (IECON), Singapore, 18–21 October 2020; pp. 3390–3395. [Google Scholar] [CrossRef]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar] [CrossRef]
- Li, C.L.; Sohn, K.; Yoon, J.; Pfister, T. CutPaste: Self-Supervised Learning for Anomaly Detection and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9664–9674. [Google Scholar] [CrossRef]
- Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Schmidt-Erfurth, U.; Langs, G. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. In Information Processing in Medical Imaging (IPMI 2017); Niethammer, M., Styner, M., Aylward, S., Zhu, H., Oguz, I., Yap, P.T., Alberola-López, C., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2017; Volume 10265, pp. 146–157. [Google Scholar] [CrossRef]
- Lyu, Y.; Han, Z.; Zhong, J.; Li, C.; Liu, Z. A GAN-Based Anomaly Detection Method for Isoelectric Line in High-Speed Railway. In Proceedings of the 2019 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Auckland, New Zealand, 20–23 May 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. In Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024. [Google Scholar] [CrossRef]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar] [CrossRef]








| Method Type | Representative Work | Core Strategy | Annotation Need | Robustness to Light |
|---|---|---|---|---|
| Traditional Sensor | LiDAR/Radar [1,5] | Physical Echo/ToF | Low | Low (Rain/Fog) |
| Classical Vision | Background Subtraction [7] | Pixel Difference | None (for motion) | Low (Shadows) |
| Supervised DL | YOLO/MSL-YOLO [12] | End-to-End CNN | High (Full) | High |
| Proposed | UG-Net | Unsupervised + label-efficient | Low (10%) | High |
| Parameter | MAE Stage (Pre-Training) | YOLO Stage (Detection) |
|---|---|---|
| CPU | Intel i5-13490f | Intel i5-13490f |
| GPU | NVIDIA A10 (24G) | RTX4060ti (8G) |
| Framework | torch = 1.9.0 + cu111 timm = 0.3.2 numpy = 1.21.5 | Torch = 1.9.0 + cu111 numpy = 1.26.4 |
| Hyperparameters | batchsize = 64 warmup = 20 epoch = 200 blr = 1.5 × 10−4 maskratio = 0.75 | batchsize = 8 epoch = 200 numwork = 0 |
| Model | P/% | R/% | mAP%@0.50 | mAP@0.50:0.95% | Params (M) |
|---|---|---|---|---|---|
| RCNN | 60.12 | 78.3 | 75.8 | 61.2 | 41.3 |
| ssd | 82.6 | 63.2 | 70.1 | 48.7 | 24.0 |
| Yolov5 | 87.3 | 91 | 77.2 | 61 | 7.2 |
| Yolov7 | 88.2 | 94 | 80.23 | 62.3 | 6.2 |
| Yolov8n | 90.1 | 92 | 86.91 | 64.87 | 3.2 |
| UG-Net | 94 | 95.1 | 94.56 | 79.76 | 3.2 (+0.04) |
| Method | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Mean ± Std (%) |
|---|---|---|---|---|---|---|
| Baseline (3-Channel) | 86.91 | 86.54 | 87.12 | 86.80 | 86.65 | 86.80 0.23 |
| UG-Net (4-Channel) | 94.56 | 94.32 | 94.81 | 94.45 | 94.60 | 94.55 0.19 |
| Difference Strategy | Metric Space | Robustness to Light/Noise | False Alarm Rate (%) | mAP@0.50 (%) |
|---|---|---|---|---|
| Pixel-level | Raw Pixel Intensity | Low | 42.3 | 68.45 |
| SSIM | Structural Window | Medium | 28.5 | 76.20 |
| Deep Feature (ResNet-50) | Semantic Feature | High | 8.4 | 91.12 |
| Deep Feature (VGG-16) | Semantic Feature | Very High | 4.1 | 94.56 |
| (A) Contribution of Loss Components. | |||||
| Model Variant | Recall (%) | mAP@0.50 (%) | |||
| Baseline (Standard MAE) | √ | × | × | 74.5 | 81.15 |
| + Abnormal Suppression | √ | √ | × | 91.2 | 89.40 |
| + Feature Contrast | √ | × | √ | 78.4 | 88.75 |
| UG-Net (Full Method) | √ | √ | √ | 95.1 | 94.56 |
| (B) Impact of Thresholding Strategies. | |||||
| Strategy | Mechanism | mAP@0.50 (%) | Observation | ||
| Fixed Threshold ) | Global Constant | 85.20 | High False Positive Rate (Noise) | ||
| Fixed Threshold () | Global Constant | 81.45 | High False Negative Rate (Misses) | ||
| Otsu’s Method | Variance-based | 89.60 | Unstable on ballast textures | ||
| Peak Relative (Ours) | Adaptive () | 94.56 | Robust to contrast variations | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tian, Z.; Zou, J. UG-Net: An Unsupervised-Guided Framework for Railway Foreign Object Detection. Appl. Sci. 2026, 16, 689. https://doi.org/10.3390/app16020689
Tian Z, Zou J. UG-Net: An Unsupervised-Guided Framework for Railway Foreign Object Detection. Applied Sciences. 2026; 16(2):689. https://doi.org/10.3390/app16020689
Chicago/Turabian StyleTian, Zhuowen, and Jinbai Zou. 2026. "UG-Net: An Unsupervised-Guided Framework for Railway Foreign Object Detection" Applied Sciences 16, no. 2: 689. https://doi.org/10.3390/app16020689
APA StyleTian, Z., & Zou, J. (2026). UG-Net: An Unsupervised-Guided Framework for Railway Foreign Object Detection. Applied Sciences, 16(2), 689. https://doi.org/10.3390/app16020689

