Real-Time Visual Perception and Explainable Fault Diagnosis for Railway Point Machines at the Edge
Abstract
1. Introduction
1.1. Research Background and Motivation
1.2. Review of Existing Methods and Limitations
1.3. Contributions
- A Hardware-Aware Optimization Strategy for Real-Time Edge Deployment;
- 2.
- Multimodal Diagnostic Paradigm via Hard-Constrained Knowledge Embedding.
2. Related Work
2.1. Signal-Based Fault Diagnosis
2.2. Computer Vision-Based Inspection
2.3. From Visual Perception to Semantic Reasoning
3. System Design
3.1. Overall System Architecture
- Edge Perception Layer;
- 2.
- Cloud Semantic Layer;
- 3.
- Client Interaction Layer.
3.2. Deployment Optimization of YOLO11-Seg on the RK3588S Platform
3.2.1. Model Conversion and Quantization
- The backbone and selected neck layers are quantized to INT8 to maximize NPU parallel acceleration;
- Key detection and segmentation head layers retain FP16 precision to mitigate boundary degradation caused by aggressive quantization.
| Algorithm 1: Sensitivity-Aware Mixed-Precision Quantization (SMPQ) |
| Input: M_float: Pre-trained floating-point model D_cal: Calibration dataset T_sens: Sensitivity threshold for precision selection Output: M_quant: Optimized mixed-precision model for RK3588S Procedure SMPQ(M_float, D_cal, T_sens):
|
3.2.2. Heterogeneous Task Scheduling
- Preprocessing tasks: Image acquisition, preprocessing, Flask communication, and ZeroTier networking operations are executed on low-power A55 cores due to low computational intensity and high I/O frequency, thereby reserving high-performance cores for inference tasks;
- Feature-extraction tasks: The backbone and selected neck layers are mapped to the NPU, leveraging quantized tensor computation to accelerate convolution, BatchNorm, and activation operations;
- Postprocessing tasks: Decoding and mask generation—including feature map decoding, threshold filtering, non-maximum suppression, mask binarization, and depth estimation—are executed on A76 CPU cores using parallel threading to efficiently process non-linear and branching operations unsuitable for NPU execution.
3.3. Geometric Prior-Based Depth Estimation and Large-Model-Based Diagnosis
3.3.1. Depth Calculation Based on Geometric Priors
3.3.2. Intelligent Diagnosis Based on Vision-Language Models
4. Experiments and Results Analysis
4.1. Experimental Environment
4.1.1. Hardware and Software Platform
4.1.2. Experimental Dataset
4.1.3. Evaluation Metrics
4.2. Performance Evaluation of the Optimized YOLO11-Seg
4.2.1. Ablation Study
4.2.2. Model Comparison Experiments
4.2.3. Visualization Analysis
4.3. Accuracy Evaluation of Depth Estimation
4.4. Evaluation of VLM-Based Diagnostic Effectiveness
4.5. System Latency and Network Robustness Analysis
5. Discussion
5.1. Applicability and Generalization
5.2. Robustness Under Environmental Variations
5.3. Impact of Acquisition Conditions on Measurement Precision
5.4. Performance Trade-Offs and Diagnostic Paradigm Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hu, X.; Cao, Y.; Tang, T.; Sun, Y. Data-driven technology of fault diagnosis in railway point machines: Review and challenges. Transp. Saf. Environ. 2022, 4, tdac036. [Google Scholar] [CrossRef]
- Peng, C.; Peng, J.; Wang, Z.; Wang, Z.; Chen, J.; Xuan, J.; Shi, T. Adaptive fault diagnosis of railway vehicle on-board controller with large language models. Appl. Soft Comput. 2025, 185, 113919. [Google Scholar] [CrossRef]
- Mukunzi, G.; Palmqvist, C.-W. The impact of railway incidents on train delays: A case of the Swedish Railway Network. J. Rail Transp. Plan. Manag. 2024, 30, 100445. [Google Scholar] [CrossRef]
- Mukunzi, G.; Palmqvist, C.-W. The Impact of Switch Faults on Train Delays: A Case Study of the Swedish Railway Network. Transp. Res. Procedia 2025, 82, 390–403. [Google Scholar] [CrossRef]
- Jansson, E.; Olsson, N.O.; Fröidh, O. Challenges of replacing train drivers in driverless and unattended railway mainline systems—A Swedish case study on delay logs descriptions. Transp. Res. Interdiscip. Perspect. 2023, 21, 100875. [Google Scholar] [CrossRef]
- Chen, X.; Hu, X.; Wen, T.; Cao, Y. Vibration signal-based fault diagnosis of railway point machines via double-scale CNN. Chin. J. Electron. 2023, 32, 972–981. [Google Scholar] [CrossRef]
- Durazo-Cardenas, I.; Namoano, B.; Starr, A.; Sala, R.D.; Lai, J. False alarm reduction in railway track quality inspections using machine learning. In Proceedings of the PHM Society European Conference, Prague, Czech Republic, 3–5 July 2024; p. 8. [Google Scholar]
- Olivier, B.; Guo, F.; Qian, Y.; Connolly, D.P. A Review of Computer Vision for Railways. IEEE Trans. Intell. Transp. Syst. 2025, 26, 11034–11065. [Google Scholar] [CrossRef]
- Cao, Z.; Qin, Y.; Jia, L.; Xie, Z.; Gao, Y.; Wang, Y.; Li, P.; Yu, Z. Railway intrusion detection based on machine vision: A survey, challenges, and perspectives. IEEE Trans. Intell. Transp. Syst. 2024, 25, 6427–6448. [Google Scholar] [CrossRef]
- Kumar, A.; Harsha, S. A systematic literature review of defect detection in railways using machine vision-based inspection methods. Int. J. Transp. Sci. Technol. 2024, 18, 207–226. [Google Scholar] [CrossRef]
- Wang, Y.; Yu, H.; Guo, B.; Shi, H.; Yu, Z. State Key Laboratory of Advanced Rail Autonomous Operation and Key Laboratory of Vehicle Advanced Manufacturing, Measuring and Control Technology, Ministry of Education, Beijing Jiaotong University, Beijing, China. Research on real-time detection system of rail surface defects based on deep learning. IEEE Sens. J. 2024, 24, 21157–21167. [Google Scholar]
- Sharma, J.; Lal Mittal, M.; Soni, G.; Keprate, A. Explainable artificial intelligence (XAI) approaches in predictive maintenance: A review. Recent Pat. Eng. 2024, 18, 18–26. [Google Scholar] [CrossRef]
- Shi, Z.; Du, Y.; Yao, X. Fault diagnosis of ZDJ7 railway point machine based on improved DCNN and SVDD classification. IET Intell. Transp. Syst. 2023, 17, 1649–1674. [Google Scholar] [CrossRef]
- Sun, Y.; Cao, Y.; Li, P. Fault diagnosis for railway point machines using VMD multi-scale permutation entropy and reliefF based on vibration signals. Chin. J. Electron. 2025, 34, 204–211. [Google Scholar] [CrossRef]
- Li, W.; Wu, X.; Hu, X.; Zhang, Y.; Bader, S.; Huang, Y. LD-RPMNet: Near-Sensor Diagnosis for Railway Point Machines. arXiv 2025, arXiv:2506.06346. [Google Scholar]
- Wei, D.; Zhang, W.; Li, H.; Jiang, Y.; Xian, Y.; Deng, J. RTINet: A Lightweight and High-Performance Railway Turnout Identification Network Based on Semantic Segmentation. Entropy 2024, 26, 878. [Google Scholar] [CrossRef] [PubMed]
- Cao, Y.; Liu, Z.; Wang, F.; Su, S.; Sun, Y.; Wang, W. An improved YOLOv7 for the state identification of sliding chairs in railway turnout. High Speed Railw. 2024, 2, 71–76. [Google Scholar] [CrossRef]
- Gholami, A.; Kim, S.; Dong, Z.; Yao, Z.; Mahoney, M.W.; Keutzer, K. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision; Chapman and Hall/CRC: New York, NY, USA, 2022; pp. 291–326. [Google Scholar]
- Zhao, X.; Xu, R.; Gao, Y.; Verma, V.; Stan, M.R.; Guo, X. Edge-mpq: Layer-wise mixed-precision quantization with tightly integrated versatile inference units for edge computing. IEEE Trans. Comput. 2024, 73, 2504–2519. [Google Scholar] [CrossRef]
- Wang, P.; Bai, S.; Tan, S.; Wang, S.; Fan, Z.; Bai, J.; Chen, K.; Liu, X.; Wang, J.; Ge, W. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv 2024, arXiv:2409.12191. [Google Scholar]
- Liu, X.; Zhang, L.; Song, Z.; Zhang, R.; Wang, J.; Wang, C.; Liang, W. An Exploratory Study on Workover Scenario Understanding Using Prompt-Enhanced Vision-Language Models. Mathematics 2025, 13, 1622. [Google Scholar] [CrossRef]
- Li, J.; Li, D.; Savarese, S.; Hoi, S. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 19730–19742. [Google Scholar]
- Lyu, Z.; Xiao, M.; Xu, J.; Skoglund, M.; Di Renzo, M. The larger the merrier? Efficient large AI model inference in wireless edge networks. arXiv 2025, arXiv:2505.09214. [Google Scholar] [CrossRef]
- He, Y.; Fang, J.; Yu, F.R.; Leung, V.C. Large language models (LLMs) inference offloading and resource allocation in cloud-edge computing: An active inference approach. IEEE Trans. Mob. Comput. 2024, 23, 11253–11264. [Google Scholar] [CrossRef]
- Wei, L.; Kong, L.; Liu, Z.; Yang, Z.; Zhang, H. A Low-Complexity Accurate Ranging Algorithm for a Switch Machine Working Component Based on the Mask RCNN. Appl. Sci. 2023, 13, 9424. [Google Scholar] [CrossRef]










| Category | Data Modality | Key Characteristics | Advantages | Limitations |
|---|---|---|---|---|
| Manual Inspection (Mainstream) | Tool inspection | Periodic Patrolling; Physical Checks | High flexibility; No sensor deployment cost | Labor-intensive; Discontinuous monitoring |
| Signal-Based | 1D Time-series | Deep learning features; Entropy analysis | Sensitive to internal state | electromagnetic interference; Installation restrictions |
| Computer Vision | 2D Images | Semantic Segmentation; Single-stage detection | High accuracy; Non-contact | Limited semantic reasoning capability High edge-side computational overhead |
| VLM/LLM | Image + Text | Vision-Language Alignment; Instruction Tuning | Strong semantic reasoning | High compute cost; Heavy training |
| Proposed | Image + Text (Edge-Cloud) | Edge-Cloud Collaboration; Zero-shot Instruction Following | Interpretable diagnosis; Low deployment cost. | Network dependent |
| Prompt Module | Core Instructions |
|---|---|
| Role and Task | Please perform the analysis task based on the provided images of the moving and static contact points and the engagement depth values (unit: mm). |
| Visual Quality Inspection | Identify moving contacts, static contacts, bolts, and contact areas; confirm image clarity. If there is blurriness or occlusion preventing detection, return “Detection Failed” immediately. |
| Status Determination | Determine the current status based on the input depth (Normal/Too Shallow/Too Deep/Not Contacted); prioritize error reporting if the image quality is poor. |
| Maintenance Recommendation | Generate the corresponding O&M operation instructions based on the status. |
| Output Constraints | Must strictly follow the JSON format output, containing three fields: detection_status (detection status), maintenance_advice (maintenance recommendation), and risk_level (risk level: low/medium/high). |
| Functional Module | Hardware Device | Software Platform | Functional Description |
|---|---|---|---|
| Edge Computing Terminal | Orange Pi 5 Pro (Shenzhen Xunlong Software Co., Ltd., Shenzhen, China) | Ubuntu 20.04 LTS, Python 3.8, OpenCV 4.8.0, FFmpeg 4.3 | Image acquisition and preprocessing, YOLO11-seg inference, depth estimation, data forwarding, FFmpeg streaming |
| Cloud Inference Server | RTX 3090 Server (NVIDIA, Santa Clara, CA, USA) | Ubuntu 20.04 LTS, Python 3.8, PyTorch 2.0, Transformers 4.30.2, FastAPI 0.95.1 | Qwen2.5-VL-7B inference and data routing |
| Client Terminal | ASUS Tianxuan Laptop (ASUS, Taipei, Taiwan) | Windows 10, Python 3.8, PyQt5 5.15.9, Matplotlib 3.7.1, PyAV 10.0.0 | Real-time monitoring, detection result visualization, remote device control, data storage and historical query |
| Environmental Condition | Normal | Shallow | Deep | Non-Contact | Total |
|---|---|---|---|---|---|
| Normal Illumination | 480 | 110 | 110 | 50 | 750 |
| Low-Light | 150 | 60 | 60 | 30 | 300 |
| Dust Interference | 125 | 60 | 60 | 30 | 275 |
| Vibration Interference | 120 | 60 | 60 | 35 | 275 |
| Total | 875 | 290 | 290 | 145 | 1600 |
| Model | Model Format | Quantization Strategy | Model Size (MB) | Inference Time (ms) | FPS | mAP@0.5 | mIoU |
|---|---|---|---|---|---|---|---|
| Baseline | .pt | FP32 | 6.5 | 180 ± 3 | 5.6 | 0.932 | 0.854 |
| ONNX | .onnx | FP32 | 12.6 | 160 ± 2 | 6.3 | 0.931 | 0.851 |
| RKNN-INT8 | .rknn | INT8 | 3.2 | 22 ± 0.8 | 45.5 | 0.926 | 0.846 |
| RKNN-FP16 head | .rknn | backbone INT8 + head FP16 | 3.6 | 26 ± 1.1 | 38.5 | 0.929 | 0.849 |
| Scheduling Strategy | Allocation Description | Inference Time (ms) | FPS | mAP@0.5 | mIoU |
|---|---|---|---|---|---|
| NPU First | All supported operators are mapped to NPU execution | 22.1 ± 0.9 | 45.3 | 0.926 | 0.846 |
| NPU + A76 Collaborative | Backbone runs on NPU; detection/segmentation head runs on A76 | 26.0 ± 1.1 | 38.5 | 0.929 | 0.849 |
| NPU Fallback to CPU | Unsupported operators fall back to CPU execution | 35.6 ± 2.0 | 28.1 | 0.928 | 0.848 |
| Model | Edge-Side Implementation | Model Size (MB) | Inference Time (ms) | FPS | mAP@0.5 | mIoU |
|---|---|---|---|---|---|---|
| YOLOv5-Seg-s | ONNX Runtime | 27.5 | 120.3 ± 3.4 | 8.3 | 0.914 | 0.838 |
| YOLOv8-Seg-s | ONNX Runtime | 25.3 | 105.1 ± 2.8 | 9.5 | 0.921 | 0.834 |
| Mask R-CNN | ONNX Runtime | 97.2 | 382.6 ± 9.7 | 2.6 | 0.935 | 0.861 |
| DeepLabV3+ | ONNX Runtime | 88.5 | 309.4 ± 9.7 | 3.2 | 0.928 | 0.857 |
| Proposed Method | ONNX Runtime | 12.6 | 160.0 ± 2.0 | 6.3 | 0.931 | 0.851 |
| Proposed Method | RKNN | 3.6 | 26.0 ± 1.1 | 38.5 | 0.929 | 0.849 |
| Operating Condition | Number of Samples | MAE/mm | RMSE/mm | Relative Error/% |
|---|---|---|---|---|
| Normal | 48 | 0.15 | 0.23 | 1.9 |
| Shallow | 36 | 0.19 | 0.29 | 2.4 |
| Deep | 24 | 0.26 | 0.39 | 3.1 |
| Non-contact | 12 | 0.28 | 0.44 | 3.0 |
| Average | 120 | 0.21 | 0.34 | 2.7 |
| Method | Classification Accuracy/% | Macro-F1 | Structured Output Usability/% | Description |
|---|---|---|---|---|
| Baseline-Num | 86.5 | 0.821 | 100.0 | Fixed rules, no semantic explanation |
| VLM | 88.0 | 0.842 | 92.5 | A small portion reverts after consistency checking |
| Stage | Operation Description | Avg. Time (ms) | Std. Dev (ms) |
|---|---|---|---|
| Edge Processing () | Edge Perception and Calculation | 45.2 | ±3.1 |
| Uplink Transmission () | Uploading corrected crop images and depth data (JSON) to Cloud | 120.5 | ±15.4 |
| Cloud Reasoning () | Qwen2.5-VL Visual Encoding + Text Generation | 1150.8 | ±42.6 |
| Downlink Feedback () | Sending Diagnostic Report (JSON) to Qt Client | 30.2 | ±5.2 |
| Total () | Sum of all stages | ~1346.7 | ±66.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhai, Y.; Wei, L. Real-Time Visual Perception and Explainable Fault Diagnosis for Railway Point Machines at the Edge. Electronics 2026, 15, 230. https://doi.org/10.3390/electronics15010230
Zhai Y, Wei L. Real-Time Visual Perception and Explainable Fault Diagnosis for Railway Point Machines at the Edge. Electronics. 2026; 15(1):230. https://doi.org/10.3390/electronics15010230
Chicago/Turabian StyleZhai, Yu, and Lili Wei. 2026. "Real-Time Visual Perception and Explainable Fault Diagnosis for Railway Point Machines at the Edge" Electronics 15, no. 1: 230. https://doi.org/10.3390/electronics15010230
APA StyleZhai, Y., & Wei, L. (2026). Real-Time Visual Perception and Explainable Fault Diagnosis for Railway Point Machines at the Edge. Electronics, 15(1), 230. https://doi.org/10.3390/electronics15010230
