VSM-UNet: A Visual State Space Reconstruction Network for Anomaly Detection of Catenary Support Components
Abstract
1. Introduction
- Based on the UNet structure, a new “visual state space reconstruction network (VSM-UNet)” is constructed using visual state space blocks (VSS blocks) and CBAM attention mechanisms for detecting loosening anomalies of CSCs.
- An anomaly score calculation module based on MLP network is designed to help the model evaluate anomaly levels. Then, a new loss function is proposed to assist model training and improve the anomaly score calculation module.
- The effectiveness of this method is verified on the abnormal CSCs dataset of positioning clamp nuts, U-shaped hoop nuts, and cotter pins.
2. The Proposed Methods
2.1. Visual State Space Block (VSS Block)
2.2. Convolutional Block Attention Module (CBAM)
2.3. Visual State Space Reconstruction Network (VSM-UNet)
2.4. Learning Strategy and Anomaly Score Calculation
3. Experimental Results and Analysis
3.1. Description of Experiments
3.1.1. Experimental Data and Parameter Settings
3.1.2. Experimental Evaluation Index
3.2. Experimental Analysis
3.2.1. The Analysis of the Impact of Background Suppression on Detection Results
3.2.2. Comparison with Other Fault Detection Methods
3.2.3. The Analysis of Attention Mechanism
3.2.4. The Analysis of Anomaly Detection Visualization Effects
3.2.5. The Analysis of the Impact on Score Processing Methods and Thresholds on Anomaly Recognition Results
3.2.6. The Analysis of Ablation Experiments
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CSCs | Catenary Support Components |
SSM | State Space Model |
VSM-UNet | Visual State Space Reconstruction Network |
VSS block | Visual State Space Block |
POFT | Phase-only Fourier Transform |
OC-SVM | One-Class Support Vector Machines |
SVDD | Support Vector Data Description |
Deep-SVDD | Deep Support Vector Data Description |
OOD | Out-of-Distribution Detection |
AE | Autoencoder |
GAN | Generative Adversarial Network |
CNNs | Convolutional Neural Networks |
CBAM | Convolutional Block Attention Module |
ODE | Ordinary Differential Equation |
CSM | Cross-Scan Module |
MLP | Multi-Layer Perceptron |
LN | Layer Normalization |
MSE | Mean Square Error |
AUROC | Area Under the Receiver Operation Feature Curve |
ROC | Receiver Operation Feature Curve |
FP | False Positive |
TP | True Positive |
SE | SE block |
NL | Non-Local |
DA | Dual-Attention |
References
- Vaikundam, S.; Hung, T.-Y.; Chia, L.T. Anomaly region detection and localization in metal surface inspection. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Pheonix, AZ, USA, 25–28 September 2016; pp. 759–763. [Google Scholar]
- Reed, I.; Yu, X. Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Trans. Acoust. Speech Signal Process. 1990, 38, 1760–1770. [Google Scholar] [CrossRef]
- Li, C.; Liu, C.; Gao, G.; Liu, Z.; Wang, Y. Robust low-rank decomposition of multi-channel feature matrices for fabric defect detection. Multimedia Tools Appl. 2018, 78, 7321–7339. [Google Scholar] [CrossRef]
- Guo, C.; Ma, Q.; Zhang, L. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Liang, L.-Q.; Li, D.; Fu, X.; Zhang, W.-J. Touch screen defect inspection based on sparse representation in low resolution images. Multimedia Tools Appl. 2015, 75, 2655–2666. [Google Scholar] [CrossRef]
- Schölkopf, B.; Williamson, R.C.; Smola, A.; Shawe-Taylor, J.; Platt, J.C. Support vector method for novelty detection. Adv. Neural Inf. Process. Syst. 1999, 12, 582–588. [Google Scholar]
- Amraee, S.; Vafaei, A.; Jamshidi, K.; Adibi, P. Abnormal event detection in crowded scenes using one-class SVM. Signal, Image Video Process. 2018, 12, 1115–1123. [Google Scholar] [CrossRef]
- Tax, D.M.; Duin, R.P. Support Vector Data Description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef]
- Yang, H.; Liu, Z.; Ma, N.; Wang, X.; Liu, W.; Wang, H.; Zhan, D.; Hu, Z. CSRM-MIM: A Self-Supervised Pretraining Method for Detecting Catenary Support Components in Electrified Railways. IEEE Trans. Transp. Electrif. 2025, 11, 10025–10037. [Google Scholar] [CrossRef]
- Wang, H.; Han, Z.; Wang, X.; Wu, Y.; Liu, Z. Contrastive Learning-Based Bayes-Adaptive Meta-Reinforcement Learning for Active Pantograph Control in High-Speed Railways. IEEE Trans. Transp. Electrif. 2023, 10, 2045–2056. [Google Scholar] [CrossRef]
- Yan, J.; Cheng, Y.; Zhang, F.; Li, M.; Zhou, N.; Jin, B.; Wang, H.; Yang, H.; Zhang, W. Research on multimodal techniques for arc detection in railway systems with limited data. Struct. Health Monit. 2025. [Google Scholar] [CrossRef]
- Wang, X.; Song, Y.; Yang, H.; Wang, H.; Lu, B.; Liu, Z. A time-frequency dual-domain deep learning approach for high-speed pantograph-catenary dynamic performance prediction. Mech. Syst. Signal Process. 2025, 238, 113258. [Google Scholar] [CrossRef]
- Yang, S.; Yang, J.; Zhou, M.; Huang, Z.; Zheng, W.-S.; Yang, X.; Ren, J. Learning From Human Educational Wisdom: A Student-Centered Knowledge Distillation Method. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4188–4205. [Google Scholar] [CrossRef] [PubMed]
- Huang, Z.; Yang, S.; Zhou, M.; Li, Z.; Gong, Z.; Chen, Y. Feature Map Distillation of Thin Nets for Low-Resolution Object Recognition. IEEE Trans. Image Process. 2022, 31, 1364–1379. [Google Scholar] [CrossRef] [PubMed]
- Fan, D.; Zhu, X.; Xiang, Z.; Lu, Y.; Quan, L. Dimension-Reduction Many-Objective Optimization Design of Multimode Double-Stator Permanent Magnet Motor. IEEE Trans. Transp. Electrif. 2024, 11, 1984–1994. [Google Scholar] [CrossRef]
- Wu, S.; Liu, Z.; Zhang, B.; Zimmermann, R.; Ba, Z.; Zhang, X.; Ren, K. Do as I Do: Pose Guided Human Motion Copy. IEEE Trans. Dependable Secur. Comput. 2024, 21, 5293–5307. [Google Scholar] [CrossRef]
- Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep One-Class Classification. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4393–4402. [Google Scholar]
- Wu, P.; Liu, J.; Shen, F. A Deep One-Class Neural Network for Anomalous Event Detection in Complex Scenes. IEEE Trans. Neural Networks Learn. Syst. 2019, 31, 2609–2622. [Google Scholar] [CrossRef]
- Perera, P.; Patel, V.M. Learning Deep Features for One-Class Classification. IEEE Trans. Image Process. 2019, 28, 5450–5463. [Google Scholar] [CrossRef]
- Lee, K.; Lee, K.; Lee, H.; Shin, J. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18), Montreal, QC, Canada, 3–8 December 2018; pp. 7167–7177. [Google Scholar]
- Golan, I.; El-Yaniv, R. Deep anomaly detection using geometric transformations. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18), Montreal, QC, Canada, 3–8 December 2018; pp. 9781–9791. [Google Scholar]
- Theis, L.; Shi, W.; Cunningham, A.; Huszár, F. Lossy Image Compression with Compressive Autoencoders. arXiv 2017, arXiv:1703.00395. Available online: https://arxiv.org/abs/1703.00395 (accessed on 1 March 2017). [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Akçay, S.; Atapour-Abarghouei, A.; Breckon, T.P. GANomaly: Semi-supervised Anomaly Detection via Adversarial Training. In Computer Vision–ACCV 2018, Proceedings of the 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 622–637. [Google Scholar]
- He, S.; Zhao, G.; Chen, J.; Zhang, S.; Mishra, D.; Yuen, M.M.-F. Weakly-aligned cross-modal learning framework for subsurface defect segmentation on building façades using UAVs. Autom. Constr. 2025, 170, 105946. [Google Scholar] [CrossRef]
- Liu, W.-Q.; Wang, S.-M. The Low-Illumination Catenary Component Detection Model Based on Semi-Supervised Learning and Adversarial Domain Adaptation. IEEE Trans. Instrum. Meas. 2025, 74, 1–11. [Google Scholar] [CrossRef]
- Xu, S.; Yu, H.; Wang, H.; Chai, H.; Ma, M.; Chen, H.; Zheng, W.X. Simultaneous Diagnosis of Open-Switch and Current Sensor Faults of Inverters in IM Drives Through Reduced-Order Interval Observer. IEEE Trans. Ind. Electron. 2024, 72, 6485–6496. [Google Scholar] [CrossRef]
- Lianghui, Z.; Bencheng, L.; Qian, Z.; Xinlong, W.; Wenyu, L.; Xinggang, W. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. arXiv 2024, arXiv:2401.09417. Available online: https://arxiv.org/abs/2401.09417 (accessed on 14 November 2024). [CrossRef]
- Wu, S.; Zhang, H.; Liu, Z.; Chen, H.; Jiao, Y. Enhancing Human Pose Estimation in Internet of Things via Diffusion Generative Models. IEEE Internet Things J. 2025, 12, 13556–13567. [Google Scholar] [CrossRef]
- Chen, H.; Wu, S.; Wang, Z.; Yin, Y.; Jiao, Y.; Lyu, Y.; Liu, Z. Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation. Proc. AAAI Conf. Artif. Intell. 2025, 39, 2052–2060. [Google Scholar] [CrossRef]
- Li, Z.; Yan, Y.; Zhao, Z.; Xu, Y.; Hu, Y. Numerical study on hydrodynamic effects of intermittent or sinusoidal coordination of pectoral fins to achieve spontaneous nose-up pitching behavior in dolphins. Ocean Eng. 2025, 337, 121854. [Google Scholar] [CrossRef]
- Li, Z.; Gai, Q.; Lei, M.; Yan, H.; Xia, D. Development of a multi-tentacled collaborative underwater robot with adjustable roll angle for each tentacle. Ocean Eng. 2024, 308, 118376. [Google Scholar] [CrossRef]
- Li, Z.; Gai, Q.; Yan, H.; Lei, M.; Zhou, Z.; Xia, D. The effect of the four-tentacled collaboration on the self-propelled performance of squid robot. Phys. Fluids 2024, 36, 041909. [Google Scholar] [CrossRef]
- Li, Z.; Xia, D.; Kang, S.; Li, Y.; Li, T. A comparative study of multi-tentacled underwater robot with different self-steering behaviors: Maneuvering and cruising modes. Phys. Fluids 2024, 36, 115118. [Google Scholar] [CrossRef]
- Wang, H.; Liu, Z.; Han, Z.; Wu, Y.; Liu, D. Rapid Adaptation for Active Pantograph Control in High-Speed Railway via Deep Meta Reinforcement Learning. IEEE Trans. Cybern. 2023, 54, 2811–2823. [Google Scholar] [CrossRef]
- Wang, H.; Han, Z.; Liu, Z.; Wu, Y. Deep Reinforcement Learning Based Active Pantograph Control Strategy in High-Speed Railway. IEEE Trans. Veh. Technol. 2022, 72, 227–238. [Google Scholar] [CrossRef]
- Wang, H.; Liu, Z.; Hu, G.; Wang, X.; Han, Z. Offline Meta-Reinforcement Learning for Active Pantograph Control in High-Speed Railways. IEEE Trans. Ind. Inform. 2024, 20, 10669–10679. [Google Scholar] [CrossRef]
- Song, Y.; Lu, X.; Yin, Y.; Liu, Y.; Liu, Z. Optimization of Railway Pantograph-Catenary Systems for Over 350 km/h Based on an Experimentally Validated Model. IEEE Trans. Ind. Inform. 2024, 20, 7654–7664. [Google Scholar] [CrossRef]
- Liu, Z.; Song, Y.; Gao, S.; Wang, H. Review of Perspectives on Pantograph-Catenary Interaction Research for High-Speed Railways Operating at 400 Km/h and Above. IEEE Trans. Transp. Electrif. 2023, 10, 7236–7257. [Google Scholar] [CrossRef]
- Zhang, R.; Zhu, F.; Liu, J.; Liu, G. Depth-Wise Separable Convolutions and Multi-Level Pooling for an Efficient Spatial CNN-Based Steganalysis. IEEE Trans. Inf. Forensics Secur. 2019, 15, 1138–1150. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Li, X.; Chen, H.; Qi, X.; Dou, Q.; Fu, C.-W.; Heng, P.-A. H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation From CT Volumes. IEEE Trans. Med Imaging 2018, 37, 2663–2674. [Google Scholar] [CrossRef] [PubMed]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Computer Vision–ECCV 2022 Workshops, Proceedings of the ECCV 2022, Aviv, Israel, 23–27 October 2022; Lecture Notes in Computer Science; Karlinsky, L., Michaeli, T., Nishino, K., Eds.; Springer: Cham, Switzerland, 2023; Volume 13803; pp. 205–218. [Google Scholar] [CrossRef]
- Lou, H.; Duan, X.; Guo, J.; Liu, H.; Gu, J.; Bi, L.; Chen, H. DC-YOLOv8: Small-Size Object Detection Algorithm Based on Camera Sensor. Electronics 2023, 12, 2323. [Google Scholar] [CrossRef]
- Chen, L.; You, Z.; Zhang, N.; Xi, J.; Le, X. UTRAD: Anomaly detection and localization with U-Transformer. Neural Networks 2022, 147, 53–62. [Google Scholar] [CrossRef] [PubMed]
- Nesovic, K.; Koh, R.G.; Sereshki, A.A.; Zadeh, F.S.; Popovic, M.R.; Kumbhare, D. Ultrasound Image Quality Evaluation using a Structural Similarity Based Autoencoder. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Virtual, 1–5 November 2021; pp. 4002–4005. [Google Scholar]
- Tan, D.S.; Chen, Y.C.; Chen, T.P.C.; Chen, W. TrustMAE: A Noise-Resilient Defect Classification Framework using Memory-Augmented Auto-Encoders with Trust Regions. In Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 276–285. [Google Scholar]
- Zavrtanik, V.; Kristan, M.; Skocaj, D. DRAEM—A discriminatively trained reconstruction embedding for surface anomaly detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 8310–8319. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar] [CrossRef]
- Versaci, M.; Morabito, F.C.; Angiulli, G. Adaptive Image Contrast Enhancement by Computing Distances into a 4-Dimensional Fuzzy Unit Hypercube. IEEE Access 2017, 5, 26922–26931. [Google Scholar] [CrossRef]
Categories | Training Set | Testing Set | Testing Set |
---|---|---|---|
Normal Samples | Normal Samples | Abnormal Samples | |
Positioning clamp nuts | 1000 | 300 | 125 |
U-shaped hoop nuts | 1000 | 300 | 146 |
Cotter pins | 850 | 300 | 173 |
Total samples | 2850 | 900 | 444 |
Categories | Original UNet | VSM-UNet | ||
---|---|---|---|---|
Before Suppression | After Suppression | Before Suppression | After Suppression | |
Positioning clamp nuts | 0.647 | 0.928 | 0.715 | 0.963 |
U-shaped hoop nuts | 0.865 | 0.947 | 0.927 | 0.976 |
Cotter pins | 0.948 | — | 0.966 | — |
Methods | Recall | Precision | F1-Score | AUROC | FPS |
---|---|---|---|---|---|
SSIM-AE | 0.719 | 0.746 | 0.731 | 0.8 | 30.66 |
Trust-MAE | 0.748 | 0.773 | 0.767 | 0.831 | 31.76 |
GANomaly | 0.741 | 0.757 | 0.756 | 0.819 | 27.42 |
DRAEM | 0.796 | 0.817 | 0.807 | 0.871 | 61.28 |
Ours | 0.904 | 0.928 | 0.916 | 0.981 | 26.56 |
Methods | Recall | Precision | F1-Score | AUROC |
---|---|---|---|---|
SE | 0.89 | 0.914 | 0.902 | 0.965 |
NL | 0.885 | 0.908 | 0.896 | 0.962 |
DA | 0.898 | 0.927 | 0.912 | 0.974 |
CBAM | 0.902 | 0.926 | 0.914 | 0.979 |
Processing Method | Positioning Clamp Nuts | U-Shaped Hoop Nuts | Cotter Pins |
---|---|---|---|
Max value | 0.967 | 0.979 | 0.959 |
4 × 4 | 0.969 | 0.981 | 0.96 |
16 × 16 | 0.968 | 0.983 | 0.959 |
32 × 32 | 0.966 | 0.984 | 0.96 |
64 × 64 | 0.971 | 0.988 | 0.965 |
72 × 72 | 0.964 | 0.988 | 0.969 |
128 × 128 | 0.963 | 0.987 | 0.974 |
Average value | 0.82 | 0.99 | 0.971 |
Category | Quantity Statistics | Performance Indicators | FPS | |||||
---|---|---|---|---|---|---|---|---|
TN | FN | FP | TP | Recall | Precision | F1 Score | ||
Positioning clamp nuts | 297 | 28 | 3 | 97 | 77.6 | 97.0 | 86.2 | 25.9 |
U-shaped hoop nuts | 294 | 6 | 6 | 140 | 95.9 | 95.9 | 95.9 | 26.7 |
Cotter pins | 261 | 5 | 39 | 168 | 97.1 | 81.2 | 88.5 | 26.9 |
Methods | Category | Recall | Precision | F1-Score | AUROC | FPS |
---|---|---|---|---|---|---|
(-) VSS block | Positioning clamp nuts | 0.765 | 0.905 | 0.829 | 0.954 | 34.43 |
U-shaped hoop nuts | 0.909 | 0.911 | 0.91 | 0.965 | 35.65 | |
Cotter pins | 0.928 | 0.859 | 0.892 | 0.958 | 35.87 | |
(-) CBAM | Positioning clamp nuts | 0.782 | 0.917 | 0.844 | 0.961 | 29.36 |
U-shaped hoop nuts | 0.929 | 0.927 | 0.928 | 0.977 | 30.33 | |
Cotter pins | 0.947 | 0.879 | 0.912 | 0.964 | 30.89 | |
(-) None | Positioning clamp nuts | 0.803 | 0.937 | 0.865 | 0.972 | 25.96 |
U-shaped hoop nuts | 0.94 | 0.94 | 0.94 | 0.991 | 26.76 | |
Cotter pins | 0.962 | 0.901 | 0.931 | 0.975 | 26.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, S.; Fei, J.; Yang, H.; Zhao, X.; Liu, X.; Li, H. VSM-UNet: A Visual State Space Reconstruction Network for Anomaly Detection of Catenary Support Components. Sensors 2025, 25, 5967. https://doi.org/10.3390/s25195967
Xu S, Fei J, Yang H, Zhao X, Liu X, Li H. VSM-UNet: A Visual State Space Reconstruction Network for Anomaly Detection of Catenary Support Components. Sensors. 2025; 25(19):5967. https://doi.org/10.3390/s25195967
Chicago/Turabian StyleXu, Shuai, Jiyou Fei, Haonan Yang, Xing Zhao, Xiaodong Liu, and Hua Li. 2025. "VSM-UNet: A Visual State Space Reconstruction Network for Anomaly Detection of Catenary Support Components" Sensors 25, no. 19: 5967. https://doi.org/10.3390/s25195967
APA StyleXu, S., Fei, J., Yang, H., Zhao, X., Liu, X., & Li, H. (2025). VSM-UNet: A Visual State Space Reconstruction Network for Anomaly Detection of Catenary Support Components. Sensors, 25(19), 5967. https://doi.org/10.3390/s25195967