PPFS-YOLO: Physics-Prior Frequency-Spatial Fusion for Robust Container Surface Damage Detection
Abstract
1. Introduction
- Pseudo-texture interference. Container surfaces exhibit complex visual patterns—rust stains, paint peeling, specular reflections, and embossed logos—that share low-level feature characteristics with genuine damage. Purely spatial-domain convolutions struggle to disentangle these confounding textures from structural defects, leading to elevated false-positive rates. From a signal-processing perspective, pseudo-textures occupy characteristic frequency bands [19] that overlap with, but are distinct from, genuine damage signatures; this distinction is invisible to standard spatial convolutions.
- Minority-class instability. Puncture-type defects (Hole) are inherently rare in service and in available datasets (approximately 12% of annotated instances), causing detectors to under-represent this safety-critical category during training. Approaches such as focal loss [3], seesaw loss [20], and hard example mining [21] partially alleviate class imbalance but do not leverage the distinctive physical signatures of hole-type damage.
- We design the Frequency-Spatial Fusion (FSF) module, which performs learnable spectral masking in the 2D Fourier domain followed by gated spatial-frequency feature fusion, enabling the network to selectively suppress pseudo-texture frequency components while preserving damage-related signals.
- We propose the Edge-Guided Auxiliary Supervision Module (FIM), which encodes Sobel-derived edge priors as a differentiable loss () and applies edge-guided residual refinement, steering the network toward physically plausible damage representations.
- We demonstrate through comprehensive ablation that acts as the critical catalyst for the synergy between FSF and FIM: without it, the structural modules alone yield only +0.83 pp mAP@50, but with the improvement reaches +12.10 pp—a amplification that reveals how the physics prior activates the latent potential of frequency-spatial fusion.
- PPFS-YOLO achieves state-of-the-art performance on a container damage dataset (64.86% mAP@50), surpassing five competitive baselines including RT-DETR-l and YOLOv10n, with particularly notable gains on the minority Hole class (+22.19 pp) and Precision (+14.52 pp).
Novelty Positioning
2. Related Work
2.1. Evolution of Real-Time Object Detectors
2.2. YOLO-Based Industrial Defect Detection
2.3. Frequency-Domain Feature Analysis in Visual Recognition
2.4. Physics-Aware and Edge-Guided Learning for Defect Analysis
- Physics-aware signal-processing methods. In non-destructive evaluation (NDT), such approaches have demonstrated substantial gains: Chen et al. [32] achieved 0.498 mm RMSE for fatigue crack quantification; GuwNet [33] reduced guided-wave microcrack quantification errors by over 80%; DfedResNet [35] improved magnetic flux leakage depth estimation by 1–2 orders of magnitude; and an end-to-end approach [34] applied rotating-field measurements to achieve 3D defect reconstruction.
- Edge-guided visual learning. Edge-guided feature refinement, which encodes the physical constraint that genuine structural damage exhibits sharp, continuous boundaries [37], has not been explored within end-to-end YOLO detection frameworks.
3. Method
3.1. Overall Architecture
- P4 Neck (Layers 12–13): After the first A2C2f block in the top-down path (512 channels).
- P3 Neck (Layers 17–18): After the second A2C2f block in the top-down path (256 channels).
- P4 Head (Layers 22–23): After the A2C2f block in the bottom-up path (512 channels).
3.2. Frequency-Spatial Fusion Module
3.2.1. Design Rationale
3.2.2. Forward Computation
3.2.3. Gate Initialization Analysis
| Algorithm 1 FSF Module Forward Pass |
| Require: Input feature ; learnable mask ; channel scale ; gate bias Ensure: Fused feature 1: ▹ Ortho-normalized 2D FFT 2: ▹ Amplitude & Phase 3: ▹ Upsample mask 4: ▹ Spectral masking 5: ▹ Reconstruct complex spectrum 6: ▹ Inverse FFT 7: ▹ Gated fusion 8: 9: return |
3.3. Edge-Guided Auxiliary Supervision Module
| Symbol | Dimensions | Description |
| Input feature map | ||
| Refined output feature map | ||
| Sobel edge prior (fixed, ) | ||
| Predicted edge map (learnable, ) | ||
| Horizontal/vertical Sobel gradients | ||
| scalar | Physics-prior alignment loss | |
| scalar | Learnable residual scale (init. ) | |
| Edge-guided refinement residual |
3.3.1. Edge Prior via Gradient Operators
3.3.2. Learnable Edge Prediction
3.3.3. Physics-Prior Loss and Gradient Analysis
- Why stop-gradient on ? The edge prior is computed from the detached feature . If gradient were allowed to flow through , the network could trivially minimize by making smooth (zero-gradient features everywhere), which would destroy the representation quality. By detaching , the physics loss exclusively trains the predictor to predict edge structure, creating a knowledge distillation-like setup where the Sobel operator acts as a fixed “teacher” and is the “student.”
3.3.4. Edge-Guided Residual Refinement
| Algorithm 2 FIM Module Forward Pass |
| Require: Input feature ; Sobel kernels ; edge predictor ; residual scale Ensure: Refined feature ; physics loss 1: ▹ Stop gradient on feature 2: ▹ Sobel depthwise conv 3: ▹ Edge prior map 4: ▹ Learnable edge prediction 5: ▹ Physics-prior loss 6: ▹ Two-stage DW–PW refinement 7: ▹ Edge-guided residual 8: return |
3.4. Training Objective and Optimization
- Gradient magnitude balancing. The physics loss coefficient is selected to balance the gradient magnitudes. At convergence, the typical magnitudes are and . With and three FIM instances, the total physics gradient magnitude is , which is comparable to but does not dominate .
- Learning rate schedule. PPFS module parameters use a learning rate multiplier relative to the backbone, accelerating adaptation of the newly initialized FSF masks and FIM predictors while the pretrained backbone parameters fine-tune at the standard rate.
Computational Complexity Analysis
| Algorithm 3 PPFS-YOLO Training Procedure |
| Require: Training set ; pretrained YOLOv12s weights ; epochs ; physics loss weight ; LR boost factor Ensure: Trained PPFS-YOLO model 1: Initialize backbone and neck with ; randomly init FSF & FIM params 2: Set , , 3: for to T do 4: for each mini-batch do 5: ▹Forward 6: 7: ▹Total loss (Equation (14)) 8: Update backbone params with learning rate 9: Update PPFS params with learning rate ▹ boost 10: end for 11: Cosine-anneal 12: end for |
4. Experiments
4.1. Dataset
4.2. Implementation Details
4.3. Evaluation Metrics
4.4. Comparison with State-of-the-Art Methods
4.5. Per-Class Analysis
4.6. Ablation Study
4.7. Training Dynamics
4.8. Learned Spectral Mask Visualization
4.9. Efficiency Analysis
4.10. Inference Latency
4.11. Cross-Domain Validation on Kolektor SDD2
5. Discussion
5.1. The Catalyst Effect of
5.2. Why Frequency-Domain Fusion Benefits Container Damage Detection
5.3. Minority Class Benefits
- The Sobel-derived edge prior produces strong, consistent responses at hole boundaries;
- the edge-guided refinement amplifies features in regions exhibiting sharp boundaries;
- the frequency-domain filtering in FSF preserves the high-frequency edge components that define hole perimeters.
5.4. Comparison with Larger Models
5.5. Limitations and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| PPFS | Physics-Prior Frequency-Spatial |
| FSF | Frequency-Spatial Fusion |
| FIM | Edge-Guided Auxiliary Supervision Module (in PPFS-YOLO) |
| YOLO | You Only Look Once |
| FFT | Fast Fourier Transform |
| DFT | Discrete Fourier Transform |
| mAP | mean Average Precision |
| AP | Average Precision |
| GFLOPs | Giga Floating-Point Operations |
| FPS | Frames per Second |
| SGD | Stochastic Gradient Descent |
| AMP | Automatic Mixed Precision |
| IoU | Intersection over Union |
| NDT | Non-Destructive Testing |
References
- United Nations Conference on Trade and Development. Review of Maritime Transport 2024; Technical report; United Nations: Geneva, Switzerland, 2024. [Google Scholar]
- Nguyen Thi Phuong, T.; Cho, G.S.; Chatterjee, I. Automating container damage detection with the YOLO-NAS deep learning model. Sci. Prog. 2025, 108, 00368504251314084. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 1 March 2026).
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Proceedings of the European Conference on Computer Vision (ECCV); Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Jocher, G.; Qiu, J. Ultralytics YOLO11. 2024. Available online: https://docs.ultralytics.com/models/yolo11/ (accessed on 1 March 2026).
- Tian, Y.; Li, H.; Wang, H.; Chen, Y.; Ling, H. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- Hou, W.; Wei, Y.; Guo, J.; Jin, Y.; Zhu, C. MSFT-YOLO: Improved YOLOv5 Based on Transformer for Detecting Defects of Steel Surface. Sensors 2022, 22, 3467. [Google Scholar] [CrossRef]
- Guo, Z.; Wang, C.; Yang, G.; Huang, Z.; Li, G. Surface defect detection of steel strips based on improved YOLOv4. Comput. Electr. Eng. 2022, 102, 108208. [Google Scholar] [CrossRef]
- Zhang, H.; Li, S.; Miao, Q.; Fang, R.; Xue, S.; Hu, Q.; Hu, J.; Chan, S. Surface defect detection of hot rolled steel based on multi-scale feature fusion and attention mechanism residual block. Sci. Rep. 2024, 14, 7671. [Google Scholar] [CrossRef]
- Jeon, C.H.; Kim, J.H. YOLOv4-MN3 for PCB Surface Defect Detection. Appl. Sci. 2021, 11, 11701. [Google Scholar] [CrossRef]
- Tang, J.; Liu, S.; Zhao, D.; Tang, L.; Zou, W.; Zheng, B. PCB-YOLO: An Improved Detection Algorithm of PCB Surface Defects Based on YOLOv5. Sustainability 2023, 15, 5963. [Google Scholar] [CrossRef]
- Zhang, C.; Yang, T.; Yang, J. Image Recognition of Wind Turbine Blade Defects Using Attention-Based MobileNetv1-YOLOv4 and Transfer Learning. Sensors 2022, 22, 6009. [Google Scholar] [CrossRef]
- Sun, X.; Jia, X.; Liang, Y.; Wang, M.; Chi, X. A Defect Detection Method for a Boiler Inner Wall Based on an Improved YOLO-v5 Network and Data Augmentation Technologies. IEEE Access 2022, 10, 93845–93858. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar] [CrossRef]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
- Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 4th ed.; Pearson: Hong Kong, China, 2018. [Google Scholar]
- Wang, J.; Zhang, W.; Zang, Y.; Cao, Y.; Pang, J.; Gong, T.; Chen, K.; Liu, Z.; Loy, C.C.; Lin, D. Seesaw Loss for Long-Tailed Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9695–9704. [Google Scholar] [CrossRef]
- Shrivastava, A.; Gupta, A.; Girshick, R. Training Region-Based Object Detectors with Online Hard Example Mining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 761–769. [Google Scholar] [CrossRef]
- Chi, L.; Jiang, B.; Mu, Y. Fast Fourier Convolution. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; pp. 4479–4488. [Google Scholar]
- Qin, Z.; Zhang, P.; Wu, F.; Li, X. FcaNet: Frequency Channel Attention Networks. arXiv 2020, arXiv:2012.11879. [Google Scholar]
- Zhong, Y.; Li, B.; Tang, L.; Kuang, S.; Wu, S.; Ding, S. Detecting Camouflaged Object in Frequency Domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 4504–4513. [Google Scholar] [CrossRef]
- Lin, J.; Tan, X.; Xu, K.; Ma, L.; Lau, R.W. Frequency-aware Camouflaged Object Detection. ACM Trans. Multimed. Comput. Commun. Appl. 2022, 19, 61. [Google Scholar] [CrossRef]
- Zheng, S.; Wu, Z.; Xu, Y.; Wei, Z. Instance-Aware Spatial-Frequency Feature Fusion Detector for Oriented Object Detection in Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5606513. [Google Scholar] [CrossRef]
- Sun, X.; Yu, Y.; Cheng, Q. Adaptive multimodal feature fusion with frequency domain gate for remote sensing object detection. Remote Sens. Lett. 2024, 15, 133–144. [Google Scholar] [CrossRef]
- Li, H.; Yi, Z.; Wang, Z.; Wang, Y.; Ge, L.; Cao, W.; Mei, L.; Yang, W.; Sun, Q. FDADNet: Detection of Surface Defects in Wood-Based Panels Based on Frequency Domain Transformation and Adaptive Dynamic Downsampling. Processes 2024, 12, 2134. [Google Scholar] [CrossRef]
- Zou, G.; Li, T.; Li, G.; Peng, X.; Fu, G. A visual detection method of tile surface defects based on spatial-frequency domain image enhancement and region growing. In Proceedings of the Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019. [Google Scholar] [CrossRef]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Cuomo, S.; Di Cola, V.S.; Giampaolo, F.; Rozza, G.; Raissi, M.; Piccialli, F. Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next. J. Sci. Comput. 2022, 92, 88. [Google Scholar] [CrossRef]
- Pan, Y.; Khodaei, Z.S.; Aliabadi, F.M. In-service fatigue crack monitoring through baseline-free automated detection and physics-informed neural network quantification. NDT E Int. 2025, 151, 103360. [Google Scholar] [CrossRef]
- Sun, H.; Peng, L.; Lin, J.; Wang, S.; Zhao, W.; Huang, S. Microcrack Defect Quantification Using a Focusing High-Order SH Guided Wave EMAT: The Physics-Informed Deep Neural Network GuwNet. IEEE Trans. Ind. Inform. 2021, 18, 3235–3247. [Google Scholar] [CrossRef]
- Zhao, J.; Li, W.; Yuan, X.A.; Yin, X.; Li, X.; Chen, Q.; Ding, J. An End-to-End Physics-Informed Neural Network for Defect Identification and 3-D Reconstruction Using Rotating Alternating Current Field Measurement. IEEE Trans. Ind. Inform. 2022, 19, 8340–8350. [Google Scholar] [CrossRef]
- Sun, H.; Peng, L.; Huang, S.; Li, S.; Long, Y.; Wang, S.; Zhao, W. Development of a Physics-Informed Doubly Fed Cross-Residual Deep Neural Network for High-Precision Magnetic Flux Leakage Defect Size Estimation. IEEE Trans. Ind. Inform. 2021, 18, 1629–1640. [Google Scholar] [CrossRef]
- Zhang, E.; Dao, M.; Karniadakis, G.E.; Suresh, S. Analyses of internal structures and defects in materials using physics-informed neural networks. Sci. Adv. 2022, 8, eabk0644. [Google Scholar] [CrossRef]
- Canny, J. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
- Sobel, I. History and Definition of the So-Called “Sobel Operator”, More Appropriately Named the Sobel-Feldman Operator. 2014. Available online: https://www.researchgate.net/profile/Irwin-Sobel/publication/285159837 (accessed on 7 April 2026).
- Finder, S.E.; Arar, R.; Averbuch-Elor, H.; Zrigui, S.; Yariv, U.; Gurevich, T. WTConv: Rethinking Large Kernel Convolutions with Wavelet Decomposition. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; Volume 37. [Google Scholar]
- Chen, L.; Fu, Y.; Gu, L.; Yan, C.; Harada, T.; Huang, G. Frequency-aware Feature Fusion for Dense Image Prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10763–10780. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. arXiv 2020, arXiv:2006.04388. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-aligned One-stage Object Detection. arXiv 2021, arXiv:2108.07755. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards Balanced Learning for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar] [CrossRef]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9756–9765. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Xia, K.; Lv, Z.; Zhou, C.; Gu, G.; Zhao, Z.; Liu, K.; Li, Z. Mixed Receptive Fields Augmented YOLO with Multi-Path Spatial Pyramid Pooling for Steel Surface Defect Detection. Sensors 2023, 23, 5114. [Google Scholar] [CrossRef]
- Wang, X.; Zhuang, K. An improved YOLOX method for surface defect detection of steel strips. In Proceedings of the IEEE International Conference on Power Electronics, Computer Applications (ICPECA), Shenyang, China, 29–31 January 2023. [Google Scholar] [CrossRef]
- Zeng, K.; Xia, Z.; Qian, J.; Du, X.; Xiao, P.; Zhu, L. Steel Surface Defect Detection Technology Based on YOLOv8-MGVS. Metals 2025, 15, 109. [Google Scholar] [CrossRef]
- Fang, W.; Yang, Y.; Zhang, W.; Wang, T.; Feng, J.; Liu, G. ASD-YOLO: A lightweight multi-module collaboratively optimized model for steel surface defect detection. Meas. Sci. Technol. 2025, 36, 095411. [Google Scholar] [CrossRef]
- Le, H.F.; Zhang, L.J.; Liu, Y.X. Surface Defect Detection of Industrial Parts Based on YOLOv5. IEEE Access 2022, 10, 130784–130794. [Google Scholar] [CrossRef]
- Zhao, Z.; Yang, X.; Zhou, Y.; Sun, Q.; Ge, Z.; Liu, D. Real-time detection of particleboard surface defects based on improved YOLOV5 target detection. Sci. Rep. 2021, 11, 21777. [Google Scholar] [CrossRef]
- Niu, W.; Lv, C.; Zhang, E.; Wei, Z. YOLO-RDM: A high accuracy and efficient algorithm for magnetic tile surface defect detection with practical applications. PLoS ONE 2025, 20, e0328815. [Google Scholar] [CrossRef] [PubMed]
- Ding, L.; Xu, H.; Du, P.; Cui, Y. ACS-YOLO: A lightweight bearing surface defect detection algorithm. J. Eng. Appl. Sci. 2025, 72, 818. [Google Scholar] [CrossRef]
- Bao, N.; Lin, J.; Fan, Y.; Bao, R.; Simeone, A. FabricMamba: A fabric surface defect detection system based on large kernel attention and visual state space. Eng. Appl. Artif. Intell. 2025, 162, 112558. [Google Scholar] [CrossRef]
- Yin, X.; Zhao, Z.; Weng, L. MAS-YOLO: A Lightweight Detection Algorithm for PCB Defect Detection Based on Improved YOLOv12. Appl. Sci. 2025, 15, 6238. [Google Scholar] [CrossRef]
- Ji, Y.; Ma, T.; Shen, H.; Feng, H.; Zhang, Z.; Li, D.; He, Y. Transmission Line Defect Detection Algorithm Based on Improved YOLOv12. Electronics 2025, 14, 2432. [Google Scholar] [CrossRef]
- Shukla, V.; Shukla, A.; S.K., S.P.; Shukla, S. A systematic survey: Role of deep learning-based image anomaly detection in industrial inspection contexts. Front. Robot. AI 2025, 12, 1554196. [Google Scholar] [CrossRef] [PubMed]
- Yang, W. A Survey of Surface Defect Detection Based on Deep Learning. In 2022 7th International Conference on Modern Management and Education Technology (MMET 2022); Atlantis Press: Dordrecht, The Netherlands, 2022. [Google Scholar] [CrossRef]
- Parseval des Chênes, M.A. Mémoire sur les séries et sur l’intégration complète. Mem. Present. L’Inst. Sci. Lett. Arts 1806, 1, 638–648. [Google Scholar]
- Zhu, Z.; Zhu, Y.; Wang, H.; Wang, N.; Ye, J.; Ling, X. FDTNet: Enhancing frequency-aware representation for prohibited object detection from X-ray images via dual-stream transformers. Eng. Appl. Artif. Intell. 2024, 133, 108076. [Google Scholar] [CrossRef]
- Rippel, O.; Snoek, J.; Adams, R.P. Spectral Representations for Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Rao, Y.; Zhao, W.; Tang, Y.; Zhou, J.; Lim, S.N.; Lu, J. HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 10353–10366. [Google Scholar]
- Li, H.; Chen, Q.; Kalkofen, D.; Chen, H.-T. OUGS: Active View Selection via Object-aware Uncertainty Estimation in 3DGS. arXiv 2025, arXiv:2511.09397. [Google Scholar] [CrossRef]
- Božič, J.; Tabernik, D.; Skočaj, D. Mixed supervision for surface-defect detection: From weakly to fully supervised learning. Comput. Ind. 2021, 129, 103459. [Google Scholar] [CrossRef]
- Bergmann, P.; Batzner, K.; Fauser, M.; Sattlegger, D.; Steger, C. The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. Int. J. Comput. Vis. 2021, 129, 1038–1059. [Google Scholar] [CrossRef]
- Bergmann, P.; Fauser, M.; Sattlegger, D.; Steger, C. MVTec AD—A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9584–9592. [Google Scholar] [CrossRef]
- Li, H.; Li, Y.; Chi, Y.; Deslandes, A.; Leonardi, M.; Freger, S.; Zhang, Y.; Avery, J.; Hull, M.L.; Chen, H.-T. Who Fails Where? LLM and Human Error Patterns in Endometriosis Ultrasound Report Extraction. In Proceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems, Barcelona, Spain, 13–16 April 2026; pp. 1–6. [Google Scholar] [CrossRef]
- Li, H.; Zhao, Y.; Li, Y.; Deslandes, A.; Avery, J.; Leonardi, M.; Hull, M.L.; Chen, H.-T. EndoExtract: Co-Designing Structured Text Extraction from Endometriosis Ultrasound Reports. arXiv 2026, arXiv:2601.18154. [Google Scholar]













| Method | Domain | Freq. Fusion | Physics Prior | End-to-End Det. |
|---|---|---|---|---|
| FcaNet [23] | General | ✔ | ✗ | ✗ |
| FFC [22] | General | ✔ | ✗ | ✗ |
| FreqCOD [24] | Camouflage | ✔ | ✗ | ✗ |
| SFFD [26] | Remote Sens. | ✔ | ✗ | ✔ |
| FDADNet [28] | Wood Defect | ✔ | ✗ | ✔ |
| GuwNet [33] | ✗ | ✔ | ✗ | |
| DfedResNet [35] | ✗ | ✔ | ✗ | |
| YOLO-NAS [2] | Container | ✗ | ✗ | ✔ |
| MAS-YOLO [59] | PCB | ✗ | ✗ | ✔ |
| PPFS-YOLO (Ours) | Container | ✔ | ✔ | ✔ |
background. “” denotes the preceding layer; bracketed indices denote concatenation sources.
background. “” denotes the preceding layer; bracketed indices denote concatenation sources.| Layer | Module | From | Channels | Role |
|---|---|---|---|---|
| Backbone | ||||
| 0 | Conv s2 | image | 64 | stem |
| 1 | Conv s2 | 128 | down | |
| 2 | C3k2 () | 256 | feature | |
| 3 | Conv s2 | 256 | down | |
| 4 | C3k2 () | 512 | feature | |
| 5 | Conv s2 | 512 | down | |
| 6 | A2C2f () | 512 | attn | |
| 7 | Conv s2 | 1024 | down | |
| 8 | A2C2f () | 1024 | attn | |
| Neck (top-down) | ||||
| 9 | Upsample | 1024 | up | |
| 10 | Concat | 1536 | fuse | |
| 11 | A2C2f () | 512 | refine | |
| 12 | FreqSpatialFusion | 512 | FSF (P4) | |
| 13 | EdgeGuidedModule | 512 | FIM (P4) | |
| 14 | Upsample | 512 | up | |
| 15 | Concat | 768 | fuse | |
| 16 | A2C2f () | 256 | refine | |
| 17 | FreqSpatialFusion | 256 | FSF (P3) | |
| 18 | EdgeGuidedModule | 256 | FIM (P3) | |
| Head (bottom-up) | ||||
| 19 | Conv s2 | 256 | down | |
| 20 | Concat | 768 | fuse | |
| 21 | A2C2f () | 512 | refine | |
| 22 | FreqSpatialFusion | 512 | FSF (P4-head) | |
| 23 | EdgeGuidedModule | 512 | FIM (P4-head) | |
| 24 | Conv s2 | 512 | down | |
| 25 | Concat | 1536 | fuse | |
| 26 | A2C2f () | 1024 | refine | |
| 27 | Detect | — | output | |
| Module | C | Params (K) | GFLOPs | Component Details |
|---|---|---|---|---|
| FSF (P3) | 256 | 33.5 | 0.11 | mask , channel scale, gate conv |
| FSF (P4, P4-head) | 512 | 132.4 | 0.10 | mask , channel scale, gate conv |
| FIM (P3) | 256 | 71.4 | 0.17 | edge predictor, DW–PW refine |
| FIM (P4, P4-head) | 512 | 283.9 | 0.51 | edge predictor, DW–PW refine |
| Total (3 pairs) | — | 790 | 1.70 | — |
| Original | After Augmentation | |||
| Class | Instances | Ratio (%) | Train Instances | Aug. Factor |
| Dent | 4438 | 48.8 | 4438 | |
| Hole | 1098 | 12.1 | 2553 | |
| Rusty | 3568 | 39.2 | 3568 | |
| Total | 9104 | 100.0 | 10,559 | — |
| Split | Images | Negatives | Resolution | |
| Train | 3300 | +3300 neg. | variable | |
| Val/Test | 413 | — | variable | |
| Total | 7013 | — | resized to | |
| Hyperparameter | Value | Description |
|---|---|---|
| Input resolution | standard YOLO input | |
| Epochs | 200 | training duration |
| Optimizer (all) | SGD | unified for fair comparison |
| Learning rate | initial, cosine annealed | |
| Weight decay | regularization | |
| Batch size per GPU | 16 | effective |
| AMP | enabled | mixed precision |
| Seed | 42 | reproducibility |
| † Gate bias | 1.0 | initial |
| † Residual scale | 0.1 | FIM init |
| † LR boost factor | PPFS module params | |
| † | 0.5 | physics loss weight |
| † Mask base res. | FSF spectral mask |
= best;
= second;
= third.
= best;
= second;
= third.| Method | Params (M) | GFLOPs | mAP@50 | mAP@50:95 | Precision | Recall |
|---|---|---|---|---|---|---|
| YOLOv10n [7] | 2.27 | 4.4 | 46.55 | 24.93 | 58.04 | 48.31 |
| YOLO11n [8] | 2.58 | 3.3 | 48.10 | 25.20 | 62.46 | 47.03 |
| RT-DETR-l [17] | 32.00 | 54.2 | 52.49 | 30.83 | 68.27 | 55.31 |
| YOLOv8s [4] | 11.13 | 14.4 | 52.56 | 29.67 | 66.33 | 52.45 |
| YOLO12s [9] | 9.23 | 10.8 | 52.51 | 29.58 | 63.77 | 54.42 |
| PPFS-YOLO | 10.02 | 12.5 | 64.86 | 37.49 | 78.29 | 64.82 |
= best;
= second;
= third.
= best;
= second;
= third.| Method | Dent | Hole | Rusty | mAP@50 |
|---|---|---|---|---|
| YOLOv10n [7] | 50.10 | 59.81 | 29.73 | 46.55 |
| YOLO11n [8] | 54.14 | 56.88 | 33.28 | 48.10 |
| RT-DETR-l [17] | 57.06 | 64.15 | 36.27 | 52.49 |
| YOLOv8s [4] | 54.32 | 65.44 | 37.93 | 52.56 |
| YOLO12s [9] | 58.61 | 60.87 | 38.04 | 52.51 |
| PPFS-YOLO | 66.85 | 83.06 | 44.66 | 64.86 |
| vs. YOLO12s (SGD) | +8.24 | +22.19 | +6.62 | +12.35 |
= best.| Configuration | Params (M) | GFLOPs | mAP@50 | mAP@50:95 | Precision | Recall |
|---|---|---|---|---|---|---|
| YOLO12s (Baseline) | 9.23 | 10.8 | 52.76 | 30.65 | 65.33 | 53.86 |
| +FSF only | 9.35 | 11.1 | 54.55 (+1.79) | 31.88 | 69.66 | 53.92 |
| +FIM only | 9.91 | 12.3 | 53.71 (+0.95) | 31.58 | 66.65 | 53.50 |
| +FSF+FIM () | 10.02 | 12.5 | 53.59 (+0.83) | 31.00 | 68.04 | 52.64 |
| Full PPFS | 10.02 | 12.5 | 64.86 (+12.10) | 37.49 | 78.29 | 64.82 |
| Method | Params (M) | Latency (ms) | FPS |
|---|---|---|---|
| YOLOv10n | 2.78 | 10.1 | 99.4 |
| YOLO11n | 2.62 | 8.7 | 115.4 |
| RT-DETR-l | 32.97 | 31.3 | 32.0 |
| YOLOv8s | 11.17 | 6.2 | 162.6 |
| YOLO12s | 9.29 | 14.3 | 70.0 |
| PPFS-YOLO | 10.08 | 17.2 | 58.3 |
| Method | Best mAP@50 | Best mAP@50:95 | Final mAP@50 | vs. YOLO12s |
|---|---|---|---|---|
| YOLO12s (baseline) | 95.6 | 93.8 | 92.6 | — |
| PPFS-YOLO | 93.6 | 91.9 | 92.3 |
= best;
= second;
= third.
= best;
= second;
= third.| Method | mAP@50 | GFLOPs | mAP/GFLOPs | Params (M) |
|---|---|---|---|---|
| YOLOv10n | 46.55 | 4.4 | 10.58 | 2.27 |
| YOLO11n | 48.10 | 3.3 | 14.58 | 2.58 |
| RT-DETR-l | 52.49 | 54.2 | 0.97 | 32.00 |
| YOLOv8s | 52.56 | 14.4 | 3.65 | 11.13 |
| YOLO12s | 52.51 | 10.8 | 4.86 | 9.23 |
| PPFS-YOLO | 64.86 | 12.5 | 5.19 | 10.02 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, J.; Gao, F. PPFS-YOLO: Physics-Prior Frequency-Spatial Fusion for Robust Container Surface Damage Detection. Sensors 2026, 26, 3224. https://doi.org/10.3390/s26103224
Liu J, Gao F. PPFS-YOLO: Physics-Prior Frequency-Spatial Fusion for Robust Container Surface Damage Detection. Sensors. 2026; 26(10):3224. https://doi.org/10.3390/s26103224
Chicago/Turabian StyleLiu, Jingze, and Feng Gao. 2026. "PPFS-YOLO: Physics-Prior Frequency-Spatial Fusion for Robust Container Surface Damage Detection" Sensors 26, no. 10: 3224. https://doi.org/10.3390/s26103224
APA StyleLiu, J., & Gao, F. (2026). PPFS-YOLO: Physics-Prior Frequency-Spatial Fusion for Robust Container Surface Damage Detection. Sensors, 26(10), 3224. https://doi.org/10.3390/s26103224

