A Static-to-Temporal Framework for Interpretable Camera Lens Soiling Severity Estimation in Autonomous Driving
Abstract
1. Introduction
2. Related Work
2.1. Soiling Detection and Quantification
2.2. Synthetic Soiling Generation and Data Augmentation
2.3. Temporal Smoothing and Modeling
3. Materials and Methods
3.1. Data Resources Used in the Framework
3.1.1. WoodScape: Primary Strongly Annotated Dataset
3.1.2. External Test and OccNuScenes-Dirt: Independent External Evaluation Sets
3.2. Static Foundation: Structured Dual-Head Model
3.2.1. Architecture of the SDSM
3.2.2. Structured Severity Score Formulation
3.2.3. Dual-Head Learning and Consistency Constraint
3.2.4. Static Variants Used in the Ablation Study
3.3. TS-SD for SD-Seq Temporal Supervision
3.3.1. Motivation for TS-SD
3.3.2. Two-Stage Controllable Soiling Synthesis
3.3.3. Construction of SD-Seq for Temporal Learning
3.4. Adaptive EMA Module for Temporal Refinement
3.4.1. Formulation of the Adaptive EMA Module
3.4.2. Frozen-Ruler Training Strategy and Objectives
3.4.3. Mechanism-Oriented Probing Design for Dynamic
3.5. Evaluation Protocols and Metrics
3.5.1. Strongly Supervised Static Evaluation on WoodScape Test
3.5.2. Cluster-Level Weakly Supervised Static Evaluation on External Test
3.5.3. Supplementary Triplet-Monotonicity Evaluation on OccNuScenes-Dirt
3.5.4. Evaluation Metrics
4. Datasets Severity Protocol Validity
4.1. Validity of the External Test Weak Severity Protocol
4.2. Validity of the OccNuScenes-Dirt Controlled Severity Protocol
5. Experiments and Results
5.1. Main Static Results of the SDSM
5.2. Ablation on the Structured Severity Score
5.3. Supplementary Controlled-Protocol Validation on OccNuScenes-Dirt
5.4. Assessment of Single-Frame TS-SD Augmentation
5.5. Main Temporal Stabilization Results on External Test
5.6. Mechanism Verification of the Learned Dynamic
6. Discussion
6.1. Trade-Off Analysis
6.2. Methodological Value and Robustness of the Structured Severity Representation
6.3. Engineering Value
6.4. Scope of Robustness and Future Work
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| SDSM | Structured Dual-Head Static Model |
| TS-SD | Two-Stage Stable Diffusion |
| SD-Seq | Stable Diffusion Sequence |
Appendix A. Qualitative TS-SD Single-Frame Synthesis Examples

References
- Heimberger, M.; Horgan, J.; Hughes, C.; McDonald, J.; Yogamani, S. Computer Vision in Automated Parking Systems: Design, Implementation and Challenges. Image Vis. Comput. 2017, 68, 88–101. [Google Scholar] [CrossRef]
- Eising, C.; Horgan, J.; Yogamani, S. Near-Field Perception for Low-Speed Vehicle Automation Using Surround-View Fisheye Cameras. IEEE Trans. Intell. Transp. Syst. 2022, 23, 13976–13993. [Google Scholar] [CrossRef]
- Kumar, V.R.; Eising, C.; Witt, C.; Yogamani, S.K. Surround-View Fisheye Camera Perception for Automated Driving: Overview, Survey & Challenges. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3638–3659. [Google Scholar] [CrossRef]
- Wang, H.; Li, J.; Dong, H. A Review of Vision-Based Multi-Task Perception Research Methods for Autonomous Vehicles. Sensors 2025, 25, 2611. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Carballo, A.; Yang, H.; Takeda, K. Perception and Sensing for Autonomous Vehicles Under Adverse Weather Conditions: A Survey. ISPRS J. Photogramm. Remote Sens. 2023, 196, 146–177. [Google Scholar] [CrossRef]
- Son, S.; Lee, W.; Jung, H.; Lee, J.; Kim, C.; Lee, H.; Park, H.; Lee, H.; Jang, J.; Cho, S.; et al. Evaluation of Camera Recognition Performance under Blockage Using Virtual Test Drive Toolchain. Sensors 2023, 23, 8027. [Google Scholar] [CrossRef] [PubMed]
- Uřičář, M.; Křížek, P.; Sistu, G.; Yogamani, S. SoilingNet: Soiling Detection on Automotive Surround-View Cameras. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 67–72. [Google Scholar]
- Das, A.; Křížek, P.; Sistu, G.; Bürger, F.; Madasamy, S.; Uřičář, M.; Kumar, V.R.; Yogamani, S. TiledSoilingNet: Tile-Level Soiling Detection on Automotive Surround-View Cameras Using Coverage Metric. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar]
- Yogamani, S.; Hughes, C.; Horgan, J.; Sistu, G.; Chennupati, S.; Uricar, M.; Milz, S.; Simon, M.; Amende, K.; Witt, C.; et al. WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9307–9317. [Google Scholar]
- Uřičář, M.; Sistu, G.; Rashed, H.; Vobecký, A.; Kumar, V.R.; Křížek, P.; Bürger, F.; Yogamani, S. Let’s Get Dirty: GAN Based Data Augmentation for Camera Lens Soiling Detection in Autonomous Driving. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Virtual, 5–9 January 2021; pp. 766–775. [Google Scholar]
- Das, A. SoildNet: Soiling Degradation Detection in Autonomous Driving. arXiv 2019, arXiv:1911.01054. [Google Scholar] [CrossRef]
- Ravi Kumar, V.; Yogamani, S.; Rashed, H.; Sistu, G.; Witt, C.; Leang, I.; Milz, S.; Mäder, P. OmniDet: Surround View Cameras Based Multi-Task Visual Perception Network for Autonomous Driving. IEEE Robot. Autom. Lett. 2021, 6, 2830–2837. [Google Scholar] [CrossRef]
- Rashed, H.; Mohamed, E.; Sistu, G.; Kumar, V.R.; Eising, C.; El-Sallab, A.; Yogamani, S. Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Virtual, 5–9 January 2021; pp. 2271–2279. [Google Scholar]
- Uřićář, M.; Ulićný, J.; Sistu, G.; Rashed, H.; Křížek, P.; Hurych, D.; Vobecký, A.; Yogamani, S. Desoiling Dataset: Restoring Soiled Areas on Automotive Fisheye Cameras. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 4273–4279. [Google Scholar]
- Matos, F.; Bernardino, J.; Durães, J.; Cunha, J. A Survey on Sensor Failures in Autonomous Vehicles: Challenges and Solutions. Sensors 2024, 24, 5108. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Zhang, R.; Li, J.; Li, X. Procedural Generation of Lens Soiling Data via Physics-Based Simulation. Vis. Comput. 2025, 41, 10433–10449. [Google Scholar] [CrossRef]
- Song, Z.; He, Z.; Li, X.; Ma, Q.; Ming, R.; Mao, Z.; Pei, H.; Peng, L.; Hu, J.; Yao, D.; et al. Synthetic Datasets for Autonomous Driving: A Survey. IEEE Trans. Intell. Veh. 2024, 9, 1847–1864. [Google Scholar] [CrossRef]
- Kumar, S.; Brophy, T.; Mohandas, R.; Grua, E.M.; Sistu, G.; Donzella, V.; Eising, C. Occluded nuScenes: A Multi-Sensor Dataset for Evaluating Perception Robustness in Automated Driving. arXiv 2025, arXiv:2510.18552. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 10674–10685. [Google Scholar]
- Zhang, L.; Rao, A.; Agrawala, M. Adding Conditional Control to Text-to-Image Diffusion Models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 3813–3824. [Google Scholar]
- Mușat, V.; Fursa, I.; Newman, P.; Cuzzolin, F.; Bradley, A. Multi-Weather City: Adverse Weather Stacking for Autonomous Driving. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 2906–2915. [Google Scholar]
- Bijelic, M.; Gruber, T.; Mannan, F.; Kraus, F.; Ritter, W.; Dietmayer, K.; Heide, F. Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11679–11689. [Google Scholar]
- von Bernuth, A.; Volk, G.; Bringmann, O. Rendering Physically Correct Raindrops on Windshields for Robustness Verification of Camera-Based Object Recognition. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 922–927. [Google Scholar]
- Kondratyuk, D.; Yuan, L.; Li, Y.; Zhang, L.; Tan, M.; Brown, M.; Gong, B. MoViNets: Mobile Video Networks for Efficient Video Recognition. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 16015–16025. [Google Scholar]
- Li, Y.; Wu, C.-Y.; Fan, H.; Mangalam, K.; Xiong, B.; Malik, J.; Feichtenhofer, C. MViTv2: Improved Multiscale Vision Transformers for Classification and Detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 4794–4804. [Google Scholar]
- Liu, Z.; Ning, J.; Cao, Y.; Wei, Y.; Zhang, Z.; Lin, S.; Hu, H. Video Swin Transformer. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 3192–3201. [Google Scholar]
- Revach, G.; Shlezinger, N.; Ni, X.; Escoriza, A.L.; van Sloun, R.J.G.; Eldar, Y.C. KalmanNet: Neural Network Aided Kalman Filtering for Partially Known Dynamics. IEEE Trans. Signal Process. 2022, 70, 1532–1547. [Google Scholar] [CrossRef]
- Jonschkowski, R.; Rastogi, D.; Brock, O. Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors. arXiv 2018, arXiv:1805.11122. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. Int. Conf. Learn. Represent. 2022, 1, 3. [Google Scholar]
- Kumar, S.; Sharma, S.; Asghar, R.; Mohandas, R.; Brophy, T.; Sistu, G.; Grua, E.M.; Donzella, V.; Eising, C. Exploring Sensor Impact and Architectural Robustness in Adverse Weather on BEV Perception. IEEE Open J. Veh. Technol. 2025, 6, 2857–2875. [Google Scholar] [CrossRef]
- Xie, S.; Kong, L.; Zhang, W.; Ren, J.; Pan, L.; Chen, K.; Liu, Z. RoboBEV: Towards Robust Bird’s Eye View Perception under Corruptions. arXiv 2023, arXiv:2304.06719. [Google Scholar]








| Severity Level | N_Images | Target Detection | Actual Detected | DetectionRate (%) | Fuzzy Detection | FuzzyRate (%) |
|---|---|---|---|---|---|---|
| Level-1 | 2020 | 3060 | 2715 | 88.7 | 660 | 24.3 |
| Level-2 | 1492 | 2348 | 1928 | 82.1 | 632 | 32.8 |
| Level-3 | 64 | 204 | 116 | 56.9 | 76 | 65.5 |
| Method | Sensors | Dirt-0.1 | Dirt-0.2 | Dirt-0.3 | Relative Decrease (%) |
|---|---|---|---|---|---|
| SimpleBEV | Camera only | 38.61 | 23.42 | 5.85 | 84.85% |
| SimpleBEV | Cam + RADAR | 49.84 | 23.42 | 5.85 | 77.91% |
| SimpleBEV | Cam + LiDAR | 49.84 | 32.86 | 11.01 | 71.03% |
| SimpleBEV | Cam + LiDAR + RADAR | 60.43 | 42.16 | 16.69 | 62.57% |
| Model | Global MAE ↓ | Gap MAE ↓ | ↑ | ↑ |
|---|---|---|---|---|
| 0.0217 | 0.0145 | 0.9980 | 0.9929 |
| Model | ↑ | 95% CI | SE ↓ | CI Width ↓ |
|---|---|---|---|---|
| 0.7876 | [0.7448, 0.8205] | 0.0196 | 0.0757 |
| Variant | Opacity | Spatial | Dominance | Transp. Discount | Consistency | Reweighted | ↑ | 95% CI | SE ↓ | CI Width ↓ |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.3174 | [0.2345, 0.3974] | 0.0417 | 0.1629 | |||||||
| ✔ | ✔ | 0.7020 | [0.6550, 0.7445] | 0.0237 | 0.0895 | |||||
| ✔ | ✔ | ✔ | 0.7190 | [0.6714, 0.7592] | 0.0231 | 0.0879 | ||||
| ✔ | ✔ | ✔ | ✔ | ✔ | 0.7236 | [0.6754, 0.7644] | 0.0227 | 0.0890 | ||
| ✔ | ✔ | ✔ | ✔ | 0.6774 | [0.6229, 0.7246] | 0.0261 | 0.1017 | |||
| ✔ | ✔ | ✔ | ✔ | ✔ | 0.7461 | [0.7017, 0.7835] | 0.0206 | 0.0818 | ||
| ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | 0.7876 | [0.7448, 0.8205] | 0.0196 | 0.0757 |
| Model | ↑ | Strict-Triplet Acc() ↑ | Pairwise-Triplet Acc() ↑ |
|---|---|---|---|
| 0.5234 | 0.8086 | 0.8952 | |
| 0.7564 | 0.9427 | 0.9713 | |
| 0.7673 | 0.9600 | 0.9800 | |
| 0.7892 | 0.9884 | 0.9940 |
| Settings | WoodScape MAE ↓ | ρ ↑ | 95% CI |
|---|---|---|---|
| Real-only | 0.0192 | 0.7876 | [0.7448, 0.8205] |
| Real + SD (989) | 0.0188 | 0.7672 | [0.7239, 0.8023] |
| Real + SD (1260) | 0.0187 | 0.7106 | [0.6590, 0.7525] |
| Method | Ranking Preservation | Temporal Stability | |||
|---|---|---|---|---|---|
| ρ ↑ | MAD ↓ | MeanAbsDiff ↓ | Variance ↓ | Range ↓ | |
| Static Anchor | 0.7876 | 0.0330 | 0.0870 | 0.0040 | 0.1700 |
| +adaptive EMA | 0.7829 | 0.0160 | 0.0400 | 0.0009 | 0.0770 |
| △ | −0.0047 | −0.0170 | −0.0470 | −0.0031 | −0.0930 |
| Method | △ρ vs. Static | △MAD | △MeanAbsDiff | △Variance | △Range |
|---|---|---|---|---|---|
| Fixed EMA, | −0.0051 | −0.0185 | −0.0496 | −0.0029 | −0.0930 |
| Fixed EMA, | −0.0045 | −0.0171 | −0.0473 | −0.0029 | −0.0890 |
| Fixed EMA, | −0.0044 | −0.0159 | −0.0449 | −0.0028 | −0.0843 |
| Adaptive EMA | −0.0046 | −0.0172 | −0.0466 | −0.0029 | −0.0892 |
| Group | Mean Static MAD | Mean Temporal MAD | MeanAbsDiff Reduction ↑ | Mean Relative Reduction (%) ↑ |
|---|---|---|---|---|
| Top-20 Clusters | 0.0738 | 0.0276 | 0.0462 | 66.4 |
| Bottom-20 Clusters | 0.0254 | 0.0253 | 0.00008 | 1.8 |
| Setting | Mean (Insert) | Mean (Non-Insert) | Peak |Adaptive − Fixed| |
|---|---|---|---|
| Single-frame | 0.24844 | 0.24882 | 0.000300 |
| Two-frame | 0.24865 | 0.24922 | 0.000325 |
| Ablation Mode | Mean Abs Shift | Peak Abs Shift |
|---|---|---|
| No | 0.001094 | 0.002140 |
| No | 0.000225 | 0.000300 |
| No | 0.000048 | 0.000143 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, F.; Duan, X.; Li, F.; Zhang, L. A Static-to-Temporal Framework for Interpretable Camera Lens Soiling Severity Estimation in Autonomous Driving. Sensors 2026, 26, 3533. https://doi.org/10.3390/s26113533
Yang F, Duan X, Li F, Zhang L. A Static-to-Temporal Framework for Interpretable Camera Lens Soiling Severity Estimation in Autonomous Driving. Sensors. 2026; 26(11):3533. https://doi.org/10.3390/s26113533
Chicago/Turabian StyleYang, Fan, Xingyu Duan, Fan Li, and Luolin Zhang. 2026. "A Static-to-Temporal Framework for Interpretable Camera Lens Soiling Severity Estimation in Autonomous Driving" Sensors 26, no. 11: 3533. https://doi.org/10.3390/s26113533
APA StyleYang, F., Duan, X., Li, F., & Zhang, L. (2026). A Static-to-Temporal Framework for Interpretable Camera Lens Soiling Severity Estimation in Autonomous Driving. Sensors, 26(11), 3533. https://doi.org/10.3390/s26113533
