Enhancing Object Detection for Autonomous Vehicles in Low-Resolution Environments Using a Super-Resolution Transformer-Based Preprocessing Framework
Abstract
1. Introduction
- We formulate the problem of object detection in low-resolution imagery and demonstrate how it can be mitigated by introducing Super-Resolution as a preprocessing step.
- We empirically show that SR preprocessing improves object detection performance without requiring modifications to the detection model, particularly within autonomous vehicle vision systems.
- We propose an efficient Transformer-based SR architecture, Dual Residual Compression Transformer (DRCT), which enhances image resolution while preserving contextual and structural detail for robust object detection.
2. Literature Review
2.1. Super-Resolution on Computer Vision
2.2. Transformer-Based Super-Resolution
2.3. Object Detection
2.4. Contribution of This Work
3. Problem Formulation
4. Methodology
4.1. Super-Resolution Based Approach
4.2. Super-Resolution as a Preprocessing Stage for Object-Detection
4.3. Object Detection Model
5. Experiment and Result
5.1. Experimental Dataset
5.2. Experimental Environment
5.3. Quantitative Result
5.4. Qualitative Result
5.5. Comparison of Complexity Models
5.6. Ablation Study
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| SR | Super-Resolution |
| CNN | Convolutional Neural Network |
| GAN | Generative Adversarial Network |
| HAT | Linear dichroism |
| DRCT | Dense Residual Connected Transformer |
| MLP | Multi-layer Perceptron |
| YOLO | You Only Look Only |
| SSD | Single-Shot Multibox Detector |
| PSNR | Peak Signal Noise to Ratio |
| SSIM | Structure Similarity |
| mAP | Mean Average Precision |
| LR | Low Resolution |
| HR | High Resolution |
| RDG | Residual Dense Groups |
| STL | Swin Transformer Layer |
| C2PSA | Cross-Stage Partial Self-Attention |
| CSP | Cross-Stage Partial |
| SPPF | Spatial Pyramid Pooling—Fast |
| SiLU | Sigmoid Linear Unit |
| ReLU | Rectified Linear Unit |
| IoU | Intersection over Union |
References
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep Learning for Generic Object Detection: A Survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A survey of deep learning-based object detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
- Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef]
- Li, G.; Xie, H.; Yan, W.; Chang, Y.; Qu, X. Detection of Road Objects with Small Appearance in Images for Autonomous Driving in Various Traffic Situations Using a Deep Learning Based Approach. IEEE Access 2020, 8, 211164–211172. [Google Scholar] [CrossRef]
- Bagloee, S.A.; Tavana, M.; Asadi, M.; Oliver, T. Autonomous vehicles: Challenges, opportunities, and future implications for transportation policies. J. Mod. Transp. 2016, 24, 284–303. [Google Scholar] [CrossRef]
- Wan, L.; Sun, Y.; Sun, L.; Ning, Z.; Rodrigues, J.J.P.C. Deep Learning Based Autonomous Vehicle Super Resolution DOA Estimation for Safety Driving. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4301–4315. [Google Scholar] [CrossRef]
- Shan, T.; Wang, J.; Chen, F.; Szenher, P.; Englot, B. Simulation-based lidar super-resolution for ground vehicles. Rob. Auton. Syst. 2020, 134, 103647. [Google Scholar] [CrossRef]
- Liang, D.; Geng, Q.; Wei, Z.; Vorontsov, D.A.; Kim, E.L.; Wei, M.; Zhou, H. Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5619213. [Google Scholar] [CrossRef]
- Deng, C.; Jing, D.; Han, Y.; Chanussot, J. Toward Hierarchical Adaptive Alignment for Aerial Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5615515. [Google Scholar] [CrossRef]
- Liang, D.; Zhang, J.W.; Tang, Y.P.; Huang, S.J. MUS-CDB: Mixed Uncertainty Sampling With Class Distribution Balancing for Active Annotation in Aerial Object Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5613013. [Google Scholar] [CrossRef]
- Ingle, P.Y.; Kim, Y.G. Real-Time Abnormal Object Detection for Video Surveillance in Smart Cities. Sensors 2022, 22, 3862. [Google Scholar] [CrossRef]
- Alsubaei, F.S.; Al-Wesabi, F.N.; Hilal, A.M. Deep Learning-Based Small Object Detection and Classification Model for Garbage Waste Management in Smart Cities and IoT Environment. Appl. Sci. 2022, 12, 2281. [Google Scholar] [CrossRef]
- Abdul-Khalil, S.; Abdul-Rahman, S.; Mutalib, S.; Kamarudin, S.I.; Kamaruddin, S.S. A review on object detection for autonomous mobile robot. IAES Int. J. Artif. Intell. 2023, 3, 1033–1043. [Google Scholar] [CrossRef]
- Xu, Z.; Zhan, X.; Xiu, Y.; Suzuki, C.; Shimada, K. Onboard Dynamic-Object Detection and Tracking for Autonomous Robot Navigation With RGB-D Camera. IEEE Robot. Autom. Lett. 2024, 9, 651–658. [Google Scholar] [CrossRef]
- Kim, H.; Kim, H.; Lee, S.; Lee, H. Autonomous Exploration in a Cluttered Environment for a Mobile Robot With 2D-Map Segmentation and Object Detection. IEEE Robot. Autom. Lett. 2022, 7, 6343–6350. [Google Scholar] [CrossRef]
- Rostianingsih, S.; Setiawan, A.; Halim, C.I. COCO (Creating Common Object in Context) Dataset for Chemistry Apparatus. Procedia Comput. Sci. 2020, 171, 2445–2452. [Google Scholar] [CrossRef]
- Tong, K.; Wu, Y. Rethinking PASCAL-VOC and MS-COCO dataset for small object detection. J. Vis. Commun. Image Represent. 2023, 93, 103830. [Google Scholar] [CrossRef]
- Na, B.; Fox, G.C. Object classifications by image super-resolution preprocessing for convolutional neural networks. Adv. Sci. Technol. Eng. Syst. 2020, 5, 476–483. [Google Scholar] [CrossRef]
- Shahriar, T.; Li, H. A Study of Image Pre-processing for Faster Object Recognition. arXiv 2020, arXiv:2011.06928. [Google Scholar] [CrossRef]
- Krishna, H.; Jawahar, C.V. Improving small object detection. In Proceedings of the 4th Asian Conference on Pattern Recognition, ACPR 2017, Nanjing, China, 26–29 November 2017; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2017; pp. 346–351. [Google Scholar] [CrossRef]
- Pang, Y.; Cao, J.; Wang, J.; Han, J. JCS-Net: Joint Classification and Super-Resolution Network for Small-Scale Pedestrian Detection in Surveillance Images. IEEE Trans. Inf. Forensics Secur. 2019, 14, 3322–3331. [Google Scholar] [CrossRef]
- Yang, Z.; Chai, X.; Wang, R.; Guo, W.; Wang, W.; Pu, L.; Chen, X. Prior Knowledge Guided Small Object Detection on High-Resolution Images. In Proceedings of the International Conference on Image Processing, ICIP, Taipei, China, 22–25 September 2019. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable Detr: Deformable Transformers for End-To-End Object Detection. In Proceedings of the ICLR 2021—9th International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Zhang, H.; Mao, F.; Xue, M.; Fang, G.; Feng, Z.; Song, J.; Song, M. Knowledge Amalgamation for Object Detection With Transformers. IEEE Trans. Image Process. 2023, 32, 2093–2106. [Google Scholar] [CrossRef]
- Dai, Z.; Cai, B.; Lin, Y.; Chen, J. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021. [Google Scholar] [CrossRef]
- Dai, L.; Liu, H.; Tang, H.; Wu, Z.; Song, P. AO2-DETR: Arbitrary-Oriented Object Detection Transformer. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 2342–2356. [Google Scholar] [CrossRef]
- Grigorescu, S.; Trasnea, B.; Cocias, T.; Macesanu, G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2020, 37, 362–386. [Google Scholar] [CrossRef]
- Badue, C.; Guidolini, R.; Carneiro, R.V.; Azevedo, P.; Cardoso, V.B.; Forechi, A.; Jesus, L.; Berriel, R.; Paixão, T.M.; Mutz, F.; et al. Self-driving cars: A survey. Expert Syst. Appl. 2021, 165, 113816. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the KITTI vision benchmark suite. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 6–21 June 2012. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern. Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual Dense Network for Image Super-Resolution. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar] [CrossRef]
- Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced super-resolution generative adversarial networks. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
- Intodia, S.; Gupta, S.; Yeramalli, Y.; Bhat, A. Literature Review: Super Resolution for Autonomous Vehicles using Generative Adversarial Networks. In Proceedings of the 7th International Conference on Intelligent Computing and Control Systems, ICICCS 2023, Madurai, India, 17–19 May 2023. [Google Scholar] [CrossRef]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.P.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar] [CrossRef]
- Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating More Pixels in Image Super-Resolution Transformer. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–19 June 2023. [Google Scholar] [CrossRef]
- Hsu, C.-C.; Lee, C.-M.; Chou, Y.-S. DRCT: Saving Image Super-resolution away from Information Bottleneck. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; Available online: http://arxiv.org/abs/2404.00722 (accessed on 23 November 2024).
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Zhang, D.; Huang, F.; Liu, S.; Wang, X.; Jin, Z. SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution. arXiv 2022, arXiv:2208.11247. [Google Scholar]
- Haris, M.; Shakhnarovich, G.; Ukita, N. Task-Driven Super Resolution: Object Detection in Low-Resolution Images. In Communications in Computer and Information Science; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
- Liu, K.; Fu, Z.; Jin, S.; Chen, Z.; Zhou, F.; Jiang, R.; Chen, Y.; Ye, J. ESOD: Efficient Small Object Detection on High-Resolution Images. IEEE Trans. Image Process. 2024, 14, 183–195. [Google Scholar] [CrossRef]
- Musunuri, Y.R.; Kwon, O.S.; Kung, S.Y. SRODNet: Object Detection Network Based on Super Resolution for Autonomous Vehicles. Remote Sens. 2022, 14, 6270. [Google Scholar] [CrossRef]
- Yang, Q.; Huang, C.; Cao, L.; Song, Q.; Jiang, X.; Liu, X.; Yuan, C. CLAHR: Cascaded Label Assignment Head for High-Resolution Small Object Detection. IEEE Access 2024, 12, 15447–15457. [Google Scholar] [CrossRef]
- Chen, Z.; Ji, H.; Zhang, Y.; Zhu, Z.; Li, Y. High-Resolution Feature Pyramid Network for Small Object Detection on Drone View. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 475–489. [Google Scholar] [CrossRef]
- Li, J.; Zhang, Z.; Tian, Y.; Xu, Y.; Wen, Y.; Wang, S. Target-Guided Feature Super-Resolution for Vehicle Detection in Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8020805. [Google Scholar] [CrossRef]
- Truong, N.Q.; Nguyen, P.H.; Nam, S.H.; Park, K.R. Deep Learning-Based Super-Resolution Reconstruction and Marker Detection for Drone Landing. IEEE Access 2019, 7, 61639–61655. [Google Scholar] [CrossRef]
- Ma, S.; Xu, M.; Feng, W. Dam Crack Instance Segmentation Algorithm Based on Improved YOLOv8. IEEE Access 2025, 13, 84271–84283. [Google Scholar] [CrossRef]
- Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Zheng, X.; Bi, J.; Li, K.; Zhang, G.; Jiang, P. SMN-YOLO: Lightweight YOLOv8-Based Model for Small Object Detection in Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2025, 22, 8001305. [Google Scholar] [CrossRef]
- Terven, J.; Cordova-Esparza, D. A Comprehensive Review of YOLO: From YOLOv1 and Beyond. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Li, B.; Huang, S.; Zhong, G. LTEA-YOLO: An Improved YOLOv5s Model for Small Object Detection. IEEE Access 2024, 12, 99768–99778. [Google Scholar] [CrossRef]
- Qiu, J.; Cai, F.; Fu, N.; Yao, Y. YOLO-Air: An Efficient Deep Learning Network for Small Object Detection in Drone-Based Imagery. IEEE Access 2025, 13, 79718–79735. [Google Scholar] [CrossRef]
- Yang, Y.; Wang, H.; Pang, P. SAIR-YOLO: An Improved YOLOv8 Network for Sea-Air Background IR Small Object Detection. IEEE Geosci. Remote Sens. Lett. 2025, 22, 7000505. [Google Scholar] [CrossRef]
- Lou, H.; Duan, X.; Guo, J.; Liu, H.; Gu, J.; Bi, L.; Chen, H. DC-YOLOv8: Small-Size Object Detection Algorithm Based on Camera Sensor. Electronics 2023, 12, 2323. [Google Scholar] [CrossRef]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y.; Jiang, Y. Extended Feature Pyramid Network for Small Object Detection. IEEE Trans. Multimed. 2022, 24, 1968–1979. [Google Scholar] [CrossRef]
- Mirzaei, B.; Nezamabadi-pour, H.; Raoof, A.; Derakhshani, R. Small Object Detection and Tracking: A Comprehensive Review. Sensors 2023, 23, 6887. [Google Scholar] [CrossRef]
- Palwankar, T.; Kothari, K. Real Time Object Detection using SSD and MobileNet. Int. J. Res. Appl. Sci. Eng. Technol. 2022, 10, 831–834. [Google Scholar] [CrossRef]
- Li, W.; Liu, K.; Zhang, L.; Cheng, F. Object detection based on an adaptive attention mechanism. Sci. Rep. 2020, 10, 11307. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, M.; Jiang, Y.; Li, D.; Zhou, Y. SSRDet: Small Object Detection Based on Feature Pyramid Network. IEEE Access 2023, 11, 96743–96752. [Google Scholar] [CrossRef]
- Song, Z.; Zhang, Y.; Liu, Y.; Yang, K.; Sun, M. MSFYOLO: Feature fusion-based detection for small objects. IEEE Lat. Am. Trans. 2022, 20, 823–830. [Google Scholar] [CrossRef]
- Cao, C.; Wang, B.; Zhang, W.; Zeng, X.; Yan, X.; Feng, Z.; Liu, Y.; Wu, Z. An Improved Faster R-CNN for Small Object Detection. IEEE Access 2019, 7, 106838–106846. [Google Scholar] [CrossRef]
- Keys, R.G. Cubic Convolution Interpolation for Digital Image Processing. IEEE Trans. Acoust. 1981, 29, 1153–1160. [Google Scholar] [CrossRef]
- Duchon, C.E. Lanczos Filtering in One and Two Dimensions. J. Appl. Meteorol. 1979, 18, 1016–1022. [Google Scholar] [CrossRef]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Haykir, A.A.; Öksuz, I. Enhancing Object Detection in Aerial Images Using Transformer-Based Super-Resolution. In Proceedings of the UBMK 2024—9th International Conference on Computer Science and Engineering, Antalya, Turkiye, 26–28 October 2024; pp. 966–971. [Google Scholar] [CrossRef]
- Zhai, S.; Shang, D.; Wang, S.; Dong, S. DF-SSD: An Improved SSD Object Detection Algorithm Based on DenseNet and Feature Fusion. IEEE Access 2020, 8, 24344–24357. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Musunuri, Y.R.; Kim, C.; Kwon, O.S.; Kung, S.Y. Object Detection Using ESRGAN With a Sequential Transfer Learning on Remote Sensing Embedded Systems. IEEE Access 2024, 12, 102313–102327. [Google Scholar] [CrossRef]
- Zheng, Z.; Cheng, Y.; Xin, Z.; Yu, Z.; Zheng, B. Robust Perception Under Adverse Conditions for Autonomous Driving Based on Data Augmentation. IEEE Trans. Intell. Transp. Syst. 2023, 24, 13916–13929. [Google Scholar] [CrossRef]
- Mostofa, M.; Ferdous, S.N.; Riggan, B.S.; Nasrabadi, N.M. Joint-SRVDNet: Joint super resolution and vehicle detection network. IEEE Access 2020, 8, 13916–13929. [Google Scholar] [CrossRef]
- Li, A.; Pan, Y.; Xu, Z.; Bi, H.; Gao, B.; Li, K.; Yu, H.; Chen, Y. MaTVT: A Transformer-Based Approach for Multi-Agent Prediction in Complex Traffic Scenarios. IEEE Trans. Veh. Technol. 2025, 99, 1–13. [Google Scholar] [CrossRef]
- Li, M.; Gao, J.; Zhao, L.; Shen, X. Adaptive Computing Scheduling for Edge-Assisted Autonomous Driving. IEEE Trans. Veh. Technol. 2021, 70, 5318–5331. [Google Scholar] [CrossRef]
- Tang, S.; Chen, B.; Iwen, H.; Hirsch, J.; Fu, S.; Yang, Q.; Palacharla, P.; Wang, N.; Wang, X.; Shi, W. VECFrame: A Vehicular Edge Computing Framework for Connected Autonomous Vehicles. In Proceedings of the IEEE International Conference on Edge Computing, Chicago, IL, USA, 5–10 September 2021; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2021; pp. 68–77. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
- Bing, X.; Zhang, W.; Zheng, L.; Zhang, Y. Medical Image Super Resolution Using Improved Generative Adversarial Networks. IEEE Access 2019, 7, 145030–145038. [Google Scholar] [CrossRef]
- Gui, J.; Sun, Z.; Wen, Y.; Tao, D.; Ye, J. A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications. IEEE Trans. Knowl. Data Eng. 2023, 35, 3313–3332. [Google Scholar] [CrossRef]
- Wang, H.; Sun, J.; Diao, W.; Li, J.; Zhang, K. TAGAN: Texture and Attention Guided Generative Adversarial Network for Image Super Resolution. In Proceedings of the IEEE International Symposium on Circuits and Systems, Austin, TX, USA, 27 May–1 June 2022; pp. 3269–3273. [Google Scholar] [CrossRef]
- Asry, C.E.L.; Benchaji, I.; Douzi, S.; Ouahidi, B.E.L. A robust intrusion detection system based on a shallow learning model and feature extraction techniques. PLoS ONE 2024, 19, e0295801. [Google Scholar] [CrossRef]
- Jiang, T.; Cheng, J. Target Recognition Based on CNN with LeakyReLU and PReLU Activation Functions. In Proceedings of the 2019 International Conference on Sensing, Diagnostics, Prognostics, and Control, SDPC, Beijing, China, 15–17 August 2019. [Google Scholar] [CrossRef]
- El Mellouki, O.; Khedher, M.I.; El-Yacoubi, M.A. Abstract Layer for LeakyReLU for Neural Network Verification Based on Abstract Interpretation. IEEE Access 2023, 11, 33401–33413. [Google Scholar] [CrossRef]
- Xu, G.; Wang, Y.; Cheng, J.; Tang, J.; Yang, X. Accurate and Efficient Stereo Matching via Attention Concatenation Volume. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 2461–2474. [Google Scholar] [CrossRef]
- Li, W.; Li, Y.; Chen, D.; Chan, J.C.W. Thin cloud removal with residual symmetrical concatenation network. J. Photogramm. Remote Sens. 2019, 153, 137–150. [Google Scholar] [CrossRef]
- Jatmika, S.; Patmanthara, S.; Wibawa, A.P.; Kurniawan, F. The Model of Local Wisdom for Smart Wellness Tourism with Optimization Multilayer Perceptron. J. Theor. Appl. Inf. Technol. 2024, 102, 640–652. [Google Scholar]
- Mao, M.; Hong, M. YOLO Object Detection for Real-Time Fabric Defect Inspection in the Textile Industry: A Review of YOLOv1 to YOLOv11. Sensors 2025, 25, 2270. [Google Scholar] [CrossRef] [PubMed]
- Tej, A.R.; Halder, S.S.; Shandeelya, A.P.; Pankajakshan, V. Enhancing Perceptual Loss with Adversarial Feature Matching for Super-Resolution. In Proceedings of the International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
- Xie, L.; Wang, X.; Chen, X.; Li, G.; Shan, Y.; Zhou, J.; Dong, C. DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models. Proc. Mach. Learn. Res. 2023, arXiv:2307.02457. [Google Scholar]













| Ref. | Core Method | Task Domain | Key Contribution | Limitation/Gap |
|---|---|---|---|---|
| [72] | Visual augmentation and fusion techniques based on unpaired image-to-image (I2I) translation for adverse weather conditions | Visual perception for autonomous vehicles under diverse adverse weather conditions (rain, fog, nighttime rain, and low illumination) | Integrates unpaired I2I synthesis for visual enhancement and augmentation, combined with a dual-branch architecture that processes both original and synthesized images. This approach strengthens visual perception and significantly improves object recognition accuracy across multiple adverse weather scenarios. | Although effective across various adverse conditions, the method does not specifically target extreme low-resolution scenarios and still relies on relatively adequate base image quality. |
| [44] | CNN-based SR + Object Detection using YOLO | Surveillance/vehicle detection | Integrates super-resolution and detection to improve accuracy on low-resolution images. | Has not been evaluated under complex real-road driving conditions. |
| [73] | Joint Super-Resolution and Vehicle Detection Network (Joint-SRVDNet), combining Multi-scale GAN (MsGAN) for super-resolution with a jointly trained vehicle detector | Super-resolution of aerial images and vehicle detection on low-resolution aerial imagery | Demonstrates that the method provides superior visual quality and improves vehicle detection accuracy by jointly optimizing SR loss and detection loss, enabling hierarchical and discriminative feature learning. | Has not been evaluated under complex real-world driving conditions. |
| [74] | Multi-agent Trajectory Vector Transformer (MaTVT), consisting of a dual-level encoder (low-level and high-level) and a multi-modal decoder | Multi-agent trajectory prediction in complex traffic scenarios to support autonomous vehicle motion planning | Model future trajectories more accurately through hierarchical encodings of motion features, agent interactions, and environmental constraints. Evaluation on the Argoverse dataset shows that MaTVT outperforms benchmark methods in accuracy, efficiency, and robustness. | Although highly effective for trajectory prediction, the method does not address image processing or restoration tasks and is therefore irrelevant for visual perception or super-resolution problems. |
| [67] | Transformer-based super-resolution using the Hybrid Attention Transformer for Image Restoration (HAT-L), integrated with YOLOv8 OBB for object detection | Aerial image super-resolution and object detection enhancement | Demonstrates that transformer-based super-resolution can improve visual quality and strengthen object detection accuracy. Using HAT-L, the method achieves high PSNR and SSIM on the DOTA validation set, and yields improved mAP performance when combined with YOLOv8 OBB. | The approach is tailored to aerial imagery and the DOTA dataset, which limits its applicability to real-world ground-level autonomous driving scenarios. |
| Attribute | Description |
|---|---|
| Dataset Name | Vehicles Dataset |
| Annotation Format | COCO JSON, Pascal VOC XML, YOLO TXT |
| Number of Classes | Car, Bus, Truck |
| Number of Images | 4058 |
| Image Resolution | 640 × 480 |
| Augmentation | No |
| Preprocessing | No |
| Vehicles condition | Front and Back View |
| Environment | Environment Configuration |
|---|---|
| CPU | Intel(R) Core™i7-10700F (Intel Corporation, Santa Clara, CA, USA) |
| Memory | 32GB |
| Graphic Card | NVIDIA GeForce RTX3050 (NVIDIA Corporation, Santa Clara, CA, USA) |
| Operating System | Windows 10 |
| Programming Language | Python-3.13.7 |
| Deep Learning Framework | Ultralytic 8.2.103 |
| Integrated development environment | Visual Studio Code (version 1.85.1, Microsoft Corporation, Redmond, WA, USA) |
| Cuda | 11.8 |
| CUDNN | 9.1.0 |
| Model | Using SR | Scale | PSNR/SSIM | mAP@50 | mAP@50-95 | Precision | Recall | F1-Score |
|---|---|---|---|---|---|---|---|---|
| YOLOv8n [55] | No | 2 | - | 0.77965 | 0.61305 | 0.79032 | 0.75759 | 0.72573 |
| No | 3 | - | 0.74522 | 0.51852 | 0.77152 | 0.72522 | 0.70235 | |
| No | 4 | - | 0.71496 | 0.43852 | 0.74985 | 0.69595 | 0.68193 | |
| YOLOv11n [87] | No | 2 | - | 0.77765 | 0.59904 | 0.77619 | 0.75532 | 0.72859 |
| No | 3 | - | 0.748785 | 0.52476 | 0.77404 | 0.71522 | 0.71165 | |
| No | 4 | - | 0.72396 | 0.44571 | 0.77323 | 0.68252 | 0.69609 | |
| Bicubic Interpolation+YOLOv11n | Yes | 2 | 29.14/0.911 | 0.7841 | 0.60776 | 0.781318 | 0.76245 | 0.77113 |
| Yes | 3 | 27.45/0.8955 | 0.7582 | 0.53313 | 0.773425 | 0.72131 | 0.74621 | |
| Yes | 4 | 25.43/0.8752 | 0.7313 | 0.45213 | 0.766121 | 0.6912 | 0.72734 | |
| EDSR+YOLOv5 [44] | Yes | 2 | 38.04/0.9446 | 0.83709 | 0.54692 | 0.78624 | 0.77762 | 0.76594 |
| Yes | 3 | 34.29/0.9145 | 0.80521 | 0.51516 | 0.77915 | 0.74515 | 0.73155 | |
| Yes | 4 | 32.31/0.8887 | 0.74086 | 0.48183 | 0.77179 | 0.70811 | 0.69806 | |
| MsSRGAN+YOLOv3 [73] | Yes | 2 | 36.54/0.9336 | 0.83819 | 0.54802 | 0.78734 | 0.77872 | 0.76704 |
| Yes | 3 | 32.79/0.9035 | 0.80631 | 0.51626 | 0.78015 | 0.74625 | 0.73305 | |
| Yes | 4 | 30.81/0.8717 | 0.74196 | 0.48293 | 0.77289 | 0.70921 | 0.69996 | |
| DRCT+YOLOv11n (Ours) | Yes | 2 | 39.05/0.9647 | 0.88219 | 0.61094 | 0.82782 | 0.82656 | 0.81208 |
| Yes | 3 | 35.3/0.9346 | 0.84029 | 0.56675 | 0.83112 | 0.79012 | 0.77951 | |
| Yes | 4 | 33.32/0.9089 | 0.80335 | 0.5433 | 0.83964 | 0.74302 | 0.75119 | |
| Ground Truth+YOLOv11n | No | 2 | - | 0.85834 | 0.63945 | 0.84267 | 0.81511 | 0.82802 |
| No | 3 | - | 0.84323 | 0.61747 | 0.82442 | 0.79685 | 0.80854 | |
| No | 4 | - | 0.83112 | 0.59674 | 0.80522 | 0.77578 | 0.79114 |
| Name of the Model | Scale | Size (Pixel) | Params (M) | FLOPs (G) | Inference Time (ms) |
|---|---|---|---|---|---|
| Low Resolution Images + YOLOv8n [55] | 2 | 240 | 3.21 | 8.72 | 19.18 |
| 3 | 160 | 3.15 | 8.61 | 18.94 | |
| 4 | 120 | 3.11 | 8.49 | 18.68 | |
| Low Resolution Images + YOLOv11n [87] | 2 | 240 | 2.71 | 6.52 | 14.34 |
| 3 | 160 | 2.57 | 6.21 | 13.66 | |
| 4 | 120 | 2.45 | 6.19 | 13.15 | |
| Bicubic Interpolation + YOLOv11n | 2 | 240 | 2.98 | 15.81 | 34.76 |
| 3 | 160 | 2.87 | 15.82 | 33.67 | |
| 4 | 120 | 2.45 | 15.03 | 28.98 | |
| EDSR+YOLOv5 [44] | 2 | 480 | 11.62 | 15.8 | 34.76 |
| 3 | 12.55 | 15.8 | 34.76 | ||
| 4 | 12.91 | 15.9 | 34.98 | ||
| MsSRGAN+YOLOv3 [73] | 2 | 23.72 | 18.95 | 41.69 | |
| 3 | 23.91 | 19.08 | 41.98 | ||
| 4 | 24.77 | 19.25 | 42.29 | ||
| DRCT+YOLOv11n (Ours) | 2 | 30.18 | 17.57 | 38.65 | |
| 3 | 30.14 | 17.52 | 38.54 | ||
| 4 | 30.11 | 17.49 | 38.48 | ||
| Ground Truth Image + YOLOv11n | 2 | 2.98 | 15.81 | 34.76 | |
| 3 | 2.97 | 15.81 | 34.76 | ||
| 4 | 2.98 | 15.82 | 34.76 |
| Name of the Model | Upsampling Strategy | DRCT Components | Technical Description | mAP@50 | Total Parameters (M) | Inference Time (ms) |
|---|---|---|---|---|---|---|
| Lower Bound | None (Low Resolution Input) | - | Direct detection from LR input without SR or DRCT enhancement | 0.77765 | 2.71 | 23.8 |
| Baseline | 2 | - | Conventional interpolation; LR image is upscaled before YOLO detection, without SR model | 0.7841 | 2.71 | 25.1 |
| Proposed | 2 | Full Architecture | Full version of the proposed SR model combining Dense Residual blocks and Transformer modules | 0.88219 | 30.18 | 35.7 |
| Ablation 1 | 2 | Without Dense Connection | Removes Dense Connectivity blocks to evaluate their contribution to the overall performance | 0.84029 | 25.44 | 34.5 |
| Ablation 2 | 2 | Without Transformer mechanism | Eliminates the Transformer component, retaining only convolutional pathways. | 0.80335 | 29.75 | 37.01 |
| Ablation 3 | 2 | Reduced Transformer Depth | Uses a lightweight Transformer (fewer MSA layers) to reduce computational complexity. | 0.8211 | 27.01 | 33.21 |
| Ablation 4 | 2 | Without Skip Connections | Removes residual skip connections to examine stability and error propagation in SR reconstruction. | 0.79245 | 28.55 | 36.47 |
| Upper Bound | None (High Resolution Input) | - | Theoretical upper limit: direct detection on original high-resolution images without SR | 0.85834 | 2.71 | 23.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Haqiqi, M.M.E.; Arifin, A.S.; Satyawan, A.S. Enhancing Object Detection for Autonomous Vehicles in Low-Resolution Environments Using a Super-Resolution Transformer-Based Preprocessing Framework. World Electr. Veh. J. 2025, 16, 678. https://doi.org/10.3390/wevj16120678
Haqiqi MME, Arifin AS, Satyawan AS. Enhancing Object Detection for Autonomous Vehicles in Low-Resolution Environments Using a Super-Resolution Transformer-Based Preprocessing Framework. World Electric Vehicle Journal. 2025; 16(12):678. https://doi.org/10.3390/wevj16120678
Chicago/Turabian StyleHaqiqi, Mokhammad Mirza Etnisa, Ajib Setyo Arifin, and Arief Suryadi Satyawan. 2025. "Enhancing Object Detection for Autonomous Vehicles in Low-Resolution Environments Using a Super-Resolution Transformer-Based Preprocessing Framework" World Electric Vehicle Journal 16, no. 12: 678. https://doi.org/10.3390/wevj16120678
APA StyleHaqiqi, M. M. E., Arifin, A. S., & Satyawan, A. S. (2025). Enhancing Object Detection for Autonomous Vehicles in Low-Resolution Environments Using a Super-Resolution Transformer-Based Preprocessing Framework. World Electric Vehicle Journal, 16(12), 678. https://doi.org/10.3390/wevj16120678

