UNet–Transformer Hybrid Architecture for Enhanced Underwater Image Processing and Restoration
Abstract
1. Introduction
- Unified Local–Global Feature Modeling. We propose a novel framework that integrates the local detail-capturing prowess of UNet-inspired structures with the global context modeling capabilities of a MaxViT-based module, resulting in a coherent and holistic approach to underwater image enhancement.
- Efficient Global Information Integration. By adopting MaxViT’s multi-axis attention—which combines blocked local and dilated global attention—we enable effective global information capture even at early network stages, ensuring that long-range dependencies are efficiently modeled with linear computational complexity.
- Robust Enhancement Performance. Extensive evaluations on benchmark datasets such as UIEB and EUVP demonstrate that our framework substantially improves color correction, contrast enhancement, and fine detail preservation, outperforming conventional methods and providing a reliable foundation for subsequent high-level vision tasks.
2. Related Work
2.1. CNN-Based and GAN-Based Methods
2.2. Transformer-Based and Meta/Self-Supervised Methods
3. Background
3.1. Transformer
3.2. Loss Function and Evaluation Metrics
3.3. UNet
4. Proposed Method for Underwater Image Enhancement
- Wavelength-dependent light absorption and scattering, which vary smoothly across regions and require large receptive fields to correct;
- Color casts and contrast distortion, which are often scene-dependent and influenced by global illumination context;
- Texture and edge degradation, which are local and require high-resolution feature preservation.
4.1. Problem Formulation
4.2. MaxViT Blocks: Coupling Convolution and Attention
4.3. Mobile Inverted Bottleneck Convolution (MBConv)
4.4. Multi-Axis Self-Attention (Max-SA)
4.5. Hierarchical Encoder–Decoder
4.6. Positional Encoding and Scalability
4.7. SKFusion Module
5. Experimental Results
5.1. Analysis of Experimental Results
5.1.1. Quantitative Comparison on UIEB and EUVP
5.1.2. UFO-120 Dataset
5.2. Ablation Study
5.3. Computational Efficiency
5.4. Visualization of Results
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
UIE | Underwater Image Enhancement |
UIEB | Underwater Image Enhancement Benchmark |
EUVP | Enhanced Underwater Vision Project |
PSNR | Peak Signal-to-Noise Ratio |
SSIM | Structural Similarity Index Measure |
PCQI | Perception-based Color Quality Index |
UCIQE | Underwater Color Image Quality Evaluation |
UIQM | Underwater Image Quality Measure |
MBConv | Mobile Inverted Bottleneck Convolution |
Max-SA | Multi-Axis Self-Attention |
References
- Yang, M.; Hu, J.; Li, C.; Rohde, G.; Du, Y.; Hu, K. An In-Depth Survey of Underwater Image Enhancement and Restoration. IEEE Access 2019, 7, 123638–123657. [Google Scholar] [CrossRef]
- Sahu, P.; Gupta, N.; Sharma, N. A Survey on Underwater Image Enhancement Techniques. Int. J. Comput. Appl. 2014, 87, 19–23. [Google Scholar] [CrossRef]
- Xu, T.; Xu, S.; Chen, X.; Chen, F.; Li, H. Multi-core token mixer: A novel approach for underwater image enhancement. Mach. Vis. Appl. 2025, 36, 1–16. [Google Scholar] [CrossRef]
- Liu, R.; Fan, X.; Zhu, M.; Hou, M.; Luo, Z. Real-World Underwater Enhancement: Challenges, Benchmarks, and Solutions Under Natural Light. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4861–4875. [Google Scholar] [CrossRef]
- Wang, H.; Zhang, W.; Ren, P. Self-organized underwater image enhancement. ISPRS J. Photogramm. Remote Sens. 2024, 215, 1–14. [Google Scholar] [CrossRef]
- Chandrasekar, A.; Sreenivas, M.; Biswas, S. Phish-net: Physics inspired system for high resolution underwater image enhancement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January.
- Desai, C.; Benur, S.; Patil, U.; Mudenagudi, U. Rsuigm: Realistic synthetic underwater image generation with image formation model. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 21, 1–22. [Google Scholar] [CrossRef]
- Takao, S. Underwater image sharpening and color correction via dataset based on revised underwater image formation model. Vis. Comput. 2025, 41, 975–990. [Google Scholar] [CrossRef]
- Akkaynak, D.; Treibitz, T. A revised underwater image formation model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6723–6732. [Google Scholar]
- Zhang, M.; Peng, J. Underwater image restoration based on a new underwater image formation model. IEEE Access 2018, 6, 58634–58644. [Google Scholar] [CrossRef]
- An, S.; Xu, L.; Senior Member, I.; Deng, Z.; Zhang, H. HFM: A hybrid fusion method for underwater image enhancement. Eng. Appl. Artif. Intell. 2024, 127, 107219. [Google Scholar] [CrossRef]
- Du, D.; Li, E.; Si, L.; Zhai, W.; Xu, F.; Niu, J.; Sun, F. UIEDP: Boosting underwater image enhancement with diffusion prior. Expert Syst. Appl. 2025, 259, 125271. [Google Scholar] [CrossRef]
- Zhang, W.; Liu, Q.; Feng, Y.; Cai, L.; Zhuang, P. Underwater image enhancement via principal component fusion of foreground and background. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 10930–10943. [Google Scholar] [CrossRef]
- Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxvit: Multi-axis vision transformer. In Computer Vision–ECCV 2022, Proceedings of the 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part XXIV; Springer: Berlin/Heidelberg, Germany, 2022; pp. 459–479. [Google Scholar]
- Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef]
- Islam, M.J.; Luo, P.; Sattar, J. Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception. In Proceedings of the Robotics: Science and Systems (RSS), Corvalis, OR, USA, 12–16 July 2020. [Google Scholar] [CrossRef]
- Tang, Y.; Yi, J.; Tan, F. Facial micro-expression recognition method based on CNN and transformer mixed model. Int. J. Biom. 2024, 16, 463–477. [Google Scholar] [CrossRef]
- Tan, F.; Zhai, M.; Zhai, C. Foreign object detection in urban rail transit based on deep differentiation segmentation neural network. Heliyon 2024, 10, e37072. [Google Scholar] [CrossRef] [PubMed]
- Li, C.; Anwar, S.; Hou, J.; Cong, R.; Guo, C.; Ren, W. Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Trans. Image Process. 2021, 30, 4985–5000. [Google Scholar] [CrossRef] [PubMed]
- Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
- Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
- Ren, T.; Xu, H.; Jiang, G.; Yu, M.; Zhang, X.; Wang, B.; Luo, T. Reinforced swin-convs transformer for simultaneous underwater sensing scene image enhancement and super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–16. [Google Scholar] [CrossRef]
- Peng, L.; Zhu, C.; Bian, L. U-shape transformer for underwater image enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar] [CrossRef]
- Li, K.; Wu, L.; Qi, Q.; Liu, W.; Gao, X.; Zhou, L.; Song, D. Beyond single reference for training: Underwater image enhancement via comparative learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 2561–2576. [Google Scholar] [CrossRef]
- Zhang, Z.; Yan, H.; Tang, K.; Duan, Y. MetaUE: Model-based Meta-learning for Underwater Image Enhancement. arXiv 2023, arXiv:2303.06543. [Google Scholar]
- Huang, S.; Wang, K.; Liu, H.; Chen, J.; Li, Y. Contrastive semi-supervised learning for underwater image restoration via reliable bank. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18145–18155. [Google Scholar]
- Wang, Z.; Tao, H.; Zhou, H.; Deng, Y.; Zhou, P. A content-style control network with style contrastive learning for underwater image enhancement. Multimed. Syst. 2025, 31, 60. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Chu, X.; Tian, Z.; Zhang, B.; Wang, X.; Wei, X.; Xia, H.; Shen, C. Conditional positional encodings for vision transformers. arXiv 2021, arXiv:2102.10882. [Google Scholar]
- Zhang, W.; Zhuang, P.; Sun, H.H.; Li, G.; Kwong, S.; Li, C. Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef] [PubMed]
- Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
- Wang, N.; Zhou, Y.; Han, F.; Zhu, H.; Yao, J. UWGAN: Underwater GAN for real-world underwater color restoration and dehazing. arXiv 2019, arXiv:1912.10269. [Google Scholar]
- Zhuang, P.; Wu, J.; Porikli, F.; Li, C. Underwater image enhancement with hyper-laplacian reflectance priors. IEEE Trans. Image Process. 2022, 31, 5442–5455. [Google Scholar] [CrossRef]
- Peng, Y.T.; Cosman, P.C. Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef]
- Jiang, Z.; Li, Z.; Yang, S.; Fan, X.; Liu, R. Target oriented perceptual adversarial fusion network for underwater image enhancement. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6584–6598. [Google Scholar] [CrossRef]
- Drews, P.L.; Nascimento, E.R.; Botelho, S.S.; Campos, M.F.M. Underwater depth estimation and image restoration based on single images. IEEE Comput. Graph. Appl. 2016, 36, 24–35. [Google Scholar] [CrossRef]
- Hou, G.; Li, N.; Zhuang, P.; Li, K.; Sun, H.; Li, C. Non-uniform illumination underwater image restoration via illumination channel sparsity prior. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 799–814. [Google Scholar] [CrossRef]
Method | PSNR (↑) | PSNRL (↑) | SSIM (↑) | PCQI (↑) | UCIQE (↑) | UIQM (↑) | UICM (↑) | UIConM (↑) | CCF (↑) |
---|---|---|---|---|---|---|---|---|---|
UIEB | |||||||||
FUnIE-GAN [21] | 17.3828 | 20.0596 | 0.7285 | 0.6393 | 0.5459 | 1.3993 | 5.7775 | 1.1605 | 21.0605 |
UW-GAN [36] | 16.2281 | 19.1524 | 0.7644 | 0.7187 | 0.5655 | 1.3467 | 5.5550 | 1.1219 | 22.8449 |
UWCNN [35] | 12.0247 | 13.7820 | 0.6469 | 0.3921 | 0.5058 | 1.0647 | 1.9216 | 0.9398 | 11.0019 |
HLRP [37] | 13.0317 | 13.7891 | 0.2874 | 0.2326 | 0.6356 | 1.6477 | 9.5089 | 1.1915 | 36.9312 |
MLLE [34] | 18.1079 | 19.4733 | 0.7985 | 0.9105 | 0.6044 | 1.6287 | 4.8271 | 1.0124 | 36.2740 |
IBLA [38] | 15.5610 | 17.7756 | 0.7390 | 0.6949 | 0.6020 | 1.4397 | 7.3312 | 1.0143 | 28.9752 |
TOPAL [39] | 20.5871 | 22.6706 | 0.8674 | 0.7148 | 0.5841 | 1.2209 | 5.0093 | 1.0321 | 21.2175 |
UDCP [40] | 11.9286 | 12.7288 | 0.6441 | 0.6117 | 0.5956 | 1.5725 | 7.1986 | 1.1748 | 27.1441 |
Water-Net [15] | 18.7003 | 19.7212 | 0.8623 | 0.6926 | 0.5711 | 1.2637 | 5.0644 | 1.0633 | 16.5005 |
ICSP [41] | 11.7942 | 13.1037 | 0.6340 | 0.7323 | 0.5636 | 1.4759 | 6.3012 | 1.0484 | 26.6930 |
PhISH-Net [6] | 21.1390 | 23.4312 | 0.8686 | 0.9294 | 0.6405 | 1.5968 | 8.8169 | 1.1513 | 37.2389 |
U-Shape Transformer [23] | 21.2700 | 22.0546 | 0.7400 | 0.9395 | 0.6817 | 1.3966 | 7.5764 | 1.0686 | 36.8498 |
URSCT-SESR [22] | 22.7200 | 23.0156 | 0.9100 | 0.9189 | 0.6658 | 1.4896 | 7.6944 | 1.0217 | 36.2467 |
Our Method | 22.9100 | 23.5286 | 0.9020 | 0.9341 | 0.6460 | 1.6020 | 9.6405 | 1.1865 | 37.9325 |
EUVP | |||||||||
FUnIE-GAN [21] | 20.5600 | 27.4704 | 0.8100 | 0.8927 | 0.5086 | 1.5549 | 4.0977 | 1.2502 | 29.2321 |
UW-GAN [36] | 15.7630 | 22.8411 | 0.9155 | 0.9653 | 0.5262 | 1.4741 | 3.3911 | 1.2164 | 29.3064 |
UWCNN [35] | 15.5175 | 18.4511 | 0.8439 | 0.6451 | 0.5427 | 1.4212 | 1.6375 | 1.2607 | 19.5666 |
HLRP [37] | 12.4673 | 13.3926 | 0.2213 | 0.1722 | 0.5748 | 1.5591 | 4.0038 | 1.2885 | 30.0306 |
MLLE [34] | 14.2530 | 16.1892 | 0.6125 | 1.0295 | 0.5879 | 1.7296 | 2.9907 | 0.7756 | 36.2180 |
IBLA [38] | 16.9223 | 23.0862 | 0.8643 | 0.9891 | 0.5895 | 1.5616 | 4.6178 | 1.1100 | 39.5418 |
TOPAL [39] | 18.3044 | 24.4843 | 0.9335 | 0.9942 | 0.5826 | 1.4905 | 3.2523 | 1.1990 | 34.9025 |
UDCP [40] | 14.4190 | 18.2478 | 0.8140 | 0.8572 | 0.5908 | 1.6489 | 4.7064 | 1.1962 | 35.1508 |
Water-Net [15] | 18.2595 | 24.3506 | 0.7200 | 0.8831 | 0.5793 | 1.4975 | 3.1465 | 1.2309 | 25.6186 |
ICSP [41] | 12.1254 | 14.5056 | 0.6710 | 0.9797 | 0.5750 | 1.5896 | 4.0923 | 0.9728 | 41.2878 |
PhISH-Net [6] | 20.9197 | 27.4717 | 0.8559 | 1.0378 | 0.5918 | 1.5925 | 4.3570 | 1.1512 | 38.8619 |
U-Shape Transformer [23] | 25.5900 | 26.8789 | 0.7800 | 1.2862 | 0.5227 | 1.4163 | 5.1185 | 1.0341 | 38.7448 |
URSCT-SESR [22] | 27.0100 | 28.1574 | 0.8300 | 1.4148 | 0.5749 | 1.5080 | 5.6304 | 1.1375 | 42.6193 |
Our Method | 26.1200 | 29.5431 | 0.8600 | 1.1203 | 0.6014 | 1.6849 | 4.0854 | 1.2012 | 40.5065 |
Method | PSNR | SSIM |
---|---|---|
WaterNet [15] | 22.46 | 0.79 |
UWCNN [35] | 20.50 | 0.78 |
Deep-SESR [16] | 27.15 | 0.84 |
UGAN [20] | 23.45 | 0.80 |
FUnIE-GAN [21] | 25.15 | 0.82 |
URSCT-SESR [22] | 25.96 | 0.80 |
U-Shape Transformer [23] | 22.86 | 0.67 |
CLUIE [24] | 18.86 | 0.74 |
Our method | 27.13 | 0.83 |
UIEB | EUVP | UFO-120 | ||||
---|---|---|---|---|---|---|
Methods | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM |
Our method (CNN) | 22.11 | 0.90 | 25.84 | 0.84 | 26.96 | 0.82 |
Our method (self-attention) | 22.91 | 0.90 | 26.12 | 0.86 | 27.13 | 0.83 |
Method | PSNR | SSIM | FLOPs (G) | Time (s) |
---|---|---|---|---|
URSCT-SESR [22] | 22.72 | 0.91 | 11.15 | 0.023 |
U-Shape Transformer [23] | 21.25 | 0.84 | 26.10 | 0.029 |
WaterNet [15] | 17.73 | 0.82 | 72.18 | 0.164 |
Our method | 22.91 | 0.90 | 10.82 | 0.015 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ji, J.; Man, J. UNet–Transformer Hybrid Architecture for Enhanced Underwater Image Processing and Restoration. Mathematics 2025, 13, 2535. https://doi.org/10.3390/math13152535
Ji J, Man J. UNet–Transformer Hybrid Architecture for Enhanced Underwater Image Processing and Restoration. Mathematics. 2025; 13(15):2535. https://doi.org/10.3390/math13152535
Chicago/Turabian StyleJi, Jie, and Jiaju Man. 2025. "UNet–Transformer Hybrid Architecture for Enhanced Underwater Image Processing and Restoration" Mathematics 13, no. 15: 2535. https://doi.org/10.3390/math13152535
APA StyleJi, J., & Man, J. (2025). UNet–Transformer Hybrid Architecture for Enhanced Underwater Image Processing and Restoration. Mathematics, 13(15), 2535. https://doi.org/10.3390/math13152535