A Deep Feature Fusion Underwater Image Enhancement Model Based on Perceptual Vision Swin Transformer
Abstract
1. Introduction
- We propose a perception-guided U-shaped Swin Transformer framework to model hierarchical representations for underwater image enhancement. The proposed framework explicitly addresses the spatially non-uniform and spectrally dependent attenuation inherent in underwater imagery.
- We introduce a Global-Aware Attention Map (GAMP) to emphasize attenuated color channels and spatial regions. GAMP jointly models multi-scale spatial degradation and channel-wise attenuation to guide degradation-aware feature modulation.
- We develop a Dual-Window Multi-Head Self-Attention (DWMSA) that integrates small-window and overlapping-window attention, unifying global context modeling with fine-grained texture preservation.
- We design a Feature-Augmentation Residual Network (FARN) to stabilize deep optimization and enhance the recovery of high-frequency details and chromatic fidelity across diverse underwater conditions.
2. Related Works
2.1. Physics-Based Models
2.2. Non-Physics-Based Models
2.3. Deep Learning-Based Models
3. Methodology
3.1. Model Architecture
3.1.1. Encoding Module
3.1.2. Bottleneck Module
3.1.3. Decoder Module
3.2. GAMP
3.2.1. Multi-Scale Degradation Feature Extraction
3.2.2. Channel Attention Generation
3.2.3. Spatial Attention Generation
3.3. DWMSA
3.4. FARN
3.4.1. Channel Attention
3.4.2. Pixel Attention
4. Experiments and Results Analysis
4.1. Experimental Preparation
4.1.1. Dataset Selection
4.1.2. Selection of Metrics
4.2. Simulated Training
4.3. Experimental Results Analysis
4.3.1. Subjective Analysis and Comparison
4.3.2. Objective Evaluation
4.4. Ablation Experiment
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Islam, M.J.; Ho, M.; Sattar, J. Understanding human motion and gestures for underwater human–robot collaboration. J. Field Robot. 2019, 36, 851–873. [Google Scholar] [CrossRef]
- Kennedy, B.R.; Rotjan, R.D. Mind the gap: Comparing exploration effort with global biodiversity patterns and climate projections to determine ocean areas with greatest exploration needs. Front. Mar. Sci. 2023, 10, 1219799. [Google Scholar] [CrossRef]
- Liu, X.; Chen, Z.; Xu, Z.; Zheng, Z.; Ma, F.; Wang, Y. Enhancement of underwater images through parallel fusion of transformer and CNN. J. Mar. Sci. Eng. 2024, 12, 1467. [Google Scholar] [CrossRef]
- Shuang, X.; Zhang, J.; Tian, Y. Algorithms for improving the quality of underwater optical images: A comprehensive review. Signal Process. 2024, 219, 109408. [Google Scholar] [CrossRef]
- Zhang, D.; Wu, C.; Zhou, J.; Zhang, W.; Lin, Z.; Polat, K.; Alenezi, F. Robust underwater image enhancement with cascaded multi-level sub-networks and triple attention mechanism. Neural Netw. 2024, 169, 685–697. [Google Scholar] [CrossRef]
- Zhao, X.; Wang, Z.; Deng, Z.; Qin, H. G-net: An efficient convolutional network for underwater object detection. J. Mar. Sci. Eng. 2024, 12, 116. [Google Scholar] [CrossRef]
- Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [CrossRef] [PubMed]
- Li, C.-Y.; Guo, J.-C.; Cong, R.-M.; Pang, Y.-W.; Wang, B. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE Trans. Image Process. 2016, 25, 5664–5677. [Google Scholar] [CrossRef]
- Peng, Y.-T.; Cosman, P.C. Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef]
- She, M.; Seegräber, F.; Nakath, D.; Köser, K. Refractive COLMAP: Refractive structure-from-motion revisited. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 12816–12823. [Google Scholar]
- Iqbal, K.; Odetayo, M.; James, A.; Salam, R.A.; Talib, A.Z.H. Enhancing the low quality images using unsupervised colour correction method. In Proceedings of the 2010 IEEE International Conference on Systems, Man and Cybernetics, Istanbul, Turkey, 10–13 October 2010; pp. 1703–1709. [Google Scholar]
- Ghani, A.S.A.; Isa, N.A.M. Enhancement of low quality underwater image through integrated global and local contrast correction. Appl. Soft Comput. 2015, 37, 332–344. [Google Scholar] [CrossRef]
- Akkaynak, D.; Treibitz, T. A revised underwater image formation model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6723–6732. [Google Scholar]
- Fu, X.; Zhuang, P.; Huang, Y.; Liao, Y.; Zhang, X.-P.; Ding, X. A retinex-based enhancing approach for single underwater image. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 4572–4576. [Google Scholar]
- Zhou, J.; Wang, S.; Lin, Z.; Jiang, Q.; Sohel, F. A pixel distribution remapping and multi-prior retinex variational model for underwater image enhancement. IEEE Trans. Multimed. 2024, 26, 7838–7849. [Google Scholar] [CrossRef]
- Song, Y.; She, M.; Köser, K. Advanced underwater image restoration in complex illumination conditions. ISPRS J. Photogramm. Remote Sens. 2024, 209, 197–212. [Google Scholar] [CrossRef]
- Wang, S.; Lu, Q.; Peng, B.; Nie, Y.; Tao, Q. DPEC: Dual-path error compensation method for enhanced low-light image clarity. arXiv 2024, arXiv:2407.09553. [Google Scholar]
- McGlamery, B. A computer model for underwater camera systems. Ocean Opt. VI 1980, 208, 221–231. [Google Scholar]
- Schettini, R.; Corchs, S. Underwater image processing: State of the art of restoration and image enhancement methods. EURASIP J. Adv. Signal Process. 2010, 2010, 746052. [Google Scholar] [CrossRef]
- González-Sabbagh, S.P.; Robles-Kelly, A. A survey on underwater computer vision. ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
- Cong, X.; Zhao, Y.; Gui, J.; Hou, J.; Tao, D. A comprehensive survey on underwater image enhancement based on deep learning. arXiv 2024, arXiv:2405.19684. [Google Scholar] [CrossRef]
- Singh, N.; Bhat, A. A systematic review of the methodologies for the processing and enhancement of the underwater images. Multimed. Tools Appl. 2023, 82, 38371–38396. [Google Scholar] [CrossRef]
- Zhang, W.; Zhuang, P.; Sun, H.-H.; Li, G.; Kwong, S.; Li, C. Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef]
- Chen, K.; Li, Z.; Zhou, F.; Yu, Z. CASF-Net: Underwater image enhancement with color correction and spatial fusion. Sensors 2025, 25, 2574. [Google Scholar] [CrossRef]
- Deng, J.; Luo, G.; Zhao, C. UCT-GAN: Underwater image colour transfer generative adversarial network. IET Image Proc. 2020, 14, 3613–3622. [Google Scholar] [CrossRef]
- Guan, F.; Lu, S.; Lai, H.; Du, X. AUIE–GAN: Adaptive underwater image enhancement based on generative adversarial networks. J. Mar. Sci. Eng. 2023, 11, 1476. [Google Scholar] [CrossRef]
- Mu, D.; Li, H.; Liu, H.; Dong, L.; Zhang, G. Underwater image enhancement using a mixed generative adversarial network. IET Image Proc. 2023, 17, 1149–1160. [Google Scholar] [CrossRef]
- Han, R.; Guan, Y.; Yu, Z.; Liu, P.; Zheng, H. Underwater image enhancement based on a spiral generative adversarial framework. IEEE Access. 2020, 8, 218838–218852. [Google Scholar] [CrossRef]
- Qing, Y.; Wang, Y.; Yan, H.; Xie, X.; Wu, Z. Unformer: A transformer-based approach for adaptive multi-scale feature aggregation in underwater image enhancement. IEEE Trans. Artif. Intell. 2024, 6, 1024–1037. [Google Scholar] [CrossRef]
- Shen, Z.; Xu, H.; Luo, T.; Song, Y.; He, Z. UDAformer: Underwater image enhancement based on dual attention transformer. Comput. Graph. 2023, 111, 77–88. [Google Scholar] [CrossRef]
- Bakkouri, I.; Afdel, K. MLCA2F: Multi-level context attentional feature fusion for COVID-19 lesion segmentation from CT scans. Signal Image Video Process. 2023, 17, 1181–1188. [Google Scholar] [CrossRef]
- Zhang, J.; Bi, Q. U-TWGAN: Underwater image enhancement via wavelet-transformer and sparse multilayer perceptrons generative adversarial network. Earth Sci. Inf. 2025, 18, 410. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 16000–16009. [Google Scholar]
- Fan, C.M.; Liu, T.J.; Liu, K.H. SUNet: Swin transformer UNet for image denoising. In Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA, 27 May–1 June 2022; pp. 2333–2337. [Google Scholar]
- Tian, S.; Sirikham, A.; Konpang, J.; Wang, C. High-Dimensional attention generative adversarial network framework for underwater image enhancement. Electronics 2025, 14, 1203. [Google Scholar] [CrossRef]
- You, D.; Gao, X.; Katayama, S. WPD-PCA-based laser welding process monitoring and defects diagnosis by using FNN and SVM. IEEE Trans. Ind. Electron. 2015, 62, 628–636. [Google Scholar] [CrossRef]
- Islam, M.J.; Luo, P.; Sattar, J. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv 2020, arXiv:2002.01155. [Google Scholar] [CrossRef]
- Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Panetta, K.; Gao, C.; Agaian, S. Human-Visual-System-Inspired underwater image quality measures. IEEE J. Ocean. Eng. 2016, 41, 541–551. [Google Scholar] [CrossRef]
- Yang, M.; Sowmya, A. An underwater color image quality evaluation metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Li, C.; Anwar, S. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2019, 98, 107038. [Google Scholar] [CrossRef]
- Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
- Guo, Y.; Li, H.; Zhuang, P. Underwater image enhancement using a multiscale dense generative adversarial network. IEEE J. Ocean. Eng. 2019, 45, 862–870. [Google Scholar] [CrossRef]
- Ren, T.; Xu, H.; Jiang, G.; Yu, M.; Zhang, X.; Wang, B.; Luo, T. Reinforced swin-convs transformer for simultaneous underwater sensing scene image enhancement and super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4209616. [Google Scholar] [CrossRef]











| Parameter | Value |
|---|---|
| GPU | NVIDIA GeForce RTX 4060 (8 GB VRAM) |
| Framework | PyTorch 2.1.0 |
| RAM | 32 GB |
| CUDA & cuDNN | CUDA 11.8 |
| Optimizer | Adam |
| Learning rate | 0.0001 |
| Batch size | 8 |
| Method | UWCNN | UGAN | FUnIE_GAN | WaterNet | Deep_SESR | URSCT_SESR | Our |
|---|---|---|---|---|---|---|---|
| Mean | 4.562 | 5.352 | 6.853 | 7.027 | 7.572 | 7.548 | 8.539 |
| Dataset | Method | UWCNN | UGAN | FUnIE_GAN | WaterNet | Deep_SESR | URSCT_SESR | Our |
|---|---|---|---|---|---|---|---|---|
| UFO-120 | PSNR | 25.43 | 24.34 | 26.73 | 23.86 | 26.82 | 27.55 | 29.57 |
| SSIM | 0.723 | 0.744 | 0.685 | 0.733 | 0.694 | 0.862 | 0.945 | |
| LPIPS | 0.342 | 0.215 | 0.312 | 0.354 | 0.257 | 0.253 | 0.185 | |
| UIQM | 2.53 | 2.74 | 2.92 | 2.47 | 2.75 | 3.24 | 3.66 | |
| UCIQE | 0.498 | 0.534 | 0.56 | 0.552 | 0.56 | 0.543 | 0.598 | |
| EUVP | PSNR | 25.24 | 24.87 | 25.68 | 26.34 | 26.66 | 27.87 | 29.35 |
| SSIM | 0.628 | 0.727 | 0.636 | 0.787 | 0.749 | 0.828 | 0.943 | |
| LPIPS | 0.316 | 0.193 | 0.286 | 0.315 | 0.301 | 0.273 | 0.152 | |
| UIQM | 2.454 | 2.766 | 3.054 | 2.365 | 2.886 | 3.164 | 3.575 | |
| UCIQE | 0.513 | 0.517 | 0.502 | 0.505 | 0.479 | 0.516 | 0.585 | |
| UIEB-90 | UIQM | 2.462 | 2.256 | 2.856 | 2.743 | 2.953 | 3.068 | 3.462 |
| UCIQE | 0.521 | 0.538 | 0.527 | 0.527 | 0.563 | 0.624 | 0.621 |
| BASELINE | DWMSA | GAMP | FARN | PSNR | SSIM | LPIPS | UIQM | UCIQE | |
|---|---|---|---|---|---|---|---|---|---|
| Model A | √ | × | × | × | 22.56 | 0.756 | 0.342 | 2.964 | 0.503 |
| Model B | × | √ | √ | × | 23.24 | 0.753 | 0.197 | 2.975 | 0.549 |
| Model C | × | × | √ | √ | 23.36 | 0.776 | 0.205 | 3.087 | 0.566 |
| Model D | × | √ | × | √ | 24.46 | 0.824 | 0.210 | 3.124 | 0.541 |
| Model E | × | √ | √ | √ | 26.88 | 0.896 | 0.181 | 3.464 | 0.589 |
| Model | FLOPs/G | Parameters/M | Inference Time/ms | FPS |
|---|---|---|---|---|
| Model A | 16.52 | 16.67 | 9.6 | 104.2 |
| Model B | 20.26 | 18.48 | 13.8 | 72.5 |
| Model C | 21.33 | 19.79 | 14.6 | 68.5 |
| Model D | 18.76 | 18.64 | 12.1 | 82.6 |
| Model E | 19.37 | 19.04 | 11.8 | 84.7 |
| PSNR | SSIM | LPIPS | UIQM | UCIQE | ||||
|---|---|---|---|---|---|---|---|---|
| Model F | √ | √ | × | 22.32 | 0.765 | 0.192 | 2.989 | 0.525 |
| Model G | × | √ | √ | 22.52 | 0.736 | 0.210 | 2.983 | 0.537 |
| Model H | √ | × | √ | 23.21 | 0.785 | 0.195 | 3.087 | 0.542 |
| Model I | √ | √ | √ | 25.79 | 0.856 | 0.186 | 3.358 | 0.594 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tian, S.; Sirikham, A.; Konpang, J.; Wang, C. A Deep Feature Fusion Underwater Image Enhancement Model Based on Perceptual Vision Swin Transformer. J. Imaging 2026, 12, 44. https://doi.org/10.3390/jimaging12010044
Tian S, Sirikham A, Konpang J, Wang C. A Deep Feature Fusion Underwater Image Enhancement Model Based on Perceptual Vision Swin Transformer. Journal of Imaging. 2026; 12(1):44. https://doi.org/10.3390/jimaging12010044
Chicago/Turabian StyleTian, Shasha, Adisorn Sirikham, Jessada Konpang, and Chuyang Wang. 2026. "A Deep Feature Fusion Underwater Image Enhancement Model Based on Perceptual Vision Swin Transformer" Journal of Imaging 12, no. 1: 44. https://doi.org/10.3390/jimaging12010044
APA StyleTian, S., Sirikham, A., Konpang, J., & Wang, C. (2026). A Deep Feature Fusion Underwater Image Enhancement Model Based on Perceptual Vision Swin Transformer. Journal of Imaging, 12(1), 44. https://doi.org/10.3390/jimaging12010044

