MixRformer: Dual-Branch Network for Underwater Image Enhancement in Wavelet Domain
Abstract
:1. Introduction
- This work proposes, for the first time, an underwater image enhancement algorithm structure, MixRformer, that effectively combines the wavelet transform with CNN-Transformer. The introduction of the wavelet transform is not only conducive to restoring the detailed features of the image, but also can reduce the image resolution, thereby reducing GPU memory consumption and reducing the amount of calculation of Transformer, so that MixRformer can better restore the color and texture of the image.
- We propose a dual-branch feature capture block (DFCB), which consists of a simple surface information extraction block, ConvBlock, and an innovative Rectangle GLU-Window Transformer block, which are used to extract surface details and capture global features, respectively.
- We construct a multi-loss function and introduce MSE Loss and VGG Perceptual Loss to train the model for restoring underwater distorted images. Compared with several state-of-the-art underwater image enhancement (UIE) methods, our proposed method shows extremely excellent performance in both objective indicators and visual quality. In particular, it has significant advantages in eliminating color casts and improving image clarity.
2. Related Works
2.1. Underwater Image Restoration Task
2.2. CNN-Based UIE Methods
2.3. Transformer-Based UIE Methods
2.4. Wavelet-Based UIE Methods
2.5. Style and Keypoint Learning-Based UIE Methods
3. Proposed Approach
3.1. Network Architecture
3.2. Wavelet-Based Image Enhancement
3.3. Dual-Branch Feature Capture Block
Rectangle GLU-Window Transformer Block
3.4. Loss Function
3.4.1. MSE Loss
3.4.2. VGG Perceptual Loss
4. Experiments
4.1. Implementation Details
4.1.1. Datasets
4.1.2. Experimental Settings
4.2. Comparative Experiments and Analysis
4.2.1. Qualitative Evaluation
4.2.2. Quantitative Evaluation
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
- Drews, P.; Nascimento, E.; Moraes, F.; Botelho, S.; Campos, M. Transmission estimation in underwater single images. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2–8 December 2013; pp. 825–830. [Google Scholar]
- Zhang, S.; Wang, T.; Dong, J.; Yu, H. Underwater image enhancement via extended multi-scale Retinex. Neurocomputing 2017, 245, 1–9. [Google Scholar] [CrossRef]
- Drews, P.L.J.; Nascimento, E.R.; Botelho, S.S.C.; Campos, M.F.M. Underwater depth estimation and image restoration based on single images. IEEE Comput. Graph. Appl. 2016, 36, 24–35. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, P.; Quan, L.; Yi, C.; Lu, C. Underwater image enhancement based on deep learning and image formation model. arXiv 2021, arXiv:2101.00991. [Google Scholar]
- Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Zhang, D. Wavelet transform. In Fundamentals of Image Data Mining: Analysis, Features, Classification and Retrieval; Springer International Publishing: Cham, Switzerland, 2019; pp. 35–44. [Google Scholar]
- Chen, Z.; Zhang, Y.; Gu, J.; Kong, L.; Yuan, X. Cross aggregation transformer for image restoration. Adv. Neural Inf. Process. Syst. 2022, 35, 25478–25490. [Google Scholar]
- Hou, G.; Pan, Z.; Wang, G.; Yang, H.; Duan, J. An efficient nonlocal variational method with application to underwater image restoration. Neurocomputing 2019, 369, 106–121. [Google Scholar] [CrossRef]
- Zhou, Y.; Wu, Q.; Yan, K.; Feng, L.; Xiang, W. Underwater image restoration using color-line model. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 907–911. [Google Scholar] [CrossRef]
- Song, W.; Wang, Y.; Huang, D.; Liotta, A.; Perra, C. Enhancement of underwater images with statistical model of background light and optimization of transmission map. IEEE Trans. Broadcast. 2020, 66, 153–169. [Google Scholar] [CrossRef]
- Hou, G.; Li, J.; Wang, G.; Yang, H.; Huang, B.; Pan, Z. A novel dark channel prior guided variational framework for underwater image restoration. J. Vis. Commun. Image Represent. 2020, 66, 102732. [Google Scholar] [CrossRef]
- Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Haiyang, Y.; Ruige, G.; Zhongda, Z.; Yuzhang, Z.; Xiaobo, Z.; Tao, L.; Haiyan, W. U-TransCNN: A U-shape transformer-CNN fusion model for underwater image enhancement. Displays 2025, 88, 103047. [Google Scholar] [CrossRef]
- Chen, H.Q.; Shen, X.; Zhao, Z.; Yan, Y. Hybrid CNN-Transformer Network for Two-Stage Underwater Image Enhancement with Contrastive Learning. In Proceedings of the 2024 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Bali, Indonesia, 19–22 August 2024; pp. 1–5. [Google Scholar]
- Wang, Y.; Hu, S.; Yin, S.; Deng, Z.; Yang, Y.-H. A multi-level wavelet-based underwater image enhancement network with color compensation prior. Expert Syst. Appl. 2024, 242, 122710. [Google Scholar] [CrossRef]
- Zhao, C.; Cai, W.; Dong, C.; Hu, C. Wavelet-based fourier information interaction with frequency diffusion adjustment for underwater image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 8281–8291. [Google Scholar]
- Liu, S.; Fan, H.; Wang, Q.; Han, Z.; Guan, Y.; Tang, Y. Wavelet–pixel domain progressive fusion network for underwater image enhancement. Knowl.-Based Syst. 2024, 299, 112049. [Google Scholar] [CrossRef]
- Wang, Z.; Tao, H.; Zhou, H.; Deng, Y.; Zhou, P. A content-style control network with style contrastive learning for underwater image enhancement. Multimed. Syst. 2025, 31, 1–13. [Google Scholar] [CrossRef]
- Chen, X.; Tao, H.; Zhou, H.; Zhou, P.; Deng, Y. Hierarchical and progressive learning with key point sensitive loss for sonar image classification. Multimed. Syst. 2024, 30, 1–16. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 205–218. [Google Scholar]
- Chen, Y.; Zhang, X.; Peng, L.; He, Y.; Sun, F.; Sun, H. Medical image segmentation network based on multi-scale frequency domain filter. Neural Netw. 2024, 175, 106280. [Google Scholar] [CrossRef] [PubMed]
- Shi, D. Transnext: Robust foveal visual perception for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 17773–17783. [Google Scholar]
- Yang, Q.; Yan, P.; Zhang, Y.; Yu, H.; Shi, Y.; Mou, X.; Kalra, M.K.; Zhang, Y.; Sun, L.; Wang, G. Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss. IEEE Trans. Med. Imaging 2018, 37, 1348–1357. [Google Scholar] [CrossRef]
- Shin, Y.S.; Cho, Y.; Pandey, G.; Kim, A. Estimation of ambient light and transmission map with common convolutional architecture. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016; pp. 1–7. [Google Scholar]
- Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
- Zhuang, P.; Li, C.; Wu, J. Bayesian retinex underwater image enhancement. Eng. Appl. Artif. Intell. 2021, 101, 104171. [Google Scholar] [CrossRef]
- Peng, Y.T.; Cosman, P.C. Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef]
- Naik, A.; Swarnakar, A.; Mittal, K. Shallow-uwnet: Compressed model for underwater image enhancement (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021; Volume 35, pp. 15853–15854. [Google Scholar]
- Wang, Y.; Guo, J.; Gao, H.; Yue, H. UIEC^ 2-Net: CNN-based underwater image enhancement using two color space. Signal Process. Image Commun. 2021, 96, 116250. [Google Scholar] [CrossRef]
- Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial networks. In Proceedings of the 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
- Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
- Peng, L.; Zhu, C.; Bian, L. U-shape transformer for underwater image enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar] [CrossRef] [PubMed]
- Tang, Y.; Iwaguchi, T.; Kawasaki, H.; Sagawa, R.; Furukawa, R. AutoEnhancer: Transformer on U-Net architecture search for underwater image enhancement. In Proceedings of the Asian Conference on Computer Vision, Macao, China, 4–8 December 2022; pp. 1403–1420. [Google Scholar]
- Wang, D.; Sun, Z. Frequency domain based learning with transformer for underwater image restoration. In Pacific Rim International Conference on Artificial Intelligence; Springer Nature: Cham, Switzerland, 2022; pp. 218–232. [Google Scholar]
Model | PSNR | SSIM | UIQM | UCIQE | NIQE |
---|---|---|---|---|---|
BRUE | 17.58 | 0.67 | 3.17 | 0.42 | 4.92 |
UDCP | 15.84 | 0.57 | 2.00 | 0.52 | 5.66 |
IBLA | 16.55 | 0.43 | 2.25 | 0.49 | 5.50 |
ShallowUWnet | 26.75 | 0.79 | 2.91 | 0.398 | 6.04 |
Uice2Net | 26.83 | 0.86 | 2.99 | 0.396 | 5.83 |
UGAN | 25.42 | 0.81 | 2.94 | 0.41 | 5.15 |
FunieGAN | 26.24 | 0.81 | 2.89 | 0.45 | 5.66 |
U-Trans | 22.57 | 0.74 | 3.00 | 0.38 | 5.04 |
AutoEnhancer | 28.78 | 0.86 | 2.85 | 0.41 | 5.06 |
Frequency | 28.81 | 0.86 | 2.82 | 0.42 | 4.95 |
Ours | 29.01 | 0.87 | 2.88 | 0.42 | 4.916 |
Model | PSNR | SSIM | UIQM | UCIQE | NIQE |
---|---|---|---|---|---|
BRUE | 17.05 | 0.64 | 3.09 | 0.43 | 4.96 |
UDCP | 19.12 | 0.66 | 2.25 | 0.51 | 5.11 |
IBLA | 20.06 | 0.71 | 2.45 | 0.49 | 5.06 |
ShallowUWnet | 27.02 | 0.81 | 2.89 | 0.43 | 5.16 |
Uice2Net | 21.00 | 0.75 | 3.02 | 0.45 | 5.03 |
FunieGAN | 26.97 | 0.83 | 2.73 | 0.45 | 4.88 |
U-Trans | 25.29 | 0.80 | 2.96 | 0.42 | 4.99 |
AutoEnhancer | 28.13 | 0.84 | 2.79 | 0.43 | 4.88 |
Frequency | 28.33 | 0.84 | 2.73 | 0.44 | 4.91 |
Ours | 28.43 | 0.85 | 2.84 | 0.43 | 4.68 |
Model | Patchsize | GPU | Time/epoch |
---|---|---|---|
w/o WT | 3 | 21,976 MiB | 66 m |
MixRformer | 12 | 23,418 MiB | 17 m |
Model | PSNR | SSIM | UIQM |
---|---|---|---|
w/o WT | 28.083 ± 3.369 | 0.835 ± 0.062 | 2.859 ± 0.388 |
w/o ConvBlock | 28.521 ± 3.08 | 0.847 ± 0.054 | 2.845 ± 0.393 |
w/o RGLUWin Transformer | 28.59 ± 3.163 | 0.846 ± 0.052 | 2.88 ± 0.371 |
MixRformer | 29.01 ± 3.158 | 0.87 ± 0.054 | 2.88 ± 0.387 |
Model | PSNR | SSIM | UIQM |
---|---|---|---|
w/MLP | 28.89 ± 3.222 | 0.854 ± 0.053 | 2.873 ± 0.402 |
MixRformer w/GLU | 29.01 ± 3.158 | 0.87 ± 0.054 | 2.88 ± 0.387 |
Model | PSNR | SSIM | UIQM |
---|---|---|---|
w/o LVGG | 28.669 ± 3.136 | 0.85 ± 0.052 | 2.841 ± 0.412 |
MixRformer w/GLU | 29.01 ± 3.158 | 0.87 ± 0.054 | 2.88 ± 0.387 |
Model | Parameters | FLOPs |
---|---|---|
w/o ConvBlock | 10.10 M | 11.58 G |
w/o RGLUWin | 12.97 M | 15.29 G |
Transformer | 20.05 M | 100.16 G |
w/o WT | ||
MixRformer | 20.06 M | 25.17 G |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Zhao, L.; Li, H.; Xue, X.; Liu, H. MixRformer: Dual-Branch Network for Underwater Image Enhancement in Wavelet Domain. Sensors 2025, 25, 3302. https://doi.org/10.3390/s25113302
Li J, Zhao L, Li H, Xue X, Liu H. MixRformer: Dual-Branch Network for Underwater Image Enhancement in Wavelet Domain. Sensors. 2025; 25(11):3302. https://doi.org/10.3390/s25113302
Chicago/Turabian StyleLi, Jie, Lei Zhao, Heng Li, Xiaojun Xue, and Hui Liu. 2025. "MixRformer: Dual-Branch Network for Underwater Image Enhancement in Wavelet Domain" Sensors 25, no. 11: 3302. https://doi.org/10.3390/s25113302
APA StyleLi, J., Zhao, L., Li, H., Xue, X., & Liu, H. (2025). MixRformer: Dual-Branch Network for Underwater Image Enhancement in Wavelet Domain. Sensors, 25(11), 3302. https://doi.org/10.3390/s25113302