O-Transformer-Mamba: An O-Shaped Transformer-Mamba Framework for Remote Sensing Image Haze Removal
Highlights
- We propose an enhanced visual state space model and an adaptive self-attention mechanism that jointly improve global-local feature modeling and selective feature enhancement.
- A novel framework for dehazing remote sensing images.
- Enhancing global-local feature interaction and selective feature refinement is essential for achieving robust remote sensing image restoration.
- On both real and synthetic remote sensing datasets, our method achieves state-of-the-art performance.
Abstract
1. Introduction
- We propose a novel O-shaped topology that fosters comprehensive multi-scale and cross-level feature integration, thereby enhancing the network’s capability to address large-scale and spatially uneven haze distributions in remote sensing imagery.
- We design a sparsely enhanced self-attention (SE-SA) mechanism that adaptively focuses on degraded regions, effectively recovering structural elements, including edges and surface patterns.
- A mixed visual state space model (Mix-VSSM) combining convolution and state space modeling to balance local spatial awareness and global sequence modeling.
- Our proposed network demonstrates superior effectiveness on both real-world and synthetic datasets, handling diverse scales and haze density levels with remarkable robustness.
2. Materials and Methods
2.1. Motivation
2.2. Overview of Proposed O-Transformer-Mamba Architecture
2.3. Sparse-Enhanced Self-Attention
| Algorithm 1: Sparse Enhancement Operation |
Input: Input feature map Output: Attention-enhanced feature map Step 1: Reducing Channels: ; Step 2: Generate M: ; Step 3: Calculating the gap between the attention score and M: ; Step 4: Gaps Enlargement and Smoothing: // scale is the adjustable scaling factor Step 5: Compute attention output: return Y |
2.4. Mixed Visual State Space Model
2.5. Loss Function
3. Results
3.1. Dataset
3.2. Implementation Details
3.3. Quantitative Results
3.4. Qualitative Results
3.5. Ablation Study
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, N.; Yang, A.; Cui, Z.; Ding, Y.; Xue, Y.; Su, Y. Capsule attention network for hyperspectral image classification. Remote Sens. 2024, 16, 4001. [Google Scholar] [CrossRef]
- Wang, N.; Cui, Z.; Lan, Y.; Zhang, C.; Xue, Y.; Su, Y.; Li, A. Large-Scale Hyperspectral Image-Projected Clustering via Doubly Stochastic Graph Learning. Remote Sens. 2025, 17, 1526. [Google Scholar] [CrossRef]
- Liu, Y.; Yan, Z.; Tan, J.; Li, Y. Multi-purpose oriented single nighttime image haze removal based on unified variational retinex model. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 1643–1657. [Google Scholar] [CrossRef]
- Liu, Y.; Yan, Z.; Chen, S.; Ye, T.; Ren, W.; Chen, E. Nighthazeformer: Single nighttime haze removal using prior query transformer. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 4119–4128. [Google Scholar]
- Zhou, H.; Chen, Z.; Liu, Y.; Sheng, Y.; Ren, W.; Xiong, H. Physical-priors-guided DehazeFormer. Knowl.-Based Syst. 2023, 266, 110410. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, X.; Hu, E.; Wang, A.; Shiri, B.; Lin, W. VNDHR: Variational single nighttime image Dehazing for enhancing visibility in intelligent transportation systems via hybrid regularization. IEEE Trans. Intell. Transp. Syst. 2025, 26, 10189–10203. [Google Scholar] [CrossRef]
- Zhou, H.; Wang, Y.; Zhang, Q.; Tao, T.; Ren, W. A Dual-Stage Residual Diffusion Model with Perceptual Decoding for Remote Sensing Image Dehazing. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4109312. [Google Scholar] [CrossRef]
- Rong, Z.; Jun, W.L. Improved wavelet transform algorithm for single image dehazing. Optik 2014, 125, 3064–3066. [Google Scholar] [CrossRef]
- Wang, J.; Lu, K.; Xue, J.; He, N.; Shao, L. Single image dehazing based on the physical model and MSRCR algorithm. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 2190–2199. [Google Scholar] [CrossRef]
- He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [CrossRef]
- Zhu, Q.; Mai, J.; Shao, L. A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [CrossRef]
- Dong, H.; Pan, J.; Xiang, L.; Hu, Z.; Zhang, X.; Wang, F.; Yang, M.H. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Washington, WA, USA, 16–18 June 2020; pp. 2157–2167. [Google Scholar]
- Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef]
- Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar]
- Liu, X.; Ma, Y.; Shi, Z.; Chen, J. Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7314–7323. [Google Scholar]
- Li, Y.; Chen, X. A coarse-to-fine two-stage attentive network for haze removal of remote sensing images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1751–1755. [Google Scholar] [CrossRef]
- Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 11908–11915. [Google Scholar]
- Song, Y.; He, Z.; Qian, H.; Du, X. Vision transformers for single image dehazing. IEEE Trans. Image Process. 2023, 32, 1927–1941. [Google Scholar] [CrossRef]
- Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17683–17693. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 24 2022; pp. 5728–5739. [Google Scholar]
- Nie, J.; Xie, J.; Sun, H. Remote sensing image dehazing via a local context-enriched transformer. Remote Sens. 2024, 16, 1422. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Guan, X.; He, R.; Wang, L.; Zhou, H.; Liu, Y.; Xiong, H. DWTMA-Net: Discrete Wavelet Transform and Multi-Dimensional Attention Network for Remote Sensing Image Dehazing. Remote Sens. 2025, 17, 2033. [Google Scholar] [CrossRef]
- Zhou, H.; Wang, L.; Li, Q.; Guan, X.; Tao, T. Multi-Dimensional and Multi-Scale Physical Dehazing Network for Remote Sensing Images. Remote Sens. 2024, 16, 4780. [Google Scholar] [CrossRef]
- Wu, J.; Ai, H.; Zhou, P.; Wang, H.; Zhang, H.; Zhang, G.; Chen, W. Low-Light Image Dehazing and Enhancement via Multi-Feature Domain Fusion. Remote Sens. 2025, 17, 2944. [Google Scholar] [CrossRef]
- Zhou, H.; Wang, Y.; Peng, W.; Guan, X.; Tao, T. ScaleViM-PDD: Multi-Scale EfficientViM with Physical Decoupling and Dual-Domain Fusion for Remote Sensing Image Dehazing. Remote Sens. 2025, 17, 2664. [Google Scholar] [CrossRef]
- Wang, H.; Ding, Y.; Zhou, X.; Yuan, G.; Sun, C. Dehazing of Panchromatic Remote Sensing Images Based on Histogram Features. Remote Sens. 2025, 17, 3479. [Google Scholar] [CrossRef]
- Lu, L.; Xiong, Q.; Xu, B.; Chu, D. Mixdehazenet: Mix structure block for image dehazing network. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–10. [Google Scholar]
- Cui, Y.; Ren, W.; Knoll, A. Omni-Kernel Network for Image Restoration. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 1426–1434. [Google Scholar]
- Cui, Y.; Zamir, S.W.; Khan, S.; Knoll, A.; Shah, M.; Khan, F.S. AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation. In Proceedings of the the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Hamilton, J.D. State-space models. Handb. Econom. 1994, 4, 3039–3080. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. Vmamba: Visual state space model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
- Guo, H.; Li, J.; Dai, T.; Ouyang, Z.; Ren, X.; Xia, S.T. Mambair: A simple baseline for image restoration with state-space model. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 222–241. [Google Scholar]
- Sun, S.; Ren, W.; Zhou, J.; Gan, J.; Wang, R.; Cao, X. A hybrid transformer-mamba network for single image deraining. arXiv 2024, arXiv:2409.00410. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Kautz, J. Mambavision: A hybrid mamba-transformer vision backbone. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 25261–25270. [Google Scholar]
- Su, X.; Li, S.; Cui, Y.; Cao, M.; Zhang, Y.; Chen, Z.; Wu, Z.; Wang, Z.; Zhang, Y.; Yuan, X. Prior-guided hierarchical harmonization network for efficient image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 7042–7050. [Google Scholar]
- Cui, Y.; Wang, Q.; Li, C.; Ren, W.; Knoll, A. EENet: An effective and efficient network for single image dehazing. Pattern Recognit. 2025, 158, 111074. [Google Scholar] [CrossRef]
- Wang, T.; Du, L.; Yi, W.; Hong, J.; Zhang, L.; Zheng, J.; Li, C.; Ma, X.; Zhang, D.; Fang, W.; et al. An adaptive atmospheric correction algorithm for the effective adjacency effect correction of submeter-scale spatial resolution optical satellite images: Application to a WorldView-3 panchromatic image. Remote Sens. Environ. 2021, 259, 112412. [Google Scholar] [CrossRef]
- Huang, B.; Zhi, L.; Yang, C.; Sun, F.; Song, Y. Single satellite optical imagery dehazing using SAR image prior based on conditional generative adversarial networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1806–1813. [Google Scholar]
- Zhang, L.; Wang, S. Dense haze removal based on dynamic collaborative inference learning for remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5631016. [Google Scholar] [CrossRef]
- Zhu, Z.H.; Lu, W.; Chen, S.B.; Ding, C.H.Q.; Tang, J.; Luo, B. Real-World Remote Sensing Image Dehazing: Benchmark and Baseline. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4705014. [Google Scholar] [CrossRef]
- Feng, C.; Chen, Z.; Kou, R.; Gao, G.; Wang, C.; Li, X.; Shu, X.; Dai, Y.; Fu, Q.; Yang, J. HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes. arXiv 2024, arXiv:2409.19833. [Google Scholar]














| Methods | Thin | Moderate | Thick | Average | ||||
|---|---|---|---|---|---|---|---|---|
| PSNR ↑ | SSIM ↑ | PSNR ↑ | SSIM ↑ | PSNR ↑ | SSIM ↑ | PSNR ↑ | SSIM ↑ | |
| DCP [10] | 20.15 | 0.8645 | 20.51 | 0.8932 | 15.77 | 0.7117 | 18.81 | 0.8241 |
| FCTF-Net [16] | 19.13 | 0.8532 | 22.32 | 0.9107 | 17.78 | 0.7617 | 19.74 | 0.8419 |
| AOD-Net [14] | 15.97 | 0.8169 | 15.39 | 0.7442 | 14.44 | 0.7013 | 15.27 | 0.7541 |
| GridDehaze-Net [15] | 19.81 | 0.8556 | 22.75 | 0.9085 | 17.94 | 0.7551 | 20.17 | 0.8397 |
| Dehazeformer [18] | 24.90 | 0.9104 | 27.13 | 0.9431 | 22.68 | 0.8497 | 24.90 | 0.9011 |
| MixDehaze-Net [28] | 22.12 | 0.8822 | 23.92 | 0.9040 | 19.96 | 0.7950 | 22.00 | 0.8604 |
| FFA-Net [17] | 24.04 | 0.9130 | 25.62 | 0.9336 | 21.70 | 0.8422 | 23.79 | 0.8963 |
| OK-Net [29] | 20.68 | 0.8860 | 25.39 | 0.9406 | 20.21 | 0.8186 | 22.09 | 0.8817 |
| AdaIR [30] | 24.14 | 0.9092 | 25.26 | 0.9376 | 21.01 | 0.8209 | 23.47 | 0.8892 |
| Ours | ||||||||
| Methods | Thick Fog | Moderate Fog | Thin Fog | Average | ||||
|---|---|---|---|---|---|---|---|---|
| PSNR ↑ | SSIM ↑ | PSNR ↑ | SSIM ↑ | PSNR ↑ | SSIM ↑ | PSNR ↑ | SSIM ↑ | |
| DCP [10] | 12.43 | 0.4493 | 13.32 | 0.4771 | 16.16 | 0.4762 | 13.97 | 0.4675 |
| FCTF-Net [16] | 22.43 | 0.6686 | 21.72 | 0.6313 | 21.20 | 0.5579 | 21.78 | 0.6193 |
| AOD-Net [14] | 15.10 | 0.3847 | 15.26 | 0.3923 | 18.22 | 0.4850 | 16.19 | 0.4207 |
| GridDehaze-Net [15] | 23.84 | 0.7125 | 22.54 | 0.6475 | 22.54 | 0.6203 | 22.97 | 0.6601 |
| Dehazeformer [18] | 24.71 | 0.7309 | 20.93 | 0.6183 | 23.09 | 0.6452 | 22.92 | 0.6648 |
| MixDehaze-Net [28] | 23.64 | 0.6728 | 22.43 | 0.6233 | 22.02 | 0.5666 | 22.70 | 0.6209 |
| FFA-Net [17] | 25.09 | 0.7397 | 23.24 | 0.6713 | 23.09 | 0.6409 | 23.81 | 0.6840 |
| OK-Net [29] | 24.76 | 0.7415 | 24.18 | 0.7025 | 23.55 | 0.6599 | 24.16 | 0.7013 |
| AdaIR [30] | 23.98 | 0.7146 | 23.12 | 0.6671 | 22.54 | 0.6007 | 23.21 | 0.6608 |
| Ours | ||||||||
| Methods | LHID | DHID | Average | |||
|---|---|---|---|---|---|---|
| PSNR ↑ | SSIM ↑ | PSNR ↑ | SSIM ↑ | PSNR ↑ | SSIM ↑ | |
| DCP [10] | 21.34 | 0.7976 | 19.15 | 0.8195 | 20.25 | 0.8086 |
| FCTF-Net [16] | 28.55 | 0.8727 | 22.43 | 0.8482 | 25.49 | 0.8605 |
| AOD-Net [14] | 21.91 | 0.8144 | 16.03 | 0.7291 | 18.97 | 0.7718 |
| GridDehaze-Net [15] | 25.80 | 0.8584 | 26.77 | 0.8851 | 26.29 | 0.8718 |
| MixDehaze-Net [28] | 29.47 | 0.8631 | 27.36 | 0.8864 | 28.42 | 0.8748 |
| FFA-Net [17] | 29.33 | 0.8755 | 24.62 | 0.8657 | 26.98 | 0.8706 |
| OK-Net [29] | 29.03 | 0.8766 | 27.80 | 0.8973 | 28.42 | 0.8870 |
| AdaIR [30] | 28.08 | 0.8776 | 24.01 | 0.8676 | 26.05 | 0.8726 |
| Ours | ||||||
| Methods | HazyDet | |
|---|---|---|
| PSNR ↑ | SSIM ↑ | |
| DCP [10] | 17.03 | 0.8024 |
| FCTF-Net [16] | 24.89 | 0.8552 |
| AOD-Net [14] | 18.99 | 0.7808 |
| GridDehaze-Net [15] | 26.66 | 0.8801 |
| MixDehaze-Net [28] | 28.75 | 0.9068 |
| FFA-Net [17] | 27.12 | 0.8782 |
| OK-Net [29] | 27.76 | 0.8875 |
| AdaIR [30] | 27.60 | 0.8857 |
| Ours | ||
| Method | FLOPs | Parameters |
|---|---|---|
| DCP | - | - |
| FCTF-Net | 40.19 (G) | 163.48 (K) |
| AOD-Net | 457.70 (M) | 1.76 (K) |
| GridDehaze-Net | 85.72 (G) | 955.75 (K) |
| MixDehaze-Net | 114.30 (G) | 3.17 (M) |
| FFA-Net | 624.20 (G) | 4.68 (M) |
| Dehazeformer | 358.89 (G) | 9.68 (M) |
| OK-Net | 158.20 (G) | 4.43 (M) |
| AdaIR | 588.73 (G) | 28.74 (M) |
| O-Transformer-Mamba | 395.98 (G) | 6.20 (M) |
| Methods | Thin Haze | |
|---|---|---|
| PSNR ↑ | SSIM ↑ | |
| SA | 22.62 | 0.8986 |
| SE-SA | 23.36 | 0.9046 |
| VSSM | 23.01 | 0.8991 |
| Mix-VSSM | 24.07 | 0.9060 |
| O(SE-SA + Mix-VSSM) | 24.53 | 0.9121 |
| SE-SA + Mix-VSSM + FAM | 24.80 | 0.9152 |
| Structures | Thin Haze | |
|---|---|---|
| PSNR ↑ | SSIM ↑ | |
| parallel | 23.68 | 0.9051 |
| U-shaped | 24.11 | 0.9078 |
| O-shaped | 24.80 | 0.9152 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Guan, X.; He, R.; Wang, L.; Zhou, H.; Liu, Y.; Xiong, H. O-Transformer-Mamba: An O-Shaped Transformer-Mamba Framework for Remote Sensing Image Haze Removal. Remote Sens. 2026, 18, 191. https://doi.org/10.3390/rs18020191
Guan X, He R, Wang L, Zhou H, Liu Y, Xiong H. O-Transformer-Mamba: An O-Shaped Transformer-Mamba Framework for Remote Sensing Image Haze Removal. Remote Sensing. 2026; 18(2):191. https://doi.org/10.3390/rs18020191
Chicago/Turabian StyleGuan, Xin, Runxu He, Le Wang, Hao Zhou, Yun Liu, and Hailing Xiong. 2026. "O-Transformer-Mamba: An O-Shaped Transformer-Mamba Framework for Remote Sensing Image Haze Removal" Remote Sensing 18, no. 2: 191. https://doi.org/10.3390/rs18020191
APA StyleGuan, X., He, R., Wang, L., Zhou, H., Liu, Y., & Xiong, H. (2026). O-Transformer-Mamba: An O-Shaped Transformer-Mamba Framework for Remote Sensing Image Haze Removal. Remote Sensing, 18(2), 191. https://doi.org/10.3390/rs18020191

