Fine-Grained Segmentation Method of Ground-Based Cloud Images Based on Improved Transformer
Abstract
1. Introduction
2. Related Work
3. Methods
3.1. Swin Transformer
3.2. Overall Framework
3.3. BiFormer Blcok
3.4. Multi-Scale Dual-Attention
3.5. MLP Bottleneck Layer
4. Experiment
4.1. Parameter Settings
4.2. Dataset
4.3. Evaluation Metrics
4.4. Comparative Experiment
4.5. Ablation Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, J.; Zhao, L.; Deng, S.; Xu, W.; Zhang, Y. A critical review of the models used to estimate solar radiation. Renew. Sustain. Energy Rev. 2017, 70, 314–329. [Google Scholar] [CrossRef]
- Tan, Z.; Zhang, H.; Xu, J. Photovoltaic power generation in China: Development potential, benefits of energy conservation and emission reduction. J. Energy Eng. 2012, 138, 73–86. [Google Scholar] [CrossRef]
- Kabir, E.; Kumar, P.; Kumar, S.; Adelodun, A.A. Solar energy: Potential and future prospects. Renew. Sustain. Energy Rev. 2018, 82, 894–900. [Google Scholar] [CrossRef]
- Zhang, Z.; Yang, S.; Liu, S.; Xiao, B.; Cao, X. Ground-based cloud detection using multiscale attention convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8019605. [Google Scholar] [CrossRef]
- Hong, D.; Yokoya, N.; Chanussot, J.; Zhu, X.X. An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Trans. Image Process. 2019, 28, 1923–1938. [Google Scholar] [CrossRef] [PubMed]
- Shi, C.; Zhou, Y.; Qiu, B.; Guo, D.; Li, M. CloudU-Net: A deep convolutional neural network architecture for daytime and nighttime cloud images segmentation. IEEE Geosci. Remote Sens. Lett. 2021, 18, 1688–1692. [Google Scholar] [CrossRef]
- Zhen, Z.; Zhang, X.; Mei, S.; Chang, X.; Chai, H. Ultra short-term irradiance forecasting model based on ground-based cloud image and deep learning algorithm. IET Renew. Power Gener. 2022, 16, 2604–2616. [Google Scholar] [CrossRef]
- Dev, S.; Nautiyal, A.; Lee, Y.H. CloudSegNet: A deep network for nychthemeron cloud image segmentation. IEEE Geosci. Remote Sens. Lett. 2021, 16, 1814–1818. [Google Scholar] [CrossRef]
- Ye, L.; Cao, Z.; Yang, Z. CCAD-Net: A cascade cloud attribute discrimination network for cloud genera segmentation in whole-sky images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6512105. [Google Scholar] [CrossRef]
- Dev, S.; Lee, Y.H.; Winkler, S. Color-based segmentation of sky/cloud images from ground-based cameras. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 231–242. [Google Scholar] [CrossRef]
- Gacal, G.F.B.; Antioquia, C.; Lagrosas, N. Ground-based detection of nighttime clouds above Manila observatory (14.64° N, 121.07° E) using a digital camera. IEEE Geosci. Remote Sens. Lett. 2016, 55, 6040–6045. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Shi, C.; Zhou, Y.; Qiu, B. CloudU-Netv2: A cloud segmentation method for ground-based cloud images based on deep learning. Neural Process. Lett. 2021, 53, 2715–2728. [Google Scholar] [CrossRef]
- Shi, C.; Zhou, Y.; Qiu, B. CloudRaednet: Residual attention-based encoder-decoder network for ground-based cloud images segmentation in nychthemeron. Int. J. Remote Sens. 2022, 43, 2059–2075. [Google Scholar] [CrossRef]
- Liu, S.; Zhang, J.; Zhang, Z. TransCloudSeg: Ground-based cloud image segmentation with transformer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 6121–6132. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar]
- Zhu, L.; Wang, X.; Ke, Z. Biformer: Vision transformer with bi-level routing attention. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 10323–10333. [Google Scholar]
- Sun, G.; Pan, Y.; Kong, W. DA-TransUNet: Integrating spatial and channel dual attention with transformer U-Net for medical image segmentation. Front. Bioeng. Biotechnol. 2024, 12, 1398237. [Google Scholar] [CrossRef] [PubMed]
- Sun, K.; Xiao, B.; Liu, D. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J. Swin-Unet: U-Net-like pure transformer for medical image segmentation. In Computer Vision—ECCV 2022 Workshops; Springer: Cham, Switzerland, 2022; pp. 205–218. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online Conference, 6–14 December 2021; pp. 12077–12090. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. In Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Cham, Switzerland, 8 February 2021; pp. 485–495. [Google Scholar]






| Methods | mFscore | Accuracy | Precision | mIoU |
|---|---|---|---|---|
| U-Net [13] | 39.30 | 40.75 | 37.19 | 36.52 |
| HrNet [21] | 59.48 | 57.30 | 59.79 | 52.44 |
| SwinU-Net [22] | 60.71 | 62.03 | 60.54 | 57.02 |
| Segformer [23] | 62.58 | 66.90 | 63.74 | 59.34 |
| TranUnet [24] | 64.48 | 67.53 | 65.90 | 61.60 |
| ours | 69.85 | 71.47 | 70.20 | 65.18 |
| Methods | Params (M) | FLOPs (G) | Inference Time (ms/Image) |
|---|---|---|---|
| U-Net [13] | 31.0 | 55.2 | 12.8 |
| HrNet [21] | 65.9 | 84.7 | 18.4 |
| SwinU-Net [22] | 41.3 | 96.5 | 21.6 |
| Segformer [23] | 27.5 | 62.3 | 14.2 |
| TranUnet [24] | 105.3 | 122.8 | 27.5 |
| ours | 58.2 | 110.7 | 23.8 |
| MSDA | BiFormer Block | mFscore | Accuracy | Precision | mIoU |
|---|---|---|---|---|---|
| — | — | 62.10 | 63.56 | 59.48 | 56.82 |
| √ | — | 63.42 | 64.70 | 64.56 | 61.31 |
| — | √ | 65.80 | 66.37 | 66.68 | 62.94 |
| √ | √ | 69.85 | 71.47 | 70.20 | 65.18 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, L.; Shi, D.; Li, P.; Liu, B.; Sun, T.; Jiao, B.; Wang, C.; Zhang, R.; Shi, C. Fine-Grained Segmentation Method of Ground-Based Cloud Images Based on Improved Transformer. Electronics 2026, 15, 156. https://doi.org/10.3390/electronics15010156
Zhang L, Shi D, Li P, Liu B, Sun T, Jiao B, Wang C, Zhang R, Shi C. Fine-Grained Segmentation Method of Ground-Based Cloud Images Based on Improved Transformer. Electronics. 2026; 15(1):156. https://doi.org/10.3390/electronics15010156
Chicago/Turabian StyleZhang, Lihua, Dawei Shi, Pengfei Li, Buwei Liu, Tongmeng Sun, Bo Jiao, Chunze Wang, Rongda Zhang, and Chaojun Shi. 2026. "Fine-Grained Segmentation Method of Ground-Based Cloud Images Based on Improved Transformer" Electronics 15, no. 1: 156. https://doi.org/10.3390/electronics15010156
APA StyleZhang, L., Shi, D., Li, P., Liu, B., Sun, T., Jiao, B., Wang, C., Zhang, R., & Shi, C. (2026). Fine-Grained Segmentation Method of Ground-Based Cloud Images Based on Improved Transformer. Electronics, 15(1), 156. https://doi.org/10.3390/electronics15010156

