SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification
Abstract
:1. Introduction
- We propose a novel satellite cloud image classification framework, SIG-ShapeFormer, which independently extracts shape, spatial, and temporal features from cloud images and effectively fuses them through an EAFM (Ensemble-Attention Feature Mixer) block, thereby enhancing the classification accuracy across diverse cloud types;
- We develop a multi-scale feature extraction architecture based on the Inception framework, which integrates parallel 1D convolutional branches with varying kernel sizes (from 1 × 1 to 1 × 7) to construct a multi-level feature pyramid, enabling the model to simultaneously capture both local texture details and global temporal sequence features in cloud systems;
- We design a differential-enhanced GASF transformation module that combines first-order and second-order difference information to convert time-series cloud data into spatially meaningful 2D representations, significantly improving the ability of the model to characterize cloud boundaries and internal structures;
- The proposed SIG-ShapeFormer model demonstrates outstanding classification performance, achieving 99.36% accuracy on the LSCIDMR-S dataset, a significant improvement over the original ShapeFormer model. Additionally, it achieves higher accuracy on 6 out of 8 datasets from the UCM remote sensing dataset and the UEA archive dataset, showcasing robust generalization capabilities across diverse meteorological conditions.
2. Data
2.1. Original Data Description
2.2. Data Preprocessing
3. Methods
3.1. Problem Formulation and Overall Model Structure
3.2. Shapelet-Based Feature Extraction Channel (S-Channel)
3.2.1. Shapelet Transformation
3.2.2. Positional Encoding and Temporal Modeling
3.3. Inception-Based Feature Extraction Channel (I-Channel)
3.3.1. Multi-Scale Inception Module Design
3.3.2. Positional Encoding and Temporal Modeling
3.4. GASF Transformation Channel with Differential Feature Fusion (G-Channel)
3.5. EAFM(Ensemble-Attention Feature Mixer) Module
4. Experiments and Results
4.1. Implementation Details
4.2. Evaluation Metrics
4.3. Comparative Experiment
4.4. Ablation Experiment
- Shapelet Feature Channel: The weight of this channel gradually decreases, indicating its significant value in identifying critical spatiotemporal patterns (such as unique subsequences in weather systems) during the early stages of training. However, its relative importance diminishes as the model learns more robust multi-scale features through other channels, such as GASF.
- Multi-Scale Inception Channel: From epoch 100 onwards, the weight of this channel stabilizes and converges to approximately 0.3. This highlights the consistent support provided by its multi-scale convolutional kernels for capturing fine and coarse spatiotemporal features.
- GASF Transformation Channel: As training progresses, the weight of this channel increases and stabilizes at around 0.6, demonstrating its capability to integrate first/second-order differential information and spatial structural features, which is crucial for enhancing classification accuracy.
4.5. Generalization Experiment
5. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhuang, Z.; Wang, M.; Wang, K.; Li, S.; Wu, J. Research progress of deep learning-based cloud classification. Nanjing Xinxi Gongcheng Daxue Xuebao 2022, 14, 566–578. [Google Scholar]
- Yashuai, F.; Wenhao, Z.; Yongtao, J.; Qiyue, L.; Lili, Z.; Fangfei, B.; Yu, M. Research progress of cloud classification based on optical satellite remote sensing. J. Atmos. Environ. Opt. 2025, 20, 1. [Google Scholar]
- Cheng, Z.; Wang, H.; Bai, J. A Review of Ground-Based Retrieval of Cloud Microphysical Parameters. Meteorol. Sci. Technol. 2007, 35, 9–14. [Google Scholar]
- Ding, L.; Xia, M.; Lin, H.; Hu, K. Multi-level attention interactive network for cloud and snow detection segmentation. Remote Sens. 2023, 16, 112. [Google Scholar] [CrossRef]
- Wang, Y.; Shu, Z.; Feng, Y.; Liu, R.; Cao, Q.; Li, D.; Wang, L. Enhancing Cross-Domain Remote Sensing Scene Classification by Multi-Source Subdomain Distribution Alignment Network. Remote Sens. 2025, 17, 1302. [Google Scholar] [CrossRef]
- Yang, C.; Yuan, Z.; Gu, S. Cloud classification of GMS-5 satellite imagery by the use of multispectral threshold technique. J. Nanjing Inst. Meteorol 2002, 25, 747–754. [Google Scholar]
- Zhou, X.; Yang, X.; Yao, X. The study of cloud classification and detection in remote sensing image. J. Graph. 2014, 35, 768. [Google Scholar]
- Li, J.; Menzel, W.P.; Yang, Z.; Frey, R.A.; Ackerman, S.A. High-spatial-resolution surface and cloud-type classification from MODIS multispectral band measurements. J. Appl. Meteorol. 2003, 42, 204–226. [Google Scholar] [CrossRef]
- Wu, Y.; Zhang, R.; Jiang, G.; Sun, Z.; Niu, S. A fuzzy clustering method for multi-spectral satellite images. J. Trop. Meteorol. 2004, 20, 689–696. [Google Scholar]
- Li, Q.; Zhang, Z.; Lu, W.; Yang, J.; Ma, Y.; Yao, W. From pixels to patches: A cloud classification method based on a bag of micro-structures. Atmos. Meas. Tech. 2016, 9, 753–764. [Google Scholar] [CrossRef]
- Yu, Z.; Ma, S.; Han, D.; Li, G.; Gao, D.; Yan, W. A cloud classification method based on random forest for FY-4A. Int. J. Remote Sens. 2021, 42, 3353–3379. [Google Scholar] [CrossRef]
- Tan, Z.; Liu, C.; Ma, S.; Wang, X.; Shang, J.; Wang, J.; Ai, W.; Yan, W. Detecting multilayer clouds from the geostationary advanced Himawari imager using machine learning techniques. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4103112. [Google Scholar] [CrossRef]
- Zhang, C.; Zhuge, X.; Yu, F. Development of a high spatiotemporal resolution cloud-type classification approach using Himawari-8 and CloudSat. Int. J. Remote Sens. 2019, 40, 6464–6481. [Google Scholar] [CrossRef]
- Shan, W.; Chu-yi, X.; Chun-xiang, S.; Ying, Z. Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM. Comput. Sci. 2022, 49, 675–679. [Google Scholar]
- Zhang, J.; Liu, P.; Zhang, F.; Song, Q. CloudNet: Ground-based cloud classification with deep convolutional neural network. Geophys. Res. Lett. 2018, 45, 8665–8672. [Google Scholar] [CrossRef]
- Liu, S.; Li, M.; Zhang, Z.; Cao, X.; Durrani, T.S. Ground-based cloud classification using task-based graph convolutional network. Geophys. Res. Lett. 2020, 47, e2020GL087338. [Google Scholar] [CrossRef]
- Afzali Gorooh, V.; Kalia, S.; Nguyen, P.; Hsu, K.l.; Sorooshian, S.; Ganguly, S.; Nemani, R.R. Deep neural network cloud-type classification (DeepCTC) model and its application in evaluating PERSIANN-CCS. Remote Sens. 2020, 12, 316. [Google Scholar] [CrossRef]
- Chen, L.; Xu, D.; Yang, L.; Ng, C.T.; Fu, J.; He, Y.; He, Y. Classification and identification of extreme wind events by CNNs based on Shapelets and improved GASF-GADF. J. Wind. Eng. Ind. Aerodyn. 2024, 253, 105852. [Google Scholar] [CrossRef]
- Li, K.; Yan, H.; Gu, J.; Su, L.; Su, W.; Xue, Z. A Multi-Source Transfer Learning Method for Rolling Bearing Fault Diagnosis Based on Shapelet Time Series. Chin. J. Mech. Eng. 2022, 33, 2990–2996. [Google Scholar]
- Arul, M.; Kareem, A. Applications of shapelet transform to time series classification of earthquake, wind and wave data. Eng. Struct. 2021, 228, 111564. [Google Scholar] [CrossRef]
- Roscher, R.; Waske, B. Shapelet-based sparse representation for landcover classification of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2015, 54, 1623–1634. [Google Scholar] [CrossRef]
- He, W.; Cheng, M.; Liu, Q.; Li, Z. Shapewordnet: An interpretable shapelet neural network for physiological signal classification. In Proceedings of the International Conference on Database Systems for Advanced Applications, Tianjin, China, 17–20 April 2023; pp. 353–369. [Google Scholar]
- Duan, Y.; Song, C.; Zhang, Y.; Cheng, P.; Mei, S. STMSF: Swin Transformer with Multi-Scale Fusion for Remote Sensing Scene Classification. Remote Sens. 2025, 17, 668. [Google Scholar] [CrossRef]
- Zheng, F.; Lin, S.; Zhou, W.; Huang, H. A lightweight dual-branch swin transformer for remote sensing scene classification. Remote Sens. 2023, 15, 2865. [Google Scholar] [CrossRef]
- Dang, G.; Mao, Z.; Zhang, T.; Liu, T.; Wang, T.; Li, L.; Gao, Y.; Tian, R.; Wang, K.; Han, L. Joint superpixel and Transformer for high resolution remote sensing image classification. Sci. Rep. 2024, 14, 5054. [Google Scholar] [CrossRef]
- Chen, K.; Liu, C.; Chen, H.; Zhang, H.; Li, W.; Zou, Z.; Shi, Z. RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]
- Chen, K.; Liu, C.; Chen, B.; Li, W.; Zou, Z.; Shi, Z. Dynamicvis: An efficient and general visual foundation model for remote sensing image understanding. arXiv 2025, arXiv:2503.16426. [Google Scholar]
- Chen, K.; Chen, B.; Liu, C.; Li, W.; Zou, Z.; Shi, Z. Rsmamba: Remote sensing image classification with state space model. IEEE Geosci. Remote Sens. Lett. 2024, 21, 8002605. [Google Scholar] [CrossRef]
- Bao, M.; Lyu, S.; Xu, Z.; Zhou, H.; Ren, J.; Xiang, S.; Li, X.; Cheng, G. Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook. arXiv 2025, arXiv:2505.00630. [Google Scholar]
- Paranjape, J.N.; De Melo, C.; Patel, V.M. A mamba-based siamese network for remote sensing change detection. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, 28 February–4 March 2025; pp. 1186–1196. [Google Scholar]
- Bai, C.; Zhao, D.; Zhang, M.; Zhang, J. Multimodal information fusion for weather systems and clouds identification from satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7333–7345. [Google Scholar] [CrossRef]
- An, S.; Zhang, L.; Li, X.; Zhang, G.; Li, P.; Zhao, K.; Ma, H.; Lian, Z. Global–Local Feature Fusion of Swin Kansformer Novel Network for Complex Scene Classification in Remote Sensing Images. Remote Sens. 2025, 17, 1137. [Google Scholar] [CrossRef]
- Zhao, H.; Lu, Z.; Sun, S.; Wang, P.; Jia, T.; Xie, Y.; Xu, F. Classification of Large Scale Hyperspectral Remote Sensing Images Based on LS3EU-Net++. Remote Sens. 2025, 17, 872. [Google Scholar] [CrossRef]
- Bai, C.; Zhang, M.; Zhang, J.; Zheng, J.; Chen, S. LSCIDMR: Large-scale satellite cloud image database for meteorological research. IEEE Trans. Cybern. 2021, 52, 12538–12550. [Google Scholar] [CrossRef] [PubMed]
- Yuan, L.; Chen, Y.; Wang, T.; Yu, W.; Shi, Y.; Jiang, Z.H.; Tay, F.E.; Feng, J.; Yan, S. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 558–567. [Google Scholar]
- Le, X.M.; Luo, L.; Aickelin, U.; Tran, M.T. Shapeformer: Shapelet transformer for multivariate time series classification. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 1484–1494. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
- Cui, Y.; Wang, R.; Si, Y.; Zhang, S.; Wang, Y.; Lin, A. T-type inverter fault diagnosis based on GASF and improved AlexNet. Energy Rep. 2023, 9, 2718–2731. [Google Scholar] [CrossRef]
- Su, B.; Liu, D.; Chen, Q.; Han, D.; Wu, J. A Method for Identifying Wheat Stripe Rust Resistance Levels Based on Time Series Vegetation Index. Trans. Chin. Soc. Agric. Eng. 2024, 40, 155–165. [Google Scholar]
- Yu, D.; Zhang, B.; Zhao, C.; Guo, H.; Lu, J. Remote Sensing Image Scene Classification Using Convolutional Neural Networks and Ensemble Learning. J. Remote Sens. 2021, 24, 717–727. [Google Scholar]
- Juba, B.; Le, H.S. Precision-recall versus accuracy and the role of large data sets. In Proceedings of the AAAI Conference on Artificial Intelligence, Hilton Hawaiian Village, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4039–4048. [Google Scholar]
- Sekrecka, A.; Karwowska, K. Classical vs. Machine Learning-Based Inpainting for Enhanced Classification of Remote Sensing Image. Remote Sens. 2025, 17, 1305. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Chen, X.; Zheng, X.; Zhang, Y.; Lu, X. Remote Sensing Scene Classification by Local–Global Mutual Learning. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6506405. [Google Scholar] [CrossRef]
- Shi, C.; Wang, T.; Wang, L. Branch feature fusion convolution network for remote sensing scene classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5194–5210. [Google Scholar] [CrossRef]
- Tang, F.; Ding, J.; Wang, L.; Ning, C.; Zhou, S.K. CMUNeXt: An Efficient Medical Image Segmentation Network based on Large Kernel and Skip Fusion. arXiv 2023, arXiv:2308.01239. [Google Scholar]
- Li, Y.; Yan, B.; Hou, J.; Bai, B.; Huang, X.; Xu, C.; Fang, L. UNet based on dynamic convolution decomposition and triplet attention. Sci. Rep. 2024, 14, 271. [Google Scholar] [CrossRef]
- Zhang, C.; Wang, B. Progressive Feature Fusion Framework Based on Graph Convolutional Network for Remote Sensing Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3270–3284. [Google Scholar] [CrossRef]
- Zhang, X.; Gao, Y.; Lin, J.; Lu, C.T. Tapnet: Multivariate time series classification with attentional prototypical network. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, YSA, 7–12 February 2020; Volume 34, pp. 6845–6852. [Google Scholar]
- Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate LSTM-FCNs for time series classification. Neural Netw. 2019, 116, 237–245. [Google Scholar] [CrossRef] [PubMed]
- Foumani, N.M.; Tan, C.W.; Webb, G.I.; Salehi, M. Improving position encoding of transformers for multivariate time series classification. Data Min. Knowl. Discov. 2024, 38, 22–48. [Google Scholar] [CrossRef]
- Zerveas, G.; Jayaraman, S.; Patel, D.; Bhamidipaty, A.; Eickhoff, C. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Online, 14–18 August 2021; pp. 2114–2124. [Google Scholar]
- Zuo, R.; Li, G.; Choi, B.; Bhowmick, S.S.; Mah, D.N.y.; Wong, G.L. SVP-T: A shape-level variable-position transformer for multivariate time series classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11497–11505. [Google Scholar]
- Bagnall, A.; Dau, H.A.; Lines, J.; Flynn, M.; Large, J.; Bostrom, A.; Southam, P.; Keogh, E. The UEA multivariate time series classification archive. arXiv 2018, arXiv:1811.00075. [Google Scholar]
Dataset Type | Dataset | Classes | Total | Sample Size |
---|---|---|---|---|
Remote Sensing | LSCIDMR-S | 11 | 104,390 | 256 × 256 |
UCM | 21 | 2100 | 256 × 256 | |
UEA | BasicMotions | 4 | 80 | 6 × 100 |
Epilepsy | 4 | 275 | 3 × 206 | |
Heartbeat | 2 | 409 | 61 × 405 | |
StandWalkJump | 3 | 27 | 4 × 2500 | |
Libras | 15 | 360 | 2 × 45 | |
DuckDuckGeese | 5 | 100 | 1345 × 270 | |
RacketSports | 4 | 303 | 6 × 30 |
Type | Number in LSCIDMR-S | Ratio |
---|---|---|
Desert | 4518 | 4.32% |
Extratropical Cyclone | 4984 | 4.77% |
Frontal Surface | 634 | 0.61% |
High-Ice Cloud | 5278 | 5.05% |
Low-Water Cloud | 1774 | 1.70% |
Ocean | 4042 | 3.87% |
Snow | 7631 | 7.31% |
Tropical Cyclone | 3305 | 3.17% |
Vegetation | 7831 | 7.50% |
Westerly Jet | 628 | 0.60% |
PatchElse | 63,765 | 61.07% |
Total Number | 104,390 | 100% |
Processing Stage | S-Channel | I-Channel | G-Channel |
---|---|---|---|
Input Shape | X:(B,V,T) | X:(B,V,T) | X:(B,V,T) |
Feature Extraction | Shapelet I:(B,K,d) | Inception V:(B,4C,1,T) | Diff GASF Images (B,V,T,T) |
Feature Encoding | Transformer Encoder Fshape:(B,E) | Transformer Encoder Ftemporal:(B,E) | ResNet Fspace:(B,E) |
Base Classifier | Cshape:(B,N) | Ctemporal:(B,N) | Cspace:(B,N) |
Feature Fusion | EAFM: Ffused:(B,N) | ||
Model Output | Y:(B,N) |
Symbol | Definition |
---|---|
B | Batch size, i.e., the number of samples in each mini-batch during model training. |
V | Feature dimension of the input multivariate time series, i.e., the number of features obtained from satellite cloud images. |
T | Temporal length of the time series. Each time-step corresponds to one satellite cloud image. |
K | Total number of Shapelets extracted per class. |
d | Embedding dimension of Shapelet features. |
E | Dimension of feature vectors output by the encoder from each branch, used as input to base classifiers. |
N | Total number of target classes (11 in this study). |
X | Model input: a multivariate time-series representation of satellite image sequences, with shape (B, V, T). |
, | |
, | Feature vectors from the Shapelet, Inception, and GASF branches, respectively; each has a dimension of E. |
Final fused representation obtained using Ensemble-Attention Feature Mixer (EAFM). | |
Y | Predicted output label vector of the model. |
GASF | Gramian Angular Summation Field—converts time series into 2D spatial images. |
GELU | Gaussian Error Linear Unit—a smooth, non-linear activation function commonly used in deep learning. |
Type | Train | Test | Total Number |
---|---|---|---|
Desert | 400 | 100 | 500 |
Extratropical Cyclone | 400 | 100 | 500 |
Frontal Surface | 200 | 50 | 250 |
High-Ice Cloud | 400 | 100 | 500 |
Low-Water Cloud | 400 | 100 | 500 |
Ocean | 400 | 100 | 500 |
Snow | 400 | 100 | 500 |
Tropical Cyclone | 400 | 100 | 500 |
Vegetation | 400 | 100 | 500 |
Westerly Jet | 200 | 50 | 250 |
PatchElse | 400 | 100 | 500 |
Category | Method | Precision (%) | Recall (%) | F1-Score (%) | OA(%) |
---|---|---|---|---|---|
CNN | ResNet-50 | 87.72 ± 0.17 | 85.30 ± 0.17 | 86.23 ± 0.15 | 85.30 ± 0.23 |
DenseNet | 88.31 ± 0.10 | 87.65 ± 0.13 | 87.75 ± 0.11 | 87.65 ± 0.07 | |
PCNet | 86.81 ± 0.22 | 73.52 ± 0.23 | 77.51 ± 0.20 | 73.52 ± 0.21 | |
LCNN-BFF | 89.07 ± 0.24 | 84.47 ± 0.35 | 86.05 ± 0.20 | 84.47 ± 0.26 | |
CMUNetx | 90.80 ± 0.13 | 88.40 ± 0.12 | 87.30 ± 0.12 | 88.40 ± 0.11 | |
DTA-Unet | 92.20 ± 0.22 | 90.60 ± 0.29 | 89.90 ± 0.17 | 90.57 ± 0.20 | |
PFFGCN | 93.48 ± 0.41 | 93.50 ± 0.48 | 93.50 ± 0.39 | 93.50 ± 0.40 | |
LSTM | TapNet | 90.16 ± 0.29 | 82.74 ± 0.31 | 85.76 ± 0.28 | 82.74 ± 0.35 |
MLSTM-FCNS | 80.24 ± 0.35 | 76.35 ± 0.42 | 78.01 ± 0.40 | 76.35 ± 0.42 | |
Transformer | VIT | 83.81 ± 0.21 | 72.31 ± 0.20 | 77.38 ± 0.16 | 72.31 ± 0.13 |
ConvTran | 89.36 ± 0.20 | 85.94 ± 0.22 | 87.12 ± 0.19 | 85.94 ± 0.20 | |
TST | 91.20 ± 0.31 | 93.60 ± 0.24 | 92.30 ± 0.20 | 93.57 ± 0.16 | |
SVP-T | 93.40 ± 0.19 | 93.90 ± 0.15 | 93.80 ± 0.16 | 93.92 ± 0.18 | |
ShapeFormer | 96.88 ± 0.27 | 97.10 ± 0.21 | 96.90 ± 0.20 | 97.14 ± 0.19 | |
Ours | Ours | 99.46 ± 0.23 | 99.40 ± 0.17 | 99.40 ± 0.19 | 99.36 ± 0.20 |
Feature Fusion Strategy | Precision (%) | Recall (%) | F1-Score(%) | OA(%) |
---|---|---|---|---|
Trichannel + Concat | 98.57 | 98.50 | 98.46 | 98.28 |
Trichannel + Stacking | 99.00 | 98.85 | 98.80 | 98.85 |
EAFM (Proposed) | 99.46 | 99.40 | 99.40 | 99.36 |
Method | Training Time/(min) | Trainable Parameters (M) |
---|---|---|
ResNet-50 | 1.6633 | 23.53 |
DenseNet | 0.4267 | 0.77 |
PCNet | 0.6477 | 42.94 |
LCNN-BFF | 2.1755 | 9.42 |
CMUNetx | 0.7548 | 1.25 |
DTA-Unet | 28.0667 | 82.43 |
PFFGCN | 37.3333 | 105.73 |
TapNet | 0.3267 | 159.40 |
MLSTM-FCNS | 0.1813 | 6.63 |
VIT | 0.8757 | 85.81 |
ConvTran | 3.4000 | 16.80 |
TST | 1.2000 | 1.08 |
SVP-T | 0.8010 | 0.06 |
Shapeformer | 1.5000 | 5.20 |
Ours | 3.5000 | 7.98 |
Inception | GASF | FD-Order | EAFM | Precision (%) | Recall (%) | F1-Score (%) | OA (%) |
---|---|---|---|---|---|---|---|
× | × | ✓ | ✓ | 99.29 | 99.30 | 99.30 | 99.28 |
✓ | × | × | ✓ | 98.70 | 98.60 | 98.60 | 98.57 |
✓ | ✓ | × | ✓ | 99.30 | 99.28 | 99.28 | 99.29 |
✓ | × | ✓ | ✓ | 99.46 | 99.40 | 99.40 | 99.36 |
✓ | × | ✓ | × | 98.80 | 98.60 | 98.60 | 98.57 |
Method | Precision (%) | Recall (%) | F1-Score (%) | OA (%) |
---|---|---|---|---|
ResNet-50 | 89.27 | 87.86 | 88.08 | 87.86 |
DenseNet | 92.38 | 93.22 | 92.39 | 92.38 |
PCNet | 88.50 | 87.60 | 87.30 | 87.60 |
LCNN-BFF | 87.48 | 88.02 | 87.70 | 87.73 |
CMUNetx | 92.52 | 92.31 | 92.41 | 92.52 |
DTA-Unet | 91.43 | 92.28 | 91.33 | 91.43 |
PFFGCN | 89.33 | 88.10 | 88.14 | 88.10 |
VIT | 79.52 | 80.14 | 78.38 | 79.52 |
Shapeformer | 92.30 | 92.25 | 92.31 | 92.30 |
Ours | 93.50 | 94.10 | 93.52 | 93.50 |
Method | BasicMotions | Epilepsy | DuckGeese | Heartbeat | StandWalkJump | Libras | RacketSports |
---|---|---|---|---|---|---|---|
TapNet | 94.50 | 88.60 | 57.10 | 71.80 | 42.20 | 83.50 | 84.10 |
MLSTM-FCNS | 89.20 | 76.10 | 62.50 | 66.30 | 12.60 | 85.60 | 80.30 |
ConvTran | 98.30 | 93.00 | 61.20 | 72.42 | 33.30 | 87.00 | 86.20 |
TST | 94.80 | 86.70 | 42.40 | 68.40 | 20.00 | 84.44 | 83.10 |
SVP-T | 95.10 | 92.80 | 51.42 | 71.20 | 46.60 | 85.10 | 81.00 |
Shapeformer | 97.30 | 93.47 | 60.42 | 72.68 | 52.22 | 89.44 | 84.21 |
Ours | 98.17 | 89.13 | 58.42 | 77.42 | 53.33 | 94.23 | 86.84 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; Lu, Z.; Lu, B.; Li, Z.; Chen, Z.; Ma, Y. SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification. Remote Sens. 2025, 17, 2034. https://doi.org/10.3390/rs17122034
Liu X, Lu Z, Lu B, Li Z, Chen Z, Ma Y. SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification. Remote Sensing. 2025; 17(12):2034. https://doi.org/10.3390/rs17122034
Chicago/Turabian StyleLiu, Xuan, Zhenyu Lu, Bingjian Lu, Zhuang Li, Zhongfeng Chen, and Yongjie Ma. 2025. "SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification" Remote Sensing 17, no. 12: 2034. https://doi.org/10.3390/rs17122034
APA StyleLiu, X., Lu, Z., Lu, B., Li, Z., Chen, Z., & Ma, Y. (2025). SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification. Remote Sensing, 17(12), 2034. https://doi.org/10.3390/rs17122034