MDPI - Publisher of Open Access Journals

32 pages, 8835 KB

Open AccessArticle

SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification

by Xuan Liu, Zhenyu Lu, Bingjian Lu, Zhuang Li, Zhongfeng Chen and Yongjie Ma

Remote Sens. 2025, 17(12), 2034; https://doi.org/10.3390/rs17122034 - 12 Jun 2025

Viewed by 1843

Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most existing models—such as those based on convolutional neural networks (CNNs), Transformer architectures, and their variants like Swin Transformer—primarily focus on spatial modeling of static images and do not explicitly incorporate temporal information, thereby limiting their ability to effectively integrate spatiotemporal features. To address this limitation, we propose SIG-ShapeFormer, a novel classification model specifically designed for satellite cloud images with temporal continuity. To the best of our knowledge, this work is the first to transform satellite cloud data into multivariate time series and introduce a unified framework for multi-scale and multimodal feature fusion. SIG-Shapeformer consists of three core components: (1) a Shapelet-based module that captures discriminative and interpretable local temporal patterns; (2) a multi-scale Inception module combining 1D convolutions and Transformer encoders to extract temporal features across different scales; and (3) a differentially enhanced Gramian Angular Summation Field (GASF) module that converts time series into 2D texture representations, significantly improving the recognition of cloud internal structures. Experimental results demonstrate that SIG-ShapeFormer achieves a classification accuracy of 99.36% on the LSCIDMR-S dataset, outperforming the original ShapeFormer by 2.2% and outperforming other CNN- or Transformer-based models. Moreover, the model exhibits strong generalization performance on the UCM remote sensing dataset and several benchmark tasks from the UEA time-series archive. SIG-Shapeformer is particularly suitable for remote sensing applications involving continuous temporal sequences, such as extreme weather warnings and dynamic cloud system monitoring. However, it relies on temporally coherent input data and may perform suboptimally when applied to datasets with limited or irregular temporal resolution. Full article

► Show Figures

Figure 1

19 pages, 1789 KB

Open AccessArticle

Optimization of Temporal Feature Attribution and Sequential Dependency Modeling for High-Precision Multi-Step Resource Forecasting: A Methodological Framework and Empirical Evaluation

by Jiaqi Shen, Peiwen Qin, Rui Zhong and Peiyao Han

Mathematics 2025, 13(8), 1339; https://doi.org/10.3390/math13081339 - 19 Apr 2025

Viewed by 751

Abstract

This paper presents a comprehensive time-series analysis framework leveraging the Temporal Fusion Transformer (TFT) architecture to address the challenge of multi-horizon forecasting in complex ecological systems, specifically focusing on global fishery resources. Using global fishery data spanning 70 years (1950–2020), enhanced with key climate indicators, we develop a methodology for predicting time-dependent patterns across three-year, five-year, and extended seven-year horizons. Our approach integrates static metadata with temporal features, including historical catch and climate data, through a specialized architecture incorporating variable selection networks, multi-head attention mechanisms, and bidirectional encoding layers. A comparative analysis demonstrates the TFT model’s robust performance against traditional methods (ARIMA), standard deep learning models (MLP, LSTM), and contemporary architectures (TCN, XGBoost). While competitive across different horizons, TFT excels in the 7-year forecast, achieving a mean absolute percentage error (MAPE) of 13.7%, outperforming the next best model (LSTM, 15.1%). Through a sensitivity analysis, we identify the optimal temporal granularity and historical context length for maximizing prediction accuracy. The variable selection component reveals differential weighting, with recent market observations (past 1-year catch: 31%) and climate signals (ONI index: 15%, SST anomaly: 10%) playing significant roles. A species-specific analysis uncovers variations in predictability patterns. Ablation experiments quantify the contributions of the architectural components. The proposed methodology offers practical applications for resource management and theoretical insights into modeling temporal dependencies in complex ecological data. Full article

(This article belongs to the Special Issue Deep Neural Network: Theory, Algorithms and Applications)

► Show Figures

Figure 1

18 pages, 1165 KB

Open AccessArticle

Appearance-Based Gaze Estimation Method Using Static Transformer Temporal Differential Network

by Yujie Li, Longzhao Huang, Jiahui Chen, Xiwen Wang and Benying Tan

Mathematics 2023, 11(3), 686; https://doi.org/10.3390/math11030686 - 29 Jan 2023

Cited by 9 | Viewed by 4229

Abstract

Gaze behavior is important and non-invasive human–computer interaction information that plays an important role in many fields—including skills transfer, psychology, and human–computer interaction. Recently, improving the performance of appearance-based gaze estimation, using deep learning techniques, has attracted increasing attention: however, several key problems in these deep-learning-based gaze estimation methods remain. Firstly, the feature fusion stage is not fully considered: existing methods simply concatenate the different obtained features into one feature, without considering their internal relationship. Secondly, dynamic features can be difficult to learn, because of the unstable extraction process of ambiguously defined dynamic features. In this study, we propose a novel method to consider feature fusion and dynamic feature extraction problems. We propose the static transformer module (STM), which uses a multi-head self-attention mechanism to fuse fine-grained eye features and coarse-grained facial features. Additionally, we propose an innovative recurrent neural network (RNN) cell—that is, the temporal differential module (TDM)—which can be used to extract dynamic features. We integrated the STM and the TDM into the static transformer with a temporal differential network (STTDN). We evaluated the STTDN performance, using two publicly available datasets (MPIIFaceGaze and Eyediap), and demonstrated the effectiveness of the STM and the TDM. Our results show that the proposed STTDN outperformed state-of-the-art methods, including that of Eyediap (by 2.9%). Full article

(This article belongs to the Special Issue Learning-Based Control and Nonlinear Optimization: Theory, Models, Algorithms, and Applications)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI