MDPI - Publisher of Open Access Journals

19 pages, 709 KiB

Open AccessArticle

Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks

by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng

Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025

Viewed by 127

Abstract

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article

(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)

► Show Figures

Figure 1

20 pages, 26018 KiB

Open AccessArticle

An Accuracy Assessment of the ESTARFM Data-Fusion Model in Monitoring Lake Dynamics

by Can Peng, Yuanyuan Liu, Liwen Chen, Yanfeng Wu, Jingxuan Sun, Yingna Sun, Guangxin Zhang, Yuxuan Zhang, Yangguang Wang, Min Du and Peng Qi

Water 2025, 17(14), 2057; https://doi.org/10.3390/w17142057 - 9 Jul 2025

Viewed by 209

Abstract

High-spatiotemporal-resolution remote sensing data are of great significance for surface monitoring. However, existing remote sensing data cannot simultaneously meet the demands for high temporal and spatial resolution. Spatiotemporal fusion algorithms are effective solutions to this problem. Among these, the ESTARFM (Enhanced Spatiotemporal Adaptive [...] Read more.

High-spatiotemporal-resolution remote sensing data are of great significance for surface monitoring. However, existing remote sensing data cannot simultaneously meet the demands for high temporal and spatial resolution. Spatiotemporal fusion algorithms are effective solutions to this problem. Among these, the ESTARFM (Enhanced Spatiotemporal Adaptive Reflection Fusion Model) algorithm has been widely used for the fusion of multi-source remote sensing data to generate high spatiotemporal resolution remote sensing data, owing to its robustness. However, most existing studies have been limited to applying ESTARFM for the fusion of single-surface-element data and have paid less attention to the effects of multi-band remote sensing data fusion and its accuracy analysis. For this reason, this study selects Chagan Lake as the study area and conducts a detailed evaluation of the performance of the ESTARFM in fusing six bands—visible, near-infrared, infrared, and far-infrared—using metrics such as the correlation coefficient and Root Mean Square Error (RMSE). The results show that (1) the ESTARFM fusion image is highly consistent with the clear-sky Landsat image, with the coefficients of determination (R²) for all six bands exceeding 0.8; (2) the Normalized Difference Vegetation Index (NDVI) (R² = 0.87, RMSE = 0.023) and the Normalized Difference Water Index (NDWI) (R² = 0.93, RMSE = 0.022), derived from the ESTARFM fusion data, are closely aligned with the real values; (3) the evaluation and analysis of different bands for various land-use types reveal that R² generally exhibits a favorable trend. This study extends the application of the ESTARFM to inland water monitoring and can be applied to scenarios similar to Chagan Lake, facilitating the acquisition of high-frequency water-quality information. Full article

(This article belongs to the Special Issue Drought Evaluation Under Climate Change Condition)

► Show Figures

Figure 1

25 pages, 4232 KiB

Open AccessArticle

Multimodal Fusion Image Stabilization Algorithm for Bio-Inspired Flapping-Wing Aircraft

by Zhikai Wang, Sen Wang, Yiwen Hu, Yangfan Zhou, Na Li and Xiaofeng Zhang

Biomimetics 2025, 10(7), 448; https://doi.org/10.3390/biomimetics10070448 - 7 Jul 2025

Viewed by 355

Abstract

This paper presents FWStab, a specialized video stabilization dataset tailored for flapping-wing platforms. The dataset encompasses five typical flight scenarios, featuring 48 video clips with intense dynamic jitter. The corresponding Inertial Measurement Unit (IMU) sensor data are synchronously collected, which jointly provide reliable [...] Read more.

This paper presents FWStab, a specialized video stabilization dataset tailored for flapping-wing platforms. The dataset encompasses five typical flight scenarios, featuring 48 video clips with intense dynamic jitter. The corresponding Inertial Measurement Unit (IMU) sensor data are synchronously collected, which jointly provide reliable support for multimodal modeling. Based on this, to address the issue of poor image acquisition quality due to severe vibrations in aerial vehicles, this paper proposes a multi-modal signal fusion video stabilization framework. This framework effectively integrates image features and inertial sensor features to predict smooth and stable camera poses. During the video stabilization process, the true camera motion originally estimated based on sensors is warped to the smooth trajectory predicted by the network, thereby optimizing the inter-frame stability. This approach maintains the global rigidity of scene motion, avoids visual artifacts caused by traditional dense optical flow-based spatiotemporal warping, and rectifies rolling shutter-induced distortions. Furthermore, the network is trained in an unsupervised manner by leveraging a joint loss function that integrates camera pose smoothness and optical flow residuals. When coupled with a multi-stage training strategy, this framework demonstrates remarkable stabilization adaptability across a wide range of scenarios. The entire framework employs Long Short-Term Memory (LSTM) to model the temporal characteristics of camera trajectories, enabling high-precision prediction of smooth trajectories. Full article

► Show Figures

Figure 1

28 pages, 35973 KiB

Open AccessArticle

SFT-GAN: Sparse Fast Transformer Fusion Method Based on GAN for Remote Sensing Spatiotemporal Fusion

by Zhaoxu Ma, Wenxing Bao, Wei Feng, Xiaowu Zhang, Xuan Ma and Kewen Qu

Remote Sens. 2025, 17(13), 2315; https://doi.org/10.3390/rs17132315 - 5 Jul 2025

Viewed by 258

Abstract

Multi-source remote sensing spatiotemporal fusion aims to enhance the temporal continuity of high-spatial, low-temporal-resolution images. In recent years, deep learning-based spatiotemporal fusion methods have achieved significant progress in this field. However, existing methods face three major challenges. First, large differences in spatial resolution [...] Read more.

Multi-source remote sensing spatiotemporal fusion aims to enhance the temporal continuity of high-spatial, low-temporal-resolution images. In recent years, deep learning-based spatiotemporal fusion methods have achieved significant progress in this field. However, existing methods face three major challenges. First, large differences in spatial resolution among heterogeneous remote sensing images hinder the reconstruction of high-quality texture details. Second, most current deep learning-based methods prioritize spatial information while overlooking spectral information. Third, these methods often depend on complex network architectures, resulting in high computational costs. To address the aforementioned challenges, this article proposes a Sparse Fast Transformer fusion method based on Generative Adversarial Network (SFT-GAN). First, the method introduces a multi-scale feature extraction and fusion architecture to capture temporal variation features and spatial detail features across multiple scales. A channel attention mechanism is subsequently designed to integrate these heterogeneous features adaptively. Secondly, two information compensation modules are introduced: detail compensation module, which enhances high-frequency information to improve the fidelity of spatial details; spectral compensation module, which improves spectral fidelity by leveraging the intrinsic spectral correlation of the image. In addition, the proposed sparse fast transformer significantly reduces both the computational and memory complexity of the method. Experimental results on four publicly available benchmark datasets showed that the proposed SFT-GAN achieved the best performance compared with state-of-the-art methods in fusion accuracy while reducing computational cost by approximately 70%. Additional classification experiments further validated the practical effectiveness of SFT-GAN. Overall, this approach presents a new paradigm for balancing accuracy and efficiency in spatiotemporal fusion. Full article

(This article belongs to the Special Issue Remote Sensing Data Fusion and Applications (2nd Edition))

► Show Figures

Figure 1

20 pages, 4929 KiB

Open AccessArticle

Remote Sensing Image-Based Building Change Detection: A Case Study of the Qinling Mountains in China

by Lei Fu, Yunfeng Zhang, Keyun Zhao, Lulu Zhang, Ying Li, Changjing Shang and Qiang Shen

Remote Sens. 2025, 17(13), 2249; https://doi.org/10.3390/rs17132249 - 30 Jun 2025

Viewed by 269

Abstract

With the widespread application of deep learning in Earth observation, remote sensing image-based building change detection has achieved numerous groundbreaking advancements. However, differences across time periods caused by temporal variations in land cover, as well as the complex spatial structures in remote sensing [...] Read more.

With the widespread application of deep learning in Earth observation, remote sensing image-based building change detection has achieved numerous groundbreaking advancements. However, differences across time periods caused by temporal variations in land cover, as well as the complex spatial structures in remote sensing scenes, significantly constrain the performance of change detection. To address these challenges, a change detection algorithm based on spatio-spectral information aggregation is proposed, which consists of two key modules: the Cross-Scale Heterogeneous Convolution module (CSHConv) and the Spatio-Spectral Information Fusion module (SSIF). CSHConv mitigates information loss caused by scale heterogeneity, thereby enhancing the effective utilization of multi-scale features. Meanwhile, SSIF models spatial and spectral information jointly, capturing interactions across different spatial scales and spectral domains. This investigation is illustrated with a case study conducted with the real-world dataset QL-CD (Qinling change detection), acquired in the Qinling region of China. The work includes the construction of QL-CD, which includes 12,724 pairs of images captured by the Gaofen-1 satellite. Experimental results demonstrate that the proposed approach outperforms a wide range of state-of-the-art algorithms. Full article

(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)

► Show Figures

Figure 1

22 pages, 27201 KiB

Open AccessArticle

Spatiotemporal Interactive Learning for Cloud Removal Based on Multi-Temporal SAR–Optical Images

by Chenrui Xu, Zhenfei Wang, Liang Chen and Xiangchao Meng

Remote Sens. 2025, 17(13), 2169; https://doi.org/10.3390/rs17132169 - 24 Jun 2025

Viewed by 361

Abstract

Optical remote sensing images suffer from information loss due to cloud interference, while Synthetic Aperture Radar (SAR), capable of all-weather and day–night imaging capabilities, provides crucial auxiliary data for cloud removal and reconstruction. However, existing cloud removal methods face the following key challenges: [...] Read more.

Optical remote sensing images suffer from information loss due to cloud interference, while Synthetic Aperture Radar (SAR), capable of all-weather and day–night imaging capabilities, provides crucial auxiliary data for cloud removal and reconstruction. However, existing cloud removal methods face the following key challenges: insufficient utilization of spatiotemporal information in multi-temporal data, and fusion challenges arising from fundamentally different imaging mechanisms between optical and SAR images. To address these challenges, a spatiotemporal feature interaction-based cloud removal method is proposed to effectively fuse SAR and optical images. Built upon a conditional generative adversarial network framework, the method incorporates three key modules: a multi-temporal spatiotemporal feature joint extraction module, a spatiotemporal information interaction module, and a spatiotemporal discriminator module. These components jointly establish a many-to-many spatiotemporal interactive learning network, which separately extracts and fuses spatiotemporal features from multi-temporal SAR–optical image pairs to generate temporally consistent, cloud-free image sequences. Experiments on both simulated and real datasets demonstrate the superior performance of the proposed method. Full article

► Show Figures

Figure 1

28 pages, 114336 KiB

Open AccessArticle

Mamba-STFM: A Mamba-Based Spatiotemporal Fusion Method for Remote Sensing Images

by Qiyuan Zhang, Xiaodan Zhang, Chen Quan, Tong Zhao, Wei Huo and Yuanchen Huang

Remote Sens. 2025, 17(13), 2135; https://doi.org/10.3390/rs17132135 - 21 Jun 2025

Viewed by 464

Abstract

Spatiotemporal fusion techniques can generate remote sensing imagery with high spatial and temporal resolutions, thereby facilitating Earth observation. However, traditional methods are constrained by linear assumptions; generative adversarial networks suffer from mode collapse; convolutional neural networks struggle to capture global context; and Transformers [...] Read more.

Spatiotemporal fusion techniques can generate remote sensing imagery with high spatial and temporal resolutions, thereby facilitating Earth observation. However, traditional methods are constrained by linear assumptions; generative adversarial networks suffer from mode collapse; convolutional neural networks struggle to capture global context; and Transformers are hard to scale due to quadratic computational complexity and high memory consumption. To address these challenges, this study introduces an end-to-end remote sensing image spatiotemporal fusion approach based on the Mamba architecture (Mamba-spatiotemporal fusion model, Mamba-STFM), marking the first application of Mamba in this domain and presenting a novel paradigm for spatiotemporal fusion model design. Mamba-STFM consists of a feature extraction encoder and a feature fusion decoder. At the core of the encoder is the visual state space-FuseCore-AttNet block (VSS-FCAN block), which deeply integrates linear complexity cross-scan global perception with a channel attention mechanism, significantly reducing quadratic-level computation and memory overhead while improving inference throughput through parallel scanning and kernel fusion techniques. The decoder’s core is the spatiotemporal mixture-of-experts fusion module (STF-MoE block), composed of our novel spatial expert and temporal expert modules. The spatial expert adaptively adjusts channel weights to optimize spatial feature representation, enabling precise alignment and fusion of multi-resolution images, while the temporal expert incorporates a temporal squeeze-and-excitation mechanism and selective state space model (SSM) techniques to efficiently capture short-range temporal dependencies, maintain linear sequence modeling complexity, and further enhance overall spatiotemporal fusion throughput. Extensive experiments on public datasets demonstrate that Mamba-STFM outperforms existing methods in fusion quality; ablation studies validate the effectiveness of each core module; and efficiency analyses and application comparisons further confirm the model’s superior performance. Full article

► Show Figures

Figure 1

17 pages, 548 KiB

Open AccessArticle

Enhanced Localisation and Handwritten Digit Recognition Using ConvCARU

by Sio-Kei Im and Ka-Hou Chan

Appl. Sci. 2025, 15(12), 6772; https://doi.org/10.3390/app15126772 - 16 Jun 2025

Viewed by 291

Abstract

Predicting the motion of handwritten digits in video sequences is challenging due to complex spatiotemporal dependencies, variable writing styles, and the need to preserve fine-grained visual details—all of which are essential for real-time handwriting recognition and digital learning applications. In this context, our [...] Read more.

Predicting the motion of handwritten digits in video sequences is challenging due to complex spatiotemporal dependencies, variable writing styles, and the need to preserve fine-grained visual details—all of which are essential for real-time handwriting recognition and digital learning applications. In this context, our study aims to develop a robust predictive framework that can accurately forecast digit trajectories while preserving structural integrity. To address these challenges, we propose a novel video prediction architecture integrating ConvCARU with a modified DCGAN to effectively separate the background from the foreground. This ensures the enhanced extraction and preservation of spatial and temporal features through convolution-based gating and adaptive fusion mechanisms. Based on extensive experiments conducted on the MNIST dataset, which comprises 70 K pixel images, our approach achieves an SSIM of 0.901 and a PSNR of 29.31 dB. This reflects a statistically significant improvement in PSNR of +0.20 dB (p < 0.05) compared to current state-of-the-art models, thus demonstrating its superior capability in maintaining consistent structural fidelity in predicted video frames. Furthermore, our framework performs better in terms of computational efficiency, with lower memory consumption compared to most other approaches. This underscores its practicality for deployment in real-time, resource-constrained applications. These promising results consequently validate the effectiveness of our integrated ConvCARU–DCGAN approach in capturing fine-grained spatiotemporal dependencies, positioning it as a compelling solution for enhancing video-based handwriting recognition and sequence forecasting. This paves the way for its adoption in diverse applications requiring high-resolution, efficient motion prediction. Full article

► Show Figures

Figure 1

23 pages, 5598 KiB

Open AccessArticle

A Bidirectional Cross Spatiotemporal Fusion Network with Spectral Restoration for Remote Sensing Imagery

by Dandan Zhou, Ke Wu and Gang Xu

Appl. Sci. 2025, 15(12), 6649; https://doi.org/10.3390/app15126649 - 13 Jun 2025

Viewed by 381

Abstract

Existing deep learning-based spatiotemporal fusion (STF) methods for remote sensing imagery often focus exclusively on capturing temporal changes or enhancing spatial details while failing to fully leverage spectral information from coarse images. To address these limitations, we propose a Bidirectional Cross Spatiotemporal Fusion [...] Read more.

Existing deep learning-based spatiotemporal fusion (STF) methods for remote sensing imagery often focus exclusively on capturing temporal changes or enhancing spatial details while failing to fully leverage spectral information from coarse images. To address these limitations, we propose a Bidirectional Cross Spatiotemporal Fusion Network with Spectral Restoration (BCSR-STF). The network integrates temporal and spatial information using a Bidirectional Cross Fusion (BCF) module and restores spectral fidelity through a Global Spectral Restoration and Feature Enhancement (GSRFE) module, which combines Adaptive Instance Normalization and spatial attention mechanisms. Additionally, a Progressive Spatiotemporal Feature Fusion and Restoration (PSTFR) module employs multi-scale iterative optimization to enhance the interaction between high- and low-level features. Experiments on three datasets demonstrate the superiority of BCSR-STF, achieving significant improvements in capturing seasonal variations and handling abrupt land cover changes compared to state-of-the-art methods. Full article

(This article belongs to the Section Earth Sciences)

► Show Figures

Figure 1

32 pages, 8835 KiB

Open AccessArticle

SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification

by Xuan Liu, Zhenyu Lu, Bingjian Lu, Zhuang Li, Zhongfeng Chen and Yongjie Ma

Remote Sens. 2025, 17(12), 2034; https://doi.org/10.3390/rs17122034 - 12 Jun 2025

Viewed by 1440

Abstract

Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most existing models—such as those based on convolutional [...] Read more.

Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most existing models—such as those based on convolutional neural networks (CNNs), Transformer architectures, and their variants like Swin Transformer—primarily focus on spatial modeling of static images and do not explicitly incorporate temporal information, thereby limiting their ability to effectively integrate spatiotemporal features. To address this limitation, we propose SIG-ShapeFormer, a novel classification model specifically designed for satellite cloud images with temporal continuity. To the best of our knowledge, this work is the first to transform satellite cloud data into multivariate time series and introduce a unified framework for multi-scale and multimodal feature fusion. SIG-Shapeformer consists of three core components: (1) a Shapelet-based module that captures discriminative and interpretable local temporal patterns; (2) a multi-scale Inception module combining 1D convolutions and Transformer encoders to extract temporal features across different scales; and (3) a differentially enhanced Gramian Angular Summation Field (GASF) module that converts time series into 2D texture representations, significantly improving the recognition of cloud internal structures. Experimental results demonstrate that SIG-ShapeFormer achieves a classification accuracy of 99.36% on the LSCIDMR-S dataset, outperforming the original ShapeFormer by 2.2% and outperforming other CNN- or Transformer-based models. Moreover, the model exhibits strong generalization performance on the UCM remote sensing dataset and several benchmark tasks from the UEA time-series archive. SIG-Shapeformer is particularly suitable for remote sensing applications involving continuous temporal sequences, such as extreme weather warnings and dynamic cloud system monitoring. However, it relies on temporally coherent input data and may perform suboptimally when applied to datasets with limited or irregular temporal resolution. Full article

► Show Figures

Figure 1

31 pages, 8699 KiB

Open AccessArticle

Transformer-Based Dual-Branch Spatial–Temporal–Spectral Feature Fusion Network for Paddy Rice Mapping

by Xinxin Zhang, Hongwei Wei, Yuzhou Shao, Haijun Luan and Da-Han Wang

Remote Sens. 2025, 17(12), 1999; https://doi.org/10.3390/rs17121999 - 10 Jun 2025

Viewed by 365

Abstract

Deep neural network fusion approaches utilizing multimodal remote sensing are essential for crop mapping. However, challenges such as insufficient spatiotemporal feature extraction and ineffective fusion strategies still exist, leading to a decrease in mapping accuracy and robustness when these approaches are applied across [...] Read more.

Deep neural network fusion approaches utilizing multimodal remote sensing are essential for crop mapping. However, challenges such as insufficient spatiotemporal feature extraction and ineffective fusion strategies still exist, leading to a decrease in mapping accuracy and robustness when these approaches are applied across spatial‒temporal regions. In this study, we propose a novel rice mapping approach based on dual-branch transformer fusion networks, named RDTFNet. Specifically, we implemented a dual-branch encoder that is based on two improved transformer architectures. One is a multiscale transformer block used to extract spatial–spectral features from a single-phase optical image, and the other is a Restormer block used to extract spatial–temporal features from time-series synthetic aperture radar (SAR) images. Both extracted features were then combined into a feature fusion module (FFM) to generate fully fused spatial–temporal–spectral (STS) features, which were finally fed into the decoder of the U-Net structure for rice mapping. The model’s performance was evaluated through experiments with the Sentinel-1 and Sentinel-2 datasets from the United States. Compared with conventional models, the RDTFNet model achieved the best performance, and the overall accuracy (OA), intersection over union (IoU), precision, recall and F1-score were 96.95%, 88.12%, 95.14%, 92.27% and 93.68%, respectively. The comparative results show that the OA, IoU, accuracy, recall and F1-score improved by 1.61%, 5.37%, 5.16%, 1.12% and 2.53%, respectively, over those of the baseline model, demonstrating its superior performance for rice mapping. Furthermore, in subsequent cross-regional and cross-temporal tests, RDTFNet outperformed other classical models, achieving improvements of 7.11% and 12.10% in F1-score, and 11.55% and 18.18% in IoU, respectively. These results further confirm the robustness of the proposed model. Therefore, the proposed RDTFNet model can effectively fuse STS features from multimodal images and exhibit strong generalization capabilities, providing valuable information for governments in agricultural management. Full article

(This article belongs to the Special Issue Improving Remote Sensing Crop Mapping and Yield Estimation by New Techniques)

► Show Figures

Figure 1

23 pages, 1894 KiB

Open AccessArticle

ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention

by Yunan Qiu, Bingjian Lu, Wenrui Xiong, Zhenyu Lu, Le Sun and Yingjie Cui

Remote Sens. 2025, 17(12), 1966; https://doi.org/10.3390/rs17121966 - 6 Jun 2025

Viewed by 426

Abstract

Weather radar, as a crucial component of remote sensing data, plays a vital role in convective weather forecasting through radar echo extrapolation techniques. To address the limitations of existing deep learning methods in radar echo extrapolation, this paper proposes a radar echo extrapolation [...] Read more.

Weather radar, as a crucial component of remote sensing data, plays a vital role in convective weather forecasting through radar echo extrapolation techniques. To address the limitations of existing deep learning methods in radar echo extrapolation, this paper proposes a radar echo extrapolation model based on video vision transformer and spatiotemporal sparse attention (ViViT-Prob). The model takes historical sequences as input and initially maps them into a fixed-dimensional vector space through 3D convolutional patch encoding. Subsequently, a multi-head spatiotemporal fusion module with sparse attention encodes these vectors, effectively capturing spatiotemporal relationships between different regions in the sequences. The sparse constraint enables better utilization of data structural information, enhanced focus on critical regions, and reduced computational complexity. Finally, a parallel output decoder generates all time step predictions simultaneously, then maps back to the prediction space through a deconvolution module to reconstruct high-resolution images. Our experimental results on the Moving MNIST and real radar echo dataset demonstrate that the proposed model achieves superior performance in spatiotemporal sequence prediction and improves the prediction accuracy while maintaining structural consistency in radar echo extrapolation tasks, providing an effective solution for short-term precipitation forecasting. Full article

► Show Figures

Figure 1

22 pages, 9648 KiB

Open AccessArticle

Three-Dimensional Real-Scene-Enhanced GNSS/Intelligent Vision Surface Deformation Monitoring System

by Yuanrong He, Weijie Yang, Qun Su, Qiuhua He, Hongxin Li, Shuhang Lin and Shaochang Zhu

Appl. Sci. 2025, 15(9), 4983; https://doi.org/10.3390/app15094983 - 30 Apr 2025

Viewed by 566

Abstract

With the acceleration of urbanization, surface deformation monitoring has become crucial. Existing monitoring systems face several challenges, such as data singularity, the poor nighttime monitoring quality of video surveillance, and fragmented visual data. To address these issues, this paper presents a 3D real-scene [...] Read more.

With the acceleration of urbanization, surface deformation monitoring has become crucial. Existing monitoring systems face several challenges, such as data singularity, the poor nighttime monitoring quality of video surveillance, and fragmented visual data. To address these issues, this paper presents a 3D real-scene (3DRS)-enhanced GNSS/intelligent vision surface deformation monitoring system. The system integrates GNSS monitoring terminals and multi-source meteorological sensors to accurately capture minute displacements at monitoring points and multi-source Internet of Things (IoT) data, which are then automatically stored in MySQL databases. To enhance the functionality of the system, the visual sensor data are fused with 3D models through streaming media technology, enabling 3D real-scene augmented reality to support dynamic deformation monitoring and visual analysis. WebSocket-based remote lighting control is implemented to enhance the quality of video data at night. The spatiotemporal fusion of UAV aerial data with 3D models is achieved through Blender image-based rendering, while edge detection is employed to extract crack parameters from intelligent inspection vehicle data. The 3DRS model is constructed through UAV oblique photography, 3D laser scanning, and the combined use of SVSGeoModeler and SketchUp. A visualization platform for surface deformation monitoring is built on the 3DRS foundation, adopting an “edge collection–cloud fusion–terminal interaction” approach. This platform dynamically superimposes GNSS and multi-source IoT monitoring data onto the 3D spatial base, enabling spatiotemporal correlation analysis of millimeter-level displacements and early risk warning. Full article

► Show Figures

Figure 1

36 pages, 11592 KiB

Open AccessArticle

A Novel Approach Based on Hypergraph Convolutional Neural Networks for Cartilage Shape Description and Longitudinal Prediction of Knee Osteoarthritis Progression

by John B. Theocharis, Christos G. Chadoulos and Andreas L. Symeonidis

Mach. Learn. Knowl. Extr. 2025, 7(2), 40; https://doi.org/10.3390/make7020040 - 26 Apr 2025

Viewed by 712

Abstract

Knee osteoarthritis (KOA) is a highly prevalent muscoloskeletal joint disorder affecting a significant portion of the population worldwide. Accurate predictions of KOA progression can assist clinicians in drawing preventive strategies for patients. In this paper, we present an integrated approach based [...] Read more.

Knee osteoarthritis (KOA) is a highly prevalent muscoloskeletal joint disorder affecting a significant portion of the population worldwide. Accurate predictions of KOA progression can assist clinicians in drawing preventive strategies for patients. In this paper, we present an integrated approach based on hypergraph convolutional networks (HGCNs) for longitudinal predictions of KOA grades and progressions from MRI images. We propose two novel models, namely, the C_Shape.Net and the predictor network. The C_Shape.Net operates on a hypergraph of volumetric nodes, especially designed to represent the surface and volumetric features of the cartilage. It encompasses deep HGCN convolutions, graph pooling, and readout operations in a hierarchy of layers, providing, at the output, expressive 3D shape descriptors of the cartilage volume. The predictor is a spatio-temporal HGCN network (ST_HGCN), following the sequence-to-sequence learning scheme. Concretely, it transforms sequences of knee representations at the historical stage into sequences of KOA predictions at the prediction stage. The predictor includes spatial HGCN convolutions, attention-based temporal fusion of feature embeddings at multiple layers, and a transformer module that generates longitudinal predictions at follow-up times. We present comprehensive experiments on the Osteoarthritis Initiative (OAI) cohort to evaluate the performance of our methodology for various tasks, including node classification, longitudinal KL grading, and progression. The basic finding of the experiments is that the larger the depth of the historical stage, the higher the accuracy of the obtained predictions in all tasks. For the maximum historic depth of four years, our method yielded an average balanced accuracy (BA) of 85.94% in KOA grading, and accuracies of 91.89% (+1), 88.11% (+2), 84.35% (+3), and 79.41% (+4) for the four consecutive follow-up visits. Under the same setting, we also achieved an average value of Area Under Curve (AUC) of 0.94 for the prediction of progression incidence, and follow-up AUC values of 0.81 (+1), 0.77 (+2), 0.73 (+3), and 0.68 (+4), respectively. Full article

(This article belongs to the Section Network)

► Show Figures

Figure 1

27 pages, 2001 KiB

Open AccessReview

Recent Research Progress of Graph Neural Networks in Computer Vision

by Zhiyong Jia, Chuang Wang, Yang Wang, Xinrui Gao, Bingtao Li, Lifeng Yin and Huayue Chen

Electronics 2025, 14(9), 1742; https://doi.org/10.3390/electronics14091742 - 24 Apr 2025

Cited by 2 | Viewed by 2109

Abstract

Graph neural networks (GNNs) have demonstrated significant potential in the field of computer vision in recent years, particularly in handling non-Euclidean data and capturing complex spatial and semantic relationships. This paper provides a comprehensive review of the latest research on GNNs in computer [...] Read more.

Graph neural networks (GNNs) have demonstrated significant potential in the field of computer vision in recent years, particularly in handling non-Euclidean data and capturing complex spatial and semantic relationships. This paper provides a comprehensive review of the latest research on GNNs in computer vision, with a focus on their applications in image processing, video analysis, and multimodal data fusion. First, we briefly introduce common GNN models, such as graph convolutional networks (GCN) and graph attention networks (GAT), and analyze their advantages in image and video data processing. Subsequently, this paper delves into the applications of GNNs in tasks such as object detection, image segmentation, and video action recognition, particularly in capturing inter-region dependencies and spatiotemporal dynamics. Finally, the paper discusses the applications of GNNs in multimodal data fusion tasks such as image–text matching and cross-modal retrieval, and highlights the main challenges faced by GNNs in computer vision, including computational complexity, dynamic graph modeling, heterogeneous graph processing, and interpretability issues. This paper provides a comprehensive understanding of the applications of GNNs in computer vision for both academia and industry and envisions future research directions. Full article

(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)

► Show Figures

Figure 1

Search Results (248)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (248)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI