MDPI - Publisher of Open Access Journals

19 pages, 1816 KiB

Open AccessArticle

Rethinking Infrared and Visible Image Fusion from a Heterogeneous Content Synergistic Perception Perspective

by Minxian Shen, Gongrui Huang, Mingye Ju and Kai-Kuang Ma

Sensors 2025, 25(15), 4658; https://doi.org/10.3390/s25154658 - 27 Jul 2025

Viewed by 263

Infrared and visible image fusion (IVIF) endeavors to amalgamate the thermal radiation characteristics from infrared images with the fine-grained texture details from visible images, aiming to produce fused outputs that are more robust and information-rich. Among the existing methodologies, those based on generative [...] Read more.

Infrared and visible image fusion (IVIF) endeavors to amalgamate the thermal radiation characteristics from infrared images with the fine-grained texture details from visible images, aiming to produce fused outputs that are more robust and information-rich. Among the existing methodologies, those based on generative adversarial networks (GANs) have demonstrated considerable promise. However, such approaches are frequently constrained by their reliance on homogeneous discriminators possessing identical architectures, a limitation that can precipitate the emergence of undesirable artifacts in the resultant fused images. To surmount this challenge, this paper introduces HCSPNet, a novel GAN-based framework. HCSPNet distinctively incorporates heterogeneous dual discriminators, meticulously engineered for the fusion of disparate source images inherent in the IVIF task. This architectural design ensures the steadfast preservation of critical information from the source inputs, even when faced with scenarios of image degradation. Specifically, the two structurally distinct discriminators within HCSPNet are augmented with adaptive salient information distillation (ASID) modules, each uniquely structured to align with the intrinsic properties of infrared and visible images. This mechanism impels the discriminators to concentrate on pivotal components during their assessment of whether the fused image has proficiently inherited significant information from the source modalities—namely, the salient thermal signatures from infrared imagery and the detailed textural content from visible imagery—thereby markedly diminishing the occurrence of unwanted artifacts. Comprehensive experimentation conducted across multiple publicly available datasets substantiates the preeminence and generalization capabilities of HCSPNet, underscoring its significant potential for practical deployment. Additionally, we also prove that our proposed heterogeneous dual discriminators can serve as a plug-and-play structure to improve the performance of existing GAN-based methods. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

31 pages, 960 KiB

Open AccessReview

Generative AI as a Pillar for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning

by Haowen Xu, Sisi Zlatanova, Ruiyu Liang and Ismet Canbulat

Fire 2025, 8(8), 293; https://doi.org/10.3390/fire8080293 - 24 Jul 2025

Viewed by 834

Abstract

Wildfires increasingly threaten human life, ecosystems, and infrastructure, with events like the 2025 Palisades and Eaton fires in Los Angeles County underscoring the urgent need for more advanced prediction frameworks. Existing physics-based and deep-learning models struggle to capture dynamic wildfire spread across both [...] Read more.

Wildfires increasingly threaten human life, ecosystems, and infrastructure, with events like the 2025 Palisades and Eaton fires in Los Angeles County underscoring the urgent need for more advanced prediction frameworks. Existing physics-based and deep-learning models struggle to capture dynamic wildfire spread across both 2D and 3D domains, especially when incorporating real-time, multimodal geospatial data. This paper explores how generative artificial intelligence (AI) models—such as GANs, VAEs, and transformers—can serve as transformative tools for wildfire prediction and simulation. These models offer superior capabilities in managing uncertainty, integrating multimodal inputs, and generating realistic, scalable wildfire scenarios. We adopt a new paradigm that leverages large language models (LLMs) for literature synthesis, classification, and knowledge extraction, conducting a systematic review of recent studies applying generative AI to fire prediction and monitoring. We highlight how generative approaches uniquely address challenges faced by traditional simulation and deep-learning methods. Finally, we outline five key future directions for generative AI in wildfire management, including unified multimodal modeling of 2D and 3D dynamics, agentic AI systems and chatbots for decision intelligence, and real-time scenario generation on mobile devices, along with a discussion of critical challenges. Our findings advocate for a paradigm shift toward multimodal generative frameworks to support proactive, data-informed wildfire response. Full article

(This article belongs to the Special Issue Fire Risk Assessment and Emergency Evacuation)

► Show Figures

Figure 1

35 pages, 1231 KiB

Open AccessReview

Toward Intelligent Underwater Acoustic Systems: Systematic Insights into Channel Estimation and Modulation Methods

by Imran A. Tasadduq and Muhammad Rashid

Electronics 2025, 14(15), 2953; https://doi.org/10.3390/electronics14152953 - 24 Jul 2025

Viewed by 306

Abstract

Underwater acoustic (UWA) communication supports many critical applications but still faces several physical-layer signal processing challenges. In response, recent advances in machine learning (ML) and deep learning (DL) offer promising solutions to improve signal detection, modulation adaptability, and classification accuracy. These developments highlight [...] Read more.

Underwater acoustic (UWA) communication supports many critical applications but still faces several physical-layer signal processing challenges. In response, recent advances in machine learning (ML) and deep learning (DL) offer promising solutions to improve signal detection, modulation adaptability, and classification accuracy. These developments highlight the need for a systematic evaluation to compare various ML/DL models and assess their performance across diverse underwater conditions. However, most existing reviews on ML/DL-based UWA communication focus on isolated approaches rather than integrated system-level perspectives, which limits cross-domain insights and reduces their relevance to practical underwater deployments. Consequently, this systematic literature review (SLR) synthesizes 43 studies (2020–2025) on ML and DL approaches for UWA communication, covering channel estimation, adaptive modulation, and modulation recognition across both single- and multi-carrier systems. The findings reveal that models such as convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and generative adversarial networks (GANs) enhance channel estimation performance, achieving error reductions and bit error rate (BER) gains ranging from

10^{- 3}

to

10^{- 6}

. Adaptive modulation techniques incorporating support vector machines (SVMs), CNNs, and reinforcement learning (RL) attain classification accuracies exceeding 98% and throughput improvements of up to 25%. For modulation recognition, architectures like sequence CNNs, residual networks, and hybrid convolutional–recurrent models achieve up to 99.38% accuracy with latency below 10 ms. These performance metrics underscore the viability of ML/DL-based solutions in optimizing physical-layer tasks for real-world UWA deployments. Finally, the SLR identifies key challenges in UWA communication, including high complexity, limited data, fragmented performance metrics, deployment realities, energy constraints and poor scalability. It also outlines future directions like lightweight models, physics-informed learning, advanced RL strategies, intelligent resource allocation, and robust feature fusion to build reliable and intelligent underwater systems. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 4920 KiB

Open AccessArticle

Martian Skylight Identification Based on the Deep Learning Model

by Lihong Li, Lingli Mu, Wei Zhang, Weihua Dong and Yuqing He

Remote Sens. 2025, 17(15), 2571; https://doi.org/10.3390/rs17152571 - 24 Jul 2025

Viewed by 286

Abstract

As a type of distinctive pit on Mars, skylights are entrances to subsurface lava caves. They are very important for studying volcanic activity and potential preserved water ice, and are also considered as potential sites for human extraterrestrial bases in the future. Most [...] Read more.

As a type of distinctive pit on Mars, skylights are entrances to subsurface lava caves. They are very important for studying volcanic activity and potential preserved water ice, and are also considered as potential sites for human extraterrestrial bases in the future. Most skylights are manually identified, which has low efficiency and is highly subjective. Although deep learning methods have recently been used to identify skylights, they face challenges of few effective samples and low identification accuracy. In this article, 151 positive samples and 920 negative samples based on the MRO-HiRISE image data was used to create an initial skylight dataset, which contained few positive samples. To augment the initial dataset, StyleGAN2-ADA was selected to synthesize some positive samples and generated an augmented dataset with 896 samples. On the basis of the augmented skylight dataset, we proposed YOLOv9-Skylight for skylight identification by incorporating Inner-EIoU loss and DySample to enhance localization accuracy and feature extracting ability. Compared with YOLOv9, the P, R, and the F1 of YOLOv9-Skylight were improved by about 9.1%, 2.8%, and 5.6%, respectively. Compared with other mainstream models such as YOLOv5, YOLOv10, Faster R-CNN, Mask R-CNN, and DETR, YOLOv9-Skylight achieved the highest accuracy (F1 = 92.5%), which shows a strong performance in skylight identification. Full article

(This article belongs to the Special Issue Remote Sensing and Photogrammetry Applied to Deep Space Exploration)

► Show Figures

Figure 1

20 pages, 1647 KiB

Open AccessArticle

Research on the Enhancement of Provincial AC/DC Ultra-High Voltage Power Grid Security Based on WGAN-GP

by Zheng Shi, Yonghao Zhang, Zesheng Hu, Yao Wang, Yan Liang, Jiaojiao Deng, Jie Chen and Dingguo An

Electronics 2025, 14(14), 2897; https://doi.org/10.3390/electronics14142897 - 19 Jul 2025

Viewed by 238

Abstract

With the advancement in the “dual carbon” strategy and the integration of high proportions of renewable energy sources, AC/DC ultra-high-power grids are facing new security challenges such as commutation failure and multi-infeed coupling effects. Fault diagnosis, as an important tool for assisting power [...] Read more.

With the advancement in the “dual carbon” strategy and the integration of high proportions of renewable energy sources, AC/DC ultra-high-power grids are facing new security challenges such as commutation failure and multi-infeed coupling effects. Fault diagnosis, as an important tool for assisting power grid dispatching, is essential for maintaining the grid’s long-term stable operation. Traditional fault diagnosis methods encounter challenges such as limited samples and data quality issues under complex operating conditions. To overcome these problems, this study proposes a fault sample data enhancement method based on the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP). Firstly, a simulation model of the AC/DC hybrid system is constructed to obtain the original fault sample data. Then, through the adoption of the Wasserstein distance measure and the gradient penalty strategy, an improved WGAN-GP architecture suitable for feature learning of the AC/DC hybrid system is designed. Finally, by comparing the fault diagnosis performance of different data models, the proposed method achieves up to 100% accuracy on certain fault types and improves the average accuracy by 6.3% compared to SMOTE and vanilla GAN, particularly under limited-sample conditions. These results confirm that the proposed approach can effectively extract fault characteristics from complex fault data. Full article

(This article belongs to the Special Issue Applications of Computational Intelligence, 3rd Edition)

► Show Figures

Figure 1

21 pages, 31171 KiB

Open AccessArticle

Local Information-Driven Hierarchical Fusion of SAR and Visible Images via Refined Modal Salient Features

by Yunzhong Yan, La Jiang, Jun Li, Shuowei Liu and Zhen Liu

Remote Sens. 2025, 17(14), 2466; https://doi.org/10.3390/rs17142466 - 16 Jul 2025

Viewed by 202

Abstract

Compared to other multi-source image fusion tasks, visible and SAR image fusion faces a lack of training data in deep learning-based methods. Introducing structural priors to design fusion networks is a viable solution. We incorporated the feature hierarchy concept from computer vision, dividing [...] Read more.

Compared to other multi-source image fusion tasks, visible and SAR image fusion faces a lack of training data in deep learning-based methods. Introducing structural priors to design fusion networks is a viable solution. We incorporated the feature hierarchy concept from computer vision, dividing deep features into low-, mid-, and high-level tiers. Based on the complementary modal characteristics of SAR and visible, we designed a fusion architecture that fully analyze and utilize the difference of hierarchical features. Specifically, our framework has two stages. In the cross-modal enhancement stage, a CycleGAN generator-based method for cross-modal interaction and input data enhancement is employed to generate pseudo-modal images. In the fusion stage, we have three innovations: (1) We designed feature extraction branches and fusion strategies differently for each level based on the features of different levels and the complementary modal features of SAR and visible to fully utilize cross-modal complementary features. (2) We proposed the Layered Strictly Nested Framework (LSNF), which emphasizes hierarchical differences and uses hierarchical characteristics, to reduce feature redundancy. (3) Based on visual saliency theory, we proposed a Gradient-weighted Pixel Loss (GWPL), which dynamically assigns higher weights to regions with significant gradient magnitudes, emphasizing high-frequency detail preservation during fusion. Experiments on the YYX-OPT-SAR and WHU-OPT-SAR datasets show that our method outperforms 11 state-of-the-art methods. Ablation studies confirm each component’s contribution. This framework effectively meets remote sensing applications’ high-precision image fusion needs. Full article

(This article belongs to the Special Issue Advancing Synthetic Aperture Radar: Imaging, Processing, and Applications in Remote Sensing)

► Show Figures

Figure 1

15 pages, 2473 KiB

Open AccessArticle

Self-Calibrating TSEP for Junction Temperature and RUL Prediction in GaN HEMTs

by Yifan Cui, Yutian Gan, Kangyao Wen, Yang Jiang, Chunzhang Chen, Qing Wang and Hongyu Yu

Nanomaterials 2025, 15(14), 1102; https://doi.org/10.3390/nano15141102 - 16 Jul 2025

Viewed by 343

Abstract

Gallium nitride high-electron-mobility transistors (GaN HEMTs) are critical for high-power applications like AI power supplies and robotics but face reliability challenges due to increased dynamic ON-resistance (R_{DS_ON}) from electrical and thermomechanical stresses. This paper presents a novel self-calibrating temperature-sensitive electrical parameter [...] Read more.

Gallium nitride high-electron-mobility transistors (GaN HEMTs) are critical for high-power applications like AI power supplies and robotics but face reliability challenges due to increased dynamic ON-resistance (R_{DS_ON}) from electrical and thermomechanical stresses. This paper presents a novel self-calibrating temperature-sensitive electrical parameter (TSEP) model that uses gate leakage current (I_G) to estimate junction temperature with high accuracy, uniquely addressing aging effects overlooked in prior studies. By integrating I_G, aging-induced degradation, and failure-in-time (FIT) models, the approach achieves a junction temperature estimation error of less than 1%. Long-term hard-switching tests confirm its effectiveness, with calibrated R_{DS_ON} measurements enabling precise remaining useful life (RUL) predictions. This methodology significantly improves GaN HEMT reliability assessment, enhancing their performance in resilient power electronics systems. Full article

(This article belongs to the Section Nanoelectronics, Nanosensors and Devices)

► Show Figures

Figure 1

24 pages, 7849 KiB

Open AccessArticle

Face Desensitization for Autonomous Driving Based on Identity De-Identification of Generative Adversarial Networks

by Haojie Ji, Liangliang Tian, Jingyan Wang, Yuchi Yao and Jiangyue Wang

Electronics 2025, 14(14), 2843; https://doi.org/10.3390/electronics14142843 - 15 Jul 2025

Viewed by 270

Abstract

Automotive intelligent agents are increasingly collecting facial data for applications such as driver behavior monitoring and identity verification. These excessive collections of facial data bring serious risks of sensitive information leakage to autonomous driving. Facial information has been explicitly required to be anonymized, [...] Read more.

Automotive intelligent agents are increasingly collecting facial data for applications such as driver behavior monitoring and identity verification. These excessive collections of facial data bring serious risks of sensitive information leakage to autonomous driving. Facial information has been explicitly required to be anonymized, but the availability of most desensitized facial data is poor, which will greatly affect its application in autonomous driving. This paper proposes an automotive sensitive information anonymization method with high-quality generated facial images by considering the data availability under privacy protection. By comparing K-Same and Generative Adversarial Networks (GANs), this paper proposes a hierarchical self-attention mechanism in StyleGAN3 to enhance the feature perception of face images. The synchronous regularization of sample data is applied to optimize the loss function of the discriminator of StyleGAN3, thereby improving the convergence stability of the model. The experimental results demonstrate that the proposed facial desensitization model reduces the Frechet inception distance (FID) and structural similarity index measure (SSIM) by 95.8% and 24.3%, respectively. The image quality and privacy desensitization of the facial data generated by the StyleGAN3 model have been fully verified in this work. This research provides an efficient and robust facial privacy protection solution for autonomous driving, which is conducive to promoting the security guarantee of automotive data. Full article

(This article belongs to the Special Issue Development and Advances in Autonomous Driving Technology)

► Show Figures

Figure 1

18 pages, 2200 KiB

Open AccessArticle

A Self-Supervised Adversarial Deblurring Face Recognition Network for Edge Devices

by Hanwen Zhang, Myun Kim, Baitong Li and Yanping Lu

J. Imaging 2025, 11(7), 241; https://doi.org/10.3390/jimaging11070241 - 15 Jul 2025

Viewed by 346

Abstract

With the advancement of information technology, human activity recognition (HAR) has been widely applied in fields such as intelligent surveillance, health monitoring, and human–computer interaction. As a crucial component of HAR, facial recognition plays a key role, especially in vision-based activity recognition. However, [...] Read more.

With the advancement of information technology, human activity recognition (HAR) has been widely applied in fields such as intelligent surveillance, health monitoring, and human–computer interaction. As a crucial component of HAR, facial recognition plays a key role, especially in vision-based activity recognition. However, current facial recognition models on the market perform poorly in handling blurry images and dynamic scenarios, limiting their effectiveness in real-world HAR applications. This study aims to construct a fast and accurate facial recognition model based on novel adversarial learning and deblurring theory to enhance its performance in human activity recognition. The model employs a generative adversarial network (GAN) as the core algorithm, optimizing its generation and recognition modules by decomposing the global loss function and incorporating a feature pyramid, thereby solving the balance challenge in GAN training. Additionally, deblurring techniques are introduced to improve the model’s ability to handle blurry and dynamic images. Experimental results show that the proposed model achieves high accuracy and recall rates across multiple facial recognition datasets, with an average recall rate of 87.40% and accuracy rates of 81.06% and 79.77% on the YTF, IMDB-WIKI, and WiderFace datasets, respectively. These findings confirm that the model effectively addresses the challenges of recognizing faces in dynamic and blurry conditions in human activity recognition, demonstrating significant application potential. Full article

(This article belongs to the Special Issue Techniques and Applications in Face Image Analysis)

► Show Figures

Figure 1

16 pages, 2365 KiB

Open AccessArticle

Fast Inference End-to-End Speech Synthesis with Style Diffusion

by Hui Sun, Jiye Song and Yi Jiang

Electronics 2025, 14(14), 2829; https://doi.org/10.3390/electronics14142829 - 15 Jul 2025

Viewed by 504

Abstract

In recent years, deep learning-based end-to-end Text-To-Speech (TTS) models have made significant progress in enhancing speech naturalness and fluency. However, existing Variational Inference Text-to-Speech (VITS) models still face challenges such as insufficient pitch modeling, inadequate contextual dependency capture, and low inference efficiency in [...] Read more.

In recent years, deep learning-based end-to-end Text-To-Speech (TTS) models have made significant progress in enhancing speech naturalness and fluency. However, existing Variational Inference Text-to-Speech (VITS) models still face challenges such as insufficient pitch modeling, inadequate contextual dependency capture, and low inference efficiency in the decoder. To address these issues, this paper proposes an improved TTS framework named Q-VITS. Q-VITS incorporates Rotary Position Embedding (RoPE) into the text encoder to enhance long-sequence modeling, adopts a frame-level prior modeling strategy to optimize one-to-many mappings, and designs a style extractor based on a diffusion model for controllable style rendering. Additionally, the proposed decoder ConfoGAN integrates explicit F0 modeling, Pseudo-Quadrature Mirror Filter (PQMF) multi-band synthesis and Conformer structure. The experimental results demonstrate that Q-VITS outperforms the VITS in terms of speech quality, pitch accuracy, and inference efficiency in both subjective Mean Opinion Score (MOS) and objective Mel-Cepstral Distortion (MCD) and Root Mean Square Error (RMSE) evaluations on a single-speaker dataset, achieving performance close to ground-truth audio. These improvements provide an effective solution for efficient and controllable speech synthesis. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

47 pages, 814 KiB

Open AccessSystematic Review

Generative Adversarial Networks in Histological Image Segmentation: A Systematic Literature Review

by Yanna Leidy Ketley Fernandes Cruz, Antonio Fhillipi Maciel Silva, Ewaldo Eder Carvalho Santana and Daniel G. Costa

Appl. Sci. 2025, 15(14), 7802; https://doi.org/10.3390/app15147802 - 11 Jul 2025

Viewed by 423

Abstract

Histological image analysis plays a crucial role in understanding and diagnosing various diseases, but manually segmenting these images is often complex, time-consuming, and heavily reliant on expert knowledge. Generative adversarial networks (GANs) have emerged as promising tools to assist in this task, enhancing [...] Read more.

Histological image analysis plays a crucial role in understanding and diagnosing various diseases, but manually segmenting these images is often complex, time-consuming, and heavily reliant on expert knowledge. Generative adversarial networks (GANs) have emerged as promising tools to assist in this task, enhancing the accuracy and efficiency of segmentation in histological images. This systematic literature review aims to explore how GANs have been utilized for segmentation in this field, highlighting the latest trends, key challenges, and opportunities for future research. The review was conducted across multiple digital libraries, including IEEE, Springer, Scopus, MDPI, and PubMed, with combinations of the keywords “generative adversarial network” or “GAN”, “segmentation” or “image segmentation” or “semantic segmentation”, and “histology” or “histological” or “histopathology” or “histopathological”. We reviewed 41 GAN-based histological image segmentation articles published between December 2014 and February 2025. We summarized and analyzed these papers based on the segmentation regions, datasets, GAN tasks, segmentation tasks, and commonly used metrics. Additionally, we discussed advantages, challenges, and future research directions. The analyzed studies demonstrated the versatility of GANs in handling challenges like stain variability, multi-task segmentation, and data scarcity—all crucial challenges in the analysis of histopathological images. Nevertheless, the field still faces important challenges, such as the need for standardized datasets, robust evaluation metrics, and better generalization across diverse tissues and conditions. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

32 pages, 12851 KiB

Open AccessArticle

Research on Autonomous Vehicle Lane-Keeping and Navigation System Based on Deep Reinforcement Learning: From Simulation to Real-World Application

by Chia-Hsin Cheng, Hsiang-Hao Lin and Yu-Yong Luo

Electronics 2025, 14(13), 2738; https://doi.org/10.3390/electronics14132738 - 7 Jul 2025

Viewed by 436

Abstract

In recent years, with the rapid development of science and technology and the substantial improvement of computing power, various deep learning research topics have been promoted. However, existing autonomous driving technologies still face significant challenges in achieving robust lane-keeping and navigation performance, especially [...] Read more.

In recent years, with the rapid development of science and technology and the substantial improvement of computing power, various deep learning research topics have been promoted. However, existing autonomous driving technologies still face significant challenges in achieving robust lane-keeping and navigation performance, especially when transferring learned models from simulation to real-world environments due to environmental complexity and domain gaps. Many fields such as computer vision, natural language processing, and medical imaging have also accelerated their development due to the emergence of this wave, and the field of self-driving cars is no exception. The trend of self-driving cars is unstoppable. Many technology companies and automobile manufacturers have invested a lot of resources in the research and development of self-driving technology. With the emergence of different levels of self-driving cars, most car manufacturers have already reached the L2 level of self-driving classification standards and are moving towards L3 and L4 levels. This study applies deep reinforcement learning (DRL) to train autonomous vehicles with lane-keeping and navigation capabilities. Through simulation training and Sim2Real strategies, including domain randomization and CycleGAN, the trained models are evaluated in real-world environments to validate performance. The results demonstrate the feasibility of DRL-based autonomous driving and highlight the challenges in transferring models from simulation to reality. Full article

(This article belongs to the Special Issue Autonomous and Connected Vehicles)

► Show Figures

Figure 1

28 pages, 35973 KiB

Open AccessArticle

SFT-GAN: Sparse Fast Transformer Fusion Method Based on GAN for Remote Sensing Spatiotemporal Fusion

by Zhaoxu Ma, Wenxing Bao, Wei Feng, Xiaowu Zhang, Xuan Ma and Kewen Qu

Remote Sens. 2025, 17(13), 2315; https://doi.org/10.3390/rs17132315 - 5 Jul 2025

Viewed by 340

Abstract

Multi-source remote sensing spatiotemporal fusion aims to enhance the temporal continuity of high-spatial, low-temporal-resolution images. In recent years, deep learning-based spatiotemporal fusion methods have achieved significant progress in this field. However, existing methods face three major challenges. First, large differences in spatial resolution [...] Read more.

Multi-source remote sensing spatiotemporal fusion aims to enhance the temporal continuity of high-spatial, low-temporal-resolution images. In recent years, deep learning-based spatiotemporal fusion methods have achieved significant progress in this field. However, existing methods face three major challenges. First, large differences in spatial resolution among heterogeneous remote sensing images hinder the reconstruction of high-quality texture details. Second, most current deep learning-based methods prioritize spatial information while overlooking spectral information. Third, these methods often depend on complex network architectures, resulting in high computational costs. To address the aforementioned challenges, this article proposes a Sparse Fast Transformer fusion method based on Generative Adversarial Network (SFT-GAN). First, the method introduces a multi-scale feature extraction and fusion architecture to capture temporal variation features and spatial detail features across multiple scales. A channel attention mechanism is subsequently designed to integrate these heterogeneous features adaptively. Secondly, two information compensation modules are introduced: detail compensation module, which enhances high-frequency information to improve the fidelity of spatial details; spectral compensation module, which improves spectral fidelity by leveraging the intrinsic spectral correlation of the image. In addition, the proposed sparse fast transformer significantly reduces both the computational and memory complexity of the method. Experimental results on four publicly available benchmark datasets showed that the proposed SFT-GAN achieved the best performance compared with state-of-the-art methods in fusion accuracy while reducing computational cost by approximately 70%. Additional classification experiments further validated the practical effectiveness of SFT-GAN. Overall, this approach presents a new paradigm for balancing accuracy and efficiency in spatiotemporal fusion. Full article

(This article belongs to the Special Issue Remote Sensing Data Fusion and Applications (2nd Edition))

► Show Figures

Figure 1

30 pages, 23006 KiB

Open AccessArticle

RaDiT: A Differential Transformer-Based Hybrid Deep Learning Model for Radar Echo Extrapolation

by Wenda Zhu, Zhenyu Lu, Yuan Zhang, Ziqi Zhao, Bingjian Lu and Ruiyi Li

Remote Sens. 2025, 17(12), 1976; https://doi.org/10.3390/rs17121976 - 6 Jun 2025

Viewed by 560

Abstract

Radar echo extrapolation, a critical spatiotemporal sequence forecasting task, requires precise modeling of motion trajectories and intensity evolution from sequential radar reflectivity inputs. Contemporary deep learning implementations face two operational limitations: progressive attenuation of predicted echo intensities during autoregressive inference and spectral leakage-induced [...] Read more.

Radar echo extrapolation, a critical spatiotemporal sequence forecasting task, requires precise modeling of motion trajectories and intensity evolution from sequential radar reflectivity inputs. Contemporary deep learning implementations face two operational limitations: progressive attenuation of predicted echo intensities during autoregressive inference and spectral leakage-induced diffusion at high-intensity echo boundaries. This study presents RaDiT, a hybrid architecture combining differential transformer with adversarial training for radar echo extrapolation. The framework employs a U-Net backbone augmented with vision transformer blocks, utilizing differential attention mechanisms to govern spatiotemporal interactions. Our differential attention mechanism enhances noise suppression under high-threshold conditions, effectively minimizing spurious feature generation while improving metric reliability. A conditional GAN discriminator is integrated to maintain microphysical consistency in generated sequences, simultaneously addressing spectral blurring and intensity dissipation. Comprehensive evaluations demonstrate RaDiT’s superior performance in preserving spatiotemporal coherence and intensity across 0–90 min forecasting horizons. The proposed architecture achieves CSI improvements of 10.23% and 2.88% at 4 × 4 and 16 × 16 spatial pooling scales, respectively, for ≥30 dBZ thresholds on the CMARC dataset compared to PreDiff. To our knowledge, this represents the first successful implementation of differential transformers for radar echo extrapolation. Full article

► Show Figures

Figure 1

17 pages, 1829 KiB

Open AccessArticle

Research on Improved Occluded-Face Restoration Network

by Shangzhen Pang, Tzer Hwai Gilbert Thio, Fei Lu Siaw, Mingju Chen and Li Lin

Symmetry 2025, 17(6), 827; https://doi.org/10.3390/sym17060827 - 26 May 2025

Viewed by 361

Abstract

The natural features of the face exhibit significant symmetry. In practical applications, faces may be partially occluded due to factors like wearing masks or glasses, or the presence of other objects. Occluded-face restoration has broad application prospects in fields such as augmented reality, [...] Read more.

The natural features of the face exhibit significant symmetry. In practical applications, faces may be partially occluded due to factors like wearing masks or glasses, or the presence of other objects. Occluded-face restoration has broad application prospects in fields such as augmented reality, virtual reality, healthcare, security, etc. It is also of significant practical importance in enhancing public safety and providing efficient services. This research establishes an improved occluded-face restoration network based on facial feature points and Generative Adversarial Networks. A facial landmark prediction network is constructed based on an improved MobileNetV3-small network. On the foundation of U-Net, dilated convolutions and residual blocks are introduced to form an enhanced generator network. Additionally, an improved discriminator network is built based on Patch-GAN. Compared to the Contextual Attention network, under various occlusions, the improved face restoration network shows a maximum increase in the Peak Signal-to-Noise Ratio of 24.47%, and in the Structural Similarity Index of 24.39%, and a decrease in the Fréchet Inception Distance of 81.1%. Compared to the Edge Connect network, under various occlusions, the improved network shows a maximum increase in the Peak Signal-to-Noise Ratio of 7.89% and in the Structural Similarity Index of 10.34%, and a decrease in the Fréchet Inception Distance of 27.2%. Compared to the LaFIn network, under various occlusions, the improved network shows a maximum increase in the Peak Signal-to-Noise Ratio of 3.4% and in the Structural Similarity Index of 3.31%, and a decrease in the Fréchet Inception Distance of 9.19%. These experiments show that the improved face restoration network yields better restoration results. Full article

(This article belongs to the Section Physics)

► Show Figures

Figure 1

Search Results (228)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (228)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI