MDPI - Publisher of Open Access Journals

25 pages, 6911 KiB

Open AccessArticle

Image Inpainting Algorithm Based on Structure-Guided Generative Adversarial Network

by Li Zhao, Tongyang Zhu, Chuang Wang, Feng Tian and Hongge Yao

Mathematics 2025, 13(15), 2370; https://doi.org/10.3390/math13152370 (registering DOI) - 24 Jul 2025

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a [...] Read more.

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a two-stage restoration paradigm: (1) Structural Prior Extraction, where adaptive edge detection algorithms identify residual contours in corrupted regions, and a transformer-enhanced network reconstructs globally consistent structural maps through contextual feature propagation; (2) Structure-Constrained Texture Synthesis, wherein a multi-scale generator with hybrid dilated convolutions and channel attention mechanisms iteratively refines high-fidelity textures under explicit structural guidance. The framework introduces three innovations: (1) a hierarchical feature fusion architecture that synergizes multi-scale receptive fields with spatial-channel attention to preserve long-range dependencies and local details simultaneously; (2) spectral-normalized Markovian discriminator with gradient-penalty regularization, enabling adversarial training stability while enforcing patch-level structural consistency; and (3) dual-branch loss formulation combining perceptual similarity metrics with edge-aware constraints to align synthesized content with both semantic coherence and geometric fidelity. Our experiments on the two benchmark datasets (Places2 and CelebA) have demonstrated that our framework achieves more unified textures and structures, bringing the restored images closer to their original semantic content. Full article

► Show Figures

Figure 1

21 pages, 5616 KiB

Open AccessArticle

Symmetry-Guided Dual-Branch Network with Adaptive Feature Fusion and Edge-Aware Attention for Image Tampering Localization

by Zhenxiang He, Le Li and Hanbin Wang

Symmetry 2025, 17(7), 1150; https://doi.org/10.3390/sym17071150 - 18 Jul 2025

Viewed by 175

Abstract

When faced with diverse types of image tampering and image quality degradation in real-world scenarios, traditional image tampering localization methods often struggle to balance boundary accuracy and robustness. To address these issues, this paper proposes a symmetric guided dual-branch image tampering localization network—FENet [...] Read more.

When faced with diverse types of image tampering and image quality degradation in real-world scenarios, traditional image tampering localization methods often struggle to balance boundary accuracy and robustness. To address these issues, this paper proposes a symmetric guided dual-branch image tampering localization network—FENet (Fusion-Enhanced Network)—that integrates adaptive feature fusion and edge attention mechanisms. This method is based on a structurally symmetric dual-branch architecture, which extracts RGB semantic features and SRM noise residual information to comprehensively capture the fine-grained differences in tampered regions at the visual and statistical levels. To effectively fuse different features, this paper designs a self-calibrating fusion module (SCF), which introduces a content-aware dynamic weighting mechanism to adaptively adjust the importance of different feature branches, thereby enhancing the discriminative power and expressiveness of the fused features. Furthermore, considering that image tampering often involves abnormal changes in edge structures, we further propose an edge-aware coordinate attention mechanism (ECAM). By jointly modeling spatial position information and edge-guided information, the model is guided to focus more precisely on potential tampering boundaries, thereby enhancing its boundary detection and localization capabilities. Experiments on public datasets such as Columbia, CASIA, and NIST16 demonstrate that FENet achieves significantly better results than existing methods. We also analyze the model’s performance under various image quality conditions, such as JPEG compression and Gaussian blur, demonstrating its robustness in real-world scenarios. Experiments in Facebook, Weibo, and WeChat scenarios show that our method achieves average F1 scores that are 2.8%, 3%, and 5.6% higher than those of existing state-of-the-art methods, respectively. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Image Processing and Computer Vision Using Embedded Systems)

► Show Figures

Figure 1

20 pages, 2926 KiB

Open AccessArticle

SonarNet: Global Feature-Based Hybrid Attention Network for Side-Scan Sonar Image Segmentation

by Juan Lei, Huigang Wang, Liming Fan, Qingyue Gu, Shaowei Rong and Huaxia Zhang

Remote Sens. 2025, 17(14), 2450; https://doi.org/10.3390/rs17142450 - 15 Jul 2025

Viewed by 232

Abstract

With the rapid advancement of deep learning techniques, side-scan sonar image segmentation has become a crucial task in underwater scene understanding. However, the complex and variable underwater environment poses significant challenges for salient object detection, with traditional deep learning approaches often suffering from [...] Read more.

With the rapid advancement of deep learning techniques, side-scan sonar image segmentation has become a crucial task in underwater scene understanding. However, the complex and variable underwater environment poses significant challenges for salient object detection, with traditional deep learning approaches often suffering from inadequate feature representation and the loss of global context during downsampling, thus compromising the segmentation accuracy of fine structures. To address these issues, we propose SonarNet, a Global Feature-Based Hybrid Attention Network specifically designed for side-scan sonar image segmentation. SonarNet features a dual-encoder architecture that leverages residual blocks and a self-attention mechanism to simultaneously capture both global structural and local contextual information. In addition, an adaptive hybrid attention module is introduced to effectively integrate channel and spatial features, while a global enhancement block fuses multi-scale global and spatial representations from the dual encoders, mitigating information loss throughout the network. Comprehensive experiments on a dedicated underwater sonar dataset demonstrate that SonarNet outperforms ten state-of-the-art saliency detection methods, achieving a mean absolute error as low as 2.35%. These results highlight the superior performance of SonarNet in challenging sonar image segmentation tasks. Full article

(This article belongs to the Special Issue Ocean Remote Sensing Based on Radar, Sonar and Optical Techniques (Second Edition))

► Show Figures

Graphical abstract

46 pages, 5911 KiB

Open AccessArticle

Leveraging Prior Knowledge in Semi-Supervised Learning for Precise Target Recognition

by Guohao Xie, Zhe Chen, Yaan Li, Mingsong Chen, Feng Chen, Yuxin Zhang, Hongyan Jiang and Hongbing Qiu

Remote Sens. 2025, 17(14), 2338; https://doi.org/10.3390/rs17142338 - 8 Jul 2025

Viewed by 296

Abstract

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, [...] Read more.

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, enhanced by domain-specific prior knowledge. The architecture employs a Convolutional Block Attention Module (CBAM) for localized feature refinement, a lightweight New Transformer Encoder for global context modeling, and a novel TriFusion Block to synergize spectral–temporal–spatial features through parallel multi-branch fusion, addressing the limitations of single-modality extraction. Leveraging the mean teacher framework, DART-MT optimizes consistency regularization to exploit unlabeled data, effectively mitigating class imbalance and annotation scarcity. Evaluations on the DeepShip and ShipsEar datasets demonstrate state-of-the-art accuracy: with 10% labeled data, DART-MT achieves 96.20% (DeepShip) and 94.86% (ShipsEar), surpassing baseline models by 7.2–9.8% in low-data regimes, while reaching 98.80% (DeepShip) and 98.85% (ShipsEar) with 90% labeled data. Under varying noise conditions (−20 dB to 20 dB), the model maintained a robust performance (F1-score: 92.4–97.1%) with 40% lower variance than its competitors, and ablation studies validated each module’s contribution (TriFusion Block alone improved accuracy by 6.9%). This research advances UATR by (1) resolving multi-scale feature fusion bottlenecks, (2) demonstrating the efficacy of semi-supervised learning in marine acoustics, and (3) providing an open-source implementation for reproducibility. In future work, we will extend cross-domain adaptation to diverse oceanic environments. Full article

(This article belongs to the Special Issue Remote Sensing Target Recognition and Detection: Theory and Applications (Second Edition))

► Show Figures

Figure 1

22 pages, 670 KiB

Open AccessArticle

LDC-GAT: A Lyapunov-Stable Graph Attention Network with Dynamic Filtering and Constraint-Aware Optimization

by Liping Chen, Hongji Zhu and Shuguang Han

Axioms 2025, 14(7), 504; https://doi.org/10.3390/axioms14070504 - 27 Jun 2025

Viewed by 197

Abstract

Graph attention networks are pivotal for modeling non-Euclidean data, yet they face dual challenges: training oscillations induced by projection-based high-dimensional constraints and gradient anomalies due to poor adaptation to heterophilic structure. To address these issues, we propose LDC-GAT (Lyapunov-Stable Graph Attention Network with [...] Read more.

Graph attention networks are pivotal for modeling non-Euclidean data, yet they face dual challenges: training oscillations induced by projection-based high-dimensional constraints and gradient anomalies due to poor adaptation to heterophilic structure. To address these issues, we propose LDC-GAT (Lyapunov-Stable Graph Attention Network with Dynamic Filtering and Constraint-Aware Optimization), which jointly optimizes both forward and backward propagation processes. In the forward path, we introduce Dynamic Residual Graph Filtering, which integrates a tunable self-loop coefficient to balance neighborhood aggregation and self-feature retention. This filtering mechanism, constrained by a lower bound on Dirichlet energy, improves multi-head attention via multi-scale fusion and mitigates overfitting. In the backward path, we design the Fro-FWNAdam, a gradient descent algorithm guided by a learning-rate-aware perceptron. An explicit Frobenius norm bound on weights is derived from Lyapunov theory to form the basis of the perceptron. This stability-aware optimizer is embedded within a Frank–Wolfe framework with Nesterov acceleration, yielding a projection-free constrained optimization strategy that stabilizes training dynamics. Experiments on six benchmark datasets show that LDC-GAT outperforms GAT by 10.54% in classification accuracy, which demonstrates strong robustness on heterophilic graphs. Full article

(This article belongs to the Section Mathematical Analysis)

► Show Figures

Figure 1

24 pages, 2802 KiB

Open AccessArticle

MSDCA: A Multi-Scale Dual-Branch Network with Enhanced Cross-Attention for Hyperspectral Image Classification

by Ning Jiang, Shengling Geng, Yuhui Zheng and Le Sun

Remote Sens. 2025, 17(13), 2198; https://doi.org/10.3390/rs17132198 - 26 Jun 2025

Viewed by 346

Abstract

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, [...] Read more.

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, a multiscale 3D spatial–spectral feature extraction module (3D-SSF) employs parallel 3D convolutional branches with diverse kernel sizes and dilation rates, enabling hierarchical modeling of spatial–spectral representations from large-scale patches and effectively capturing both fine-grained textures and global context. Second, a multi-branch directional feature module (MBDFM) enhances the network’s sensitivity to directional patterns and long-range spatial relationships. It achieves this by applying axis-aware depthwise separable convolutions along both horizontal and vertical axes, thereby significantly improving the representation of spatial features. Finally, the enhanced cross-attention Transformer encoder (ECATE) integrates a dual-branch fusion strategy, where a cross-attention stream learns semantic dependencies across multi-scale tokens, and a residual path ensures the preservation of structural integrity. The fused features are further refined through lightweight channel and spatial attention modules. This adaptive alignment process enhances the discriminative power of heterogeneous spatial–spectral features. The experimental results on three widely used benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches in terms of classification accuracy and robustness. Notably, the framework is particularly effective for small-sample classes and complex boundary regions, while maintaining high computational efficiency. Full article

► Show Figures

Graphical abstract

22 pages, 1422 KiB

Open AccessArticle

MA-YOLO: A Pest Target Detection Algorithm with Multi-Scale Fusion and Attention Mechanism

by Yongzong Lu, Pengfei Liu and Chong Tan

Agronomy 2025, 15(7), 1549; https://doi.org/10.3390/agronomy15071549 - 25 Jun 2025

Viewed by 406

Abstract

Agricultural pest detection is critical for crop protection and food security, yet existing methods suffer from low computational efficiency and poor generalization due to imbalanced data distribution, minimal inter-class variations among pest categories, and significant intra-class differences. To address the high computational complexity [...] Read more.

Agricultural pest detection is critical for crop protection and food security, yet existing methods suffer from low computational efficiency and poor generalization due to imbalanced data distribution, minimal inter-class variations among pest categories, and significant intra-class differences. To address the high computational complexity and inadequate feature representation in traditional convolutional networks, this study proposes MA-YOLO, an agricultural pest detection model based on multi-scale fusion and attention mechanisms. The SDConv module reduces computational costs through depthwise separable convolution and dynamic group convolution while enhancing local feature extraction. The LDSPF module captures multi-scale information via parallel dilated convolutions with spatial attention mechanisms and dual residual connections. The ASCC module improves feature discriminability by establishing an adaptive triple-weight system for global, channel, and spatial semantic responses. The MDF module balances efficiency and multi-scale feature extraction using multi-branch depthwise separable convolution and soft attention-based dynamic weighting. Experimental results demonstrate detection accuracies of 65.4% and 73.9% on the IP102 and Pest24 datasets, respectively, representing improvements of 2% and 1.8% over the original YOLOv11s network. These results establish MA-YOLO as an effective solution for automated agricultural pest monitoring with applications in precision agriculture and crop protection systems. Full article

(This article belongs to the Collection Advances of Agricultural Robotics in Sustainable Agriculture 4.0)

► Show Figures

Figure 1

34 pages, 18851 KiB

Open AccessArticle

Dual-Branch Multi-Dimensional Attention Mechanism for Joint Facial Expression Detection and Classification

by Cheng Peng, Bohao Li, Kun Zou, Bowen Zhang, Genan Dai and Ah Chung Tsoi

Sensors 2025, 25(12), 3815; https://doi.org/10.3390/s25123815 - 18 Jun 2025

Viewed by 342

Abstract

This paper addresses the central issue arising from the (SDAC) of facial expressions, namely, to balance the competing demands of good global features for detection, and fine features for good facial expression classifications by replacing the feature extraction part of the “neck” network [...] Read more.

This paper addresses the central issue arising from the (SDAC) of facial expressions, namely, to balance the competing demands of good global features for detection, and fine features for good facial expression classifications by replacing the feature extraction part of the “neck” network in the feature pyramid network in the You Only Look Once X (YOLOX) framework with a novel architecture involving three attention mechanisms—batch, channel, and neighborhood—which respectively explores the three input dimensions—batch, channel, and spatial. Correlations across a batch of images in the individual path of the dual incoming paths are first extracted by a self attention mechanism in the batch dimension; these two paths are fused together to consolidate their information and then split again into two separate paths; the information along the channel dimension is extracted using a generalized form of channel attention, an adaptive graph channel attention, which provides each element of the incoming signal with a weight that is adapted to the incoming signal. The combination of these two paths, together with two skip connections from the input to the batch attention to the output of the adaptive channel attention, then passes into a residual network, with neighborhood attention to extract fine features in the spatial dimension. This novel dual path architecture has been shown experimentally to achieve a better balance between the competing demands in an SDAC problem than other competing approaches. Ablation studies enable the determination of the relative importance of these three attention mechanisms. Competitive results are obtained on two non-aligned face expression recognition datasets, RAF-DB and SFEW, when compared with other state-of-the-art methods. Full article

(This article belongs to the Special Issue Sensing Technologies Applied in Human Emotion and Facial Expression Recognition)

► Show Figures

Figure 1

19 pages, 2812 KiB

Open AccessArticle

Component Generation Network-Based Image Enhancement Method for External Inspection of Electrical Equipment

by Xiong Liu, Juan Zhang, Qiushi Cui, Yingyue Zhou, Qian Wang, Zining Zhao and Yong Li

Electronics 2025, 14(12), 2419; https://doi.org/10.3390/electronics14122419 - 13 Jun 2025

Viewed by 312

Abstract

For external inspection of electrical equipment, poor lighting conditions often lead to problems such as uneven illumination, insufficient brightness, and detail loss, which directly affect subsequent analysis. To solve this problem, the Retinex image enhancement method based on the Component Generation Network (CGNet) [...] Read more.

For external inspection of electrical equipment, poor lighting conditions often lead to problems such as uneven illumination, insufficient brightness, and detail loss, which directly affect subsequent analysis. To solve this problem, the Retinex image enhancement method based on the Component Generation Network (CGNet) is proposed in this paper. It employs CGNet to accurately estimate and generate the illumination and reflection components of the target image. The CGNet, based on UNet, integrates Residual Branch Dual-convolution blocks (RBDConv) and the Channel Attention Mechanism (CAM) to improve the feature-learning capability. By setting different numbers of network layers, the optimal estimation of the illumination and reflection components is achieved. To obtain the ideal enhancement results, gamma correction is applied to adjust the estimated illumination component, while the HSV transformation model preserves color information. Finally, the effectiveness of the proposed method is verified on a dataset of poorly illuminated images from external inspection of electrical equipment. The results show that this method not only requires no external datasets for training but also improves the detail clarity and color richness of the target image, effectively addressing poor lighting of images in external inspection of electrical equipment. Full article

► Show Figures

Figure 1

29 pages, 4553 KiB

Open AccessArticle

X-FuseRLSTM: A Cross-Domain Explainable Intrusion Detection Framework in IoT Using the Attention-Guided Dual-Path Feature Fusion and Residual LSTM

by Adel Alabbadi and Fuad Bajaber

Sensors 2025, 25(12), 3693; https://doi.org/10.3390/s25123693 - 12 Jun 2025

Viewed by 580

Abstract

Due to domain variability and developing attack tactics, intrusion detection in heterogeneous and dynamic IoT systems is still a crucial challenge. For cross-domain intrusion detection, this paper proposes a novel algorithm, X-FuseRLSTM, a dual-path feature fusion framework that is attention guided and coupled [...] Read more.

Due to domain variability and developing attack tactics, intrusion detection in heterogeneous and dynamic IoT systems is still a crucial challenge. For cross-domain intrusion detection, this paper proposes a novel algorithm, X-FuseRLSTM, a dual-path feature fusion framework that is attention guided and coupled with a residual LSTM architecture. The proposed algorithm is the combination of four major steps: first, feature extraction using deep encoder and sparse transformer; second, feature fusion of the extracted features and reducing the fused features; third, the classification model; and last, explainable artificial intelligence (XAI). The classification model used is a deep neural network and residual long short-term memory (RLSTM). The model effectively incorporates both spatial and temporal correlations in network traffic data, which improves its detection capability. The model predictions are explained using the XAI techniques. Extensive experiments on datasets including TON_IoT Network, NSL-KDD, and CICIoMT 2024 with both 19-class and 6-class variations show that X-FuseRLSTM achieves the highest accuracy of 99.40% on network, 99.72% on NSL-KDD, and 97.66% for 19-class and 98.05% for 6-class on CICIoMT 2024 datasets. The suggested method is appropriate for practical IoT security applications since it provides strong domain generalization and explainability while preserving computational efficiency. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

15 pages, 4420 KiB

Open AccessArticle

Single-Pixel Imaging Reconstruction Network with Hybrid Attention and Enhanced U-Net

by Bingrui Xiao, Huibin Wang and Yang Bu

Photonics 2025, 12(6), 607; https://doi.org/10.3390/photonics12060607 - 12 Jun 2025

Viewed by 672

Abstract

Single-pixel imaging has the characteristics of a simple structure and low cost, which means it has potential applications in many fields. This paper proposes an image reconstruction method for single-pixel imaging (SPI) based on deep learning. This method takes the Generative Adversarial Network [...] Read more.

Single-pixel imaging has the characteristics of a simple structure and low cost, which means it has potential applications in many fields. This paper proposes an image reconstruction method for single-pixel imaging (SPI) based on deep learning. This method takes the Generative Adversarial Network (GAN) as the basic architecture, combines the dense residual structure and the deep separable attention mechanism, and reduces the parameters while ensuring the diversity of feature extraction. It also reduces the amount of computation and improves the computational efficiency. In addition, dual-skip connections between the encoder and decoder parts are used to combine the original detailed information with the overall information processed by the network structure. This approach enables a more comprehensive and efficient reconstruction of the target image. Both simulations and experiments have confirmed that the proposed method can effectively reconstruct images at low sampling rates and also achieve good reconstruction results on natural images not seen during training, demonstrating a strong generalization capability. Full article

► Show Figures

Figure 1

16 pages, 1439 KiB

Open AccessArticle

An Underwater Acoustic Communication Signal Modulation-Style Recognition Algorithm Based on Dual-Feature Fusion and ResNet–Transformer Dual-Model Fusion

by Fanyu Zhou, Haoran Wu, Zhibin Yue and Han Li

Appl. Sci. 2025, 15(11), 6234; https://doi.org/10.3390/app15116234 - 1 Jun 2025

Viewed by 447

Abstract

Traditional underwater acoustic reconnaissance technologies are limited in directly detecting underwater acoustic communication signals. This paper proposes a dual-feature ResNet–Transformer model with two innovative breakthroughs: (1) A dual-modal fusion architecture of ResNet and Transformer is constructed using residual connections to alleviate gradient degradation [...] Read more.

Traditional underwater acoustic reconnaissance technologies are limited in directly detecting underwater acoustic communication signals. This paper proposes a dual-feature ResNet–Transformer model with two innovative breakthroughs: (1) A dual-modal fusion architecture of ResNet and Transformer is constructed using residual connections to alleviate gradient degradation in deep networks and combining multi-head self-attention to enhance long-distance dependency modeling. (2) The time–frequency representation obtained from the smooth pseudo-Wigner–Ville distribution is used as the first input branch, and higher-order statistics are introduced as the second input branch to enhance phase feature extraction and cope with channel interference. Experiments on the Danjiangkou measured dataset show that the model improves the accuracy by 6.67% compared with the existing Convolutional Neural Network (CNN)–Transformer model in long-distance ranges, providing an efficient solution for modulation recognition in complex underwater acoustic environments. Full article

(This article belongs to the Special Issue Emerging Technologies for Underwater Acoustic Sensing and Communication)

► Show Figures

Figure 1

19 pages, 2321 KiB

Open AccessArticle

Dual-Branch Network with Hybrid Attention for Multimodal Ophthalmic Diagnosis

by Xudong Wang, Anyu Cao, Caiye Fan, Zuoping Tan and Yuanyuan Wang

Bioengineering 2025, 12(6), 565; https://doi.org/10.3390/bioengineering12060565 - 25 May 2025

Viewed by 638

Abstract

In this paper, we propose a deep learning model based on dual-branch learning with a hybrid attention mechanism for meeting challenges in the underutilization of features in ophthalmic image diagnosis and the limited generalization ability of traditional single modal deep learning models when [...] Read more.

In this paper, we propose a deep learning model based on dual-branch learning with a hybrid attention mechanism for meeting challenges in the underutilization of features in ophthalmic image diagnosis and the limited generalization ability of traditional single modal deep learning models when using imbalanced data. Firstly, a dual-branch architecture layout is designed, in which the left and right branches use residual blocks to deal with the features of a 2D image and 3D volume, respectively. Secondly, a frequency domain transform-driven hybrid attention module is innovated, which consists of frequency domain attention, spatial attention, and channel attention, respectively, to solve the problem of inefficiency in network feature extraction. Finally, through a multi-scale grouped attention fusion mechanism, the local details and global structure information of the bimodal modalities are integrated, which solves the problem of the inefficiency of fusion caused by the heterogeneity of modal features. The experimental results show that the accuracy of MOD-Net improved by 1.66% and 1.14% over GeCoM-Net and ViT-2SPN, respectively. It can be concluded that the model effectively mines the deep correlation features of multimodal images through the hybrid attention mechanism, which provides a new paradigm for the intelligent diagnosis of ophthalmic diseases. Full article

(This article belongs to the Special Issue AI in OCT (Optical Coherence Tomography) Image Analysis)

► Show Figures

Figure 1

16 pages, 3751 KiB

Open AccessArticle

Improved Face Image Super-Resolution Model Based on Generative Adversarial Network

by Qingyu Liu, Yeguo Sun, Lei Chen and Lei Liu

J. Imaging 2025, 11(5), 163; https://doi.org/10.3390/jimaging11050163 - 19 May 2025

Viewed by 674

Abstract

Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, [...] Read more.

Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, which dynamically enhances feature representation in critical face regions through dual-branch convolution and channel-spatial attention. Second, an Edge-guided Enhancement Block (EEB) is introduced, generating adaptive detail residuals by combining edge masks and channel attention to accurately recover high-frequency textures. Furthermore, a multi-scale discriminator with a weighted sub-discriminator loss is developed to balance global structural and local detail generation quality. Additionally, a phase-wise training strategy with dynamic adjustment of learning rate (Lr) and loss function weights is implemented to improve the realism of super-resolved face images. Experiments on the CelebA-HQ dataset demonstrate that the proposed model achieves a PSNR of 23.35 dB, a SSIM of 0.7424, and a LPIPS of 24.86, outperforming classical models and delivering superior visual quality in high-frequency regions. Notably, this model also surpasses the SwinIR model (PSNR: 23.28 dB → 23.35 dB, SSIM: 0.7340 → 0.7424, and LPIPS: 30.48 → 24.86), validating the effectiveness of the improved model and the training strategy in preserving facial details. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

19 pages, 2256 KiB

Open AccessArticle

Multi-Scale Residual Convolutional Neural Network with Hybrid Attention for Bearing Fault Detection

by Yanping Zhu, Wenlong Chen, Sen Yan, Jianqiang Zhang, Chenyang Zhu, Fang Wang and Qi Chen

Machines 2025, 13(5), 413; https://doi.org/10.3390/machines13050413 - 14 May 2025

Cited by 1 | Viewed by 524

Abstract

This paper proposes an advanced deep convolutional neural network model for motor bearing fault detection that was designed to overcome the limitations of traditional models in feature extraction, accuracy, and generalization under complex operating conditions. The model combines multi-scale residuals, hybrid attention mechanisms, [...] Read more.

This paper proposes an advanced deep convolutional neural network model for motor bearing fault detection that was designed to overcome the limitations of traditional models in feature extraction, accuracy, and generalization under complex operating conditions. The model combines multi-scale residuals, hybrid attention mechanisms, and dual global pooling to enhance the performance. Convolutional layers efficiently extract features, while hybrid attention mechanisms strengthen the feature representation. The multi-scale residual network structure captures features at various scales, and fault classification is performed using global average and max pooling. The model was trained with the Adam optimizer and sparse categorical cross-entropy loss by incorporating a learning rate decay mechanism to refine the training process. Experiments on the University of Paderborn bearing dataset across four conditions showed that the model had superior performance, where it achieved a diagnostic accuracy of 99.7%, which surpassed traditional models, like AMCNN, LeNet5, and AlexNet. Comparative experiments on rolling bearing vibration and motor current datasets across four bearing conditions highlighted the model’s effectiveness and broad applicability in motor fault detection. Its robust feature extraction and classification capabilities make it a reliable solution for motor bearing fault diagnosis, with significant potential for real-world applications. This makes it a reliable solution for motor bearing fault diagnosis with significant potential for practical applications. Full article

(This article belongs to the Section Electrical Machines and Drives)

► Show Figures

Figure 1

Search Results (132)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (132)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI