Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,398)

Search Parameters:
Keywords = semantic attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 1866 KiB  
Article
A Spatio-Temporal Evolutionary Embedding Approach for Geographic Knowledge Graph Question Answering
by Chunju Zhang, Chaoqun Chu, Kang Zhou, Shu Wang, Yunqiang Zhu, Jianwei Huang, Zhaofu Wu and Fei Gao
ISPRS Int. J. Geo-Inf. 2025, 14(8), 295; https://doi.org/10.3390/ijgi14080295 - 28 Jul 2025
Abstract
In recent years, geographic knowledge graphs (GeoKGs) have shown great promise in representing spatio-temporal and event-driven knowledge. However, existing knowledge graph embedding approaches mainly focus on structural patterns and often overlook the dynamic evolution of entities in both time and space, which limits [...] Read more.
In recent years, geographic knowledge graphs (GeoKGs) have shown great promise in representing spatio-temporal and event-driven knowledge. However, existing knowledge graph embedding approaches mainly focus on structural patterns and often overlook the dynamic evolution of entities in both time and space, which limits their effectiveness in downstream reasoning tasks. To address this, we propose a spatio-temporal evolutionary knowledge embedding approach (ST-EKA) that enhances entity representations by modeling their evolution through type-aware encoding, temporal and spatial decay mechanisms, and context aggregation. ST-EKA integrates four core components, including an entity encoder constrained by relational type consistency, a temporal encoder capable of handling both time points and intervals through unified sampling and feedforward encoding, a multi-scale spatial encoder that combines geometric coordinates with semantic attributes, and an evolutionary knowledge encoder that employs attention-based spatio-temporal weighting to capture contextual dynamics. We evaluate ST-EKA on three representative GeoKG datasets—GDELT, ICEWS, and HAD. The results demonstrate that ST-EKA achieves an average improvement of 6.5774% in AUC and 5.0992% in APR on representation learning tasks. In question answering tasks, it yields a maximum average increase of 1.7907% in AUC and 0.5843% in APR. Notably, it exhibits superior performance in chain queries and complex spatio-temporal reasoning, validating its strong robustness, good interpretability, and practical application value. Full article
(This article belongs to the Special Issue Spatial Data Science and Knowledge Discovery)
20 pages, 9953 KiB  
Article
Dual-Branch Occlusion-Aware Semantic Part-Features Extraction Network for Occluded Person Re-Identification
by Bo Sun, Yulong Zhang, Jianan Wang and Chunmao Jiang
Mathematics 2025, 13(15), 2432; https://doi.org/10.3390/math13152432 - 28 Jul 2025
Abstract
Occlusion remains a major challenge in person re-identification, as it often leads to incomplete or misleading visual cues. To address this issue, we propose a dual-branch occlusion-aware network (DOAN), which explicitly and implicitly enhances the model’s capability to perceive and handle occlusions. The [...] Read more.
Occlusion remains a major challenge in person re-identification, as it often leads to incomplete or misleading visual cues. To address this issue, we propose a dual-branch occlusion-aware network (DOAN), which explicitly and implicitly enhances the model’s capability to perceive and handle occlusions. The proposed DOAN framework comprises two synergistic branches. In the first branch, we introduce an Occlusion-Aware Semantic Attention (OASA) module to extract semantic part features, incorporating a parallel channel and spatial attention (PCSA) block to precisely distinguish between pedestrian body regions and occlusion noise. We also generate occlusion-aware parsing labels by combining external human parsing annotations with occluder masks, providing structural supervision to guide the model in focusing on visible regions. In the second branch, we develop an occlusion-aware recovery (OAR) module that reconstructs occluded pedestrians to their original, unoccluded form, enabling the model to recover missing semantic information and enhance occlusion robustness. Extensive experiments on occluded, partial, and holistic benchmark datasets demonstrate that DOAN consistently outperforms existing state-of-the-art methods. Full article
31 pages, 103100 KiB  
Article
Semantic Segmentation of Small Target Diseases on Tobacco Leaves
by Yanze Zou, Zhenping Qiang, Shuang Zhang and Hong Lin
Agronomy 2025, 15(8), 1825; https://doi.org/10.3390/agronomy15081825 - 28 Jul 2025
Abstract
The application of image recognition technology plays a vital role in agricultural disease identification. Existing approaches primarily rely on image classification, object detection, or semantic segmentation. However, a major challenge in current semantic segmentation methods lies in accurately identifying small target objects. In [...] Read more.
The application of image recognition technology plays a vital role in agricultural disease identification. Existing approaches primarily rely on image classification, object detection, or semantic segmentation. However, a major challenge in current semantic segmentation methods lies in accurately identifying small target objects. In this study, common tobacco leaf diseases—such as frog-eye disease, climate spots, and wildfire disease—are characterized by small lesion areas, with an average target size of only 32 pixels. This poses significant challenges for existing techniques to achieve precise segmentation. To address this issue, we propose integrating two attention mechanisms, namely cross-feature map attention and dual-branch attention, which are incorporated into the semantic segmentation network to enhance performance on small lesion segmentation. Moreover, considering the lack of publicly available datasets for tobacco leaf disease segmentation, we constructed a training dataset via image splicing. Extensive experiments were conducted on baseline segmentation models, including UNet, DeepLab, and HRNet. Experimental results demonstrate that the proposed method improves the mean Intersection over Union (mIoU) by 4.75% on the constructed dataset, with only a 15.07% increase in computational cost. These results validate the effectiveness of our novel attention-based strategy in the specific context of tobacco leaf disease segmentation. Full article
(This article belongs to the Section Pest and Disease Management)
Show Figures

Figure 1

25 pages, 2518 KiB  
Article
An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving
by Jia Tian, Peizeng Xin, Xinlu Bai, Zhiguo Xiao and Nianfeng Li
Appl. Sci. 2025, 15(15), 8373; https://doi.org/10.3390/app15158373 - 28 Jul 2025
Abstract
In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where [...] Read more.
In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where strict real-time performance is essential. Achieving an effective balance between speed and accuracy has thus become a central challenge in this field. To address this issue, we present a lightweight semantic segmentation model tailored for the perception requirements of autonomous vehicles. The architecture follows an encoder–decoder paradigm, which not only preserves the capability for deep feature extraction but also facilitates multi-scale information integration. The encoder leverages a high-efficiency backbone, while the decoder introduces a dynamic fusion mechanism designed to enhance information interaction between different feature branches. Recognizing the limitations of convolutional networks in modeling long-range dependencies and capturing global semantic context, the model incorporates an attention-based feature extraction component. This is further augmented by positional encoding, enabling better awareness of spatial structures and local details. The dynamic fusion mechanism employs an adaptive weighting strategy, adjusting the contribution of each feature channel to reduce redundancy and improve representation quality. To validate the effectiveness of the proposed network, experiments were conducted on a single RTX 3090 GPU. The Dynamic Real-time Integrated Vision Encoder–Segmenter Network (DriveSegNet) achieved a mean Intersection over Union (mIoU) of 76.9% and an inference speed of 70.5 FPS on the Cityscapes test dataset, 74.6% mIoU and 139.8 FPS on the CamVid test dataset, and 35.8% mIoU with 108.4 FPS on the ADE20K dataset. The experimental results demonstrate that the proposed method achieves an excellent balance between inference speed, segmentation accuracy, and model size. Full article
Show Figures

Figure 1

24 pages, 3480 KiB  
Article
MFPI-Net: A Multi-Scale Feature Perception and Interaction Network for Semantic Segmentation of Urban Remote Sensing Images
by Xiaofei Song, Mingju Chen, Jie Rao, Yangming Luo, Zhihao Lin, Xingyue Zhang, Senyuan Li and Xiao Hu
Sensors 2025, 25(15), 4660; https://doi.org/10.3390/s25154660 - 27 Jul 2025
Abstract
To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation [...] Read more.
To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation rates attention shuffle decoder (DDRASD), a multi-scale convolutional feature enhancement module (MCFEM), and a cross-path residual fusion module (CPRFM). The Swin Transformer efficiently extracts multi-level global semantic features through its hierarchical structure and window attention mechanism. The DDRASD’s diverse dilation rates attention (DDRA) block combines convolutions with diverse dilation rates and channel-coordinate attention to enhance multi-scale contextual awareness, while Shuffle Block improves resolution via pixel rearrangement and avoids checkerboard artifacts. The MCFEM enhances local feature modeling through parallel multi-kernel convolutions, forming a complementary relationship with the Swin Transformer’s global perception capability. The CPRFM employs multi-branch convolutions and a residual multiplication–addition fusion mechanism to enhance interactions among multi-source features, thereby improving the recognition of small objects and similar categories. Experiments on the ISPRS Vaihingen and Potsdam datasets show that MFPI-Net outperforms mainstream methods, achieving 82.57% and 88.49% mIoU, validating its superior segmentation performance in urban remote sensing. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

26 pages, 3125 KiB  
Article
Tomato Leaf Disease Identification Framework FCMNet Based on Multimodal Fusion
by Siming Deng, Jiale Zhu, Yang Hu, Mingfang He and Yonglin Xia
Plants 2025, 14(15), 2329; https://doi.org/10.3390/plants14152329 - 27 Jul 2025
Abstract
Precisely recognizing diseases in tomato leaves plays a crucial role in enhancing the health, productivity, and quality of tomato crops. However, disease identification methods that rely on single-mode information often face the problems of insufficient accuracy and weak generalization ability. Therefore, this paper [...] Read more.
Precisely recognizing diseases in tomato leaves plays a crucial role in enhancing the health, productivity, and quality of tomato crops. However, disease identification methods that rely on single-mode information often face the problems of insufficient accuracy and weak generalization ability. Therefore, this paper proposes a tomato leaf disease recognition framework FCMNet based on multimodal fusion, which combines tomato leaf disease image and text description to enhance the ability to capture disease characteristics. In this paper, the Fourier-guided Attention Mechanism (FGAM) is designed, which systematically embeds the Fourier frequency-domain information into the spatial-channel attention structure for the first time, enhances the stability and noise resistance of feature expression through spectral transform, and realizes more accurate lesion location by means of multi-scale fusion of local and global features. In order to realize the deep semantic interaction between image and text modality, a Cross Vision–Language Alignment module (CVLA) is further proposed. This module generates visual representations compatible with Bert embeddings by utilizing block segmentation and feature mapping techniques. Additionally, it incorporates a probability-based weighting mechanism to achieve enhanced multimodal fusion, significantly strengthening the model’s comprehension of semantic relationships across different modalities. Furthermore, to enhance both training efficiency and parameter optimization capabilities of the model, we introduce a Multi-strategy Improved Coati Optimization Algorithm (MSCOA). This algorithm integrates Good Point Set initialization with a Golden Sine search strategy, thereby boosting global exploration, accelerating convergence, and effectively preventing entrapment in local optima. Consequently, it exhibits robust adaptability and stable performance within high-dimensional search spaces. The experimental results show that the FCMNet model has increased the accuracy and precision by 2.61% and 2.85%, respectively, compared with the baseline model on the self-built dataset of tomato leaf diseases, and the recall and F1 score have increased by 3.03% and 3.06%, respectively, which is significantly superior to the existing methods. This research provides a new solution for the identification of tomato leaf diseases and has broad potential for agricultural applications. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)
Show Figures

Figure 1

25 pages, 4296 KiB  
Article
StripSurface-YOLO: An Enhanced Yolov8n-Based Framework for Detecting Surface Defects on Strip Steel in Industrial Environments
by Haomin Li, Huanzun Zhang and Wenke Zang
Electronics 2025, 14(15), 2994; https://doi.org/10.3390/electronics14152994 - 27 Jul 2025
Abstract
Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in [...] Read more.
Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in complex industrial environments, this study proposes StripSurface–YOLO, a novel real-time defect detection framework built upon YOLOv8n. The core architecture integrates an Efficient Cross-Stage Local Perception module (ResGSCSP), which synergistically combines GSConv lightweight convolutions with a one-shot aggregation strategy, thereby markedly reducing both model parameters and computational complexity. To further enhance multi-scale feature representation, this study introduces an Efficient Multi-Scale Attention (EMA) mechanism at the feature-fusion stage, enabling the network to more effectively attend to critical defect regions. Moreover, conventional nearest-neighbor upsampling is replaced by DySample, which produces deeper, high-resolution feature maps enriched with semantic content, improving both inference speed and fusion quality. To heighten sensitivity to small-scale and low-contrast defects, the model adopts Focal Loss, dynamically adjusting to sample difficulty. Extensive evaluations on the NEU-DET dataset demonstrate that StripSurface–YOLO reduces FLOPs by 11.6% and parameter count by 7.4% relative to the baseline YOLOv8n, while achieving respective improvements of 1.4%, 3.1%, 4.1%, and 3.0% in precision, recall, mAP50, and mAP50:95. Under adverse conditions—including contrast variations, brightness fluctuations, and Gaussian noise—SteelSurface-YOLO outperforms the baseline model, delivering improvements of 5.0% in mAP50 and 4.7% in mAP50:95, attesting to the model’s robust interference resistance. These findings underscore the potential of StripSurface–YOLO to meet the rigorous performance demands of real-time surface defect detection in the metal forging industry. Full article
Show Figures

Figure 1

21 pages, 5205 KiB  
Article
SGNet: A Structure-Guided Network with Dual-Domain Boundary Enhancement and Semantic Fusion for Skin Lesion Segmentation
by Haijiao Yun, Qingyu Du, Ziqing Han, Mingjing Li, Le Yang, Xinyang Liu, Chao Wang and Weitian Ma
Sensors 2025, 25(15), 4652; https://doi.org/10.3390/s25154652 - 27 Jul 2025
Abstract
Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based [...] Read more.
Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based on UNet or Transformer architectures, often face limitations in regard to fully exploiting lesion features and incur high computational costs, compromising precise lesion delineation. To overcome these challenges, we propose SGNet, a structure-guided network, integrating a hybrid CNN–Mamba framework for robust skin lesion segmentation. The SGNet employs the Visual Mamba (VMamba) encoder to efficiently extract multi-scale features, followed by the Dual-Domain Boundary Enhancer (DDBE), which refines boundary representations and suppresses noise through spatial and frequency-domain processing. The Semantic-Texture Fusion Unit (STFU) adaptively integrates low-level texture with high-level semantic features, while the Structure-Aware Guidance Module (SAGM) generates coarse segmentation maps to provide global structural guidance. The Guided Multi-Scale Refiner (GMSR) further optimizes boundary details through a multi-scale semantic attention mechanism. Comprehensive experiments based on the ISIC2017, ISIC2018, and PH2 datasets demonstrate SGNet’s superior performance, with average improvements of 3.30% in terms of the mean Intersection over Union (mIoU) value and 1.77% in regard to the Dice Similarity Coefficient (DSC) compared to state-of-the-art methods. Ablation studies confirm the effectiveness of each component, highlighting SGNet’s exceptional accuracy and robust generalization for computer-aided dermatological diagnosis. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

27 pages, 3868 KiB  
Article
Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism
by Jie Rao, Mingju Chen, Xiaofei Song, Chen Xie, Xueyang Duan, Xiao Hu, Senyuan Li and Xingyue Zhang
Appl. Sci. 2025, 15(15), 8332; https://doi.org/10.3390/app15158332 - 26 Jul 2025
Viewed by 57
Abstract
This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale [...] Read more.
This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale geological signal representation. The decoder replaces traditional self-attention with ORCA attention to enable global context modeling with lower computational cost. Skip connections integrate a residual channel attention module, mitigating gradient degradation via dual-pooling feature fusion and activation optimization, forming a full-link optimization from low-level feature enhancement to high-level semantic integration. Simulated and real dataset experiments show that at decimation ratios of 0.1–0.5, the method significantly outperforms SwinUnet, TransUnet, etc., in reconstruction performance. Residual signals and F-K spectra verify high-fidelity reconstruction. Despite increased difficulty with higher sparsity, it maintains optimal performance with notable margins, demonstrating strong robustness. The proposed hierarchical feature enhancement and cross-scale attention strategies offer an efficient seismic profile signal reconstruction solution and show generality for migration to complex visual tasks, advancing geophysics-computer vision interdisciplinary innovation. Full article
20 pages, 77932 KiB  
Article
Image Alignment Based on Deep Learning to Extract Deep Feature Information from Images
by Lin Zhu, Yuxing Mao and Jianyu Pan
Sensors 2025, 25(15), 4628; https://doi.org/10.3390/s25154628 - 26 Jul 2025
Viewed by 93
Abstract
To overcome the limitations of traditional image alignment methods in capturing deep semantic features, a deep feature information image alignment network (DFA-Net) is proposed. This network aims to enhance image alignment performance through multi-level feature learning. DFA-Net is based on the deep residual [...] Read more.
To overcome the limitations of traditional image alignment methods in capturing deep semantic features, a deep feature information image alignment network (DFA-Net) is proposed. This network aims to enhance image alignment performance through multi-level feature learning. DFA-Net is based on the deep residual architecture and introduces spatial pyramid pooling to achieve cross-scalar feature fusion, effectively enhancing the feature’s adaptability to scale. A feature enhancement module based on the self-attention mechanism is designed, with key features that exhibit geometric invariance and high discriminative power, achieved through a dynamic weight allocation strategy. This improves the network’s robustness to multimodal image deformation. Experiments on two public datasets, MSRS and RoadScene, show that the method performs well in terms of alignment accuracy, with the RMSE metrics being reduced by 0.661 and 0.473, and the SSIM, MI, and NCC improved by 0.155, 0.163, and 0.211; and 0.108, 0.226, and 0.114, respectively, compared with the benchmark model. The visualization results validate the significant improvement in the features’ visual quality and confirm the method’s advantages in terms of stability and discriminative properties of deep feature extraction. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Graphical abstract

18 pages, 516 KiB  
Article
A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information
by Hyunsun Hwang, Youngjun Jung, Changki Lee and Wooyoung Go
Appl. Sci. 2025, 15(15), 8255; https://doi.org/10.3390/app15158255 - 24 Jul 2025
Viewed by 142
Abstract
Nested named entity recognition (NER) is a task that identifies hierarchically structured entities, where one entity can contain other entities within its span. This study introduces a nested NER model for few-shot learning environments, addressing the difficulty of building extensive datasets for general [...] Read more.
Nested named entity recognition (NER) is a task that identifies hierarchically structured entities, where one entity can contain other entities within its span. This study introduces a nested NER model for few-shot learning environments, addressing the difficulty of building extensive datasets for general named entities. We enhance the Biaffine nested NER model by modifying its output layer to incorporate label semantic information through a novel label description embedding (LDE) approach, improving performance with limited training data. Our method replaces the traditional biaffine classifier with a label attention mechanism that leverages comprehensive natural language descriptions of entity types, encoded using BERT to capture rich semantic relationships between labels and input spans. We conducted comprehensive experiments on four benchmark datasets: GENIA (nested NER), ACE 2004 (nested NER), ACE 2005 (nested NER), and CoNLL 2003 English (flat NER). Performance was evaluated across multiple few-shot scenarios (1-shot, 5-shot, 10-shot, and 20-shot) using F1-measure as the primary metric, with five different random seeds to ensure robust evaluation. We compared our approach against strong baselines including BERT-LSTM-CRF with nested tags, the original Biaffine model, and recent few-shot NER methods (FewNER, FIT, LPNER, SpanNER). Results demonstrate significant improvements across all few-shot scenarios. On GENIA, our LDE model achieves 45.07% F1 in five-shot learning compared to 30.74% for the baseline Biaffine model (46.4% relative improvement). On ACE 2005, we obtain 44.24% vs. 32.38% F1 in five-shot scenarios (36.6% relative improvement). The model shows consistent gains in 10-shot (57.19% vs. 49.50% on ACE 2005) and 20-shot settings (64.50% vs. 58.21% on ACE 2005). Ablation studies confirm that semantic information from label descriptions is the key factor enabling robust few-shot performance. Transfer learning experiments demonstrate the model’s ability to leverage knowledge from related domains. Our findings suggest that incorporating label semantic information can substantially enhance NER models in low-resource settings, opening new possibilities for applying NER in specialized domains or languages with limited annotated data. Full article
(This article belongs to the Special Issue Applications of Natural Language Processing to Data Science)
Show Figures

Figure 1

17 pages, 2072 KiB  
Article
Barefoot Footprint Detection Algorithm Based on YOLOv8-StarNet
by Yujie Shen, Xuemei Jiang, Yabin Zhao and Wenxin Xie
Sensors 2025, 25(15), 4578; https://doi.org/10.3390/s25154578 - 24 Jul 2025
Viewed by 185
Abstract
This study proposes an optimized footprint recognition model based on an enhanced StarNet architecture for biometric identification in the security, medical, and criminal investigation fields. Conventional image recognition algorithms exhibit limitations in processing barefoot footprint images characterized by concentrated feature distributions and rich [...] Read more.
This study proposes an optimized footprint recognition model based on an enhanced StarNet architecture for biometric identification in the security, medical, and criminal investigation fields. Conventional image recognition algorithms exhibit limitations in processing barefoot footprint images characterized by concentrated feature distributions and rich texture patterns. To address this, our framework integrates an improved StarNet into the backbone of YOLOv8 architecture. Leveraging the unique advantages of element-wise multiplication, the redesigned backbone efficiently maps inputs to a high-dimensional nonlinear feature space without increasing channel dimensions, achieving enhanced representational capacity with low computational latency. Subsequently, an Encoder layer facilitates feature interaction within the backbone through multi-scale feature fusion and attention mechanisms, effectively extracting rich semantic information while maintaining computational efficiency. In the feature fusion part, a feature modulation block processes multi-scale features by synergistically combining global and local information, thereby reducing redundant computations and decreasing both parameter count and computational complexity to achieve model lightweighting. Experimental evaluations on a proprietary barefoot footprint dataset demonstrate that the proposed model exhibits significant advantages in terms of parameter efficiency, recognition accuracy, and computational complexity. The number of parameters has been reduced by 0.73 million, further improving the model’s speed. Gflops has been reduced by 1.5, lowering the performance requirements for computational hardware during model deployment. Recognition accuracy has reached 99.5%, with further improvements in model precision. Future research will explore how to capture shoeprint images with complex backgrounds from shoes worn at crime scenes, aiming to further enhance the model’s recognition capabilities in more forensic scenarios. Full article
(This article belongs to the Special Issue Transformer Applications in Target Tracking)
Show Figures

Figure 1

25 pages, 6911 KiB  
Article
Image Inpainting Algorithm Based on Structure-Guided Generative Adversarial Network
by Li Zhao, Tongyang Zhu, Chuang Wang, Feng Tian and Hongge Yao
Mathematics 2025, 13(15), 2370; https://doi.org/10.3390/math13152370 - 24 Jul 2025
Viewed by 195
Abstract
To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a [...] Read more.
To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a two-stage restoration paradigm: (1) Structural Prior Extraction, where adaptive edge detection algorithms identify residual contours in corrupted regions, and a transformer-enhanced network reconstructs globally consistent structural maps through contextual feature propagation; (2) Structure-Constrained Texture Synthesis, wherein a multi-scale generator with hybrid dilated convolutions and channel attention mechanisms iteratively refines high-fidelity textures under explicit structural guidance. The framework introduces three innovations: (1) a hierarchical feature fusion architecture that synergizes multi-scale receptive fields with spatial-channel attention to preserve long-range dependencies and local details simultaneously; (2) spectral-normalized Markovian discriminator with gradient-penalty regularization, enabling adversarial training stability while enforcing patch-level structural consistency; and (3) dual-branch loss formulation combining perceptual similarity metrics with edge-aware constraints to align synthesized content with both semantic coherence and geometric fidelity. Our experiments on the two benchmark datasets (Places2 and CelebA) have demonstrated that our framework achieves more unified textures and structures, bringing the restored images closer to their original semantic content. Full article
Show Figures

Figure 1

1 pages, 126 KiB  
Correction
Correction: Li et al. HSAA-CD: A Hierarchical Semantic Aggregation Mechanism and Attention Module for Non-Agricultural Change Detection in Cultivated Land. Remote Sens. 2024, 16, 1372
by Fangting Li, Fangdong Zhou, Guo Zhang, Jianfeng Xiao and Peng Zeng
Remote Sens. 2025, 17(15), 2566; https://doi.org/10.3390/rs17152566 - 24 Jul 2025
Viewed by 78
Abstract
The authors would like to make a correction to the published paper [...] Full article
21 pages, 2919 KiB  
Article
A Feasible Domain Segmentation Algorithm for Unmanned Vessels Based on Coordinate-Aware Multi-Scale Features
by Zhengxun Zhou, Weixian Li, Yuhan Wang, Haozheng Liu and Ning Wu
J. Mar. Sci. Eng. 2025, 13(8), 1387; https://doi.org/10.3390/jmse13081387 - 22 Jul 2025
Viewed by 108
Abstract
The accurate extraction of navigational regions from images of navigational waters plays a key role in ensuring on-water safety and the automation of unmanned vessels. Nonetheless, current technological methods encounter significant challenges in addressing fluctuations in water surface illumination, reflective disturbances, and surface [...] Read more.
The accurate extraction of navigational regions from images of navigational waters plays a key role in ensuring on-water safety and the automation of unmanned vessels. Nonetheless, current technological methods encounter significant challenges in addressing fluctuations in water surface illumination, reflective disturbances, and surface undulations, among other disruptions, in turn making it challenging to achieve rapid and precise boundary segmentation. To cope with these challenges, in this paper, we propose a coordinate-aware multi-scale feature network (GASF-ResNet) method for water segmentation. The method integrates the attention module Global Grouping Coordinate Attention (GGCA) in the four downsampling branches of ResNet-50, thus enhancing the model’s ability to capture target features and improving the feature representation. To expand the model’s receptive field and boost its capability in extracting features of multi-scale targets, the Avoidance Spatial Pyramid Pooling (ASPP) technique is used. Combined with multi-scale feature fusion, this effectively enhances the expression of semantic information at different scales and improves the segmentation accuracy of the model in complex water environments. The experimental results show that the average pixel accuracy (mPA) and average intersection and union ratio (mIoU) of the proposed method on the self-made dataset and on the USVInaland unmanned ship dataset are 99.31% and 98.61%, and 98.55% and 99.27%, respectively, significantly better results than those obtained for the existing mainstream models. These results are helpful in overcoming the background interference caused by water surface reflection and uneven lighting in the aquatic environment and in realizing the accurate segmentation of the water area for the safe navigation of unmanned vessels, which is of great value for the stable operation of unmanned vessels in complex environments. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

Back to TopTop