Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (841)

Search Parameters:
Keywords = Lightweight Transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1383 KiB  
Article
Enhancing Underwater Images with LITM: A Dual-Domain Lightweight Transformer Framework
by Wang Hu, Zhuojing Rong, Lijun Zhang, Zhixiang Liu, Zhenhua Chu, Lu Zhang, Liping Zhou and Jingxiang Xu
J. Mar. Sci. Eng. 2025, 13(8), 1403; https://doi.org/10.3390/jmse13081403 - 23 Jul 2025
Abstract
Underwater image enhancement (UIE) technology plays a vital role in marine resource exploration, environmental monitoring, and underwater archaeology. However, due to the absorption and scattering of light in underwater environments, images often suffer from blurred details, color distortion, and low contrast, which seriously [...] Read more.
Underwater image enhancement (UIE) technology plays a vital role in marine resource exploration, environmental monitoring, and underwater archaeology. However, due to the absorption and scattering of light in underwater environments, images often suffer from blurred details, color distortion, and low contrast, which seriously affect the usability of underwater images. To address the above limitations, a lightweight transformer-based model (LITM) is proposed for improving underwater degraded images. Firstly, our proposed method utilizes a lightweight RGB transformer enhancer (LRTE) that uses efficient channel attention blocks to capture local detail features in the RGB domain. Subsequently, a lightweight HSV transformer encoder (LHTE) is utilized to extract global brightness, color, and saturation from the hue–saturation–value (HSV) domain. Finally, we propose a multi-modal integration block (MMIB) to effectively fuse enhanced information from the RGB and HSV pathways, as well as the input image. Our proposed LITM method significantly outperforms state-of-the-art methods, achieving a peak signal-to-noise ratio (PSNR) of 26.70 and a structural similarity index (SSIM) of 0.9405 on the LSUI dataset. Furthermore, the designed method also exhibits good generality and adaptability on unpaired datasets. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

17 pages, 1927 KiB  
Article
ConvTransNet-S: A CNN-Transformer Hybrid Disease Recognition Model for Complex Field Environments
by Shangyun Jia, Guanping Wang, Hongling Li, Yan Liu, Linrong Shi and Sen Yang
Plants 2025, 14(15), 2252; https://doi.org/10.3390/plants14152252 - 22 Jul 2025
Viewed by 38
Abstract
To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification [...] Read more.
To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification tasks. Unlike existing hybrid approaches, ConvTransNet-S uniquely introduces three key innovations: First, a Local Perception Unit (LPU) and Lightweight Multi-Head Self-Attention (LMHSA) modules were introduced to synergistically enhance the extraction of fine-grained plant disease details and model global dependency relationships, respectively. Second, an Inverted Residual Feed-Forward Network (IRFFN) was employed to optimize the feature propagation path, thereby enhancing the model’s robustness against interferences such as lighting variations and leaf occlusions. This novel combination of a LPU, LMHSA, and an IRFFN achieves a dynamic equilibrium between local texture perception and global context modeling—effectively resolving the trade-offs inherent in standalone CNNs or transformers. Finally, through a phased architecture design, efficient fusion of multi-scale disease features is achieved, which enhances feature discriminability while reducing model complexity. The experimental results indicated that ConvTransNet-S achieved a recognition accuracy of 98.85% on the PlantVillage public dataset. This model operates with only 25.14 million parameters, a computational load of 3.762 GFLOPs, and an inference time of 7.56 ms. Testing on a self-built in-field complex scene dataset comprising 10,441 images revealed that ConvTransNet-S achieved an accuracy of 88.53%, which represents improvements of 14.22%, 2.75%, and 0.34% over EfficientNetV2, Vision Transformer, and Swin Transformer, respectively. Furthermore, the ConvTransNet-S model achieved up to 14.22% higher disease recognition accuracy under complex background conditions while reducing the parameter count by 46.8%. This confirms that its unique multi-scale feature mechanism can effectively distinguish disease from background features, providing a novel technical approach for disease diagnosis in complex agricultural scenarios and demonstrating significant application value for intelligent agricultural management. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

20 pages, 22580 KiB  
Article
Life-Threatening Ventricular Arrhythmia Identification Based on Multiple Complex Networks
by Zhipeng Cai, Menglin Yu, Jiawen Yu, Xintao Han, Jianqing Li and Yangyang Qu
Electronics 2025, 14(15), 2921; https://doi.org/10.3390/electronics14152921 - 22 Jul 2025
Viewed by 54
Abstract
Ventricular arrhythmias (VAs) are critical cardiovascular diseases that require rapid and accurate detection. Conventional approaches relying on multi-lead ECG or deep learning models have limitations in computational cost, interpretability, and real-time applicability on wearable devices. To address these issues, a lightweight and interpretable [...] Read more.
Ventricular arrhythmias (VAs) are critical cardiovascular diseases that require rapid and accurate detection. Conventional approaches relying on multi-lead ECG or deep learning models have limitations in computational cost, interpretability, and real-time applicability on wearable devices. To address these issues, a lightweight and interpretable framework based on multiple complex networks was proposed for the detection of life-threatening VAs using short-term single-lead ECG signals. The input signals were decomposed using the fixed-frequency-range empirical wavelet transform, and sub-bands were subsequently analyzed through multiscale visibility graphs, recurrence networks, cross-recurrence networks, and joint recurrence networks. Eight topological features were extracted and input into an XGBoost classifier for VA identification. Ten-fold cross-validation results on the MIT-BIH VFDB and CUDB databases demonstrated that the proposed method achieved a sensitivity of 99.02 ± 0.53%, a specificity of 98.44 ± 0.43%, and an accuracy of 98.73 ± 0.02% for 10 s ECG segments. The model also maintained robust performance on shorter segments, with 97.23 ± 0.76% sensitivity, 98.85 ± 0.95% specificity, and 96.62 ± 0.02% accuracy on 2 s segments. The results outperformed existing feature-based and deep learning approaches while preserving model interpretability. Furthermore, the proposed method supports mobile deployment, facilitating real-time use in wearable healthcare applications. Full article
(This article belongs to the Special Issue Smart Bioelectronics, Wearable Systems and E-Health)
Show Figures

Figure 1

22 pages, 6496 KiB  
Article
Real-Time Search and Rescue with Drones: A Deep Learning Approach for Small-Object Detection Based on YOLO
by Francesco Ciccone and Alessandro Ceruti
Drones 2025, 9(8), 514; https://doi.org/10.3390/drones9080514 - 22 Jul 2025
Viewed by 53
Abstract
Unmanned aerial vehicles are increasingly used in civil Search and Rescue operations due to their rapid deployment and wide-area coverage capabilities. However, detecting missing persons from aerial imagery remains challenging due to small object sizes, cluttered backgrounds, and limited onboard computational resources, especially [...] Read more.
Unmanned aerial vehicles are increasingly used in civil Search and Rescue operations due to their rapid deployment and wide-area coverage capabilities. However, detecting missing persons from aerial imagery remains challenging due to small object sizes, cluttered backgrounds, and limited onboard computational resources, especially when managed by civil agencies. In this work, we present a comprehensive methodology for optimizing YOLO-based object detection models for real-time Search and Rescue scenarios. A two-stage transfer learning strategy was employed using VisDrone for general aerial object detection and Heridal for Search and Rescue-specific fine-tuning. We explored various architectural modifications, including enhanced feature fusion (FPN, BiFPN, PB-FPN), additional detection heads (P2), and modules such as CBAM, Transformers, and deconvolution, analyzing their impact on performance and computational efficiency. The best-performing configuration (YOLOv5s-PBfpn-Deconv) achieved a mAP@50 of 0.802 on the Heridal dataset while maintaining real-time inference on embedded hardware (Jetson Nano). Further tests at different flight altitudes and explainability analyses using EigenCAM confirmed the robustness and interpretability of the model in real-world conditions. The proposed solution offers a viable framework for deploying lightweight, interpretable AI systems for UAV-based Search and Rescue operations managed by civil protection authorities. Limitations and future directions include the integration of multimodal sensors and adaptation to broader environmental conditions. Full article
Show Figures

Figure 1

19 pages, 2016 KiB  
Article
A Robust and Energy-Efficient Control Policy for Autonomous Vehicles with Auxiliary Tasks
by Yabin Xu, Chenglin Yang and Xiaoxi Gong
Electronics 2025, 14(15), 2919; https://doi.org/10.3390/electronics14152919 - 22 Jul 2025
Viewed by 149
Abstract
We present a lightweight autonomous driving method that uses a low-cost camera, a simple end-to-end convolutional neural network architecture, and smoother driving techniques to achieve energy-efficient vehicle control. Instead of directly constructing a mapping from raw sensory input to the action, our network [...] Read more.
We present a lightweight autonomous driving method that uses a low-cost camera, a simple end-to-end convolutional neural network architecture, and smoother driving techniques to achieve energy-efficient vehicle control. Instead of directly constructing a mapping from raw sensory input to the action, our network takes the frame-to-frame visual difference as one of the crucial inputs to produce control commands, including the steering angle and the speed value at each time step. This choice of input allows highlighting the most relevant parts on raw image pairs to decrease the unnecessary visual complexity caused by different road and weather conditions. Additionally, our network achieves the prediction of the vehicle’s upcoming control commands by incorporating a view synthesis component into the model. The view synthesis, as an auxiliary task, aims to infer a novel view for the future from the historical environment transformation cue. By combining both the current and upcoming control commands, our framework achieves driving smoothness, which is highly associated with energy efficiency. We perform experiments on benchmarks to evaluate the reliability under different driving conditions in terms of control accuracy. We deploy a mobile robot outdoors to evaluate the power consumption of different control policies. The quantitative results demonstrate that our method can achieve energy efficiency in the real world. Full article
(This article belongs to the Special Issue Simultaneous Localization and Mapping (SLAM) of Mobile Robots)
Show Figures

Figure 1

22 pages, 2514 KiB  
Article
High-Accuracy Recognition Method for Diseased Chicken Feces Based on Image and Text Information Fusion
by Duanli Yang, Zishang Tian, Jianzhong Xi, Hui Chen, Erdong Sun and Lianzeng Wang
Animals 2025, 15(15), 2158; https://doi.org/10.3390/ani15152158 - 22 Jul 2025
Viewed by 169
Abstract
Poultry feces, a critical biomarker for health assessment, requires timely and accurate pathological identification for food safety. Conventional visual-only methods face limitations due to environmental sensitivity and high visual similarity among feces from different diseases. To address this, we propose MMCD (Multimodal Chicken-feces [...] Read more.
Poultry feces, a critical biomarker for health assessment, requires timely and accurate pathological identification for food safety. Conventional visual-only methods face limitations due to environmental sensitivity and high visual similarity among feces from different diseases. To address this, we propose MMCD (Multimodal Chicken-feces Diagnosis), a ResNet50-based multimodal fusion model leveraging semantic complementarity between images and descriptive text to enhance diagnostic precision. Key innovations include the following: (1) Integrating MASA(Manhattan self-attention)and DSconv (Depthwise Separable convolution) into the backbone network to mitigate feature confusion. (2) Utilizing a pre-trained BERT to extract textual semantic features, reducing annotation dependency and cost. (3) Designing a lightweight Gated Cross-Attention (GCA) module for dynamic multimodal fusion, achieving a 41% parameter reduction versus cross-modal transformers. Experiments demonstrate that MMCD significantly outperforms single-modal baselines in Accuracy (+8.69%), Recall (+8.72%), Precision (+8.67%), and F1 score (+8.72%). It surpasses simple feature concatenation by 2.51–2.82% and reduces parameters by 7.5M and computations by 1.62 GFLOPs versus the base ResNet50. This work validates multimodal fusion’s efficacy in pathological fecal detection, providing a theoretical and technical foundation for agricultural health monitoring systems. Full article
(This article belongs to the Section Animal Welfare)
Show Figures

Figure 1

30 pages, 10173 KiB  
Article
Integrated Robust Optimization for Lightweight Transformer Models in Low-Resource Scenarios
by Hui Huang, Hengyu Zhang, Yusen Wang, Haibin Liu, Xiaojie Chen, Yiling Chen and Yuan Liang
Symmetry 2025, 17(7), 1162; https://doi.org/10.3390/sym17071162 - 21 Jul 2025
Viewed by 162
Abstract
With the rapid proliferation of artificial intelligence (AI) applications, an increasing number of edge devices—such as smartphones, cameras, and embedded controllers—are being tasked with performing AI-based inference. Due to constraints in storage capacity, computational power, and network connectivity, these devices are often categorized [...] Read more.
With the rapid proliferation of artificial intelligence (AI) applications, an increasing number of edge devices—such as smartphones, cameras, and embedded controllers—are being tasked with performing AI-based inference. Due to constraints in storage capacity, computational power, and network connectivity, these devices are often categorized as operating in resource-constrained environments. In such scenarios, deploying powerful Transformer-based models like ChatGPT and Vision Transformers is highly impractical because of their large parameter sizes and intensive computational requirements. While lightweight Transformer models, such as MobileViT, offer a promising solution to meet storage and computational limitations, their robustness remains insufficient. This poses a significant security risk for AI applications, particularly in critical edge environments. To address this challenge, our research focuses on enhancing the robustness of lightweight Transformer models under resource-constrained conditions. First, we propose a comprehensive robustness evaluation framework tailored for lightweight Transformer inference. This framework assesses model robustness across three key dimensions: noise robustness, distributional robustness, and adversarial robustness. It further investigates how model size and hardware limitations affect robustness, thereby providing valuable insights for robustness-aware model design. Second, we introduce a novel adversarial robustness enhancement strategy that integrates lightweight modeling techniques. This approach leverages methods such as gradient clipping and layer-wise unfreezing, as well as decision boundary optimization techniques like TRADES and SMART. Together, these strategies effectively address challenges related to training instability and decision boundary smoothness, significantly improving model robustness. Finally, we deploy the robust lightweight Transformer models in real-world resource-constrained environments and empirically validate their inference robustness. The results confirm the effectiveness of our proposed methods in enhancing the robustness and reliability of lightweight Transformers for edge AI applications. Full article
(This article belongs to the Section Mathematics)
Show Figures

Figure 1

28 pages, 2518 KiB  
Article
Enhancing Keyword Spotting via NLP-Based Re-Ranking: Leveraging Semantic Relevance Feedback in the Handwritten Domain
by Stergios Papazis, Angelos P. Giotis and Christophoros Nikou
Electronics 2025, 14(14), 2900; https://doi.org/10.3390/electronics14142900 - 20 Jul 2025
Viewed by 137
Abstract
Handwritten Keyword Spotting (KWS) remains a challenging task, particularly in segmentation-free scenarios where word images must be retrieved and ranked based on their similarity to a query without relying on prior page-level segmentation. Traditional KWS methods primarily focus on visual similarity, often overlooking [...] Read more.
Handwritten Keyword Spotting (KWS) remains a challenging task, particularly in segmentation-free scenarios where word images must be retrieved and ranked based on their similarity to a query without relying on prior page-level segmentation. Traditional KWS methods primarily focus on visual similarity, often overlooking the underlying semantic relationships between words. In this work, we propose a novel NLP-driven re-ranking approach that refines the initial ranked lists produced by state-of-the-art KWS models. By leveraging semantic embeddings from pre-trained BERT-like Large Language Models (LLMs, e.g., RoBERTa, MPNet, and MiniLM), we introduce a relevance feedback mechanism that improves both verbatim and semantic keyword spotting. Our framework operates in two stages: (1) projecting retrieved word image transcriptions into a semantic space via LLMs and (2) re-ranking the retrieval list using a weighted combination of semantic and exact relevance scores based on pairwise similarities with the query. We evaluate our approach on the widely used George Washington (GW) and IAM collections using two cutting-edge segmentation-free KWS models, which are further integrated into our proposed pipeline. Our results show consistent gains in Mean Average Precision (mAP), with improvements of up to 2.3% (from 94.3% to 96.6%) on GW and 3% (from 79.15% to 82.12%) on IAM. Even when mAP gains are smaller, qualitative improvements emerge: semantically relevant but inexact matches are retrieved more frequently without compromising exact match recall. We further examine the effect of fine-tuning transformer-based OCR (TrOCR) models on historical GW data to align textual and visual features more effectively. Overall, our findings suggest that semantic feedback can enhance retrieval effectiveness in KWS pipelines, paving the way for lightweight hybrid vision-language approaches in handwritten document analysis. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

18 pages, 2930 KiB  
Article
Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery
by Satish Pawar, Aris Thomasberger, Stefan Hein Bengtson, Malte Pedersen and Karen Timmermann
Remote Sens. 2025, 17(14), 2518; https://doi.org/10.3390/rs17142518 - 19 Jul 2025
Viewed by 191
Abstract
The accurate and large-scale mapping of seagrass meadows is essential, as these meadows form primary habitats for marine organisms and large sinks for blue carbon. Image data available for mapping these habitats are often scarce or are acquired through multiple surveys and instruments, [...] Read more.
The accurate and large-scale mapping of seagrass meadows is essential, as these meadows form primary habitats for marine organisms and large sinks for blue carbon. Image data available for mapping these habitats are often scarce or are acquired through multiple surveys and instruments, resulting in images of varying spatial and spectral characteristics. This study presents an unsupervised domain adaptation (UDA) strategy that combines histogram-matching with the transformer-based SegFormer model to address these challenges. Unoccupied aerial vehicle (UAV)-derived imagery (3-cm resolution) was used for training, while orthophotos from airplane surveys (12.5-cm resolution) served as the target domain. The method was evaluated across three Danish estuaries (Horsens Fjord, Skive Fjord, and Lovns Broad) using one-to-one, leave-one-out, and all-to-one histogram matching strategies. The highest performance was observed at Skive Fjord, achieving an F1-score/IoU = 0.52/0.48 for the leave-one-out test, corresponding to 68% of the benchmark model that was trained on both domains. These results demonstrate the potential of this lightweight UDA approach to generalization across spatial, temporal, and resolution domains, enabling the cost-effective and scalable mapping of submerged vegetation in data-scarce environments. This study also sheds light on contrast as a significant property of target domains that impacts image segmentation. Full article
(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)
Show Figures

Figure 1

28 pages, 43087 KiB  
Article
LWSARDet: A Lightweight SAR Small Ship Target Detection Network Based on a Position–Morphology Matching Mechanism
by Yuliang Zhao, Yang Du, Qiutong Wang, Changhe Li, Yan Miao, Tengfei Wang and Xiangyu Song
Remote Sens. 2025, 17(14), 2514; https://doi.org/10.3390/rs17142514 - 19 Jul 2025
Viewed by 175
Abstract
The all-weather imaging capability of synthetic aperture radar (SAR) confers unique advantages for maritime surveillance. However, ship detection under complex sea conditions still faces challenges, such as high-frequency noise interference and the limited computational power of edge computing platforms. To address these challenges, [...] Read more.
The all-weather imaging capability of synthetic aperture radar (SAR) confers unique advantages for maritime surveillance. However, ship detection under complex sea conditions still faces challenges, such as high-frequency noise interference and the limited computational power of edge computing platforms. To address these challenges, we propose a lightweight SAR small ship detection network, LWSARDet, which mitigates feature redundancy and reduces computational complexity in existing models. Specifically, based on the YOLOv5 framework, a dual strategy for the lightweight network is adopted as follows: On the one hand, to address the limited nonlinear representation ability of the original network, a global channel attention mechanism is embedded and a feature extraction module, GCCR-GhostNet, is constructed, which can effectively enhance the network’s feature extraction capability and high-frequency noise suppression, while reducing computational cost. On the other hand, to reduce feature dilution and computational redundancy in traditional detection heads when focusing on small targets, we replace conventional convolutions with simple linear transformations and design a lightweight detection head, LSD-Head. Furthermore, we propose a Position–Morphology Matching IoU loss function, P-MIoU, which integrates center distance constraints and morphological penalty mechanisms to more precisely capture the spatial and structural differences between predicted and ground truth bounding boxes. Extensive experiments conduct on the High-Resolution SAR Image Dataset (HRSID) and the SAR Ship Detection Dataset (SSDD) demonstrate that LWSARDet achieves superior overall performance compared to existing state-of-the-art (SOTA) methods. Full article
Show Figures

Figure 1

20 pages, 47683 KiB  
Article
Multi-Faceted Adaptive Token Pruning for Efficient Remote Sensing Image Segmentation
by Chuge Zhang and Jian Yao
Remote Sens. 2025, 17(14), 2508; https://doi.org/10.3390/rs17142508 - 18 Jul 2025
Viewed by 260
Abstract
Global context information is essential for semantic segmentation of remote sensing (RS) images. Due to their remarkable capability to capture global context information and model long-range dependencies, vision transformers have demonstrated great performance on semantic segmentation. However, the high computational complexity of vision [...] Read more.
Global context information is essential for semantic segmentation of remote sensing (RS) images. Due to their remarkable capability to capture global context information and model long-range dependencies, vision transformers have demonstrated great performance on semantic segmentation. However, the high computational complexity of vision transformers impedes their broad application in resource-constrained environments for RS image segmentation. To address this challenge, we propose multi-faceted adaptive token pruning (MATP) to reduce computational cost while maintaining relatively high accuracy. MATP is designed to prune well-learned tokens which do not have a close relation to other tokens. To quantify these two metrics, MATP employs multi-faceted scores: entropy, to evaluate the learning progression of tokens; and attention weight, to assess token correlations. Specially, MATP utilizes adaptive criteria for each score that are automatically adjusted based on specific input features. A token is pruned only when both criteria are satisfied. Overall, MATP facilitates the utilization of vision transformers in resource-constrained environments. Experiments conducted on three widely used datasets reveal that MATP reduces the computation cost about 67–70% with about 3–6% accuracy degradation, achieving a superior trade-off between accuracy and computational cost compared to the state of the art. Full article
Show Figures

Figure 1

21 pages, 2308 KiB  
Article
Forgery-Aware Guided Spatial–Frequency Feature Fusion for Face Image Forgery Detection
by Zhenxiang He, Zhihao Liu and Ziqi Zhao
Symmetry 2025, 17(7), 1148; https://doi.org/10.3390/sym17071148 - 18 Jul 2025
Viewed by 201
Abstract
The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they [...] Read more.
The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they still suffer from limited sensitivity to local forgery regions and inadequate interaction between spatial and frequency information in practical applications. To address these challenges, we propose a novel forgery-aware guided spatial–frequency feature fusion network. A lightweight U-Net is employed to generate pixel-level saliency maps by leveraging structural symmetry and semantic consistency, without relying on ground-truth masks. These maps dynamically guide the fusion of spatial features (from an improved Swin Transformer) and frequency features (via Haar wavelet transforms). Cross-domain attention, channel recalibration, and spatial gating are introduced to enhance feature complementarity and regional discrimination. Extensive experiments conducted on two benchmark face forgery datasets, FaceForensics++ and Celeb-DFv2, show that the proposed method consistently outperforms existing state-of-the-art techniques in terms of detection accuracy and generalization capability. The future work includes improving robustness under compression, incorporating temporal cues, extending to multimodal scenarios, and evaluating model efficiency for real-world deployment. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

21 pages, 5917 KiB  
Article
VML-UNet: Fusing Vision Mamba and Lightweight Attention Mechanism for Skin Lesion Segmentation
by Tang Tang, Haihui Wang, Qiang Rao, Ke Zuo and Wen Gan
Electronics 2025, 14(14), 2866; https://doi.org/10.3390/electronics14142866 - 17 Jul 2025
Viewed by 340
Abstract
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks [...] Read more.
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks is crucial for accurate lesion localization and optimized clinical workflows. We propose the VML-UNet, a lightweight segmentation network with core innovations including the CPMamba module and the multi-scale local supervision module (MLSM). The CPMamba module integrates the visual state space (VSS) block and a channel prior attention mechanism to enable efficient modeling of spatial relationships with linear computational complexity through dynamic channel-space weight allocation, while preserving channel feature integrity. The MLSM enhances local feature perception and reduces the inference burden. Comparative experiments were conducted on three public datasets, including ISIC2017, ISIC2018, and PH2, with ablation experiments performed on ISIC2017. VML-UNet achieves 0.53 M parameters, 2.18 MB memory usage, and 1.24 GFLOPs time complexity, with its performance on the datasets outperforming comparative networks, validating its effectiveness. This study provides valuable references for developing lightweight, high-performance skin lesion segmentation networks, advancing the field of skin lesion segmentation. Full article
(This article belongs to the Section Bioelectronics)
Show Figures

Figure 1

19 pages, 2785 KiB  
Article
Implementing an AI-Based Digital Twin Analysis System for Real-Time Decision Support in a Custom-Made Sportswear SME
by Tõnis Raamets, Kristo Karjust, Jüri Majak and Aigar Hermaste
Appl. Sci. 2025, 15(14), 7952; https://doi.org/10.3390/app15147952 - 17 Jul 2025
Viewed by 165
Abstract
Small and medium-sized enterprises (SMEs) in the manufacturing sector often struggle to make effective use of production data due to fragmented systems and limited digital infrastructure. This paper presents a case study of implementing an AI-enhanced digital twin in a custom sportswear manufacturing [...] Read more.
Small and medium-sized enterprises (SMEs) in the manufacturing sector often struggle to make effective use of production data due to fragmented systems and limited digital infrastructure. This paper presents a case study of implementing an AI-enhanced digital twin in a custom sportswear manufacturing SME developed under the AI and Robotics Estonia (AIRE) initiative. The solution integrates real-time production data collection using the Digital Manufacturing Support Application (DIMUSA); data processing and control; clustering-based data analysis; and virtual simulation for evaluating improvement scenarios. The framework was applied in a live production environment to analyze workstation-level performance, identify recurring bottlenecks, and provide interpretable visual insights for decision-makers. K-means clustering and DBSCAN were used to group operational states and detect process anomalies, while simulation was employed to model production flow and assess potential interventions. The results demonstrate how even a lightweight AI-driven system can support human-centered decision-making, improve process transparency, and serve as a scalable foundation for Industry 5.0-aligned digital transformation in SMEs. Full article
Show Figures

Figure 1

22 pages, 4882 KiB  
Article
Dual-Branch Spatio-Temporal-Frequency Fusion Convolutional Network with Transformer for EEG-Based Motor Imagery Classification
by Hao Hu, Zhiyong Zhou, Zihan Zhang and Wenyu Yuan
Electronics 2025, 14(14), 2853; https://doi.org/10.3390/electronics14142853 - 17 Jul 2025
Viewed by 173
Abstract
The decoding of motor imagery (MI) electroencephalogram (EEG) signals is crucial for motor control and rehabilitation. However, as feature extraction is the core component of the decoding process, traditional methods, often limited to single-feature domains or shallow time-frequency fusion, struggle to comprehensively capture [...] Read more.
The decoding of motor imagery (MI) electroencephalogram (EEG) signals is crucial for motor control and rehabilitation. However, as feature extraction is the core component of the decoding process, traditional methods, often limited to single-feature domains or shallow time-frequency fusion, struggle to comprehensively capture the spatio-temporal-frequency characteristics of the signals, thereby limiting decoding accuracy. To address these limitations, this paper proposes a dual-branch neural network architecture with multi-domain feature fusion, the dual-branch spatio-temporal-frequency fusion convolutional network with Transformer (DB-STFFCNet). The DB-STFFCNet model consists of three modules: the spatiotemporal feature extraction module (STFE), the frequency feature extraction module (FFE), and the feature fusion and classification module. The STFE module employs a lightweight multi-dimensional attention network combined with a temporal Transformer encoder, capable of simultaneously modeling local fine-grained features and global spatiotemporal dependencies, effectively integrating spatiotemporal information and enhancing feature representation. The FFE module constructs a hierarchical feature refinement structure by leveraging the fast Fourier transform (FFT) and multi-scale frequency convolutions, while a frequency-domain Transformer encoder captures the global dependencies among frequency domain features, thus improving the model’s ability to represent key frequency information. Finally, the fusion module effectively consolidates the spatiotemporal and frequency features to achieve accurate classification. To evaluate the feasibility of the proposed method, experiments were conducted on the BCI Competition IV-2a and IV-2b public datasets, achieving accuracies of 83.13% and 89.54%, respectively, outperforming existing methods. This study provides a novel solution for joint time-frequency representation learning in EEG analysis. Full article
(This article belongs to the Special Issue Artificial Intelligence Methods for Biomedical Data Processing)
Show Figures

Figure 1

Back to TopTop