Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (88)

Search Parameters:
Keywords = fusion time localization encoding

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 772 KB  
Article
EVformer: A Spatio-Temporal Decoupled Transformer for Citywide EV Charging Load Forecasting
by Mengxin Jia and Bo Yang
World Electr. Veh. J. 2026, 17(2), 71; https://doi.org/10.3390/wevj17020071 - 31 Jan 2026
Viewed by 61
Abstract
Accurate forecasting of citywide electric vehicle (EV) charging load is critical for alleviating station-level congestion, improving energy dispatching, and supporting the stability of intelligent transportation systems. However, large-scale EV charging networks exhibit complex and heterogeneous spatio-temporal dependencies, and existing approaches often struggle to [...] Read more.
Accurate forecasting of citywide electric vehicle (EV) charging load is critical for alleviating station-level congestion, improving energy dispatching, and supporting the stability of intelligent transportation systems. However, large-scale EV charging networks exhibit complex and heterogeneous spatio-temporal dependencies, and existing approaches often struggle to scale with increasing station density or long forecasting horizons. To address these challenges, we develop a modular spatio-temporal prediction framework that decouples temporal sequence modeling from spatial dependency learning under an encoder–decoder paradigm. For temporal representation, we introduce a global aggregation mechanism that compresses multi-station time-series signals into a shared latent context, enabling efficient modeling of long-range interactions while mitigating the computational burden of cross-channel correlation learning. For spatial representation, we design a dynamic multi-scale attention module that integrates graph topology with data-driven neighbor selection, allowing the model to adaptively capture both localized charging dynamics and broader regional propagation patterns. In addition, a cross-step transition bridge and a gated fusion unit are incorporated to improve stability in multi-horizon forecasting. The cross-step transition bridge maps historical information to future time steps, reducing error propagation. The gated fusion unit adaptively merges the temporal and spatial features, dynamically adjusting their contributions based on the forecast horizon, ensuring effective balance between the two and enhancing prediction accuracy across multiple time steps. Extensive experiments on a real-world dataset of 18,061 charging piles in Shenzhen demonstrate that the proposed framework achieves superior performance over state-of-the-art baselines in terms of MAE, RMSE, and MAPE. Ablation and sensitivity analyses verify the effectiveness of each module, while efficiency evaluations indicate significantly reduced computational overhead compared with existing attention-based spatio-temporal models. Full article
(This article belongs to the Section Vehicle Management)
21 pages, 1289 KB  
Article
A Multi-Branch CNN–Transformer Feature-Enhanced Method for 5G Network Fault Classification
by Jiahao Chen, Yi Man and Yao Cheng
Appl. Sci. 2026, 16(3), 1433; https://doi.org/10.3390/app16031433 - 30 Jan 2026
Viewed by 114
Abstract
The deployment of 5G (Fifth-Generation) networks in industrial Internet of Things (IoT), intelligent transportation, and emergency communications introduces heterogeneous and dynamic network states, leading to frequent and diverse faults. Traditional fault detection methods typically emphasize either local temporal anomalies or global distributional characteristics, [...] Read more.
The deployment of 5G (Fifth-Generation) networks in industrial Internet of Things (IoT), intelligent transportation, and emergency communications introduces heterogeneous and dynamic network states, leading to frequent and diverse faults. Traditional fault detection methods typically emphasize either local temporal anomalies or global distributional characteristics, but rarely achieve an effective balance between the two. In this paper, we propose a parallel multi-branch convolutional neural network (CNN)–Transformer framework (MBCT) to improve fault diagnosis accuracy in 5G networks. Specifically, MBCT takes time-series network key performance indicator (KPI) data as input for training and performs feature extraction through three parallel branches: a CNN branch for local patterns and short-term fluctuations, a Transformer encoder branch for cross-layer and long-term dependencies, and a statistical branch for global features describing quality-of-experience (QoE) metrics. A gating mechanism and feature-weighted fusion are applied outside the branches to adjust inter-branch weights and intra-branch feature sensitivity. The fused representation is then nonlinearly mapped and fed into a classifier to generate the fault category. This paper evaluates the performance of the proposed model on both the publicly available TelecomTS multi-modal 5G network observability dataset and a self-collected SDR5GFD dataset based on software-defined radio (SDR). Experimental results demonstrate that the proposed model achieves superior performance in fault classification, achieving 87.7% accuracy on the TelecomTS dataset and 86.3% on the SDR5GFD dataset, outperforming the baseline models CNN, Transformer, and Random Forest. Moreover, the model contains approximately 0.57M parameters and requires about 0.3 MFLOPs per sample for inference, making it suitable for large-scale online fault diagnosis. Full article
Show Figures

Figure 1

23 pages, 3475 KB  
Article
YOLO-GSD-seg: YOLO for Guide Rail Surface Defect Segmentation and Detection
by Shijun Lai, Zuoxi Zhao, Yalong Mi, Kai Yuan and Qian Wang
Appl. Sci. 2026, 16(3), 1261; https://doi.org/10.3390/app16031261 - 26 Jan 2026
Viewed by 255
Abstract
To address the challenges of accurately extracting features from elongated scratches, irregular defects, and small-scale surface flaws on high-precision linear guide rails, this paper proposes a novel instance segmentation algorithm tailored for guide rail surface defect detection. The algorithm integrates the YOLOv8 instance [...] Read more.
To address the challenges of accurately extracting features from elongated scratches, irregular defects, and small-scale surface flaws on high-precision linear guide rails, this paper proposes a novel instance segmentation algorithm tailored for guide rail surface defect detection. The algorithm integrates the YOLOv8 instance segmentation framework with deformable convolutional networks and multi-scale feature fusion to enhance defect feature extraction and segmentation performance. A dedicated guide rail surface Defect (GSD) segmentation dataset is constructed to support model training and evaluation. In the backbone, the DCNv3 module is incorporated to strengthen the extraction of elongated and irregular defect features while simultaneously reducing model parameters. In the feature fusion network, a multi-scale feature fusion module and a triple-feature encoding module are introduced to jointly capture global contextual information and preserve fine-grained local defect details. Furthermore, a Channel and Position Attention Module (CPAM) is employed to integrate global and local features, improving the model’s sensitivity to channel and positional cues of small-target defects and thereby enhancing segmentation accuracy. Experimental results show that, compared with the original YOLOv8n-Seg, the proposed method achieves improvements of 3.9% and 3.8% in Box and Mask mAP50, while maintaining a real-time inference speed of 148 FPS. Additional evaluations on the public MSD dataset further demonstrate the model’s strong versatility and robustness. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision Technology and Its Applications)
Show Figures

Figure 1

36 pages, 12414 KB  
Article
A Replication-Competent Flavivirus Genome with a Stable GFP Insertion at the NS1-NS2A Junction
by Pavel Tarlykov, Bakytkali Ingirbay, Dana Auganova, Tolganay Kulatay, Viktoriya Keyer, Sabina Atavliyeva, Maral Zhumabekova, Arman Abeev and Alexandr V. Shustov
Biology 2026, 15(3), 220; https://doi.org/10.3390/biology15030220 - 24 Jan 2026
Viewed by 306
Abstract
The flavivirus NS1 protein is a component of the viral replication complex and plays diverse, yet poorly understood, roles in the viral life cycle. To enable real-time visualization of the developing replication organelle and biochemical analysis of tagged NS1 and its interacting partners, [...] Read more.
The flavivirus NS1 protein is a component of the viral replication complex and plays diverse, yet poorly understood, roles in the viral life cycle. To enable real-time visualization of the developing replication organelle and biochemical analysis of tagged NS1 and its interacting partners, we engineered a replication-competent yellow fever virus (YFV) replicon encoding a C-terminal fusion of NS1 with green fluorescent protein (NS1–GFP). The initial variant was non-viable in the absence of trans-complementation with wild-type NS1; however, viability was partially restored through the introduction of co-adaptive mutations in GFP (Q204R/A206V) and NS4A (M108L). Subsequent cell culture adaptation generated a 17-nucleotide frameshift within the NS1–GFP linker, resulting in a more flexible and less hydrophobic linker sequence. The optimized genome, in the form of a replicon, replicates in packaging cells that produce YFV structural proteins, as well as in naive BHK-21 cells. In the packaging cells, the adapted NS1–GFP replicon produces titers of infectious particles of approximately 10^6 FFU/mL and is genetically stable over five passages. The expressed NS1–GFP fusion protein localizes to the endoplasmic reticulum and co-fractionates with detergent-resistant heavy membranes, a hallmark of flavivirus replication organelles. This NS1–GFP replicon provides a novel platform for studying NS1 functions and can be further adapted for proximity-labeling strategies aimed at identifying the still-unknown protease responsible for NS1–NS2A cleavage. Full article
22 pages, 1781 KB  
Article
Multimodal Hybrid CNN-Transformer with Attention Mechanism for Sleep Stages and Disorders Classification Using Bio-Signal Images
by Innocent Tujyinama, Bessam Abdulrazak and Rachid Hedjam
Signals 2026, 7(1), 4; https://doi.org/10.3390/signals7010004 - 8 Jan 2026
Viewed by 406
Abstract
Background and Objective: The accurate detection of sleep stages and disorders in older adults is essential for the effective diagnosis and treatment of sleep disorders affecting millions worldwide. Although Polysomnography (PSG) remains the primary method for monitoring sleep in medical settings, it is [...] Read more.
Background and Objective: The accurate detection of sleep stages and disorders in older adults is essential for the effective diagnosis and treatment of sleep disorders affecting millions worldwide. Although Polysomnography (PSG) remains the primary method for monitoring sleep in medical settings, it is costly and time-consuming. Recent automated models have not fully explored and effectively fused the sleep features that are essential to identify sleep stages and disorders. This study proposes a novel automated model for detecting sleep stages and disorders in older adults by analyzing PSG recordings. PSG data include multiple channels, and the use of our proposed advanced methods reveals the potential correlations and complementary features across EEG, EOG, and EMG signals. Methods: In this study, we employed three novel advanced architectures, (1) CNNs, (2) CNNs with Bi-LSTM, and (3) CNNs with a transformer encoder, for the automatic classification of sleep stages and disorders using multichannel PSG data. The CNN extracts local features from RGB spectrogram images of EEG, EOG, and EMG signals individually, followed by an appropriate column-wise feature fusion block. The Bi-LSTM and transformer encoder are then used to learn and capture intra-epoch feature transition rules and dependencies. A residual connection is also applied to preserve the characteristics of the original joint feature maps and prevent gradient vanishing. Results: The experimental results in the CAP sleep database demonstrated that our proposed CNN with transformer encoder method outperformed standalone CNN, CNN with Bi-LSTM, and other advanced state-of-the-art methods in sleep stages and disorders classification. It achieves an accuracy of 95.2%, Cohen’s kappa of 93.6%, MF1 of 91.3%, and MGm of 95% for sleep staging, and an accuracy of 99.3%, Cohen’s kappa of 99.1%, MF1 of 99.2%, and MGm of 99.6% for disorder detection. Our model also achieves superior performance to other state-of-the-art approaches in the classification of N1, a stage known for its classification difficulty. Conclusions: To the best of our knowledge, we are the first group going beyond the standard to investigate and innovate a model architecture which is accurate and robust for classifying sleep stages and disorders in the elderly for both patient and non-patient subjects. Given its high performance, our method has the potential to be integrated and deployed into clinical routine care settings. Full article
(This article belongs to the Special Issue Advanced Methods of Biomedical Signal Processing II)
Show Figures

Figure 1

25 pages, 8372 KB  
Article
CAFE-DETR: A Sesame Plant and Weed Classification and Detection Algorithm Based on Context-Aware Feature Enhancement
by Pengyu Hou, Linjing Wei, Haodong Liu and Tianxiang Zhou
Agronomy 2026, 16(2), 146; https://doi.org/10.3390/agronomy16020146 - 7 Jan 2026
Viewed by 228
Abstract
Weed competition represents a primary constraint in sesame production, causing substantial yield losses typically ranging from 18 to 68% under inadequate control measures. Precise crop–weed discrimination remains challenging due to morphological similarities, complex field conditions, and vegetation overlapping. To address these issues, we [...] Read more.
Weed competition represents a primary constraint in sesame production, causing substantial yield losses typically ranging from 18 to 68% under inadequate control measures. Precise crop–weed discrimination remains challenging due to morphological similarities, complex field conditions, and vegetation overlapping. To address these issues, we developed Context-Aware Feature-Enhanced Detection Transformer (CAFE-DETR), an enhanced Real-Time Detection Transformer (RT-DETR) architecture optimized for sesame–weed identification. First, the C2f with a Unified Attention-Gating (C2f-UAG) module integrates unified head attention with convolutional gating mechanisms to enhance morphological discrimination capabilities. Second, the Hierarchical Context-Adaptive Fusion Network (HCAF-Net) incorporates hierarchical context extraction and spatial–channel enhancement to achieve multi-scale feature representation. Furthermore, the Polarized Linear Spatial Multi-scale Fusion Network (PLSM-Encoder) reduces computational complexity from O(N2) to O(N) through polarized linear attention while maintaining global semantic modeling. Additionally, the Focaler-MPDIoU loss function improves localization accuracy through point distance constraints and adaptive sample focusing. Experimental results on the sesame–weed dataset demonstrate that CAFE-DETR achieves 90.0% precision, 89.5% mAP50, and 59.5% mAP50–95, representing improvements of 13.07%, 4.92%, and 2.06% above the baseline RT-DETR, respectively, while reducing computational cost by 23.73% (43.4 GFLOPs) and parameter count by 10.55% (17.8 M). These results suggest that CAFE-DETR is a viable alternative for implementation in intelligent spraying systems and precision agriculture platforms. Notably, this study lacks external validation, cross-dataset testing, and field trials, which limits the generalizability of the model to diverse real-world agricultural scenarios. Full article
(This article belongs to the Collection AI, Sensors and Robotics for Smart Agriculture)
Show Figures

Figure 1

22 pages, 2074 KB  
Article
Traffic Flow Prediction Model Based on Attention Mechanism Spatio-Temporal Graph Convolutional Network on U.S. Highways
by Ruiying Zhang and Yin Han
Appl. Sci. 2026, 16(1), 559; https://doi.org/10.3390/app16010559 - 5 Jan 2026
Viewed by 304
Abstract
Traffic flow prediction is a fundamental component of intelligent transportation systems and plays a critical role in traffic management and autonomous driving. However, accurately modeling highway traffic remains challenging due to dynamic congestion propagation, lane-level heterogeneity, and non-recurrent traffic events. To address these [...] Read more.
Traffic flow prediction is a fundamental component of intelligent transportation systems and plays a critical role in traffic management and autonomous driving. However, accurately modeling highway traffic remains challenging due to dynamic congestion propagation, lane-level heterogeneity, and non-recurrent traffic events. To address these challenges, this paper proposes an improved attention-mechanism spatio-temporal graph convolutional network, termed AMSGCN, for highway traffic flow prediction. AMSGCN introduces an adaptive adjacency matrix learning mechanism to overcome the limitations of static graphs and capture time-varying spatial correlations and congestion propagation paths. A hierarchical multi-scale spatial attention mechanism is further designed to jointly model local congestion diffusion and long-range bottleneck effects, enabling an adaptive spatial receptive field under congested conditions. To enhance temporal modeling, a gating-based fusion strategy dynamically balances periodic patterns and recent observations, allowing effective prediction under both regular and abnormal traffic scenarios. In addition, direction-aware encoding is incorporated to suppress interference from opposite-direction lanes, which is essential for directional highway traffic systems. Extensive experiments on multiple benchmark datasets, including PeMS and PEMSF, demonstrate the effectiveness and robustness of AMSGCN. In particular, on the I-24 MOTION dataset, AMSGCN achieves an RMSE reduction of 11.0% compared to ASTGCN and 17.4% relative to the strongest STGCN baseline. Ablation studies further confirm that dynamic and multi-scale spatial attention provides the primary performance gains, while temporal gating and direction-aware modeling offer complementary improvements. These results indicate that AMSGCN is a robust and effective solution for highway traffic flow prediction. Full article
Show Figures

Figure 1

17 pages, 6410 KB  
Article
IESS-FusionNet: Physiologically Inspired EEG-EMG Fusion with Linear Recurrent Attention for Infantile Epileptic Spasms Syndrome Detection
by Junyuan Feng, Zhenzhen Liu, Linlin Shen, Xiaoling Luo, Yan Chen, Lin Li and Tian Zhang
Bioengineering 2026, 13(1), 57; https://doi.org/10.3390/bioengineering13010057 - 31 Dec 2025
Viewed by 599
Abstract
Infantile Epileptic Spasms Syndrome (IESS) is a devastating epileptic encephalopathy of infancy that carries a high risk of lifelong neurodevelopmental disability. Timely diagnosis is critical, as every week of delay in effective treatment is associated with worse cognitive outcomes. Although synchronized electroencephalogram (EEG) [...] Read more.
Infantile Epileptic Spasms Syndrome (IESS) is a devastating epileptic encephalopathy of infancy that carries a high risk of lifelong neurodevelopmental disability. Timely diagnosis is critical, as every week of delay in effective treatment is associated with worse cognitive outcomes. Although synchronized electroencephalogram (EEG) and surface electromyography (EMG) recordings capture both the electrophysiological and motor signatures of spasms, accurate automated detection remains challenging due to the non-stationary nature of the signals and the absence of physiologically plausible inter-modal fusion in current deep learning approaches. We introduce IESS-FusionNet, an end-to-end dual-stream framework specifically designed for accurate, real-time IESS detection from simultaneous EEG and EMG. Each modality is processed by a dedicated Unimodal Encoder that hierarchically integrates Continuous Wavelet Transform, Spatio-Temporal Convolution, and Bidirectional Mamba to efficiently extract frequency-specific, spatially structured, local and long-range temporal features within a compact module. A novel Cross Time-Mixing module, built upon the linear recurrent attention of the Receptance Weighted Key Value (RWKV) architecture, subsequently performs efficient, time-decaying, bidirectional cross-modal integration that explicitly respects the causal and physiological properties of cortico-muscular coupling during spasms. Evaluated on an in-house clinical dataset of synchronized EEG-EMG recordings from infants with confirmed IESS, IESS-FusionNet achieves 89.5% accuracy, 90.7% specificity, and 88.3% sensitivity, significantly outperforming recent unimodal and multimodal baselines. Comprehensive ablation studies validate the contribution of each component, while the proposed cross-modal fusion requires approximately 60% fewer parameters than equivalent quadratic cross-attention mechanisms, making it suitable for real-time clinical deployment. IESS-FusionNet delivers an accurate, computationally efficient solution with physiologically inspired cross-modal fusion for the automated detection of infantile epileptic spasms, offering promise for future clinical applications in reducing diagnostic delay. Full article
Show Figures

Figure 1

29 pages, 5902 KB  
Article
MSLCP-DETR: A Multi-Scale Linear Attention and Sparse Fusion Framework for Infrared Small Target Detection in Vehicle-Mounted Systems
by Fu Li, Meimei Zhu, Ming Zhao, Yuxin Sun and Wangyu Wu
Mathematics 2026, 14(1), 67; https://doi.org/10.3390/math14010067 - 24 Dec 2025
Viewed by 289
Abstract
Detecting small infrared targets in vehicle-mounted systems remains challenging due to weak thermal radiation, cross-scale feature loss, and dynamic background interference. To address these issues, this paper proposes MSLCP-DETR, an enhanced RT-DETR-based framework that integrates multi-scale linear attention and sparse fusion mechanisms. The [...] Read more.
Detecting small infrared targets in vehicle-mounted systems remains challenging due to weak thermal radiation, cross-scale feature loss, and dynamic background interference. To address these issues, this paper proposes MSLCP-DETR, an enhanced RT-DETR-based framework that integrates multi-scale linear attention and sparse fusion mechanisms. The model introduces three novel components: a Multi-Scale Linear Attention Encoder (MSLA-AIFI), which combines multi-branch depth-wise convolution with linear attention to efficiently capture cross-scale features while reducing computational complexity; a Cross-Scale Small Object Feature Optimization module (CSOFO), which enhances the localization of small targets in dense scenes through spatial rearrangement and dynamic modeling; and a Pyramid Sparse Transformer (PST), which replaces traditional dense fusion with a dual-branch sparse attention mechanism to improve both accuracy and real-time performance. Extensive experiments on the M3FD and FLIR datasets demonstrate that MSLCP-DETR achieves an excellent balance between accuracy and efficiency, with its precision, mAP@50, and mAP@50:95 reaching 90.3%, 79.5%, and 86.0%, respectively. Ablation studies and visual analysis further validate the effectiveness of the proposed modules and the overall design strategy. Full article
Show Figures

Figure 1

21 pages, 3813 KB  
Article
HMRM: A Hybrid Motion and Region-Fused Mamba Network for Micro-Expression Recognition
by Zhe Guo, Yi Liu, Rui Luo, Jiayi Liu and Lan Wei
Sensors 2025, 25(24), 7672; https://doi.org/10.3390/s25247672 - 18 Dec 2025
Viewed by 448
Abstract
Micro-expression recognition (MER), as an important branch of intelligent visual sensing, enables the analysis of subtle facial movements for applications in emotion understanding, human–computer interaction and security monitoring. However, existing methods struggle to capture fine-grained spatiotemporal dynamics under limited data and computational resources, [...] Read more.
Micro-expression recognition (MER), as an important branch of intelligent visual sensing, enables the analysis of subtle facial movements for applications in emotion understanding, human–computer interaction and security monitoring. However, existing methods struggle to capture fine-grained spatiotemporal dynamics under limited data and computational resources, making them difficult to deploy in real-world sensing systems. To address this limitation, we propose HMRM, a hybrid motion and region-fused Mamba network designed for efficient and accurate MER. HMRM enhances motion representation through a hybrid feature augmentation module that integrates gated recurrent unit (GRU)-attention optical flow estimation with a regional MotionMix enhancement strategy to increase motion diversity. Furthermore, it employs a grained Mamba encoder to achieve lightweight and effective long-range temporal modeling. Additionally, a regions feature fusion strategy is introduced to strengthen the representation of localized expression dynamics. Experiments on multiple MER benchmark datasets demonstrate that HMRM achieves state-of-the-art performance with strong generalization and low computational cost, highlighting its potential for integration into compact, real-time visual sensing and emotion analysis systems. Full article
(This article belongs to the Special Issue Emotion Recognition and Cognitive Behavior Analysis Based on Sensors)
Show Figures

Figure 1

23 pages, 40152 KB  
Article
Leveraging Time–Frequency Distribution Priors and Structure-Aware Adaptivity for Wideband Signal Detection and Recognition in Wireless Communications
by Xikang Wang, Hua Xu, Zisen Qi, Qingwei Meng, Hongcheng Fan, Yunhao Shi and Wenran Le
Sensors 2025, 25(24), 7650; https://doi.org/10.3390/s25247650 - 17 Dec 2025
Viewed by 405
Abstract
Wideband signal detection and recognition (WSDR) is considered an effective technical means for monitoring and analyzing spectra. The mainstream technical route involves constructing time–frequency representations for wideband sampled signals and then achieving signal detection and recognition through deep learning-based object detection models. However, [...] Read more.
Wideband signal detection and recognition (WSDR) is considered an effective technical means for monitoring and analyzing spectra. The mainstream technical route involves constructing time–frequency representations for wideband sampled signals and then achieving signal detection and recognition through deep learning-based object detection models. However, existing methods exhibit insufficient attention on the prior information contained in the time–frequency domain and the structural features of signals, leaving ample room for further exploration and optimization. In this paper, we propose a novel model called TFDP-SANet for the WSDR task, which is based on time–frequency distribution priors and structure-aware adaptivity. Initially, considering the horizontal directionality and banded structure characteristics of the signal in the time–frequency representation, we introduce both the Strip Pooling Module (SPM) and Coordinate Attention (CA) mechanism during the feature extraction and fusion stages. These components enable the model to aggregate long-distance dependencies along horizontal and vertical directions, mitigate noise interference outside local windows, and enhance focus on the spatial distributions and shape characteristics of signals. Furthermore, we adopt an adaptive elliptical Gaussian encoding strategy to generate heatmaps, which enhances the adaptability of the effective guidance region for center-point localization to the target shape. During inference, we design a Time–Frequency Clustering Optimizer (TFCO) that leverages prior information to adjust the class of predicted bounding boxes, further improving accuracy. We conduct a series of ablation experiments and comparative experiments on the WidebandSig53 (WBSig53) dataset, and the results demonstrate that our proposed method outperforms existing approaches on most metrics. Full article
Show Figures

Figure 1

24 pages, 596 KB  
Article
Deep Learning-Based Fusion of Multimodal MRI Features for Brain Tumor Detection
by Bakhita Salman, Eithar Yassin, Deepak Ganta and Hermes Luna
Appl. Sci. 2025, 15(24), 13155; https://doi.org/10.3390/app152413155 - 15 Dec 2025
Viewed by 1097
Abstract
Despite advances in deep learning, brain tumor detection from MRI continues to face major challenges, including the limited robustness of single-modality models, the computational burden of transformer-based architectures, opaque fusion strategies, and the lack of efficient binary screening tools. To address these issues, [...] Read more.
Despite advances in deep learning, brain tumor detection from MRI continues to face major challenges, including the limited robustness of single-modality models, the computational burden of transformer-based architectures, opaque fusion strategies, and the lack of efficient binary screening tools. To address these issues, we propose a lightweight multimodal CNN framework that integrates T1, T2, and FLAIR MRI sequences using modality-specific encoders and a channel-wise fusion module (concatenation followed by a 1 × 1 convolution). The pipeline incorporates U-Net-based segmentation for tumor-focused patch extraction, improving localization and reducing irrelevant background. Evaluated on the BraTS 2020 dataset (7500 slices; 70/15/15 patient-level split), the proposed model achieves 93.8% accuracy, 94.1% F1-score, and 19 ms inference time. It outperforms all single-modality ablations by up to 5% and achieves competitive or superior performance to transformer-based baselines while using over 98% fewer parameters. Grad-CAM and LIME visualizations further confirm clinically meaningful tumor-region activation. Overall, this efficient and interpretable multimodal framework advances scalable brain tumor screening and supports integration into real-time clinical workflows. Full article
Show Figures

Figure 1

26 pages, 2632 KB  
Article
CAGM-Seg: A Symmetry-Driven Lightweight Model for Small Object Detection in Multi-Scenario Remote Sensing
by Hao Yao, Yancang Li, Wenzhao Feng, Ji Zhu, Haiming Yan, Shijun Zhang and Hanfei Zhao
Symmetry 2025, 17(12), 2137; https://doi.org/10.3390/sym17122137 - 12 Dec 2025
Viewed by 417
Abstract
In order to address challenges in small object recognition for remote sensing imagery—including high model complexity, overfitting with small samples, and insufficient cross-scenario generalization—this study proposes CAGM-Seg, a lightweight recognition model integrating multi-attention mechanisms. The model systematically enhances the U-Net architecture: First, the [...] Read more.
In order to address challenges in small object recognition for remote sensing imagery—including high model complexity, overfitting with small samples, and insufficient cross-scenario generalization—this study proposes CAGM-Seg, a lightweight recognition model integrating multi-attention mechanisms. The model systematically enhances the U-Net architecture: First, the encoder adopts a pre-trained MobileNetV3-Large as the backbone network, incorporating a coordinate attention mechanism to strengthen spatial localization of min targets. Second, an attention gating module is introduced in skip connections to achieve adaptive fusion of cross-level features. Finally, the decoder fully employs depthwise separable convolutions to significantly reduce model parameters. This design embodies a symmetry-aware philosophy, which is reflected in two aspects: the structural symmetry between the encoder and decoder facilitates multi-scale feature fusion, while the coordinate attention mechanism performs symmetric decomposition of spatial context (i.e., along height and width directions) to enhance the perception of geometrically regular small targets. Regarding training strategy, a hybrid loss function combining Dice Loss and Focal Loss, coupled with the AdamW optimizer, effectively enhances the model’s sensitivity to small objects while suppressing overfitting. Experimental results on the Xingtai black and odorous water body identification task demonstrate that CAGM-Seg outperforms comparison models in key metrics including precision (97.85%), recall (98.08%), and intersection-over-union (96.01%). Specifically, its intersection-over-union surpassed SegNeXt by 11.24 percentage points and PIDNet by 8.55 percentage points; its F1 score exceeded SegFormer by 2.51 percentage points. Regarding model efficiency, CAGM-Seg features a total of 3.489 million parameters, with 517,000 trainable parameters—approximately 80% fewer than the baseline U-Net—achieving a favorable balance between recognition accuracy and computational efficiency. Further cross-task validation demonstrates the model’s robust cross-scenario adaptability: it achieves 82.77% intersection-over-union and 90.57% F1 score in landslide detection, while maintaining 87.72% precision and 86.48% F1 score in cloud detection. The main contribution of this work is the effective resolution of key challenges in few-shot remote sensing small-object recognition—notably inadequate feature extraction and limited model generalization—via the strategic integration of multi-level attention mechanisms within a lightweight architecture. The resulting model, CAGM-Seg, establishes an innovative technical framework for real-time image interpretation under edge-computing constraints, demonstrating strong potential for practical deployment in environmental monitoring and disaster early warning systems. Full article
Show Figures

Figure 1

22 pages, 28862 KB  
Article
Efficient Global–Local Context Fusion with Mobile-Optimized Transformers for Concrete Dam Crack Inspection
by Jiarui Hu, Ben Huang and Fei Kang
Buildings 2025, 15(24), 4487; https://doi.org/10.3390/buildings15244487 - 11 Dec 2025
Viewed by 288
Abstract
To address the difficulties in characterizing fine crack morphology, the limitations of detection accuracy, and the challenge of real-time deployment caused by large model parameter counts in concrete dam crack detection, this paper constructs DamCrackSet-1K, a high-resolution dataset with pixel-level annotations covering multiple [...] Read more.
To address the difficulties in characterizing fine crack morphology, the limitations of detection accuracy, and the challenge of real-time deployment caused by large model parameter counts in concrete dam crack detection, this paper constructs DamCrackSet-1K, a high-resolution dataset with pixel-level annotations covering multiple crack scenarios; proposes a lightweight semantic segmentation framework, MTC-Net, which integrates a MobileNetV2 encoder with Enhanced Transformer modules to achieve global–local feature fusion and enhance feature extraction; and designs a geometry-sensitive Curvature-Aware loss function to effectively mitigate pixel-level class imbalance for fine cracks. Experiments show that, while significantly reducing the number of model parameters, the method greatly improves crack detection accuracy and inference speed, providing a feasible solution for efficient, real-time crack detection in dams. Full article
(This article belongs to the Section Building Structures)
Show Figures

Figure 1

35 pages, 3744 KB  
Review
Intelligent Fault Diagnosis for HVDC Systems Based on Knowledge Graph and Pre-Trained Models: A Critical and Comprehensive Review
by Qiang Li, Yue Ma, Jinyun Yu, Shenghui Cao, Shihong Zhang, Pengwang Zhang and Bo Yang
Energies 2025, 18(24), 6438; https://doi.org/10.3390/en18246438 - 9 Dec 2025
Viewed by 530
Abstract
High-voltage direct-current (HVDC) systems are essential for large-scale renewable integration and asynchronous interconnection, yet their complex topologies and multi-type faults expose the limits of threshold- and signal-based diagnostics. These methods degrade under noisy, heterogeneous measurements acquired under dynamic operating conditions, resulting in poor [...] Read more.
High-voltage direct-current (HVDC) systems are essential for large-scale renewable integration and asynchronous interconnection, yet their complex topologies and multi-type faults expose the limits of threshold- and signal-based diagnostics. These methods degrade under noisy, heterogeneous measurements acquired under dynamic operating conditions, resulting in poor adaptability, reduced accuracy, and high latency. To overcome these shortcomings, the synergistic use of knowledge graphs (KGs) and pre-trained models (PTMs) is emerging as a next-generation paradigm. KGs encode equipment parameters, protection logic, and fault propagation paths in an explicit, human-readable structure, while PTMs provide transferable representations that remain effective under label scarcity and data diversity. Coupled within a perception–cognition–decision loop, PTMs first extract latent fault signatures from multi-modal records; KGs then enable interpretable causal inference, yielding both precise localization and transparent explanations. This work systematically reviews the theoretical foundations, fusion strategies, and implementation pipelines of KG-PTM frameworks tailored to HVDC systems, benchmarking them against traditional diagnostic schemes. The paradigm demonstrates superior noise robustness, few-shot generalization, and decision explainability. However, open challenges remain, such as automated, conflict-free knowledge updating; principled integration of electro-magnetic physical constraints; real-time, resource-constrained deployment; and quantifiable trustworthiness. Future research should therefore advance autonomous knowledge engineering, physics-informed pre-training, lightweight model compression, and standardized evaluation platforms to translate KG-PTM prototypes into dependable industrial tools for intelligent HVDC operation and maintenance. Full article
(This article belongs to the Special Issue Energy, Electrical and Power Engineering: 5th Edition)
Show Figures

Figure 1

Back to TopTop