error_outline You can access the new MDPI.com website here. Explore and share your feedback with us.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (337)

Search Parameters:
Keywords = dual self-attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
30 pages, 6739 KB  
Article
A Fusion Algorithm for Pedestrian Anomaly Detection and Tracking on Urban Roads Based on Multi-Module Collaboration and Cross-Frame Matching Optimization
by Wei Zhao, Xin Gong, Lanlan Li and Luoyang Zuo
Sensors 2026, 26(2), 400; https://doi.org/10.3390/s26020400 - 8 Jan 2026
Abstract
Amid rapid advancements in artificial intelligence, the detection of abnormal human behaviors in complex traffic environments has garnered significant attention. However, detection errors frequently occur due to interference from complex backgrounds, small targets, and other factors. Therefore, this paper proposes a research methodology [...] Read more.
Amid rapid advancements in artificial intelligence, the detection of abnormal human behaviors in complex traffic environments has garnered significant attention. However, detection errors frequently occur due to interference from complex backgrounds, small targets, and other factors. Therefore, this paper proposes a research methodology that integrates the anomaly detection YOLO-SGCF algorithm with the tracking BoT-SORT-ReID algorithm. The detection module uses YOLOv8 as the baseline model, incorporating Swin Transformer to enhance global feature modeling capabilities in complex scenes. CBAM and CA attention are embedded into the Neck and backbone, respectively: CBAM enables dual-dimensional channel-spatial weighting, while CA precisely captures object location features by encoding coordinate information. The Neck layer incorporates GSConv convolutional modules to reduce computational load while expanding feature receptive fields. The loss function is replaced with Focal-EIoU to address sample imbalance issues and precisely optimize bounding box regression. For tracking, to enhance long-term tracking stability, ReID feature distances are incorporated during the BoT-SORT data association phase. This integrates behavioral category information from YOLO-SGCF, enabling the identification and tracking of abnormal pedestrian behaviors in complex environments. Evaluations on our self-built dataset (covering four abnormal behaviors: Climb, Fall, Fight, Phone) show mAP@50%, precision, and recall reaching 92.2%, 90.75%, and 86.57% respectively—improvements of 3.4%, 4.4%, and 6% over the original model—while maintaining an inference speed of 328.49 FPS. Additionally, generalization testing on the UCSD Ped1 dataset (covering six abnormal behaviors: Biker, Skater, Car, Wheelchair, Lawn, Runner) yielded an mAP score of 92.7%, representing a 1.5% improvement over the original model and outperforming existing mainstream models. Furthermore, the tracking algorithm achieved an MOTA of 90.8% and an MOTP of 92.6%, with a 47.6% reduction in IDS, demonstrating superior tracking performance compared to existing mainstream algorithms. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

27 pages, 4932 KB  
Article
Automated Facial Pain Assessment Using Dual-Attention CNN with Clinically Calibrated High-Reliability and Reproducibility Framework
by Albert Psatrick Sankoh, Ali Raza, Khadija Parwez, Wesam Shishah, Ayman Alharbi, Mubeen Javed and Muhammad Bilal
Biomimetics 2026, 11(1), 51; https://doi.org/10.3390/biomimetics11010051 - 8 Jan 2026
Abstract
Accurate and quantitative pain assessment remains a major challenge in clinical medicine, especially for patients unable to verbalize discomfort. Conventional methods based on self-reports or clinician observation are subjective and inconsistent. This study introduces a novel automated facial pain assessment framework built on [...] Read more.
Accurate and quantitative pain assessment remains a major challenge in clinical medicine, especially for patients unable to verbalize discomfort. Conventional methods based on self-reports or clinician observation are subjective and inconsistent. This study introduces a novel automated facial pain assessment framework built on a dual-attention convolutional neural network (CNN) that achieves clinically calibrated, high-reliability performance and interpretability. The architecture combines multi-head spatial attention to localize pain-relevant facial regions with an enhanced channel attention block employing triple-pooling (average, max, and standard deviation) to capture discriminative intensity features. Regularization through label smoothing (α = 0.1) and AdamW optimization ensures calibrated, stable convergence. Evaluated on a clinically annotated dataset using subject-wise stratified sampling, the proposed model achieved a test accuracy of 90.19% ± 0.94%, with an average 5-fold cross-validation accuracy of 83.60% ± 1.55%. The model further attained an F1-score of 0.90 and Cohen’s κ = 0.876, with macro- and micro-AUCs of 0.991 and 0.992, respectively. The evaluation covers five pain classes (No Pain, Mid Pain, Moderate Pain, Severe Pain, and Very Pain) using subject-wise splits comprising 5840 total images and 1160 test samples. Comparative benchmarking and ablation experiments confirm each module’s contribution, while Grad-CAM visualizations highlight physiologically relevant facial regions. The results demonstrate a robust, explainable, and reproducible framework suitable for integration into real-world automated pain-monitoring systems. Inspired by biological pain perception mechanisms and human facial muscle responses, the proposed framework aligns with biomimetic sensing principles by emulating how localized facial cues contribute to pain interpretation. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) in Biomedical Engineering: 2nd Edition)
Show Figures

Figure 1

16 pages, 1561 KB  
Article
TSAformer: A Traffic Flow Prediction Model Based on Cross-Dimensional Dependency Capture
by Haoning Lv, Xi Chen and Weijie Xiu
Electronics 2026, 15(1), 231; https://doi.org/10.3390/electronics15010231 - 4 Jan 2026
Viewed by 77
Abstract
Accurate multivariate traffic flow forecasting is critical for intelligent transportation systems yet remains challenging due to the complex interplay of temporal dynamics and spatial interactions. While Transformer-based models have shown promise in capturing long-range temporal dependencies, most existing approaches compress multidimensional observations into [...] Read more.
Accurate multivariate traffic flow forecasting is critical for intelligent transportation systems yet remains challenging due to the complex interplay of temporal dynamics and spatial interactions. While Transformer-based models have shown promise in capturing long-range temporal dependencies, most existing approaches compress multidimensional observations into flattened sequences—thereby neglecting explicit modeling of cross-dimensional (i.e., spatial or inter-variable) relationships, which are essential for capturing traffic propagation, network-wide congestion, and node-specific behaviors. To address this limitation, we propose TSAformer, a novel Transformer architecture that explicitly preserves and jointly models time and dimension as dual structural axes. TSAformer begins with a multimodal input embedding layer that encodes raw traffic values alongside temporal context (time-of-day and day-of-week) and node-specific positional features, ensuring rich semantic representation. The core of TSAformer is the Two-Stage Attention (TSA) module, which first models intra-dimensional temporal evolution via time-axis self-attention then captures inter-dimensional spatial interactions through a lightweight routing mechanism—avoiding quadratic complexity while enabling all-to-all cross-node communication. Built upon TSA, a hierarchical encoder–decoder (HED) structure further enhances forecasting by modeling traffic patterns across multiple temporal scales, from fine-grained fluctuations to macroscopic trends, and fusing predictions via cross-scale attention. Extensive experiments on three real-world traffic datasets—including urban road networks and highway systems—demonstrate that TSAformer consistently outperforms state-of-the-art baselines across short-term and long-term forecasting horizons. Notably, it achieves top-ranked performance in 36 out of 58 critical evaluation scenarios, including peak-hour and event-driven congestion prediction. By explicitly modeling both temporal and dimensional dependencies without structural compromise, TSAformer provides a scalable, interpretable, and high-performance solution for spatiotemporal traffic forecasting. Full article
(This article belongs to the Special Issue Artificial Intelligence for Traffic Understanding and Control)
Show Figures

Figure 1

20 pages, 3293 KB  
Article
From Data to Autonomy: Integrating Demographic Factors and AI Models for Expert-Free Exercise Coaching
by Uğur Özbalkan and Özgür Can Turna
Appl. Sci. 2026, 16(1), 488; https://doi.org/10.3390/app16010488 - 3 Jan 2026
Viewed by 109
Abstract
This study investigates the performance of three deep learning architectures—LSTM with Attention, GRU with Attention, and Transformer—in the context of real-time, self-guided exercise classification, using coordinate data collected from 103 participants via a dual-camera system. Each model was evaluated over ten randomized runs [...] Read more.
This study investigates the performance of three deep learning architectures—LSTM with Attention, GRU with Attention, and Transformer—in the context of real-time, self-guided exercise classification, using coordinate data collected from 103 participants via a dual-camera system. Each model was evaluated over ten randomized runs to ensure robustness and statistical validity. The GRU + Attention and LSTM + Attention models demonstrated consistently high test accuracy (mean ≈ 98.9%), while the Transformer model yielded significantly lower accuracy (mean ≈ 96.6%) with greater variance. Paired t-tests confirmed that the difference between LSTM and GRU models was not statistically significant (p = 0.9249), while both models significantly outperformed the Transformer architecture (p < 0.01). In addition, participant-specific features, such as athletic experience and BMI, were found to affect classification accuracy. These findings support the feasibility of AI-based feedback systems in enhancing unsupervised training, offering a scalable solution to bridge the gap between expert supervision and autonomous physical practice. Full article
Show Figures

Figure 1

19 pages, 3550 KB  
Article
CAG-Net: A Novel Change Attention Guided Network for Substation Defect Detection
by Dao Xiang, Xiaofei Du and Zhaoyang Liu
Mathematics 2026, 14(1), 178; https://doi.org/10.3390/math14010178 - 2 Jan 2026
Viewed by 164
Abstract
Timely detection and handling of substation defects plays a foundational role in ensuring the stable operation of power systems. Existing substation defect detection methods fail to make full use of the temporal information contained in substation inspection samples, resulting in problems such as [...] Read more.
Timely detection and handling of substation defects plays a foundational role in ensuring the stable operation of power systems. Existing substation defect detection methods fail to make full use of the temporal information contained in substation inspection samples, resulting in problems such as weak generalization ability and susceptibility to background interference. To address these issues, a change attention guided substation defect detection algorithm (CAG-Net) based on a dual-temporal encoder–decoder framework is proposed. The encoder module employs a Siamese backbone network composed of efficient local-global context aggregation modules to extract multi-scale features, balancing local details and global semantics, and designs a change attention guidance module that takes feature differences as attention weights to dynamically enhance the saliency of defect regions and suppress background interference. The decoder module adopts an improved FPN structure to fuse high-level and low-level features, supplement defect details, and improve the model’s ability to detect small targets and multi-scale defects. Experimental results on the self-built substation multi-phase defect dataset (SMDD) show that the proposed method achieves 81.76% in terms of mAP, which is 3.79% higher than that of Faster R-CNN and outperforms mainstream detection models such as GoldYOLO and YOLOv10. Ablation experiments and visualization analysis demonstrate that the method can effectively focus on defect regions in complex environments, improving the positioning accuracy of multi-scale targets. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

16 pages, 1260 KB  
Article
DAR-Swin: Dual-Attention Revamped Swin Transformer for Intelligent Vehicle Perception Under NVH Disturbances
by Xinglong Zhang, Zhiguo Zhang, Huihui Zuo, Chaotan Xue, Zhenjiang Wu, Zhiyu Cheng and Yan Wang
Machines 2026, 14(1), 51; https://doi.org/10.3390/machines14010051 - 31 Dec 2025
Viewed by 183
Abstract
In recent years, deep learning-based image classification has made significant progress, especially in safety-critical perception fields such as intelligent vehicles. Factors such as vibrations caused by NVH (noise, vibration, and harshness), sensor noise, and road surface roughness pose challenges to robustness and real-time [...] Read more.
In recent years, deep learning-based image classification has made significant progress, especially in safety-critical perception fields such as intelligent vehicles. Factors such as vibrations caused by NVH (noise, vibration, and harshness), sensor noise, and road surface roughness pose challenges to robustness and real-time deployment. The Transformer architecture has become a fundamental component of high-performance models. However, in complex visual environments, shifted window attention mechanisms exhibit inherent limitations: although computationally efficient, local window constraints impede cross-region semantic integration, while deep feature processing obstructs robust representation learning. To address these challenges, we propose DAR-Swin (Dual-Attention Revamped Swin Transformer), enhancing the framework through two complementary attention mechanisms. First, Scalable Self-Attention universally substitutes the standard Window-based Multi-head Self-Attention via sub-quadratic complexity operators. These operators decouple spatial positions from feature associations, enabling position-adaptive receptive fields for comprehensive contextual modeling. Second, Latent Proxy Attention integrated before the classification head adopts a learnable spatial proxy to integrate global semantic information into a fixed-size representation, while preserving relational semantics and achieving linear computational complexity through efficient proxy interactions. Extensive experiments demonstrate significant improvements over Swin Transformer Base, achieving 87.3% top-1 accuracy on CIFAR-100 (+1.5% absolute improvement) and 57.0% mAP on COCO2017 (+1.3% absolute improvement). These characteristics are particularly important for the active and passive safety features of intelligent vehicles. Full article
Show Figures

Figure 1

29 pages, 15342 KB  
Article
GS-BiFPN-YOLO: A Lightweight and Efficient Method for Segmenting Cotton Leaves in the Field
by Weiqing Wu and Liping Chen
Agriculture 2026, 16(1), 102; https://doi.org/10.3390/agriculture16010102 - 31 Dec 2025
Viewed by 165
Abstract
Instance segmentation of cotton leaves in complex field environments presents challenges including low accuracy, high computational complexity, and costly data annotation. This paper presents GS-BiFPN-YOLO, a lightweight instance segmentation method that integrates SAM for semi-automatic labeling and enhances YOLOv11n-seg with GSConv, BiFPN, and [...] Read more.
Instance segmentation of cotton leaves in complex field environments presents challenges including low accuracy, high computational complexity, and costly data annotation. This paper presents GS-BiFPN-YOLO, a lightweight instance segmentation method that integrates SAM for semi-automatic labeling and enhances YOLOv11n-seg with GSConv, BiFPN, and CBAMs to reduce annotation cost and improve accuracy. To streamline parameters, the YOLOv11-seg architecture incorporates the lightweight GSConv module, utilizing group convolution and channel shuffle. Integration of a Bidirectional Feature Pyramid Network (BiFPN) enhances multi-scale feature fusion, while a Convolutional Block Attention Module (CBAM) boosts discriminative focus on leaf regions through dual-channel and spatial attention mechanisms. Experimental results on a self-built cotton leaf dataset reveal that GS-BiFPN-YOLO achieves a bounding box and mask mAP@0.5 of 0.988 and a recall of 0.972, maintaining a computational cost of 9.0 GFLOPs and achieving an inference speed of 322 FPS. In comparison to other lightweight models (YOLOv8n-seg to YOLOv12n-seg), the proposed approach achieves superior segmentation accuracy while preserving high real-time performance. This research offers a practical solution for precise and efficient cotton leaf instance segmentation, thereby facilitating the advancement of intelligent monitoring systems for cotton production. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

34 pages, 5124 KB  
Article
A Deep Ship Trajectory Clustering Method Based on Feature Embedded Representation Learning
by Yifei Liu, Zhangsong Shi, Bing Fu, Jiankang Ke, Huihui Xu and Xuan Wang
J. Mar. Sci. Eng. 2026, 14(1), 81; https://doi.org/10.3390/jmse14010081 - 31 Dec 2025
Viewed by 126
Abstract
Trajectory clustering is of great significance for identifying behavioral patterns and vessel types of non-cooperative ships. However, existing trajectory clustering methods suffer from limitations in extracting cross-spatiotemporal scale features and modeling the coupling relationship between positional and motion features, which restricts clustering performance. [...] Read more.
Trajectory clustering is of great significance for identifying behavioral patterns and vessel types of non-cooperative ships. However, existing trajectory clustering methods suffer from limitations in extracting cross-spatiotemporal scale features and modeling the coupling relationship between positional and motion features, which restricts clustering performance. To address this, this study proposes a deep ship trajectory clustering method based on feature embedding representation learning (ERL-DTC). The method designs a Temporal Attention-based Multi-scale feature Aggregation Network (TA-MAN) to achieve dynamic fusion of trajectory features from micro to macro scales. A Dual-feature Self-attention Fusion Encoder (DualSFE) is employed to decouple and jointly represent the spatiotemporal position and motion features of trajectories. A two-stage optimization strategy of “pre-training and joint training” is adopted, combining contrastive loss and clustering loss to jointly constrain the embedding representation learning, ensuring it preserves trajectory similarity relationships while being adapted to the clustering task. Experiments on a public vessel trajectory dataset show that for a four-class task (K = 4), ERL-DTC improves ACC by approximately 14.1% compared to the current best deep clustering method, with NMI and ARI increasing by about 28.9% and 30.2%, respectively. It achieves the highest Silhouette Coefficient (SC) and the lowest Davies-Bouldin Index (DBI), indicating a tighter and more clearly separated cluster structure. Furthermore, its inference efficiency is improved by two orders of magnitude compared to traditional point-matching-based methods, without significantly increasing runtime due to model complexity. Ablation studies and parameter sensitivity analysis further validate the necessity of each module design and the rationality of hyperparameter settings. This research provides an efficient and robust solution for feature learning and clustering of vessel trajectories across spatiotemporal scales. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

23 pages, 2359 KB  
Article
Short-Term Frost Prediction During Apple Flowering in Luochuan Using a 1D-CNN–BiLSTM Network with Attention Mechanism
by Chenxi Yang and Huaibo Song
Horticulturae 2026, 12(1), 47; https://doi.org/10.3390/horticulturae12010047 - 30 Dec 2025
Viewed by 236
Abstract
Early spring frost is a major meteorological hazard during the Apple Flowering period. To improve frost event prediction, this study proposes a hybrid 1D-CNN-BiLSTM-Attention model, with its core novelty lying in the integrated dual attention mechanism (Self-attention and Cross-variable Attention) and hybrid architecture. [...] Read more.
Early spring frost is a major meteorological hazard during the Apple Flowering period. To improve frost event prediction, this study proposes a hybrid 1D-CNN-BiLSTM-Attention model, with its core novelty lying in the integrated dual attention mechanism (Self-attention and Cross-variable Attention) and hybrid architecture. The 1D-CNN extracts extreme points and mutation features from meteorological factors, while BiLSTM captures long-term patterns such as cold wave accumulation. The dual attention mechanisms dynamically weight key frost precursors (low temperature, high humidity, calm wind), aiming to enhance the model’s focus on critical information. Using 1997–2016 data from Luochuan (four variables: Ground Surface Temperature (GST), Air Temperature (TEM), Wind Speed (WS), Relative Humidity (RH)), a segmented interpolation method increased temporal resolution to 4 h, and an adaptive Savitzky–Golay Filter reduced noise. For frost classification, Recall, Precision, and F1-score were higher than those of baseline models, and the model showed good agreement with the actual frost events in Luochuan on 6, 9, and 10 April 2013. The 4 h lead time could provide growers with timely guidance to take mitigation measures, alleviating potential losses. This research may offer modest technical references for frost prediction during the Apple Flowering period in similar regions. Full article
(This article belongs to the Section Fruit Production Systems)
Show Figures

Figure 1

19 pages, 6358 KB  
Article
AFCLNet: An Attention and Feature-Consistency-Loss-Based Multi-Task Learning Network for Affective Matching Prediction in Music–Video Clips
by Zhibin Su, Jinyu Liu, Luyue Zhang, Yiming Feng and Hui Ren
Sensors 2026, 26(1), 123; https://doi.org/10.3390/s26010123 - 24 Dec 2025
Viewed by 378
Abstract
Emotion matching prediction between music and video segments is essential for intelligent mobile sensing systems, where multimodal affective cues collected from smart devices must be jointly analyzed for context-aware media understanding. However, traditional approaches relying on single-modality feature extraction struggle to capture complex [...] Read more.
Emotion matching prediction between music and video segments is essential for intelligent mobile sensing systems, where multimodal affective cues collected from smart devices must be jointly analyzed for context-aware media understanding. However, traditional approaches relying on single-modality feature extraction struggle to capture complex cross-modal dependencies, resulting in a gap between low-level audiovisual signals and high-level affective semantics. To address these challenges, a dual-driven framework that integrates perceptual characteristics with objective feature representations is proposed for audiovisual affective matching prediction. The framework incorporates fine-grained affective states of audiovisual data to better characterize cross-modal correlations from an emotional distribution perspective. Moreover, a decoupled Deep Canonical Correlation Analysis approach is developed, incorporating discriminative sample-pairing criteria (matched/mismatched data discrimination) and separate modality-specific component extractors, which dynamically refine the feature projection space. To further enhance multimodal feature interaction, an Attention and Feature-Consistency-Loss-Based Multi-Task Learning Network is proposed. In addition, a feature-consistency loss function is introduced to impose joint constraints across dual semantic embeddings, ensuring both affective consistency and matching accuracy. Experiments on a self-collected benchmark dataset demonstrate that the proposed method achieves a mean absolute error of 0.109 in music–video matching score prediction, significantly outperforming existing approaches. Full article
(This article belongs to the Special Issue Recent Advances in Smart Mobile Sensing Technology)
Show Figures

Figure 1

29 pages, 5908 KB  
Article
Dual-Linear Attention Network for Multi-Object Tracking and Segmentation
by Yiqing Ren, Xuedong Wu and Haohao Fu
Appl. Sci. 2026, 16(1), 65; https://doi.org/10.3390/app16010065 - 20 Dec 2025
Viewed by 370
Abstract
Multi-object tracking and segmentation (MOTS) is a critical task in video analysis with applications spanning autonomous driving, robot navigation, and scene understanding. MOTS has made significant progress but still faces persistent challenges, such as crowded scenes, abnormal illumination, and small objects. Several trackers [...] Read more.
Multi-object tracking and segmentation (MOTS) is a critical task in video analysis with applications spanning autonomous driving, robot navigation, and scene understanding. MOTS has made significant progress but still faces persistent challenges, such as crowded scenes, abnormal illumination, and small objects. Several trackers have implemented attention mechanisms to overcome these difficulties. However, many attention mechanisms have quadratic computational complexity and use little spatio-temporal information. This paper proposes a Dual-Linear Attention Network (DLAN), a novel approach that effectively integrates both appearance and spatio-temporal information while maintaining linear attention complexity. DLAN employs recursive linear self-attention to strengthen the appearance representation and prototypical linear cross-attention to condense rich spatio-temporal information, which can compensate for missing pixel information. DLAN optimizes both image features and segmentation, with the refined segmentation guiding frame-level memory updates to improve instance consistency. Extensive experiments on BDD100K MOT, BDD100K MOTS, and KITTI MOTS datasets demonstrate the following: (1) The three main challenges of object occlusion, illumination variation, and distant objects have been successfully mitigated by integrating DLAN. (2) DLAN has achieved an overall competitive performance when compared to state-of-the-art trackers, with a 26% reduction in identity switches (IDS) when compared to QDTrack-mots-fix. Full article
(This article belongs to the Special Issue Advances in Autonomous Driving: Detection and Tracking)
Show Figures

Figure 1

24 pages, 911 KB  
Article
Lightweight Remote Sensing Image Change Caption with Hierarchical Distillation and Dual-Constrained Attention
by Xiude Wang, Xiaolan Xie and Zhongyi Zhai
Electronics 2026, 15(1), 17; https://doi.org/10.3390/electronics15010017 - 19 Dec 2025
Viewed by 303
Abstract
Remote sensing image change captioning (RSICC) fuses computer vision and natural language processing to translate visual differences between bi-temporal remote sensing images into interpretable text, with applications in environmental monitoring, urban planning, and disaster assessment. Multimodal Large Language Models (MLLMs) boost RSICC performance [...] Read more.
Remote sensing image change captioning (RSICC) fuses computer vision and natural language processing to translate visual differences between bi-temporal remote sensing images into interpretable text, with applications in environmental monitoring, urban planning, and disaster assessment. Multimodal Large Language Models (MLLMs) boost RSICC performance but suffer from inefficient inference due to massive parameters, whereas lightweight models enable fast inference yet lack generalization across diverse scenes, which creates a critical timeliness-generalization trade-off. To address this, we propose the Dual-Constrained Transformer (DCT), an end-to-end lightweight RSICC model with three core modules and a decoder. Full-Level Feature Distillation (FLFD) transfers hierarchical knowledge from a pre-trained Dinov3 teacher to a Generalizable Lightweight Visual Encoder (GLVE), enhancing generalization while retaining compactness. Key Change Region Adaptive Weighting (KCR-AW) generates Region Difference Weights (RDW) to emphasize critical changes and suppress backgrounds. Hierarchical encoding and Difference weight Constrained Attention (HDC-Attention) refine multi-scale features via hierarchical encoding and RDW-guided noise suppression; these features are fused by multi-head self-attention and fed into a Transformer decoder for accurate descriptions. The DCT resolves three core issues: lightweight encoder generalization, key change recognition, and multi-scale feature-text association noise, achieving a dynamic balance between inference efficiency and description quality. Experiments on the public LEVIR-CC dataset show our method attains SOTA among lightweight approaches and matches advanced MLLM-based methods with only 0.98% of their parameters. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

16 pages, 1381 KB  
Article
Dual Routing Mixture-of-Experts for Multi-Scale Representation Learning in Multimodal Emotion Recognition
by Da-Eun Chae and Seok-Pil Lee
Electronics 2025, 14(24), 4972; https://doi.org/10.3390/electronics14244972 - 18 Dec 2025
Viewed by 247
Abstract
Multimodal emotion recognition (MER) often relies on single-scale representations that fail to capture the hierarchical structure of emotional signals. This paper proposes a Dual Routing Mixture-of-Experts (MoE) model that dynamically selects between local (fine-grained) and global (contextual) representations extracted from speech and text [...] Read more.
Multimodal emotion recognition (MER) often relies on single-scale representations that fail to capture the hierarchical structure of emotional signals. This paper proposes a Dual Routing Mixture-of-Experts (MoE) model that dynamically selects between local (fine-grained) and global (contextual) representations extracted from speech and text encoders. The framework first obtains local–global embeddings using WavLM and RoBERTa, then employs a scale-aware routing mechanism to activate the most informative expert before bidirectional cross-attention fusion. Experiments on the IEMOCAP dataset show that the proposed model achieves stable performance across all folds, reaching an average unweighted accuracy (UA) of 75.27% and weighted accuracy (WA) of 74.09%. The model consistently outperforms single-scale baselines and simple concatenation methods, confirming the importance of dynamic multi-scale cue selection. Ablation studies highlight that neither local-only nor global-only representations are sufficient, while routing behavior analysis reveals emotion-dependent scale preferences—such as strong reliance on local acoustic cues for anger and global contextual cues for low-arousal emotions. These findings demonstrate that emotional expressions are inherently multi-scale and that scale-aware expert activation provides a principled approach beyond conventional single-scale fusion. Full article
Show Figures

Figure 1

20 pages, 3287 KB  
Article
Dual-Branch Superpixel and Class-Center Attention Network for Efficient Semantic Segmentation
by Yunting Zhang, Hongbin Yu, Haonan Wang, Mengru Zhou, Tao Zhang and Yeh-Cheng Chen
Sensors 2025, 25(24), 7637; https://doi.org/10.3390/s25247637 - 16 Dec 2025
Viewed by 330
Abstract
With the advancement of deep learning, image semantic segmentation has achieved remarkable progress. However, the complexity and real-time requirements of practical applications pose greater challenges for segmentation algorithms. To address these, we propose a dual-branch network guided by attention mechanisms that tackles common [...] Read more.
With the advancement of deep learning, image semantic segmentation has achieved remarkable progress. However, the complexity and real-time requirements of practical applications pose greater challenges for segmentation algorithms. To address these, we propose a dual-branch network guided by attention mechanisms that tackles common limitations in existing methods, such as coarse edge segmentation, insufficient contextual understanding, and high computational overhead. Specifically, we introduce a superpixel sampling weighting module that models pixel dependencies based on different regional affiliations, thereby enhancing the network’s sensitivity to object boundaries while preserving local features. Furthermore, a class-center attention module is designed to extract class-centered features and facilitate category-aware modeling. This module reduces the computational overhead and redundancy of traditional self-attention mechanisms, thereby improving the network’s global feature representation. Additionally, learnable parameters are employed to adaptively fuse features from both branches, enabling the network to better focus on critical information. We validate our method on three benchmark datasets (PASCAL VOC 2012, Cityscapes, and ADE20K) by comparing it with mainstream models including FCN, DeepLabV3+, and DANet, with evaluation metrics of mIoU and PA. Our method delivers superior segmentation performance in these experiments. These results underscore the effectiveness of the proposed algorithm in balancing segmentation accuracy and model efficiency. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

41 pages, 1635 KB  
Review
Photoresponsive TiO2/Graphene Hybrid Electrodes for Dual-Function Supercapacitors with Integrated Environmental Sensing Capabilities
by María C. Cotto, José Ducongé, Francisco Díaz, Iro García, Carlos Neira, Carmen Morant and Francisco Márquez
Batteries 2025, 11(12), 460; https://doi.org/10.3390/batteries11120460 - 15 Dec 2025
Viewed by 507
Abstract
This review critically examines photoresponsive supercapacitors based on TiO2/graphene hybrids, with a particular focus on their emerging dual role as energy-storage devices and environmental sensors. We first provide a concise overview of the electronic structure of TiO2 and the key [...] Read more.
This review critically examines photoresponsive supercapacitors based on TiO2/graphene hybrids, with a particular focus on their emerging dual role as energy-storage devices and environmental sensors. We first provide a concise overview of the electronic structure of TiO2 and the key attributes of graphene and related nanocarbons that enable efficient charge separation, transport, and interfacial engineering. We then summarize and compare reported device architectures and electrode designs, highlighting how morphology, graphene integration strategies, and illumination conditions govern specific capacitance, cycling stability, rate capability, and light-induced enhancement in performance. Particular attention is given to the underlying mechanisms of photo-induced capacitance enhancement—including photocarrier generation, interfacial polarization, and photodoping—and to how these processes can be exploited to embed sensing functionality in working supercapacitors. We review representative studies in which TiO2/graphene systems operate as capacitive sensors for humidity, gases, and volatile organic compounds, emphasizing quantitative figures of merit such as sensitivity, response/recovery times, and stability under repeated cycling. Finally, we outline current challenges in materials integration, device reliability, and benchmarking, and propose future research directions toward scalable, multifunctional TiO2/graphene platforms for self-powered and environmentally aware electronics. This work is intended as a state-of-the-art summary and critical guide for researchers developing next-generation photoresponsive supercapacitors with integrated sensing capability. Full article
Show Figures

Figure 1

Back to TopTop