Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (621)

Search Parameters:
Keywords = CNN-Transformer fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
40 pages, 9354 KB  
Article
Temporal Gradient Attention Residual Vector-Driven Fusion Network for Wind Direction Prediction
by Molaka Maruthi, Munisamy Shyamala Devi, Sujeen Song and Chang-Yong Yi
Appl. Sci. 2026, 16(7), 3337; https://doi.org/10.3390/app16073337 - 30 Mar 2026
Abstract
Accurate prediction of wind direction is a critical requirement for coastal safety management, renewable energy optimization, and weather-driven risk mitigation, particularly in highly dynamic atmospheric environments where statistical and deep learning models often struggle to capture nonlinear interactions and temporal dependencies. Existing approaches [...] Read more.
Accurate prediction of wind direction is a critical requirement for coastal safety management, renewable energy optimization, and weather-driven risk mitigation, particularly in highly dynamic atmospheric environments where statistical and deep learning models often struggle to capture nonlinear interactions and temporal dependencies. Existing approaches typically rely on raw or weakly processed meteorological inputs and treat directional information implicitly, which limits their ability to exploit the underlying physical structure of wind evolution. To address these challenges, this research designs a novel Physics Vector Driven (PVD) data pre-processing framework that explicitly encodes physically meaningful gradients and directional dynamics from multivariate meteorological observations, transforming raw measurements into sequence-aware vector representations suitable for deep time-series learning. Building on this foundation, a novel Directional Temporal Gradient Vector Network (DTGVectorNet) is proposed, which fuses a Directional Gradient Attention ResNet (DGResNet 1D CNN) for spatial-directional feature extraction with a Temporal Gradient LSTM (TGLSTM) designed to model the temporal evolution of wind vectors. The tight integration of Directional Gradient Attention (DGA) and Temporal Gradient (TG) memory enables the network to jointly learn instantaneous directional cues and their temporal propagation, significantly enhancing predictive fidelity. An experimental evaluation of the Busan wind datasets demonstrates that the proposed DTGVectorNet achieves a wind direction prediction accuracy of 99.12%, substantially outperforming conventional state-of-the-art baselines. These results confirm that physics-aware vector preprocessing combined with directional-temporal gradient fusion provides a powerful and generalizable paradigm for high-precision wind direction forecasting. To ensure reproducibility and facilitate further research, the complete dataset and implementation details of DTGVectorNet are publicly available through an open-access repository, Zenodo. Full article
Show Figures

Figure 1

22 pages, 7692 KB  
Article
SSF-TransUnet: Fine-Grained Crop Classification via Cross-Source Spatial Spectral Fusion
by Jian Yan, Xueke Chen, Rongrong Ren, Xiaofei Mi, Zhanliang Yuan, Jian Yang, Xianhong Meng, Zhenzhao Jiang, Hongbo Zhu and Yong Liu
Remote Sens. 2026, 18(7), 1034; https://doi.org/10.3390/rs18071034 - 30 Mar 2026
Abstract
Accurate exploitation of spatial structures and spectral characteristics is essential for fine-grained crop classification using remote sensing imagery. Although multi-source remote sensing data provide complementary information, most existing methods implicitly assume homogeneous data sources with consistent spatial resolution. In practice, high spatial resolution [...] Read more.
Accurate exploitation of spatial structures and spectral characteristics is essential for fine-grained crop classification using remote sensing imagery. Although multi-source remote sensing data provide complementary information, most existing methods implicitly assume homogeneous data sources with consistent spatial resolution. In practice, high spatial resolution and rich spectral information are usually provided by different sensors, making cross-source spatial–spectral fusion a non-trivial challenge. To address this issue, we propose SSF-TransUnet, a dual-branch spatial–spectral joint modeling framework for fine crop classification. The proposed network explicitly decouples spatial structure extraction and spectral discriminability learning by jointly utilizing high spatial resolution imagery and multi-spectral observations acquired from different satellite sensors within a unified architecture. To support model training and evaluation, we construct SSCR-Agri, a spatial–spectral complementary resolution agricultural dataset integrating meter-level GF-2 imagery and multi-spectral Sentinel-2 data from five representative agricultural regions in northern China, covering five crop categories including corn, rice, wheat, potato, and others. Extensive experiments demonstrate that SSF-TransUnet consistently outperforms representative CNN-based and hybrid CNN–Transformer models. The proposed method achieves an overall accuracy (OA) of 81.84% and a mean Intersection over Union (mIoU) of 0.6954 in fine-grained crop classification, effectively distinguishing crops. These results highlight the effectiveness of spatial–spectral joint modeling for high-resolution crop mapping and demonstrate its potential for precision agriculture and large-scale agricultural monitoring applications, and shows a promising mechanism when combined with multi-temporal observations. Full article
Show Figures

Figure 1

23 pages, 10440 KB  
Article
MIFMNet: A Multimodal Interactions and Fusion Mamba for RGBT Tracking with UAV Platforms
by Runze Guo, Xiaoyong Sun, Bei Sun, Hanxiang Qian, Zhaoyang Dang, Peida Zhou, Feiyang Liu and Shaojing Su
Remote Sens. 2026, 18(7), 1026; https://doi.org/10.3390/rs18071026 - 29 Mar 2026
Viewed by 57
Abstract
RGBT tracking holds irreplaceable value in unmanned aerial vehicle (UAV) ground observation missions, effectively supporting scenarios such as nighttime monitoring and low-altitude reconnaissance. However, existing frameworks based on CNNs or Transformers face inherent trade-offs between interaction capabilities and computational efficiency. Furthermore, current methods [...] Read more.
RGBT tracking holds irreplaceable value in unmanned aerial vehicle (UAV) ground observation missions, effectively supporting scenarios such as nighttime monitoring and low-altitude reconnaissance. However, existing frameworks based on CNNs or Transformers face inherent trade-offs between interaction capabilities and computational efficiency. Furthermore, current methods perform poorly in challenging scenarios involving target scale variations and rapid motion from UAV perspectives. To address these issues, this paper proposes a novel multimodal interaction and fusion Mamba network (MIFMNet), which achieves fundamental innovations relative to existing RGB-T fusion trackers and recent Mamba-based tracking methods. Different from existing RGB-T trackers that rely on CNN’s local convolution or Transformer’s quadratic-complexity self-attention for cross-modal fusion, MIFMNet departs from these architectures and designs modality-adaptive interaction mechanisms based on Mamba, fully leveraging the complementary information while resolving the efficiency-accuracy trade-off. Specifically, this paper designs the scale differential enhanced Mamba (SDEM), which expands the receptive field through multiscale parallel convolutions while amplifying complementary information via differential strategies to enhance feature responses to scale-varying objects. Furthermore, we propose flow-guided multilayer interaction Mamba (FMIM), which integrates inter-frame motion information into scanning prediction. This enables the network to adaptively adjust interaction priorities between shallow texture and high-level semantic features based on motion intensity, mitigating early information forgetting and enhancing robustness in dynamic scenes. Extensive experiments on four major benchmarks demonstrate that MIFMNet achieves state-of-the-art performance on precision and success rate, particularly excelling in UAV scenarios involving occlusion, scale variations, and rapid motion. Simultaneously, it achieves an inference speed of 35.3 FPS, enabling efficient deployment on resource-constrained platforms, thereby providing robust support for UAV applications of RGBT tracking. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

28 pages, 2486 KB  
Article
Physics-Guided Heterogeneous Dual-Path Adaptive Weighting Network: An Adaptive Framework for Fault Diagnosis of Air Conditioning Systems
by Ziyu Zhao, Caixia Wang, Xiangyu Jiang, Yanjie Zhao and Yongxing Song
Processes 2026, 14(7), 1101; https://doi.org/10.3390/pr14071101 - 29 Mar 2026
Viewed by 60
Abstract
Aiming to address the complex coupling of transient impulses and steady-state components in vibration signals of scroll compressors in air conditioning systems, this study proposes a physically driven heterogeneous dual-path adaptive weighting network (PDW-Net). The approach constructs a physics-inspired weighting module based on [...] Read more.
Aiming to address the complex coupling of transient impulses and steady-state components in vibration signals of scroll compressors in air conditioning systems, this study proposes a physically driven heterogeneous dual-path adaptive weighting network (PDW-Net). The approach constructs a physics-inspired weighting module based on kurtosis and energy criteria, enabling adaptive reconstruction of transient impulses and steady-state vibration components. Feature extraction and decision-level fusion are achieved through a heterogeneous dual-branch network comprising a Fast Fourier Transform (FFT)-based one-dimensional convolutional neural network (1D-CNN) and a Short-Time Fourier Transform (STFT)-based two-dimensional convolutional neural network (2D-CNN). In experimental validation covering four typical fault conditions—condenser failure, refrigerant deficiency, refrigerant overcharge, and main shaft wear—the PDW-Net achieved an average diagnostic accuracy of 97.87% (standard deviation: 2.60%), with 100% accuracy in identifying refrigerant deficiency and normal operating states, demonstrating significant superiority over existing mainstream methods. Ablation studies reveal that the adaptive weighting mechanism contributes most substantially to performance, as its removal results in a 34.24 percentage point drop in accuracy. Replacing the heterogeneous dual-branch structure with a homogeneous counterpart reduces accuracy by 16.18 percentage points, robustly validating the efficacy of the physics-guided and heterogeneous fusion design. Full article
(This article belongs to the Section Process Control and Monitoring)
25 pages, 4776 KB  
Article
FireMambaNet: A Multi-Scale Mamba Network for Tiny Fire Segmentation in Satellite Imagery
by Bo Song, Bo Li, Hong Huang, Zhiyong Zhang, Zhili Chen, Tao Yue and Yun Chen
Remote Sens. 2026, 18(7), 1021; https://doi.org/10.3390/rs18071021 - 29 Mar 2026
Viewed by 70
Abstract
Satellite remote sensing plays an essential role in wildfire monitoring due to its large-scale observation capability. However, fire targets in satellite imagery are typically extremely small, sparsely distributed, and embedded in complex backgrounds, making accurate segmentation highly challenging for existing methods. To address [...] Read more.
Satellite remote sensing plays an essential role in wildfire monitoring due to its large-scale observation capability. However, fire targets in satellite imagery are typically extremely small, sparsely distributed, and embedded in complex backgrounds, making accurate segmentation highly challenging for existing methods. To address these challenges, this paper proposes a multi-scale Mamba-based network for tiny fire segmentation, named FireMambaNet. The network adopts a nested U-shaped encoder-decoder architecture, primarily consisting of three modules: the Cross-layer Gated Residual U-shaped module (CG-RSU), the Fire-aware Directional Context Modulation module (FDCM), and the Multi-scale Mamba Attention Module (M2AM). The CG-RSU, as the core building block, adaptively suppresses background redundancy and enhances weak fire responses by extracting multi-scale features through cross-layer gating. The FDCM explicitly enhances the network’s ability to perceive anisotropic expansion features of fire points, such as those along the wind direction and terrain orientation, by modeling multi-directional context. The M2AM model employs a Mamba state-space model to suppress background interference through global context modeling during cross-scale feature fusion, while enhancing consistency among sparsely distributed tiny fire targets. In addition, experimental validation is conducted using two subsets from the Active Fire dataset, which have significant pixel-level sparse features: Oceania and Asia4. The results show that the proposed method significantly outperforms various mainstream CNN, Transformer, and Mamba baseline models on both datasets. It achieves an IoU of 88.51% and F1 score of 93.76% on the Oceania dataset, and an IoU of 85.65% and F1 score of 92.26% on the Asia4 dataset. Compared to the best-performing CNN baseline model, the IoU is improved by 1.81% and 2.07%, respectively. Overall, the FireMambaNet demonstrates significant advantages in detecting tiny fire points in complex backgrounds. Full article
Show Figures

Figure 1

19 pages, 1666 KB  
Article
MTLL: A Novel Multi-Task Learning Approach for Lymphocytic Leukemia Classification and Nucleus Segmentation
by Cuisi Ou, Zhigang Hu, Xinzheng Wang, Kaiwen Cao and Yipei Wang
Electronics 2026, 15(7), 1419; https://doi.org/10.3390/electronics15071419 - 28 Mar 2026
Viewed by 143
Abstract
Bone marrow cell classification and nucleus segmentation in microscopic images are fundamental tasks for computer-aided diagnosis of lymphocytic leukemia. However, bone marrow cells from different subtypes exhibit high morphological similarity, and structural information is often constrained under optical microscopic imaging, posing challenges for [...] Read more.
Bone marrow cell classification and nucleus segmentation in microscopic images are fundamental tasks for computer-aided diagnosis of lymphocytic leukemia. However, bone marrow cells from different subtypes exhibit high morphological similarity, and structural information is often constrained under optical microscopic imaging, posing challenges for stable and effective feature representation. To address this issue, we propose MTLL (Multitask Model on Lymphocytic Leukemia), a novel multitask approach that performs cell classification and nucleus segmentation within a unified network to exploit their complementary information. The model constructs a hybrid backbone for shared feature representation based on a CNN-Transformer architecture, in which Fuse-MBConv modules are tightly integrated with multilayer multi-scale transformers to enable deep fusion of local texture and global semantic information. For the segmentation branch, we design an AM (Atrous Multilayer Perceptron) decoder that combines atrous spatial pyramid pooling with multilayer perceptrons to fuse multi-scale information and accurately delineate nucleus boundaries. The classification branch incorporates prior knowledge of cell nuclei structures to capture subtle variations in cellular morphology and texture, thereby enhancing the model’s ability to distinguish between leukemia subtypes. Experimental results demonstrate that the MTLL model significantly outperforms existing advanced single-task and multi-task models in both lymphocytic leukemia classification and cell nucleus segmentation. These results validate the effectiveness of the multi-task feature-sharing strategy for lymphocytic leukemia diagnosis using bone marrow microscopic images. Full article
Show Figures

Figure 1

34 pages, 6554 KB  
Article
Syncretic Grad-CAM Integrated ViT-CNN Hybrids with Inherent Explainability for Early Thyroid Cancer Diagnosis from Ultrasound
by Ahmed Y. Alhafdhi, Gibrael Abosamra and Abdulrhman M. Alshareef
Diagnostics 2026, 16(7), 999; https://doi.org/10.3390/diagnostics16070999 - 26 Mar 2026
Viewed by 159
Abstract
Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, [...] Read more.
Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, many approaches focus on local tissue and provide limited, non-quantitative interpretation, reducing clinical confidence. This study proposes an integrated framework combining enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E) to integrate local feature and global relational context during learning, rather than delayed integration. Methods: The proposed framework integrates enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E), enabling simultaneous learning of local feature representations and global relational context. This design allows feature fusion during the learning stage instead of delayed integration, aiming to improve diagnostic performance and interpretability in thyroid ultrasound image analysis. Results: The best-performing model, ViT-E–DenseNet169, achieved 98.5% accuracy, 98.9% sensitivity, 99.15% specificity, and 97.35% AUC, surpassing the robust basic hybrid model (CNN–XGBoost/ANN) and existing systems. A second contribution is improved interpretability, moving from mere illustration to validation. Gradient-weighted class activation mapping (Grad-CAM) maps demonstrated distinct and clinically understandable concentration patterns across various thyroid cancers: precise intralesional concentration for high-confidence malignancies (PTC = 0.968), edge/interface concentration for capsule risk patterns (PTC = 0.957), and broader-field activation consistent with infiltration concerns (PTC = 0.984), while benign scans showed low and diffuse activation (PTC = 0.002). Spatial audits reinforced this behavior (IoU/PAP: 0.72/91%, 0.65/78%, 0.58/62%). Conclusions: The integrated ViT-E–DenseNet169 framework provides highly accurate thyroid cancer detection while offering clinically meaningful interpretability through Grad-CAM-based spatial validation, supporting improved confidence in AI-assisted ultrasound diagnosis. Full article
(This article belongs to the Special Issue Deep Learning Techniques for Medical Image Analysis)
Show Figures

Figure 1

29 pages, 1942 KB  
Article
Lightweight CNN–Mamba Hybrid Network for Multi-Scale Concrete Crack Segmentation Using Vision Sensors
by Jinfu Guan, Linzhao Cui, Yanjun Chen, Chenglin Yang, Jingwu Wang and Yinuo Huo
Electronics 2026, 15(7), 1362; https://doi.org/10.3390/electronics15071362 - 25 Mar 2026
Viewed by 243
Abstract
Surface cracking is a key visible indicator of deterioration in concrete infrastructure and is routinely captured by vision sensors during field inspections. To translate inspection imagery into actionable maintenance information, crack delineation must be accurate at the pixel level and robust to challenging [...] Read more.
Surface cracking is a key visible indicator of deterioration in concrete infrastructure and is routinely captured by vision sensors during field inspections. To translate inspection imagery into actionable maintenance information, crack delineation must be accurate at the pixel level and robust to challenging conditions where cracks are slender, discontinuous, low-contrast, and easily confused with joints, stains, texture patterns, and illumination artifacts. This study proposes a lightweight CNN–Mamba hybrid segmentation framework built upon Vm-unet for reliable crack mapping under heterogeneous inspection scenarios and resource-constrained deployment. The framework couples boundary-sensitive convolutional features with long-range state-space representations via a spatially modulated convolution design, refines skip-connection features using reciprocal co-modulation attention to suppress background interference, and enhances cross-scale interactions through a decoder interaction fusion scheme to preserve fine-crack continuity and sharp boundaries. Experiments on a multi-source composite dataset and public benchmarks show consistent improvements over representative CNN-, Transformer-, and Mamba-based baselines. The proposed method achieves 80.11% mIoU and 82.05% Dice on the composite dataset, while maintaining an efficient accuracy–cost trade-off (36.049 GFLOPs, 25.991 M parameters). The resulting crack masks provide a dependable basis for inspection-driven quantitative assessment and maintenance decision support. Full article
Show Figures

Figure 1

18 pages, 984 KB  
Article
Deep Multimodal Learning for Heart Sound Classification Using CNN, Transformer, and BiLSTM with Attention
by Ilyas Ait Ichou, Samir Elouaham, Boujemaa Nassiri and Jamal Isknan
Symmetry 2026, 18(4), 556; https://doi.org/10.3390/sym18040556 - 25 Mar 2026
Viewed by 240
Abstract
Phonocardiogram (PCG) signals offer a non-invasive, low-cost screening tool for cardiovascular diseases. However, their noisy and non-stationary nature makes automated classification challenging, and traditional methods often fail to capture complex spectral-temporal patterns. This study proposes a multimodal deep learning architecture for the binary [...] Read more.
Phonocardiogram (PCG) signals offer a non-invasive, low-cost screening tool for cardiovascular diseases. However, their noisy and non-stationary nature makes automated classification challenging, and traditional methods often fail to capture complex spectral-temporal patterns. This study proposes a multimodal deep learning architecture for the binary classification of heart sounds (Healthy vs. Unhealthy). The hybrid model integrates Convolutional Neural Networks (CNNs), Transformer encoders, and Bidirectional Long Short-Term Memory (BiLSTM) networks with an attention mechanism. It utilizes an early-fusion feature extraction pipeline combining MFCCs, Mel-spectrograms, and Chroma descriptors. To ensure robust evaluation and prevent data leakage, SMOTE is applied exclusively to the training folds within a strict zero-leakage, patient-wise 5-fold cross-validation protocol. The proposed framework demonstrates exceptional performance, achieving an average accuracy of 91.67%, a sensitivity of 80.95%, a specificity of 94.46%, and an AUC-ROC of 96.50%. An ablation study confirms that integrating Transformer and BiLSTM modules significantly enhances diagnostic stability over baseline CNNs. Furthermore, with exactly 858,434 parameters (3.27 MB) and interpretable attention maps, this highly optimized model provides a robust assistive solution suitable for deployment in digital stethoscopes and mobile telemedicine systems. Full article
Show Figures

Figure 1

20 pages, 4497 KB  
Article
Remote Sensing Identification of Benggang Using a Two-Stream Network with Multimodal Feature Enhancement and Sparse Attention
by Xuli Rao, Qihao Chen, Kexin Zhu, Zhide Chen, Jinshi Lin and Yanhe Huang
Electronics 2026, 15(6), 1331; https://doi.org/10.3390/electronics15061331 - 23 Mar 2026
Viewed by 164
Abstract
Benggang (Benggang), a typical landform characterized by severe erosion and a geohazard in the red-soil hilly regions of southern China, is characterized by a fragmented texture, irregular boundaries, and high similarity to background objects such as bare soil and roads, which poses a [...] Read more.
Benggang (Benggang), a typical landform characterized by severe erosion and a geohazard in the red-soil hilly regions of southern China, is characterized by a fragmented texture, irregular boundaries, and high similarity to background objects such as bare soil and roads, which poses a dual challenge of “multiscale variability + strong noise” for automated identification at regional scales. To address insufficient information from a single modality and the limited representation of cross-scale features, this study proposes a dual-stream feature-fusion network (DF-Net) for multisource data consisting of a digital orthophoto map (DOM) and a digital elevation model (DEM). The method adopts ResNeSt50d as the backbone of the two branches: on the DOM side, a Canny-edge channel is stacked to enhance high-frequency boundary information; on the DEM side, derived terrain factors, including slope, aspect, curvature, and hillshade, are introduced to provide morphological constraints. In the cross-modal fusion stage, a multiscale sparse attention fusion module is designed, which acquires contextual information via multiwindow average pooling and suppresses noise interference through top-K sparsification. In the decision stage, a multibranch ensemble is employed to improve classification stability. Taking Anxi County, Fujian Province, as the study area, a coregistered dataset of GF-2 (1 m) DOM and ALOS (12.5 m) DEMs is constructed, and a zonal partitioning strategy is adopted to evaluate the model’s generalization ability. The experimental results show that DF-Net achieves 97.44% accuracy, 85.71% recall, and an 82.98% F1 score in the independent test zone, outperforming multiple mainstream CNN/transformer classification models. This study indicates that the strategy of “multimodal feature enhancement + sparse attention fusion” tailored to Benggang erosional landforms can significantly improve recognition performance under complex backgrounds, providing technical support for rapid Benggang surveys and governance-effectiveness assessments. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

19 pages, 13660 KB  
Article
CA-GFNet: A Cross-Modal Adaptive Gated Fusion Network for Facial Emotion Recognition
by Sitara Afzal and Jong-Ha Lee
Mathematics 2026, 14(6), 1068; https://doi.org/10.3390/math14061068 - 21 Mar 2026
Viewed by 182
Abstract
Facial emotion recognition (FER) plays an important role in healthcare, human–computer interaction, and intelligent security systems. However, despite recent advances, many state-of-the-art FER methods depend on computationally intensive CNN or transformer backbones and large-scale annotated datasets while suffering noticeable performance degradation under cross-dataset [...] Read more.
Facial emotion recognition (FER) plays an important role in healthcare, human–computer interaction, and intelligent security systems. However, despite recent advances, many state-of-the-art FER methods depend on computationally intensive CNN or transformer backbones and large-scale annotated datasets while suffering noticeable performance degradation under cross-dataset evaluation because of domain shift. These limitations hinder practical usage in resource-constrained and real-world environments. To address this issue, we propose Cross-Adaptive Gated Fusion Network (CA-GFNet), a lightweight dual-stream FER framework that explicitly combines shallow structural features with deep semantic representations. The proposed architecture integrates domain-robust gradient-based descriptors with compact deep features extracted from a VGG-based backbone. After face detection and normalization, the structural stream captures fine-grained local appearance cues, whereas the semantic stream encodes high-level facial configurations. The two feature streams are projected into a shared latent space and adaptively fused using a gated fusion mechanism that learns sample-specific weights, allowing the model to prioritize the more reliable feature source under dataset shift. Extensive experiments on KDEF along with zero-shot cross-dataset evaluation on CK+ using a strict train-on-KDEF/test-on-CK+ protocol with subject-independent splits demonstrate the effectiveness of the proposed method. CA-GFNet achieves 99.30% accuracy on KDEF and 98.98% on CK+ while requiring significantly fewer parameters than conventional deep FER models. These results confirm that adaptive gated fusion of shallow and deep features can deliver both high recognition accuracy and strong cross-dataset robustness. Full article
(This article belongs to the Special Issue Advanced Algorithms in Multimodal Affective Computing)
Show Figures

Figure 1

22 pages, 3299 KB  
Article
DualStream-RTNet: A Multimodal Deep Learning Framework for Grape Cultivar Classification and Soluble Solid Content Prediction
by Zhiguo Liu, Yufei Song, Aoran Liu, Xi Meng, Chang Liu, Shanshan Li, Xiangqing Wang and Guifa Teng
Foods 2026, 15(6), 1095; https://doi.org/10.3390/foods15061095 - 20 Mar 2026
Viewed by 242
Abstract
Accurate and non-destructive evaluation of grape quality is crucial for intelligent viticulture, yet most existing approaches address cultivar classification and soluble solid content (SSC) prediction as independent tasks based on single-modality data, limiting robustness and practical applicability. This study proposes DualStream-RTNet, a unified [...] Read more.
Accurate and non-destructive evaluation of grape quality is crucial for intelligent viticulture, yet most existing approaches address cultivar classification and soluble solid content (SSC) prediction as independent tasks based on single-modality data, limiting robustness and practical applicability. This study proposes DualStream-RTNet, a unified multimodal deep learning framework that simultaneously performs grape cultivar classification and SSC prediction by integrating RGB-HSV fused images and PCA-compressed hyperspectral spectra. The dual-stream architecture enables the complementary learning of external chromatic–textural cues and internal physicochemical information, while a Transformer-enhanced fusion module strengthens global representation and cross-modal correlation. A dataset of 864 berries from five grape cultivars was used to validate the model. DualStream-RTNet achieved 93.64% classification accuracy, outperforming ResNet18 and other CNN baselines, and produced more compact and consistent confusion-matrix patterns. For SSC prediction, it consistently yielded the highest performance across cultivars, with R2p values up to 0.9693 and RMSE as low as 0.2567, surpassing the PLSR, SVR, LSTM, and Transformer regression models. These results demonstrate the superiority of the proposed framework in capturing both visual and spectral characteristics. DualStream-RTNet provides an efficient and scalable solution for comprehensive grape quality assessment, offering strong potential for real-time sorting, precision grading, and smart agricultural applications. Full article
(This article belongs to the Section Food Engineering and Technology)
Show Figures

Figure 1

24 pages, 5799 KB  
Article
Robust Offshore Wind Power Forecasting Under Extreme Marine Conditions Using Multi-Source Feature Fusion and Kolmogorov–Arnold Networks
by Tongbo Zhu, Fan Cai and Dongdong Chen
J. Mar. Sci. Eng. 2026, 14(6), 573; https://doi.org/10.3390/jmse14060573 - 19 Mar 2026
Viewed by 163
Abstract
With the increasing penetration of offshore wind power, extreme marine conditions pose significant challenges to forecasting accuracy and grid stability. To address this issue, this study proposes a robust offshore wind power forecasting framework based on multi-source feature fusion and a hybrid TCN–BiLSTM–KAN [...] Read more.
With the increasing penetration of offshore wind power, extreme marine conditions pose significant challenges to forecasting accuracy and grid stability. To address this issue, this study proposes a robust offshore wind power forecasting framework based on multi-source feature fusion and a hybrid TCN–BiLSTM–KAN architecture. Specifically, a Temporal Convolutional Network (TCN) is employed to extract local multi-scale temporal features and suppress high-frequency disturbances, followed by a Bidirectional Long Short-Term Memory (BiLSTM) network to capture long-term temporal dependencies. A Kolmogorov–Arnold Network (KAN) is further integrated as a nonlinear mapping module to approximate complex dynamics under extreme marine conditions. The model is validated using a real-world offshore wind power dataset with a 15 min forecasting horizon, where balanced samples are constructed across different operating conditions. Experimental results demonstrate that, under extreme conditions, the proposed model achieves an RMSE of 3.58 MW and an R2 of 97.84%, with RMSE reductions of 56.8% and 42.3% compared to CNN-BiLSTM and Transformer-KAN, respectively. Furthermore, cross-site validation confirms that the model maintains stable predictive performance, indicating its preliminary spatial generalization capability. Overall, the proposed framework provides an effective solution for enhancing forecasting reliability and supporting secure grid integration of offshore wind power under extreme marine environments. Full article
(This article belongs to the Section Marine Energy)
Show Figures

Figure 1

23 pages, 2351 KB  
Article
A Transformer–CNN Dual-Branch Image Classification Model—Cross-Layer Semantic Interaction and Discriminative Feature Enhancement Algorithm
by Longyan Qin, Hong Bao and Fanghua Liu
Symmetry 2026, 18(3), 527; https://doi.org/10.3390/sym18030527 - 19 Mar 2026
Viewed by 133
Abstract
PCB defect images suffer from tiny defects, subtle morphological differences and complex background wiring, making traditional single-feature classification unstable. This paper proposes a dual-branch image classification method combining a Transformer and CNN, which jointly models local anomalies and global semantic relationships. The model [...] Read more.
PCB defect images suffer from tiny defects, subtle morphological differences and complex background wiring, making traditional single-feature classification unstable. This paper proposes a dual-branch image classification method combining a Transformer and CNN, which jointly models local anomalies and global semantic relationships. The model uses a convolutional branch and a Transformer branch to extract local defect features and global wiring dependencies, respectively. A cross-layer semantic interaction mechanism is adopted for multi-level information fusion, and a discriminative feature enhancement module is applied to highlight key defect regions and suppress background interference. Experiments show that the model improves overall accuracy by over 2%, with an F1-score of 0.930 and defect identification coverage of 0.927. It performs stably across different defect types and background complexities without obvious bias, providing new insights for hybrid deep model design in industrial defect image classification. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

26 pages, 3519 KB  
Article
Subject-Independent Depression Recognition from EEG Using an Improved Bidirectional LSTM with Dynamic Vector Routing
by Ziqi Ji, Kunye Liu, Weikai Ma, Xiaolin Ning and Yang Gao
Bioengineering 2026, 13(3), 358; https://doi.org/10.3390/bioengineering13030358 - 19 Mar 2026
Viewed by 445
Abstract
Electroencephalography (EEG) has become an increasingly important tool in depression research due to its ability to capture objective neurophysiological abnormalities associated with depressive disorders, offering high temporal resolution, non-invasiveness, and cost-effectiveness.However, existing methods often fail to fully exploit the multi-domain information in EEG [...] Read more.
Electroencephalography (EEG) has become an increasingly important tool in depression research due to its ability to capture objective neurophysiological abnormalities associated with depressive disorders, offering high temporal resolution, non-invasiveness, and cost-effectiveness.However, existing methods often fail to fully exploit the multi-domain information in EEG signals, resulting in limited model generalization capabilities. This paper proposes an improved bidirectional long short-term memory (BiLSTM) model that segments continuous EEG into non-overlapping 2-s epochs and learns end-to-end from multi-channel temporal sequences. After band-pass filtering and resampling, each epoch is represented as a channel–time matrix XRC×T (with C = 128) and processed by a BiLSTM encoder followed by a dynamic-routing encapsulated-vector classifier. On the MODMA dataset under subject-independent five-fold cross-validation, the proposed method outperforms a set of reproduced representative baselines (SVM, EEGNet, InceptionNet, Self-attention-CNN and CNN–LSTM) and achieves 84.8% accuracy with an AUC of 0.899. We further discuss recent contemporary directions (e.g., attention/Transformer-based and emotion-aware expert models) and clarify the scope of our empirical comparisons. Furthermore, experiments comparing different frequency bands and band combinations indicate that joint multi-frequency input can enhance classification performance. This study provides an effective multi-domain fusion approach for the automatic diagnosis of depression based on EEG. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Graphical abstract

Back to TopTop