Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,678)

Search Parameters:
Keywords = feature-level fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 920 KB  
Article
Early Detection of Fake News via Structured Social Interaction Simulation and Hierarchical Cross-Modal Fusion
by Ruihua Qi, Shuqin Chen, Weilong Li, Chenwei Zhang, Jiatai Lei, Haobo Lv and Yunhao Sun
Appl. Sci. 2026, 16(12), 6001; https://doi.org/10.3390/app16126001 (registering DOI) - 13 Jun 2026
Abstract
The widespread dissemination and societal impact of fake news underscore the critical need for effective detection. Existing methods remain limited, as they often fail to learn joint representations from multi-modal data and rely heavily on complete social interaction signals. Such information is frequently [...] Read more.
The widespread dissemination and societal impact of fake news underscore the critical need for effective detection. Existing methods remain limited, as they often fail to learn joint representations from multi-modal data and rely heavily on complete social interaction signals. Such information is frequently unavailable in practice, especially during the early propagation stages. To address early fake news detection in social media, this paper proposes a hierarchical cross-modal fusion framework with structured LLM-simulated social interaction (HCF-LSIM). The framework employs a progressive cross-modal attention mechanism to systematically align semantic representations across multiple levels, integrating textual, thematic, and visual features. Additionally, HCF-LSIM designs an LLM-powered social interaction simulator that generates structured triplets from adapted user profiles, effectively compensating for missing real-time interaction data. Experiments on public benchmarks demonstrate strong performance, with accuracies of 93.5% on Weibo and 87.2% on X (formerly Twitter), ranking first on Weibo and second on Twitter. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
18 pages, 1484 KB  
Article
CLIP-BEV: A Late-Fusion Framework for Multimodal Scene Understanding Using Vision Language Models
by Fatemeh Daraee, Saeed Mozaffari and Shahpour Alirezaee
Electronics 2026, 15(12), 2615; https://doi.org/10.3390/electronics15122615 (registering DOI) - 13 Jun 2026
Abstract
Scene understanding is a fundamental task in autonomous driving, requiring effective integration of semantic and geometric information from heterogeneous sensors. Although vision–language models (VLMs) provide powerful semantic representations, their integration with LiDAR-based geometric perception remains challenging. This paper proposes a multimodal late-fusion framework [...] Read more.
Scene understanding is a fundamental task in autonomous driving, requiring effective integration of semantic and geometric information from heterogeneous sensors. Although vision–language models (VLMs) provide powerful semantic representations, their integration with LiDAR-based geometric perception remains challenging. This paper proposes a multimodal late-fusion framework for multi-label scene classification that combines semantic embeddings extracted from camera images using a frozen CLIP (ViT-B/32) encoder with geometric features derived from LiDAR Bird’s-Eye-View (BEV) representations. To improve multimodal compatibility, modality-specific adaptation networks are employed to refine visual and geometric features before fusion. The proposed framework was evaluated on an annotated subset of the nuScenes dataset containing synchronized camera–LiDAR samples and nine scene-level labels. Experimental results show that the proposed late-fusion architecture outperforms both unimodal and early-fusion baselines, achieving a Hamming Accuracy of 0.950, a Micro-F1 score of 0.925, and a mean Average Precision (mAP) of 0.908. Additional experiments using a CLIP-based early-fusion baseline demonstrate that the observed performance gains are primarily attributable to the proposed modality-specific refinement and late-fusion strategy rather than the visual encoder alone. These findings indicate that modality-aware late fusion of pretrained semantic representations and LiDAR geometric information provides an effective and scalable solution for multimodal perception in autonomous driving. Full article
(This article belongs to the Special Issue Automated Driving Systems: Latest Advances and Prospects)
Show Figures

Figure 1

24 pages, 1936 KB  
Article
Warehouse Fire Detection System Based on Multi-Sensor Information Fusion
by Ziqiang Zhang, Yuxuan Ye, Xiaodong Wang, Xinqi Zhi, Xinpeng Zhang and Mingxing Zhang
Sensors 2026, 26(12), 3763; https://doi.org/10.3390/s26123763 (registering DOI) - 12 Jun 2026
Abstract
To address the problems of false negatives, false positives, and delayed response in traditional fire detection systems, this paper proposes a warehouse fire detection scheme based on multi-sensor information fusion. By constructing a ZigBee wireless sensor network and integrating temperature, CO concentration and [...] Read more.
To address the problems of false negatives, false positives, and delayed response in traditional fire detection systems, this paper proposes a warehouse fire detection scheme based on multi-sensor information fusion. By constructing a ZigBee wireless sensor network and integrating temperature, CO concentration and smoke sensors, fire simulation data are collected in the warehouse. At the data processing level, an improved Grubbs criterion is innovatively adopted to eliminate outliers, and the median is used instead of the average to effectively suppress the same-side shielding effect. At the feature layer fusion stage, a BP neural network model optimized by the cosine decreasing inertia weight particle swarm optimization algorithm (CIW-PSO) is designed. By dynamically adjusting the learning factors (c1, c2) and inertia weight (w), the convergence speed and global optimization ability are significantly improved. At the decision-making level, a fuzzy logic reasoning mechanism is introduced to integrate multi-parameter membership functions, thereby reducing the probability of misjudgment. Field tests have verified that the system can achieve early fire warning in a 50 m × 100 m warehouse environment, with a false alarm rate reduced by 42% compared to a single sensor and a response time shortened by 35%, providing an efficient and reliable intelligent solution for warehouse fire safety. Full article
(This article belongs to the Section Industrial Sensors)
32 pages, 7334 KB  
Article
Text Semantic Guided Spatial–Frequency Fusion Network for HSI–LiDAR Land-Cover Classification
by Aili Wang, Manman Yao, Haoran Lv and Haisong Chen
Remote Sens. 2026, 18(12), 1957; https://doi.org/10.3390/rs18121957 (registering DOI) - 12 Jun 2026
Abstract
Joint classification of hyperspectral images (HSI) and light detection and ranging (LiDAR) data is important for land-cover recognition, as it can exploit both spectral discrimination and structural elevation information. However, existing methods mainly focus on visual feature fusion and insufficiently utilize class-level semantic [...] Read more.
Joint classification of hyperspectral images (HSI) and light detection and ranging (LiDAR) data is important for land-cover recognition, as it can exploit both spectral discrimination and structural elevation information. However, existing methods mainly focus on visual feature fusion and insufficiently utilize class-level semantic priors, which limits their discriminative capability in complex boundaries, visually similar categories, and limited-sample scenarios. To address these issues, this paper proposes a text-guided multimodal semantic fusion network for HSI–LiDAR classification. Specifically, a Channel-Modulated Mobile Convolution Module (CMMC) is designed to extract modality-specific features, a Spatial–Frequency Feature Enhancement Module (SFFE) is introduced to enhance spatial-boundary and frequency-domain structural representations, and a Bidirectional Cross-Modal Fusion Module (BCMF) is developed to promote complementary interaction between spectral and structural information. Meanwhile, class-level textual descriptions are constructed from class names, color attributes, and geographical contexts, and a text encoder is employed to obtain semantic prototypes. Furthermore, a multi-branch vision–text semantic alignment mechanism projects HSI features, LiDAR features, and fused features into a shared semantic space for joint constraints, improving semantic consistency and class separability. Experiments on the Houston2013, Augsburg, and Trento datasets demonstrate the effectiveness of the proposed method. It achieves an overall accuracy of 98.76% on Houston2013, with improvements of 0.62%, 0.52%, and 0.67 in overall accuracy, average accuracy, and Kappa coefficient × 100 over the best competing results, respectively. The proposed method also obtains the best overall metrics on Augsburg and Trento, and ablation studies verify the effectiveness of the proposed components. Full article
15 pages, 1682 KB  
Article
Discrepancy-Guided Complementary Fusion for Unsupervised Multimodal Anomaly Detection
by Taehui Lee, Seyoung Jeong and Sang Jun Lee
Sensors 2026, 26(12), 3757; https://doi.org/10.3390/s26123757 (registering DOI) - 12 Jun 2026
Abstract
In industrial inspection, subtle defects often appear as local variations in appearance or geometry, making reliable anomaly detection challenging. A single sensing modality can miss important defect cues, while multimodal inspection combines appearance and geometric information to represent industrial objects more comprehensively. Many [...] Read more.
In industrial inspection, subtle defects often appear as local variations in appearance or geometry, making reliable anomaly detection challenging. A single sensing modality can miss important defect cues, while multimodal inspection combines appearance and geometric information to represent industrial objects more comprehensively. Many existing multimodal anomaly detection methods adopt early fusion strategies that integrate features at an early stage of the network. Such early integration can dilute modality-specific anomaly responses and cause anomaly smoothing, leading to degraded detection and localization performance. To address these challenges, we propose a reconstruction-based unsupervised multimodal anomaly detection framework integrating Discrepancy-Guided Complementary Fusion (DGCF) and Noise to Feature (N2F). Specifically, DGCF reduces anomaly smoothing by exploiting cross-modal discrepancies to extract complementary information, rather than directly summing or concatenating features from different modalities. Furthermore, N2F injects Gaussian noise into the feature space to regularize feature reconstruction and encourage the decoder to learn robust normal representations. Experimental results on the MVTec 3D-AD and Eyecandies datasets demonstrate the effectiveness of the proposed method. The proposed method achieves 97.3% I-AUROC, 99.6% P-AUROC, and 97.6% AUPRO on MVTec 3D-AD, and 94.8% I-AUROC, 98.6% P-AUROC, and 93.4% AUPRO on Eyecandies. Full article
26 pages, 3315 KB  
Article
Remote Tower Air Traffic Controller Fatigue Detection Based on Eye-Tracking and EEG Fusion
by Dajiang Song, Weijun Pan, Zirui Yin, Boyuan Han and Huafei Gao
Aerospace 2026, 13(6), 549; https://doi.org/10.3390/aerospace13060549 (registering DOI) - 12 Jun 2026
Abstract
Remote tower operations require air traffic controllers to maintain continuous visual monitoring and integrate information from panoramic displays, radar data, flight strips, and voice communication. Such screen-mediated and sustained surveillance tasks may lead to covert fatigue, which is difficult to capture using a [...] Read more.
Remote tower operations require air traffic controllers to maintain continuous visual monitoring and integrate information from panoramic displays, radar data, flight strips, and voice communication. Such screen-mediated and sustained surveillance tasks may lead to covert fatigue, which is difficult to capture using a single physiological or behavioral signal. To address this issue, this study proposes a Gated EEG–Eye Fusion Network (GEEF-Net) for window-level fatigue detection in remote tower controllers. EEG and eye-tracking signals were synchronously collected during simulated remote tower tasks and segmented into 5 s windows with a 2 s step. For each window, 53 EEG features and 47 eye-tracking features were extracted to construct a 100-dimensional multimodal representation. GEEF-Net adopts a lightweight modality-gating mechanism to adaptively weight EEG and eye-tracking representations before fatigue classification. Under the main subject-dependent validation setting, GEEF-Net achieved an Accuracy of 0.883, an F1-score of 0.788, and a ROC-AUC of 0.944, outperforming EEG-only, eye-only, and early-fusion baselines in most overall metrics. The gating analysis indicated that eye-tracking features received a higher average weight than EEG features, suggesting the importance of visual behavior in remote tower fatigue detection. Cross-subject validation showed that individual differences remain a major challenge, while few-shot subject-specific calibration improved model adaptation when limited target-subject samples were available. These findings suggest that EEG–eye-tracking fusion with lightweight modality gating is a feasible approach for fatigue detection in simulated remote tower tasks. However, larger datasets and operationally realistic validation considering shift work, circadian effects, and operational pressure are still required before the approach can be considered operationally reliable. Full article
(This article belongs to the Section Air Traffic and Transportation)
Show Figures

Figure 1

18 pages, 7317 KB  
Article
ASM-DBNet: Introducing Adaptive Differentiable Binarization, Spatial-Channel Self-Attention and Multi-Scale Context-Enhanced Dynamic Upsampling for Natural Scene Text Detection
by Xiaoliang Qian, Pengfei Wang, Li Zeng, Mengyang Chen, Wandian Chen, Jinchao Guo and Yanfang Mao
Information 2026, 17(6), 585; https://doi.org/10.3390/info17060585 - 12 Jun 2026
Abstract
Text detection models based on DBNet have demonstrated strong performance in natural scene text detection. However, these models still suffer from the following three issues. Firstly, the amplifying factor hyperparameter in the differentiable binarization (DB) makes it difficult for the text detection model [...] Read more.
Text detection models based on DBNet have demonstrated strong performance in natural scene text detection. However, these models still suffer from the following three issues. Firstly, the amplifying factor hyperparameter in the differentiable binarization (DB) makes it difficult for the text detection model to achieve optimal performance. Secondly, the integration of low-level and high-level features within the backbone’s feature pyramid lacks specific optimization strategies. Thirdly, the deconvolution operation in the prediction head may damage text contours. To tackle the aforementioned issues, this paper presents a text detection model termed ASM-DBNet, which mainly consists of three innovations. For the first issue, an adaptive differentiable binarization (ADB) scheme is proposed. It can independently predict amplifying factor for feature points at different spatial locations and replace the original amplifying factor hyperparameter, thereby improving the overall optimization performance of the model. For the second issue, a spatial-channel self-attention (SCA) module is proposed to optimize the fusion of high-level and low-level features. On the one hand, spatial self-attention is used to enhance the spatial localization ability of high-level features; on the other hand, channel self-attention based on a grouped transformer is used to optimize the fusion results of high-level and low-level features. For the third issue, a multi-scale context-enhanced dynamic upsampling (MC-DyUpS) module is proposed to replace the deconvolution operation in the prediction head. It enhances contextual perception in the region of interpolation points through multi-scale context feature extraction, and then accurately predicts coordinate offsets of interpolation points. The position correction based on these offsets effectively suppresses the spatial deviation caused by deconvolution. Ablation studies demonstrate the effectiveness of the SCA module, MC-DyUpS module, ADB scheme, and their arbitrary combinations. Comprehensive quantitative evaluations demonstrate that ASM-DBNet achieves competitive F1-scores of 84.1%, 84.2%, and 85.7% on the ICDAR 2015, Total-Text, and MSRA-TD500 datasets, respectively, with improvements of 1.8%, 1.4%, and 2.9% over the baseline model. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

25 pages, 2771 KB  
Article
Data Fusion and Machine Learning for Diagnosing Electrical and Mechanical Faults in BLDC Motors
by Marek Karbowniczyn and Jerzy Baranowski
Machines 2026, 14(6), 680; https://doi.org/10.3390/machines14060680 (registering DOI) - 11 Jun 2026
Viewed by 47
Abstract
One of the main challenges in BLDC motor diagnostics is the identification of faults with different physical origins, especially in mixed states where the symptoms of multiple faults may overlap. In this work, a classification system based on feature-level data fusion was developed [...] Read more.
One of the main challenges in BLDC motor diagnostics is the identification of faults with different physical origins, especially in mixed states where the symptoms of multiple faults may overlap. In this work, a classification system based on feature-level data fusion was developed by combining current and rotational signals. A homogeneous Stacking Ensemble model was used as the main mechanism for fault classification. The study was conducted on a dataset of 184 samples representing four operating conditions: healthy operation, mechanical faults, electrical faults associated with permanent magnet degradation, and their combined occurrence. The stability of the proposed classifier was evaluated using ten different data splits. The experiments showed that omitting PCA preserves more diagnostically relevant information contained in the raw features, resulting in a classification accuracy of 97.3% with a standard deviation of 0.017. PCA consistently reduced performance across all considered data modalities. The model was further analysed using SHAP, indicating that its decisions were driven by physically interpretable features from both the rotational and current domains. Full article
(This article belongs to the Section Machine Design and Theory)
22 pages, 27674 KB  
Article
SIRI-YOLO: A Foreign Object Detection Method for Belt Conveyors in High-Entropy Underground Scenes
by Yi Liu, Yi Liu, Rengang Xue, Zixian Zhao and Jinping Xiao
Entropy 2026, 28(6), 673; https://doi.org/10.3390/e28060673 (registering DOI) - 11 Jun 2026
Viewed by 113
Abstract
To address the poor detection performance in low-light underground coal mine belt conveyors caused by information entropy degradation and high background noise, as well as the difficulty in multi-scale target extraction due to uneven entropy distribution, this paper proposes an efficient foreign object [...] Read more.
To address the poor detection performance in low-light underground coal mine belt conveyors caused by information entropy degradation and high background noise, as well as the difficulty in multi-scale target extraction due to uneven entropy distribution, this paper proposes an efficient foreign object detection model named SIRI-YOLO based on an improved YOLOv11n architecture. First, a Self-Calibrating Illumination Network (SCINet) is introduced to restore image information entropy and enhance low-light adaptability. Second, the C2PSA module is enhanced to C2PSA-IRMB by incorporating an Inverted Residual Mobile Block (IRMB), improving multi-scale feature utilization and reducing ineffective entropy increase. Third, an improved Reparameterized Generalized Feature Pyramid Network (RepGFPN) is adopted to strengthen the fusion of high-level semantics and low-level spatial features, reducing information entropy loss during feature pyramid transfer. Finally, the Inner-MPDIoU loss function is introduced to replace CIoU, achieving more accurate entropy minimization from a KL divergence perspective. Experimental results on a dataset containing large coal chunks and anchor rods show that SIRI-YOLO achieves 92.8% mAP@50, 59.4% mAP@50:95, 89.5% precision, and 87.2% recall, with only 2.92M parameters and 70.01 FPS, outperforming mainstream YOLO models. Furthermore, on the public ExDark low-light dataset, SIRI-YOLO improves mAP@50 by 4.2% over YOLOv11n, demonstrating strong generalization across different low-light and complex scenarios. The proposed method effectively handles uneven illumination, scale variation, and complex backgrounds, offering a practical solution for coal mine safety through system entropy reduction. Full article
Show Figures

Figure 1

26 pages, 2289 KB  
Article
VI-MSFFN: A Visible-Infrared Multi-Scale Feature Fusion Network for Cross-Modal Detection in Remote Sensing
by Yurong Yue, Weiwei Qin, Hao Chi, Baiwei An, Dingyi Wu, Wenxin Guo and Jingyi Xiong
Remote Sens. 2026, 18(12), 1938; https://doi.org/10.3390/rs18121938 - 11 Jun 2026
Viewed by 62
Abstract
To address the issues of insufficient single-modality robustness and limited multi-scale object detection accuracy in remote sensing image detection (RSID) in complex environments, this paper proposes a multimodal RSID network named VI-MSFFN. The model adopts a symmetric parallel dual-branch architecture to achieve independent [...] Read more.
To address the issues of insufficient single-modality robustness and limited multi-scale object detection accuracy in remote sensing image detection (RSID) in complex environments, this paper proposes a multimodal RSID network named VI-MSFFN. The model adopts a symmetric parallel dual-branch architecture to achieve independent extraction and collaborative modeling of visible and infrared modal features. A cross-modal multi-scale sparse cross-attention fusion module is proposed and applied to the P4 and P5 feature layers, and a high-low-level feature collaborative cross-modal fusion strategy was constructed to achieve efficient and robust cross-modal feature fusion while enhancing multi-scale object modeling capability and suppressing feature redundancy and noise. Additionally, a progressive feature interaction and fusion architecture was designed to combine spatial and frequency domain information to strengthen deep object representation. The experimental results on the VEDAI and Drone Vehicle datasets demonstrate that VI-MSFFN achieves state-of-the-art (SOTA) performance in detection accuracy, robustness, and generalization ability. The proposed method effectively solves the detection challenges of RSID and has significant application value in the field of multi-modal RSID. Full article
20 pages, 4278 KB  
Article
Image Watermarking Algorithm Leveraging Dual-Attention Synergy and Adaptive Multi-Scale Fusion
by Zhenghan Yang, Huadong Sun and Nuohan Lv
Electronics 2026, 15(12), 2580; https://doi.org/10.3390/electronics15122580 - 11 Jun 2026
Viewed by 140
Abstract
Blind image watermarking models such as HiDDeN have laid an important foundation for end-to-end watermarking. Nevertheless, they still suffer from three major limitations: single-scale feature extraction, fixed fusion weights, and slow training convergence. To address these issues, this paper proposes an adaptive multi-scale [...] Read more.
Blind image watermarking models such as HiDDeN have laid an important foundation for end-to-end watermarking. Nevertheless, they still suffer from three major limitations: single-scale feature extraction, fixed fusion weights, and slow training convergence. To address these issues, this paper proposes an adaptive multi-scale watermarking algorithm based on collaborative dual-attention mechanisms. The algorithm designs an adaptive multi-scale feature fusion module (MA-FFM) with a dynamic gating network in the encoder, which flexibly combines local multi-scale textures with global contextual information, overcoming the limitation of fixed fusion weights. In the decoder, a multi-level channel attention module is embedded to strengthen the extraction of watermark signals. The two attention modules work synergistically: the encoder focuses on adaptive feature fusion while the decoder leverages channel attention to selectively enhance watermark-related features, forming a dual-attention synergy that balances robustness and imperceptibility. Moreover, the dynamic gating network adaptively adjusts the contribution of local versus global features via learnable weights, whose evolution from approximately 0.51 to about 0.89 improves model interpretability. Experiments are conducted on the COCO 2017 dataset. Compared with HiDDeN, the proposed algorithm reduces the bit error rate (BER) from 0.1696 to 0.1538 under no attack with a relative reduction of 9.3%, increases PSNR by 0.61 dB, and improves SSIM from 0.9058 to 0.9077. Under various attacks—including JPEG compression, Gaussian noise, salt-and-pepper noise, and brightness/contrast adjustments—the BER remains consistently lower than that of HiDDeN. Ablation studies confirm the effectiveness of each module. Overall, the proposed algorithm preserves visual quality, improves the accuracy of watermark embedding and extraction, and exhibits strong generalization robustness against common image distortions. Full article
Show Figures

Figure 1

27 pages, 49694 KB  
Article
DUST-YOLO: A Deployable UAV Swin Transformer YOLO with Multi-Dimensional Pruning and Mixed-Precision Quantization for End-to-End Video Object Detection
by Gongxun Lin, Jincheng Jiang, Jiaheng Cai, Xingjian Luo, Zihao Wang, Hao Sun and Ziyuan Pu
Electronics 2026, 15(12), 2579; https://doi.org/10.3390/electronics15122579 - 11 Jun 2026
Viewed by 170
Abstract
Real-time video object detection on unmanned aerial vehicles (UAVs) is essential for urban inspection and autonomous perception, yet its deployment on edge devices is severely constrained by the high computational cost of accurate detectors, the quantization sensitivity of hybrid convolution-attention networks, and the [...] Read more.
Real-time video object detection on unmanned aerial vehicles (UAVs) is essential for urban inspection and autonomous perception, yet its deployment on edge devices is severely constrained by the high computational cost of accurate detectors, the quantization sensitivity of hybrid convolution-attention networks, and the system-level latency of full video processing pipelines. To address these challenges, we present DUST-YOLO, a deployment-oriented algorithm-hardware co-design framework, where structured pruning and mixed-precision quantization-aware training (QAT) are jointly optimized with TensorRT–DeepStream for efficient UAV small-object detection on edge platforms. First, we introduce a multi-dimensional structured pruning strategy that applies asymmetric channel pruning to convolutional and feature-fusion modules while compressing the Swin Transformer prediction heads and bottleneck stacks, thereby reducing parameters and computation with limited impact on multi-scale representation capability. Second, we develop a hardware-aware mixed-precision QAT scheme that maps computation-intensive backbone layers to INT8 while preserving the Transformer-related modules in FP16, improving inference efficiency while mitigating the accuracy loss caused by uniform low-bit quantization. Third, we compile the optimized network with TensorRT and integrate the resulting inference engine into a DeepStream-based asynchronous video pipeline on the edge platform, enabling end-to-end acceleration by reducing decoding, preprocessing, and memory-transfer overheads. Experimental results on the VisDrone2019-DET dataset and the NVIDIA Jetson Orin NX demonstrate that DUST-YOLO achieves 43.7% mAP@0.5 accuracy with an end-to-end latency of 36.3 ms and a throughput of 27.5 FPS. Compared with the state of the art, DUST-YOLO reduces end-to-end latency by 56.9% and improves end-to-end video throughput by 2.31×. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

24 pages, 3427 KB  
Article
A Multi-Class Classification Model for Text Related to Online Public Opinion Risks in Higher Education Institutions Based on Confidence-Aware Dynamic Fusion
by Xin Gu, Chengjun Wang, Kai Wang and Xiang Zhao
Information 2026, 17(6), 579; https://doi.org/10.3390/info17060579 - 10 Jun 2026
Viewed by 75
Abstract
With the widespread use of social media and online platforms in the dissemination of public opinion within universities, the multi-class classification of risk-related texts has become a critical component of online public opinion analysis in higher education institutions. Existing multi-class risk classification methods [...] Read more.
With the widespread use of social media and online platforms in the dissemination of public opinion within universities, the multi-class classification of risk-related texts has become a critical component of online public opinion analysis in higher education institutions. Existing multi-class risk classification methods often focus on static semantic representations, making it difficult to effectively capture the emotional evolution within texts and the differences between samples, which in turn affects the accuracy of risk classification. To address this, this paper proposes a multi-class risk classification model for university online public opinion that integrates contextual semantic modeling, emotional evolution detection, and adaptive confidence-based feature fusion. The model employs pre-trained BERT for context encoding and, while preserving high-level semantic information, enhances the model’s adaptability to domain-specific features through a selective unfreezing strategy. First, a Bidirectional Gated Recurrent Unit (BiGRU) is introduced to model the emotional evolution trajectory within text sequences, and an emotional transition intensity metric is constructed by calculating the difference between adjacent hidden states, thereby explicitly capturing the magnitude of emotional changes. Additionally, a convolutional feature branch is designed to capture local emotional patterns, enhancing the model’s ability to perceive local risk cues and fine-grained emotional fluctuations. Finally, the Emotion-Adaptive Feature Mixer (EAFM) is introduced. This module adaptively weights global emotional evolution features and local emotional pattern features based on sample confidence to adjust the contributions of different feature branches in risk classification. Experimental results demonstrate that the proposed model exhibits good convergence characteristics in the university online public opinion scenario represented by the CUOPO dataset and demonstrates strong interpretability through attention visualization and confidence coefficient analysis. Full article
Show Figures

Figure 1

30 pages, 10130 KB  
Article
An Explainable Multi-Scale Deep Learning Framework for Multi-Class Brain MRI Classification
by Hamoud H. Alshammari and Mahmood A. Mahmood
Diagnostics 2026, 16(12), 1791; https://doi.org/10.3390/diagnostics16121791 - 10 Jun 2026
Viewed by 166
Abstract
Background/Objectives: Brain magnetic resonance imaging (MRI) is an important imaging modality for assessing neurological disorders. However, automatic multi-class MRI classification remains challenging because of visual similarity between disease categories, heterogeneous pathological patterns, class imbalance, and the need for reliable confidence estimation. This study [...] Read more.
Background/Objectives: Brain magnetic resonance imaging (MRI) is an important imaging modality for assessing neurological disorders. However, automatic multi-class MRI classification remains challenging because of visual similarity between disease categories, heterogeneous pathological patterns, class imbalance, and the need for reliable confidence estimation. This study aims to develop a comprehensive and well-calibrated deep learning framework for image-level brain MRI classification across multiple neurological categories. Methods: This paper introduces a new deep learning framework, MCND-ComputeNet++, for brain MRI classification into eight image-level categories using the MCND dataset, which comprises 16,400 two-dimensional brain MRI images belonging to eight diagnostic categories: AD-MildDemented, AD-ModerateDemented, AD-VeryMildDemented, BT-glioma, BT-meningioma, BT-pituitary, MS, and Normal. The proposed model uses a single pretrained EfficientNetV2-S backbone to extract hierarchical feature maps from three intermediate stages. These multi-level features are projected into a common latent space, spatially aligned, adaptively fused through learnable gated multi-scale fusion, further refined using convolutional processing, and aggregated using spatial attention pooling before classification. The training strategy combines class-balanced focal loss with label smoothing, MixUp/CutMix regularization, exponential moving average weight smoothing, warmup cosine learning-rate scheduling, temperature scaling, and test-time augmentation to improve generalization and calibration. The framework was evaluated using accuracy, precision, recall, macro-F1, macro-AUC, macro-average precision, expected calibration error, Brier score, bootstrap confidence intervals, ablation analysis, McNemar testing, and comparisons against standard pretrained baseline models. Results: MCND-ComputeNet++ achieved mean accuracy, macro-F1, macro-AUC, and macro-average precision values of 0.9738, 0.9771, 0.9993, and 0.9971, respectively, with narrow bootstrap confidence intervals indicating stable image-level performance. These findings should be interpreted as image-level/slice-level performance on MCND, because patient-level identifiers and subject-wise splitting were not available. These results outperformed most evaluated baselines, including ResNet50, DenseNet121, EfficientNetB0, EfficientNetV2-S with a standard classifier, Swin-Tiny, and ConvNeXt-Tiny, across several discrimination and calibration metrics. Compared with ConvNeXt-Tiny, the proposed model achieved higher macro-AUC and macro-average precision, together with a lower ECE and Brier score, suggesting improved image-level discrimination and confidence reliability. Compared with the EfficientNetV2-S standard classifier, accuracy increased from 0.9308 to 0.9738, while the Brier score decreased from 0.1045 to 0.0400. Conclusions: The results suggest that MCND-ComputeNet++ is a promising image-level brain MRI classification framework for the eight MCND categories. The proposed model integrates hierarchical feature extraction, shared latent projection, gated multi-scale fusion, convolutional refinement, spatial attention pooling, and calibrated inference within a unified architecture. However, because the current evaluation was conducted at the image/slice level without available patient-level identifiers, the findings should not be interpreted as patient-level clinical diagnostic validation. Further studies using subject-wise splitting, external multi-center datasets, 3D volumetric modeling, and multimodal clinical information are required to assess generalizability and potential clinical decision-support applicability. Full article
(This article belongs to the Special Issue Brain MRI: Current Development and Applications)
Show Figures

Figure 1

47 pages, 1039 KB  
Review
Sensor-Driven Digital Twins for Bridge Infrastructure: A Critical Review of BIM-Enabled Integration, Monitoring Architectures, and Operational Maturity
by Alejandro Mungaray-Carrillo, Ye Xia, Fidel Lozano-Galant and José Antonio Lozano Galant
Appl. Sci. 2026, 16(12), 5873; https://doi.org/10.3390/app16125873 - 10 Jun 2026
Viewed by 80
Abstract
Digital Twin (DT) research in civil infrastructure has expanded rapidly, yet its practical maturity in bridge engineering remains uneven. This review examines sensor-driven DT research in bridge infrastructure through a combined bibliometric and systematic approach, with particular emphasis on implementation logic and operational [...] Read more.
Digital Twin (DT) research in civil infrastructure has expanded rapidly, yet its practical maturity in bridge engineering remains uneven. This review examines sensor-driven DT research in bridge infrastructure through a combined bibliometric and systematic approach, with particular emphasis on implementation logic and operational maturity. First, a broad bibliometric analysis was conducted to map the thematic directions, technological clusters, and infrastructure domains structuring DT research across civil infrastructure. Second, a bridge-specific systematic review of implemented and sensor-supported cases was performed to characterize their dominant application domains, technological components, integration logic, and maturity level. The broader civil-infrastructure literature is organized around structural monitoring, lifecycle information management, cyber–physical connectivity, AI-enabled analytics, and digital representation. By contrast, the bridge-specific literature narrows toward model-asset coupling, structural health monitoring, response-based interpretation, and implementation-oriented integration. Across the reviewed bridge cases, the most recurrent layers correspond to sensing, communication, digital representation, and analytical modelling, whereas the decisive features of robust operational twins, namely continuous or recurrent physical coupling, structured data fusion, effective update logic, and explicit decision-support use, remain less consistently implemented and documented. In this sense, the study provides a more discriminating maturity-oriented interpretation of current bridge DT research by connecting bibliometric evolution, architectural configuration, and bridge-specific implementation evidence. Full article
(This article belongs to the Special Issue State-of-the-Art Structural Health Monitoring Application)
Back to TopTop