Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,851)

Search Parameters:
Keywords = cross-modality

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 5706 KB  
Article
Research on a Unified Multi-Type Defect Detection Method for Lithium Batteries Throughout Their Entire Lifecycle Based on Multimodal Fusion and Attention-Enhanced YOLOv8
by Zitao Du, Ziyang Ma, Yazhe Yang, Dongyan Zhang, Haodong Song, Xuanqi Zhang and Yijia Zhang
Sensors 2026, 26(2), 635; https://doi.org/10.3390/s26020635 (registering DOI) - 17 Jan 2026
Abstract
To address the limitations of traditional lithium battery defect detection—low efficiency, high missed detection rates for minute/composite defects, and inadequate multimodal fusion—this study develops an improved YOLOv8 model based on multimodal fusion and attention enhancement for unified full-lifecycle multi-type defect detection. Integrating visible-light [...] Read more.
To address the limitations of traditional lithium battery defect detection—low efficiency, high missed detection rates for minute/composite defects, and inadequate multimodal fusion—this study develops an improved YOLOv8 model based on multimodal fusion and attention enhancement for unified full-lifecycle multi-type defect detection. Integrating visible-light and X-ray modalities, the model incorporates a Squeeze-and-Excitation (SE) module to dynamically weight channel features, suppressing redundancy and highlighting cross-modal complementarity. A Multi-Scale Fusion Module (MFM) is constructed to amplify subtle defect expression by fusing multi-scale features, building on established feature fusion principles. Experimental results show that the model achieves an mAP@0.5 of 87.5%, a minute defect recall rate (MRR) of 84.1%, and overall industrial recognition accuracy of 97.49%. It operates at 35.9 FPS (server) and 25.7 FPS (edge) with end-to-end latency of 30.9–38.9 ms, meeting high-speed production line requirements. Exhibiting strong robustness, the lightweight model outperforms YOLOv5/7/8/9-S in core metrics. Large-scale verification confirms stable performance across the battery lifecycle, providing a reliable solution for industrial defect detection and reducing production costs. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

17 pages, 2450 KB  
Article
Design, Fabrication and Characterization of Multi-Frequency MEMS Transducer for Photoacoustic Imaging
by Alberto Prud’homme and Frederic Nabki
Micromachines 2026, 17(1), 122; https://doi.org/10.3390/mi17010122 (registering DOI) - 17 Jan 2026
Abstract
This work presents the design, fabrication, and experimental characterization of microelectromechanical system (MEMS) ultrasonic transducers engineered for multi-frequency operation in photoacoustic imaging (PAI). The proposed devices integrate multiple resonant geometries, including circular diaphragms, floated crosses, anchored cross membranes, and cantilever arrays, within compact [...] Read more.
This work presents the design, fabrication, and experimental characterization of microelectromechanical system (MEMS) ultrasonic transducers engineered for multi-frequency operation in photoacoustic imaging (PAI). The proposed devices integrate multiple resonant geometries, including circular diaphragms, floated crosses, anchored cross membranes, and cantilever arrays, within compact footprints to overcome the inherently narrow frequency response of conventional MEMS transducers. All devices were fabricated using the PiezoMUMPs commercial microfabrication process, with finite element simulations guiding modal optimization and laser Doppler vibrometry used for experimental validation in air. The circular diaphragm exhibited a narrowband response with a dominant resonance at 1.69 MHz and a quality factor (Q) of 268, confirming the bandwidth limitations of traditional geometries. In contrast, complex designs such as the floated cross and cantilever arrays achieved significantly broader spectral responses, with resonances spanning from 275 kHz to beyond 7.5 MHz. The cantilever array, with systematically varied arm lengths, achieved the highest modal density through asynchronous activation across the spectrum. Results demonstrate that structurally diverse MEMS devices can overcome the bandwidth constraints of traditional piezoelectric transducers. The integration of heterogeneous MEMS geometries offers a viable approach for broadband sensitivity in PAI, enabling improved spatial resolution and depth selectivity without compromising miniaturization or manufacturability. Full article
Show Figures

Figure 1

25 pages, 1708 KB  
Article
Distribution Network Electrical Equipment Defect Identification Based on Multi-Modal Image Voiceprint Data Fusion and Channel Interleaving
by An Chen, Junle Liu, Wenhao Zhang, Jiaxuan Lu, Jiamu Yang and Bin Liao
Processes 2026, 14(2), 326; https://doi.org/10.3390/pr14020326 - 16 Jan 2026
Abstract
With the explosive growth in the quantity of electrical equipment in distribution networks, traditional manual inspection struggles to achieve comprehensive coverage due to limited manpower and low efficiency. This has led to frequent equipment failures including partial discharge, insulation aging, and poor contact. [...] Read more.
With the explosive growth in the quantity of electrical equipment in distribution networks, traditional manual inspection struggles to achieve comprehensive coverage due to limited manpower and low efficiency. This has led to frequent equipment failures including partial discharge, insulation aging, and poor contact. These issues seriously compromise the safe and stable operation of distribution networks. Real-time monitoring and defect identification of their operation status are critical to ensuring the safety and stability of power systems. Currently, commonly used methods for defect identification in distribution network electrical equipment mainly rely on single-image or voiceprint data features. These methods lack consideration of the complementarity and interleaved nature between image and voiceprint features, resulting in reduced identification accuracy and reliability. To address the limitations of existing methods, this paper proposes distribution network electrical equipment defect identification based on multi-modal image voiceprint data fusion and channel interleaving. First, image and voiceprint feature models are constructed using two-dimensional principal component analysis (2DPCA) and the Mel scale, respectively. Multi-modal feature fusion is achieved using an improved transformer model that integrates intra-domain self-attention units and an inter-domain cross-attention mechanism. Second, an image and voiceprint multi-channel interleaving model is applied. It combines channel adaptability and confidence to dynamically adjust weights and generates defect identification results using a weighting approach based on output probability information content. Finally, simulation results show that, under the dataset size of 3300 samples, the proposed algorithm achieves a 8.96–33.27% improvement in defect recognition accuracy compared with baseline algorithms, and maintains an accuracy of over 86.5% even under 20% random noise interference by using improved transformer and multi-channel interleaving mechanism, verifying its advantages in accuracy and noise robustness. Full article
24 pages, 1302 KB  
Article
Do You Fail to Recognize Me with a Mask on? The Impact of Voice on Mask-Occluded Facial Identity Recognition
by Min Gao, Wenyu Duan, Tianhang Liu, Yulin Gao and Xiaoyu Tang
Behav. Sci. 2026, 16(1), 128; https://doi.org/10.3390/bs16010128 - 16 Jan 2026
Abstract
This research sought to examine differences in the cross-modal facilitation effect of voice on facial identity recognition under mask occlusion for both oneself and others. Employing a facial recognition paradigm, we examined the influence of voice on facial identity recognition under static and [...] Read more.
This research sought to examine differences in the cross-modal facilitation effect of voice on facial identity recognition under mask occlusion for both oneself and others. Employing a facial recognition paradigm, we examined the influence of voice on facial identity recognition under static and dynamic mask occlusion through two eye-tracking experiments. The behavioral results from Experiments 1 and 2 indicate that mask occlusion interfered with recognition for both static and dynamic faces, with greater interference observed for others’ faces than for self-faces. In addition, voice exerts cross-modal enhancement effects on faces, with greater enhancement observed for masked faces than for no mask. Furthermore, voice provides stronger enhancement for others’ dynamic faces than for their self-dynamic faces. Eye-tracking data from both experiments revealed that the difference in dynamic facial recognition between self-faces and others’ faces due to voice emerged in the early stages of dynamic facial recognition and persisted into later stages. However, regardless of whether they were in the early or late stages of static facial recognition, the facilitation effect of voice did not differ between themselves and others. This study revealed that the cross-modal facilitation of visual stimuli by voice is influenced by the self-advantage effect. Full article
Show Figures

Figure 1

15 pages, 912 KB  
Systematic Review
Does Paying the Same Sustain Telehealth? A Systematic Review of Payment Parity Laws
by Alina Doina Tanase, Malina Popa, Bogdan Hoinoiu, Raluca-Mioara Cosoroaba and Emanuela-Lidia Petrescu
Healthcare 2026, 14(2), 222; https://doi.org/10.3390/healthcare14020222 - 16 Jan 2026
Abstract
Background and Objectives: Payment parity laws require commercial health plans to pay for telehealth on the same basis as in-person care. We systematically reviewed open-access empirical studies to identify and synthesize empirical U.S. studies that explicitly evaluated state telehealth payment parity (distinct [...] Read more.
Background and Objectives: Payment parity laws require commercial health plans to pay for telehealth on the same basis as in-person care. We systematically reviewed open-access empirical studies to identify and synthesize empirical U.S. studies that explicitly evaluated state telehealth payment parity (distinct from coverage-only parity) and to summarize reported effects on telehealth utilization, modality mix, quality/adherence, equity/access, and expenditures. Methods: Following PRISMA 2020, we searched PubMed/MEDLINE, Scopus, and Web of Science for U.S. studies that explicitly modeled state payment parity or stratified results by payment parity vs. coverage-only vs. no parity. We included original quantitative or qualitative studies with a time or geographic comparator and free full-text availability. The primary outcome was telehealth utilization (share or odds of telehealth use); secondary outcomes were modality mix, quality and adherence, equity and access, and spending. Because designs were heterogeneous (interrupted time series [ITS], difference-in-differences [DiD], regression, qualitative), we used structured narrative synthesis. Results: Nine studies met inclusion criteria. In community health centers (CHCs), payment parity was associated with higher telehealth use (42% of visits in parity states vs. 29% without; Δ = +13.0 percentage points; adjusted odds ratio 1.74, 95% CI 1.49–2.03). Among patients with newly diagnosed cancer, adjusted telehealth rates were 23.3% in coverage + payment parity states vs. 19.1% in states without parity, while cross-state practice limits reduced telehealth use (14.9% vs. 17.8%). At the health-system level, parity mandates were linked to a +2.5-percentage-point telemedicine share in 2023, with mental-health (29%) and substance use disorder (SUD) care (21%) showing the highest telemedicine shares. A Medicaid coverage policy bundle increased live-video use by 6.0 points and the proportion “always able to access needed care” by 11.1 points. For hypertension, payment parity improved medication adherence, whereas early emergency department and hospital adoption studies found null associations. Direct spending evidence from open-access sources remained sparse. Conclusions: Across ambulatory settings—especially behavioral health and chronic disease management—state payment parity laws are consistently associated with modest but meaningful increases in telehealth use and some improvements in adherence and perceived access. Effects vary by specialty and are attenuated where cross-state practice limits persist, and the impact of payment parity on overall spending remains understudied. Full article
Show Figures

Figure 1

22 pages, 5928 KB  
Article
PromptTrace: A Fine-Grained Prompt Stealing Attack via CLIP-Guided Beam Search for Text-to-Image Models
by Shaofeng Ming, Yuhao Zhang, Yang Liu, Tianyu Han, Dengmu Liu, Tong Yu, Jieke Lu and Bo Xu
Symmetry 2026, 18(1), 161; https://doi.org/10.3390/sym18010161 - 15 Jan 2026
Viewed by 25
Abstract
The inherent semantic symmetry and cross-modal alignment between textual prompts and generated images have fueled the success of text-to-image (T2I) generation. However, this strong correlation also introduces security vulnerabilities, specifically prompt stealing attacks, where valuable prompts are reverse-engineered from images. In this paper, [...] Read more.
The inherent semantic symmetry and cross-modal alignment between textual prompts and generated images have fueled the success of text-to-image (T2I) generation. However, this strong correlation also introduces security vulnerabilities, specifically prompt stealing attacks, where valuable prompts are reverse-engineered from images. In this paper, we address the challenge of information asymmetry in black-box attack scenarios and propose PromptTrace, a fine-grained prompt stealing framework via Contrastive Language-Image Pre-training (CLIP)-guidedbeam search. Unlike existing methods that rely on single-stage generation, PromptTrace structurally decomposes prompt reconstruction into subject generation, modifier extraction, and iterative search optimization to effectively restore the visual–textual correspondence. By leveraging a CLIP-guided beam search strategy, our method progressively optimizes candidate prompts based on image–text similarity feedback, ensuring the stolen prompt achieves high fidelity in both semantic intent and stylistic representation. Extensive evaluations across multiple datasets and T2I models demonstrate that PromptTrace outperforms existing methods, highlighting the feasibility of exploiting cross-modal symmetry for attacks and underscoring the urgent need for defense mechanisms in the T2I ecosystem. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

23 pages, 2992 KB  
Article
Key-Value Mapping-Based Text-to-Image Diffusion Model Backdoor Attacks
by Lujia Chai, Yang Hou, Guozhao Liao and Qiuling Yue
Algorithms 2026, 19(1), 74; https://doi.org/10.3390/a19010074 - 15 Jan 2026
Viewed by 26
Abstract
Text-to-image (T2I) generation, a core component of generative artificial intelligence(AI), is increasingly important for creative industries and human–computer interaction. Despite impressive progress in realism and diversity, diffusion models still exhibit critical security blind spots particularly in the Transformer key-value mapping mechanism that underpins [...] Read more.
Text-to-image (T2I) generation, a core component of generative artificial intelligence(AI), is increasingly important for creative industries and human–computer interaction. Despite impressive progress in realism and diversity, diffusion models still exhibit critical security blind spots particularly in the Transformer key-value mapping mechanism that underpins cross-modal alignment. Existing backdoor attacks often rely on large-scale data poisoning or extensive fine-tuning, leading to low efficiency and limited stealth. To address these challenges, we propose two efficient backdoor attack methods AttnBackdoor and SemBackdoor grounded in the Transformer’s key-value storage principle. AttnBackdoor injects precise mappings between trigger prompts and target instances by fine-tuning the key-value projection matrices in U-Net cross-attention layers (≈5% of parameters). SemBackdoor establishes semantic-level mappings by editing the text encoder’s MLP projection matrix (≈0.3% of parameters). Both approaches achieve high attack success rates (>90%), with SemBackdoor reaching 98.6% and AttnBackdoor 97.2%. They also reduce parameter updates and training time by 1–2 orders of magnitude compared to prior work while preserving benign generation quality. Our findings reveal dual vulnerabilities at visual and semantic levels and provide a foundation for developing next generation defenses for secure generative AI. Full article
Show Figures

Figure 1

20 pages, 5073 KB  
Article
SAWGAN-BDCMA: A Self-Attention Wasserstein GAN and Bidirectional Cross-Modal Attention Framework for Multimodal Emotion Recognition
by Ning Zhang, Shiwei Su, Haozhe Zhang, Hantong Yang, Runfang Hao and Kun Yang
Sensors 2026, 26(2), 582; https://doi.org/10.3390/s26020582 - 15 Jan 2026
Viewed by 82
Abstract
Emotion recognition from physiological signals is pivotal for advancing human–computer interaction, yet unimodal pipelines frequently underperform due to limited information, constrained data diversity, and suboptimal cross-modal fusion. Addressing these limitations, the Self-Attention Wasserstein Generative Adversarial Network with Bidirectional Cross-Modal Attention (SAWGAN-BDCMA) framework is [...] Read more.
Emotion recognition from physiological signals is pivotal for advancing human–computer interaction, yet unimodal pipelines frequently underperform due to limited information, constrained data diversity, and suboptimal cross-modal fusion. Addressing these limitations, the Self-Attention Wasserstein Generative Adversarial Network with Bidirectional Cross-Modal Attention (SAWGAN-BDCMA) framework is proposed. This framework reorganizes the learning process around three complementary components: (1) a Self-Attention Wasserstein GAN (SAWGAN) that synthesizes high-quality Electroencephalography (EEG) and Photoplethysmography (PPG) to expand diversity and alleviate distributional imbalance; (2) a dual-branch architecture that distills discriminative spatiotemporal representations within each modality; and (3) a Bidirectional Cross-Modal Attention (BDCMA) mechanism that enables deep two-way interaction and adaptive weighting for robust fusion. Evaluated on the DEAP and ECSMP datasets, SAWGAN-BDCMA significantly outperforms multiple contemporary methods, achieving 94.25% accuracy for binary and 87.93% for quaternary classification on DEAP. Furthermore, it attains 97.49% accuracy for six-class emotion recognition on the ECSMP dataset. Compared with state-of-the-art multimodal approaches, the proposed framework achieves an accuracy improvement ranging from 0.57% to 14.01% across various tasks. These findings offer a robust solution to the long-standing challenges of data scarcity and modal imbalance, providing a profound theoretical and technical foundation for fine-grained emotion recognition and intelligent human–computer collaboration. Full article
(This article belongs to the Special Issue Advanced Signal Processing for Affective Computing)
Show Figures

Figure 1

27 pages, 24824 KB  
Article
UGFF-VLM: Uncertainty-Guided and Frequency-Fused Vision-Language Model for Remote Sensing Farmland Segmentation
by Kai Tan, Yanlan Wu, Hui Yang and Xiaoshuang Ma
Remote Sens. 2026, 18(2), 282; https://doi.org/10.3390/rs18020282 - 15 Jan 2026
Viewed by 132
Abstract
Vision-language models can leverage natural language descriptions to encode stable farmland characteristics, providing a new paradigm for farmland extraction, yet existing methods face challenges in ambiguous text-visual alignment and loss of high-frequency boundary details during fusion. To address this, this article utilizes the [...] Read more.
Vision-language models can leverage natural language descriptions to encode stable farmland characteristics, providing a new paradigm for farmland extraction, yet existing methods face challenges in ambiguous text-visual alignment and loss of high-frequency boundary details during fusion. To address this, this article utilizes the semantic prior knowledge provided by textual descriptions in vision–language models to enhance the model’s ability to recognize polymorphic features, and proposes an Uncertainty-Guided and Frequency-Fused Vision-Language Model (UGFF-VLM) for remote sensing farmland extraction. The UGFF-VLM combines the semantic representation ability of vision-language models, further integrates an Uncertainty-Guided Adaptive Alignment (UGAA) module to dynamically adjust cross-modal fusion based on alignment confidence, and a Frequency-Enhanced Cross-Modal Fusion (FECF) mechanism to preserve high-frequency boundary details in the frequency domain. Experimental results on the FarmSeg-VL dataset demonstrate that the proposed method delivers excellent and stable performance, achieving the highest mIoU across diverse geographical environments while showing significant improvements in boundary precision and robustness against false positives. Therefore, the proposed UGFF-VLM not only mitigates the issues of recognition confusion and poor generalization in purely vision-based models caused by farmland feature polymorphism but also effectively enhances boundary segmentation accuracy, providing a reliable method for the precise delineation of agricultural parcels in diverse landscapes. Full article
(This article belongs to the Special Issue Advanced AI Technology for Remote Sensing Analysis)
Show Figures

Figure 1

44 pages, 648 KB  
Systematic Review
A Systematic Review and Energy-Centric Taxonomy of Jamming Attacks and Countermeasures in Wireless Sensor Networks
by Carlos Herrera-Loera, Carolina Del-Valle-Soto, Leonardo J. Valdivia, Javier Vázquez-Castillo and Carlos Mex-Perera
Sensors 2026, 26(2), 579; https://doi.org/10.3390/s26020579 - 15 Jan 2026
Viewed by 55
Abstract
Wireless Sensor Networks (WSNs) operate under strict energy constraints and are therefore highly vulnerable to radio interference, particularly jamming attacks that directly affect communication availability and network lifetime. Although jamming and anti-jamming mechanisms have been extensively studied, energy is frequently treated as a [...] Read more.
Wireless Sensor Networks (WSNs) operate under strict energy constraints and are therefore highly vulnerable to radio interference, particularly jamming attacks that directly affect communication availability and network lifetime. Although jamming and anti-jamming mechanisms have been extensively studied, energy is frequently treated as a secondary metric, and analyses are often conducted in partial isolation from system assumptions, protocol behavior, and deployment context. This fragmentation limits the interpretability and comparability of reported results. This article presents a systematic literature review (SLR) covering the period from 2004 to 2024, with a specific focus on energy-aware jamming and mitigation strategies in IEEE 802.15.4-based WSNs. To ensure transparency and reproducibility, the literature selection and refinement process is formalized through a mathematical search-and-filtering model. From an initial corpus of 482 publications retrieved from Scopus, 62 peer-reviewed studies were selected and analyzed across multiple dimensions, including jamming modality, affected protocol layers, energy consumption patterns, evaluation assumptions, and deployment scenarios. The review reveals consistent energy trends among constant, random, and reactive jamming strategies, as well as significant variability in the energy overhead introduced by defensive mechanisms at the physical (PHY), Medium Access Control (MAC), and network layers. It further identifies persistent methodological challenges, such as heterogeneous energy metrics, incomplete characterization of jamming intensity, and the limited use of real-hardware testbeds. To address these gaps, the paper introduces an energy-centric taxonomy that explicitly accounts for attacker–defender energy asymmetry, cross-layer interactions, and recurring experimental assumptions, and proposes a minimal set of standardized energy-related performance metrics suitable for IEEE 802.15.4 environments. By synthesizing energy behaviors, trade-offs, and application-specific implications, this review provides a structured foundation for the design and evaluation of resilient, energy-proportional WSNs operating under availability-oriented adversarial interference. Full article
(This article belongs to the Special Issue Security and Privacy in Wireless Sensor Networks (WSNs))
Show Figures

Figure 1

14 pages, 2106 KB  
Article
A Hierarchical Multi-Modal Fusion Framework for Alzheimer’s Disease Classification Using 3D MRI and Clinical Biomarkers
by Ting-An Chang, Chun-Cheng Yu, Yin-Hua Wang, Zi-Ping Lei and Chia-Hung Chang
Electronics 2026, 15(2), 367; https://doi.org/10.3390/electronics15020367 - 14 Jan 2026
Viewed by 114
Abstract
Accurate and interpretable staging of Alzheimer’s disease (AD) remains challenging due to the heterogeneous progression of neurodegeneration and the complementary nature of imaging and clinical biomarkers. This study implements and evaluates an optimized Hierarchical Multi-Modal Fusion Framework (HMFF) that systematically integrates 3D structural [...] Read more.
Accurate and interpretable staging of Alzheimer’s disease (AD) remains challenging due to the heterogeneous progression of neurodegeneration and the complementary nature of imaging and clinical biomarkers. This study implements and evaluates an optimized Hierarchical Multi-Modal Fusion Framework (HMFF) that systematically integrates 3D structural MRI with clinical assessment scales for robust three-class classification of cognitively normal (CN), mild cognitive impairment (MCI), and AD subjects. A standardized preprocessing pipeline, including N4 bias field correction, nonlinear registration to MNI space, ANTsNet-based skull stripping, voxel normalization, and spatial resampling, was employed to ensure anatomically consistent and high-quality MRI inputs. Within the proposed framework, volumetric imaging features were extracted using a 3D DenseNet-121 architecture, while structured clinical information was modeled via an XGBoost classifier to capture nonlinear clinical priors. These heterogeneous representations were hierarchically fused through a lightweight multilayer perceptron, enabling effective cross-modal interaction. To further enhance discriminative capability and model efficiency, a hierarchical feature selection strategy was incorporated to progressively refine high-dimensional imaging features. Experimental results demonstrated that performance consistently improved with feature refinement and reached an optimal balance at approximately 90 selected features. Under this configuration, the proposed HMFF achieved an accuracy of 0.94 (95% Confidence Interval: [0.918, 0.951]), a recall of 0.91, a precision of 0.94, and an F1-score of 0.92, outperforming unimodal and conventional multimodal baselines under comparable settings. Moreover, Grad-CAM visualization confirmed that the model focused on clinically relevant neuroanatomical regions, including the hippocampus and medial temporal lobe, enhancing interpretability and clinical plausibility. These findings indicate that hierarchical multimodal fusion with interpretable feature refinement offers a promising and extensible solution for reliable and explainable automated AD staging. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

26 pages, 1167 KB  
Review
A Review of Multimodal Sentiment Analysis in Online Public Opinion Monitoring
by Shuxian Liu and Tianyi Li
Informatics 2026, 13(1), 10; https://doi.org/10.3390/informatics13010010 - 14 Jan 2026
Viewed by 181
Abstract
With the rapid development of the Internet, online public opinion monitoring has emerged as a crucial task in the information era. Multimodal sentiment analysis, through the integration of multiple modalities such as text, images, and audio, combined with technologies including natural language processing [...] Read more.
With the rapid development of the Internet, online public opinion monitoring has emerged as a crucial task in the information era. Multimodal sentiment analysis, through the integration of multiple modalities such as text, images, and audio, combined with technologies including natural language processing and computer vision, offers novel technical means for online public opinion monitoring. Nevertheless, current research still faces many challenges, such as the scarcity of high-quality datasets, limited model generalization ability, and difficulties with cross-modal feature fusion. This paper reviews the current research progress of multimodal sentiment analysis in online public opinion monitoring, including its development history, key technologies, and application scenarios. Existing problems are analyzed and future research directions are discussed. In particular, we emphasize a fusion-architecture-centric comparison under online public opinion monitoring, and discuss cross-lingual differences that affect multimodal alignment and evaluation. Full article
Show Figures

Figure 1

17 pages, 3529 KB  
Article
Study on Multimodal Sensor Fusion for Heart Rate Estimation Using BCG and PPG Signals
by Jisheng Xing, Xin Fang, Jing Bai, Luyao Cui, Feng Zhang and Yu Xu
Sensors 2026, 26(2), 548; https://doi.org/10.3390/s26020548 - 14 Jan 2026
Viewed by 114
Abstract
Continuous heart rate monitoring is crucial for early cardiovascular disease detection. To overcome the discomfort and limitations of ECG in home settings, we propose a multimodal temporal fusion network (MM-TFNet) that integrates ballistocardiography (BCG) and photoplethysmography (PPG) signals. The network extracts temporal features [...] Read more.
Continuous heart rate monitoring is crucial for early cardiovascular disease detection. To overcome the discomfort and limitations of ECG in home settings, we propose a multimodal temporal fusion network (MM-TFNet) that integrates ballistocardiography (BCG) and photoplethysmography (PPG) signals. The network extracts temporal features from BCG and PPG signals through temporal convolutional networks (TCNs) and bidirectional long short-term memory networks (BiLSTMs), respectively, achieving cross-modal dynamic fusion at the feature level. First, bimodal features are projected into a unified dimensional space through fully connected layers. Subsequently, a cross-modal attention weight matrix is constructed for adaptive learning of the complementary correlation between BCG mechanical vibration and PPG volumetric flow features. Combined with dynamic focusing on key heartbeat waveforms through multi-head self-attention (MHSA), the model’s robustness under dynamic activity states is significantly enhanced. Experimental validation using a publicly available BCG-PPG-ECG simultaneous acquisition dataset comprising 40 subjects demonstrates that the model achieves excellent performance with a mean absolute error (MAE) of 0.88 BPM in heart rate prediction tasks, outperforming current mainstream deep learning methods. This study provides theoretical foundations and engineering guidance for developing contactless, low-power, edge-deployable home health monitoring systems, demonstrating the broad application potential of multimodal fusion methods in complex physiological signal analysis. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

17 pages, 2669 KB  
Article
Multimodal Guidewire 3D Reconstruction Based on Magnetic Field Data
by Wenbin Jiang, Qian Zheng, Dong Yang, Jiaqian Li and Wei Wei
Sensors 2026, 26(2), 545; https://doi.org/10.3390/s26020545 - 13 Jan 2026
Viewed by 85
Abstract
Accurate 3D reconstruction of guidewires is crucial in minimally invasive surgery and interventional procedures. Traditional biplanar X-ray–based reconstruction methods can achieve reasonable accuracy but involve high radiation doses, limiting their clinical applicability; meanwhile, single-view images inherently lack reliable depth cues. To address these [...] Read more.
Accurate 3D reconstruction of guidewires is crucial in minimally invasive surgery and interventional procedures. Traditional biplanar X-ray–based reconstruction methods can achieve reasonable accuracy but involve high radiation doses, limiting their clinical applicability; meanwhile, single-view images inherently lack reliable depth cues. To address these issues, this paper proposes a multimodal guidewire 3D reconstruction approach that integrates magnetic field information. The method first employs the MiDaS v3 network to estimate an initial depth map from a single image and then incorporates tri-axial magnetic field measurements to enrich and refine the spatial information. To effectively fuse the two modalities, we design a multi-stage strategy combining nearest-neighbor matching (KNN) with a cross-modal attention mechanism (Cross-Attention), enabling accurate alignment and fusion of image and magnetic features. The fused representation is subsequently fed into a PointNet-based regressor to generate the final 3D coordinates of the guidewire. Experimental results demonstrate that our method achieves a root-mean-square error of 2.045 mm, a mean absolute error of 1.738 mm, and a z-axis MAE of 0.285 mm on the test set. These findings indicate that the proposed multimodal framework improves 3D reconstruction accuracy under single-view imaging and offers enhanced visualization support for interventional procedures. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

24 pages, 5237 KB  
Article
DCA-UNet: A Cross-Modal Ginkgo Crown Recognition Method Based on Multi-Source Data
by Yunzhi Guo, Yang Yu, Yan Li, Mengyuan Chen, Wenwen Kong, Yunpeng Zhao and Fei Liu
Plants 2026, 15(2), 249; https://doi.org/10.3390/plants15020249 - 13 Jan 2026
Viewed by 207
Abstract
Wild ginkgo, as an endangered species, holds significant value for genetic resource conservation, yet its practical applications face numerous challenges. Traditional field surveys are inefficient in mountainous mixed forests, while satellite remote sensing is limited by spatial resolution. Current deep learning approaches relying [...] Read more.
Wild ginkgo, as an endangered species, holds significant value for genetic resource conservation, yet its practical applications face numerous challenges. Traditional field surveys are inefficient in mountainous mixed forests, while satellite remote sensing is limited by spatial resolution. Current deep learning approaches relying on single-source data or merely simple multi-source fusion fail to fully exploit information, leading to suboptimal recognition performance. This study presents a multimodal ginkgo crown dataset, comprising RGB and multispectral images acquired by an UAV platform. To achieve precise crown segmentation with this data, we propose a novel dual-branch dynamic weighting fusion network, termed dual-branch cross-modal attention-enhanced UNet (DCA-UNet). We design a dual-branch encoder (DBE) with a two-stream architecture for independent feature extraction from each modality. We further develop a cross-modal interaction fusion module (CIF), employing cross-modal attention and learnable dynamic weights to boost multi-source information fusion. Additionally, we introduce an attention-enhanced decoder (AED) that combines progressive upsampling with a hybrid channel-spatial attention mechanism, thereby effectively utilizing multi-scale features and enhancing boundary semantic consistency. Evaluation on the ginkgo dataset demonstrates that DCA-UNet achieves a segmentation performance of 93.42% IoU (Intersection over Union), 96.82% PA (Pixel Accuracy), 96.38% Precision, and 96.60% F1-score. These results outperform differential feature attention fusion network (DFAFNet) by 12.19%, 6.37%, 4.62%, and 6.95%, respectively, and surpasses the single-modality baselines (RGB or multispectral) in all metrics. Superior performance on cross-flight-altitude data further validates the model’s strong generalization capability and robustness in complex scenarios. These results demonstrate the superiority of DCA-UNet in UAV-based multimodal ginkgo crown recognition, offering a reliable and efficient solution for monitoring wild endangered tree species. Full article
(This article belongs to the Special Issue Advanced Remote Sensing and AI Techniques in Agriculture and Forestry)
Show Figures

Figure 1

Back to TopTop