Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,219)

Search Parameters:
Keywords = multimodal data fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 24864 KB  
Article
Improving the Robustness of Odour Recognition with Odour-Image Data Fusion in Open-Air Settings
by Fanny Monori and Alin Tisan
Sensors 2026, 26(8), 2493; https://doi.org/10.3390/s26082493 - 17 Apr 2026
Abstract
Odour recognition with low-cost gas sensors is challenging in open-air settings due to the non-specificity of the sensors and environmental variability. This can be mitigated by incorporating additional information into the classification process. This paper investigates odour-image multimodality in two case-studies of increasing [...] Read more.
Odour recognition with low-cost gas sensors is challenging in open-air settings due to the non-specificity of the sensors and environmental variability. This can be mitigated by incorporating additional information into the classification process. This paper investigates odour-image multimodality in two case-studies of increasing complexity: banana ripening in open-air environment and strawberry ripening in a glasshouse environment. Data were collected using custom acquisition platforms equipped with cameras and MOX gas sensors operated with temperature modulation. For the visual modality, image classification (MobileNetV3) and object detection (YoloV5) models are trained. For the odour modality, established classical machine learning methods (Random Forest, XGBoost, SVM and Logistic Regression) and neural networks (1D-CNN, LSTM, MLP, and ELM) are employed. Each modality is analysed independently and together to critically assess scenarios in which combining modalities provides a clear advantage over using either modality alone. Results show that models trained on odour data achieve high accuracy in controlled environments but underperform in more dynamic open-air settings. Image-based models are sensitive to the image quality in all environments; however, they are more robust when deployed in different environments. Lastly, it is demonstrated that decision fusion consistently increases the accuracy, by as much as +12.36% in the banana ripening and +3.63% in the strawberry ripening scenario. Where decision fusion does not improve classification accuracy significantly, it is shown that the multimodal approach can still be leveraged to identify high-confidence predictions by selecting samples where both modalities agree on the label. Full article
(This article belongs to the Special Issue Recent Advances in Gas Sensors)
32 pages, 8881 KB  
Article
WS-R-IR Adapter: A Multimodal RGB–Infrared Remote Sensing Framework for Water Surface Object Detection
by Bin Xue, Qiang Yu, Kun Ding, Mengxin Jiang, Ying Wang, Shiming Xiang and Chunhong Pan
Remote Sens. 2026, 18(8), 1220; https://doi.org/10.3390/rs18081220 - 17 Apr 2026
Abstract
Water surface object detection in shipborne remote sensing is challenged by unstable wave-induced backgrounds, illumination variations, extreme scale changes with tiny objects, and limited annotations. Multimodal RGB–infrared (RGB–IR) sensing leverages complementary visible and infrared cues to enhance robustness. However, most existing RGB–IR methods [...] Read more.
Water surface object detection in shipborne remote sensing is challenged by unstable wave-induced backgrounds, illumination variations, extreme scale changes with tiny objects, and limited annotations. Multimodal RGB–infrared (RGB–IR) sensing leverages complementary visible and infrared cues to enhance robustness. However, most existing RGB–IR methods rely on backbones pretrained on limited-scale data, which constrain their performance for complex water surface scenes. In this work, we propose the WS-R-IR Adapter, a parameter-efficient vision foundation model (VFM)-based framework for shipborne RGB–IR object detection. Instead of full fine-tuning, it adapts frozen VFM representations via lightweight task-specific designs. the WS-R-IR Adapter includes (1) a water scene domain-aware modal adapter that progressively guides frozen backbone features with evolving semantic cues, (2) a parallel multi-scale structural perception module for fine-grained, scale-sensitive modeling, (3) an adaptive RGB–IR feature modulation fusion strategy, and (4) a resolution-aligned context semantic and structural detail fusion module. Moreover, we introduce an object-guided global-to-local registration framework to address dynamic cross-modal misalignment, and construct modality-aligned PoLaRIS-DET and ASV-RI-DET datasets that cover diverse water surface scenes. On the two datasets, the proposed method achieves mAP@0.5:0.95 scores of 74.2% and 50.2%, respectively, significantly outperforming existing methods with only 11.9M additional parameters. These results demonstrate the effectiveness of parameter-efficient VFM adaptation for multimodal water surface remote sensing. Full article
(This article belongs to the Section Remote Sensing Image Processing)
21 pages, 1194 KB  
Article
Environment-Aware Proactive Beam Prediction in mmWave V2I via Multi-Modal Prior Mask Map
by Changpeng Zhou and Youyun Xu
Sensors 2026, 26(8), 2488; https://doi.org/10.3390/s26082488 - 17 Apr 2026
Abstract
In millimeter wave V2I communication systems, accurate beam prediction is crucial for optimizing network performance and improving signal transmission efficiency. Traditional beam prediction methods mainly rely on single-modal data, which often fails to capture the comprehensive environmental information required for high accuracy prediction. [...] Read more.
In millimeter wave V2I communication systems, accurate beam prediction is crucial for optimizing network performance and improving signal transmission efficiency. Traditional beam prediction methods mainly rely on single-modal data, which often fails to capture the comprehensive environmental information required for high accuracy prediction. In contrast, multi-modal approaches leverage complementary information from different data sources and offer a more promising solution. However, many existing fusion methods primarily depend on real-time sensory inputs and do not fully exploit stable environmental features in V2I scenarios, limiting the effective use of each modality. To address these limitations, this paper proposes a environment-aware proactive beam prediction method based on a multi-modal prior mask map (MMPMM), which integrates offline mapping with an online beam prediction network. Specifically, the method fuses information from images, point clouds, positions, and the MMPMM to predict the optimal beam index. The MMPMM provides channel-related prior information by extracting static V2I scene features offline without incurring any additional online measurement overhead. Experimental results on real-world datasets demonstrate that the proposed method achieves a Top-3 beam prediction accuracy of up to 71.23% while maintaining stable performance under the evaluated dynamic and degraded conditions, demonstrating its effectiveness in the considered scenarios. Full article
(This article belongs to the Special Issue 6G Communication and Edge Intelligence in Wireless Sensor Networks)
Show Figures

Figure 1

26 pages, 956 KB  
Article
Environment-Guided Multimodal Pest Detection and Risk Assessment in Fruit and Vegetable Production Systems
by Jiapeng Sun, Yucheng Peng, Zhimeng Zhang, Wenrui Xu, Boyuan Xi, Yuanying Zhang and Yihong Song
Horticulturae 2026, 12(4), 486; https://doi.org/10.3390/horticulturae12040486 - 16 Apr 2026
Abstract
Aimed at the practical challenge that pest occurrence in fruit and vegetable horticultural production exhibits strong environmental dependency, pronounced stage characteristics, and high sensitivity to control decision-making, a multimodal pest recognition and occurrence risk joint modeling method is proposed to address the limitation [...] Read more.
Aimed at the practical challenge that pest occurrence in fruit and vegetable horticultural production exhibits strong environmental dependency, pronounced stage characteristics, and high sensitivity to control decision-making, a multimodal pest recognition and occurrence risk joint modeling method is proposed to address the limitation that conventional intelligent plant protection systems focus primarily on pest identification while lacking risk discrimination capability. Within a unified network framework, pest visual information and environmental temporal data are integrated through the construction of an environment-guided representation learning mechanism, a recognition–risk joint optimization strategy, and a risk-aware decision representation modeling structure. In this manner, pest category recognition and occurrence risk evaluation are conducted simultaneously, thereby providing direct decision support for precision prevention and control in fruit and vegetable production. Systematic experimental evaluation is conducted based on multi-crop and multi-year field data collected from Wuyuan County, Bayannur City, Inner Mongolia. Overall comparative results demonstrate that an identification accuracy of 0.947, a precision of 0.936, and a recall of 0.924 are achieved on the test set, all of which significantly outperform mainstream visual detection models such as YOLOv8, DETR, and Mask R-CNN. In terms of detection performance, mAP@50 and mAP@75 reach 0.962 and 0.821, respectively, indicating stable localization and discrimination capability under complex backgrounds and dense small-target conditions. For the occurrence risk discrimination task, a risk accuracy of 0.887 is obtained, representing an improvement of approximately 4.5 percentage points compared with the simple multimodal feature concatenation method. Cross-crop, cross-site, and cross-year generalization experiments further show that risk accuracy remains above 0.84 with stable recognition performance under significant distribution shifts. Ablation studies verify the synergistic contributions of the proposed core modules to overall performance improvement. The results indicate that the proposed framework enables the transition from single recognition to risk-driven plant protection decision-making, providing a technically viable pathway for pest diagnosis and control strategy optimization in fruit and vegetable horticulture. Full article
30 pages, 2314 KB  
Article
Confidence-Aware Gated Multimodal Fusion for Robust Temporal Action Localization in Occluded Environments
by Masato Takami and Tomohiro Fukuda
Sensors 2026, 26(8), 2454; https://doi.org/10.3390/s26082454 - 16 Apr 2026
Abstract
In industrial environments, robust Temporal Action Localization (TAL) is essential; however, frequent occlusions often compromise the reliability of skeletal data, leading to negative transfer in multimodal fusion. To address this challenge, we propose a Gated Skeleton Refinement Module (Gated SRM), a universal front-end [...] Read more.
In industrial environments, robust Temporal Action Localization (TAL) is essential; however, frequent occlusions often compromise the reliability of skeletal data, leading to negative transfer in multimodal fusion. To address this challenge, we propose a Gated Skeleton Refinement Module (Gated SRM), a universal front-end preprocessing module that explicitly incorporates OpenPose confidence scores into the network architecture. By applying these scores as a logarithmic bias within a self-attention mechanism, our method achieves soft suppression—dynamically attenuating the attention weights assigned to unreliable joints—before adaptively fusing the refined skeletal features with RGB representations through a learnable gating network. Extensive experiments on the heavily occluded IKEA ASM dataset demonstrate that our approach effectively prevents the catastrophic accuracy degradation typical of naive and established multimodal fusion strategies, improving the mean Average Precision (mAP) to 21.77%, maintaining parity with the RGB-only baseline while demonstrating superior robustness. Furthermore, the system maintains a practical end-to-end inference speed of approximately 9.2 frames per second (FPS), which is sufficient for monitoring macro-level industrial workflows. By prioritizing confidence-based data selection over data restoration, this sensor-metadata-driven architecture offers a robust and principled approach acting as a critical fail-safe and safety-net for real-world action recognition under occlusion. Full article
40 pages, 3667 KB  
Review
Deep Learning Methods for SAR and Optical Image Fusion: A Review
by Chengyan Guo, Zhiyuan Zhang, Kexin Huang, Lan Luo, Ziqing Yang, Shuyun Shi and Junpeng Shi
Remote Sens. 2026, 18(8), 1196; https://doi.org/10.3390/rs18081196 - 16 Apr 2026
Abstract
Synthetic Aperture Radar (SAR) and optical image fusion technology plays a crucial role in remote sensing applications. It effectively combines the high spatial resolution and rich spectral information of optical images with the all-weather and penetrating observation advantages of SAR images, thereby significantly [...] Read more.
Synthetic Aperture Radar (SAR) and optical image fusion technology plays a crucial role in remote sensing applications. It effectively combines the high spatial resolution and rich spectral information of optical images with the all-weather and penetrating observation advantages of SAR images, thereby significantly enhancing image interpretation accuracy and task execution capabilities. This paper systematically reviews deep learning-based fusion methods for SAR and optical images, with a particular focus on recent advances in deep learning models. Furthermore, it summarizes commonly used evaluation metrics for assessing fusion image quality, providing a basis for comparing and analyzing the performance of different methods. In addition, commonly used SAR-optical fusion datasets are briefly reviewed to highlight their roles in algorithm development and performance evaluation. Unlike conventional review articles, this paper further analyzes the guidance and supporting role of fusion algorithms from the perspective of typical and specific applications. Finally, it identifies key challenges and issues faced by current fusion methods, including data registration, model lightweight design, and multimodal feature alignment, and offers perspectives on future research directions. This review aims to provide routes and references for the development of SAR and optical image fusion technology. Full article
Show Figures

Figure 1

40 pages, 7468 KB  
Review
Traffic Flow Prediction in Intelligent Transportation Systems: A Comprehensive Review of Graph Neural Networks and Hybrid Deep Learning Methods
by Zhenhua Wang, Xinmeng Wang, Lijun Wang, Zheng Wu, Jiangang Hu, Fujiang Yuan and Zhen Tian
Algorithms 2026, 19(4), 310; https://doi.org/10.3390/a19040310 - 16 Apr 2026
Abstract
Traffic flow prediction is a key component of Intelligent Transportation Systems (ITS), crucial for alleviating urban congestion, optimizing traffic management, and improving the overall efficiency of road networks. With the rapid growth in vehicle numbers and the increasing complexity of urban traffic patterns, [...] Read more.
Traffic flow prediction is a key component of Intelligent Transportation Systems (ITS), crucial for alleviating urban congestion, optimizing traffic management, and improving the overall efficiency of road networks. With the rapid growth in vehicle numbers and the increasing complexity of urban traffic patterns, accurate short-term traffic flow prediction has become increasingly important. This paper comprehensively reviews the latest advancements in traffic flow prediction methods, focusing on graph neural network (GNN)-based approaches and hybrid deep learning frameworks. First, we introduce the fundamental theoretical foundations, including graph neural networks, deep learning algorithms, heuristic optimization methods, and attention mechanisms. Subsequently, we summarize GNN-based prediction methods into four paradigms: (1) federated learning and privacy-preserving methods, enabling cross-regional collaboration while protecting sensitive data; (2) dynamically adaptive graph structure methods, capturing time-varying spatial dependencies; (3) multi-graph fusion and attention mechanism methods, enhancing feature representations from multiple perspectives; and (4) cross-domain technology integration methods, fusing novel architectures and interdisciplinary technologies. Furthermore, we investigate hybrid methods combining signal decomposition, heuristic optimization, and attention mechanisms with LSTM networks to address challenges related to non-stationarity and model optimization. For each category, we analyzed representative works and summarized their core innovations, strengths, and limitations using a systematic comparative table. Finally, we discussed current challenges, including computational complexity, model interpretability, and generalization ability, and outlined future research directions such as lightweight model design, uncertainty quantification, multimodal data fusion, and integration with traffic control systems. This review provides researchers and practitioners with a systematic understanding of the latest advances in traffic flow prediction and offers guidance for methodological selection and future research. Full article
Show Figures

Figure 1

26 pages, 2120 KB  
Article
CARYPAR: A Multimodal Decision-Support Framework Integrating Satellite Bio-Environmental Reanalysis and Proximal Edge-Intelligence for Hylocereus spp. Health Monitoring
by Carlos Diego Rodríguez-Yparraguirre, Abel José Rodríguez-Yparraguirre, Cesar Moreno-Rojo, Wendy Akemmy Castañeda-Rodríguez, Iván Martin Olivares-Espino, Andrés David Epifania-Huerta, María Adriana Vilchez-Reyes, Dany Paul Gonzales-Romero, Enrique Jannier Boy-Vásquez and Wilson Arcenio Maco-Vasquez
Sustainability 2026, 18(8), 3928; https://doi.org/10.3390/su18083928 - 15 Apr 2026
Abstract
Pitahaya (Hylocereus spp.) production is increasingly affected by climatic factors, as well as by phytopathogens and abiotic stress, leading to delays in agronomic interventions and reduced productivity. The objective was to design, implement, and validate a multimodal system (CARYPAR) that enables early [...] Read more.
Pitahaya (Hylocereus spp.) production is increasingly affected by climatic factors, as well as by phytopathogens and abiotic stress, leading to delays in agronomic interventions and reduced productivity. The objective was to design, implement, and validate a multimodal system (CARYPAR) that enables early disease detection and agile decision-making, characterized by low latency and reduced dependence on cloud connectivity. The methodology integrates climate reanalysis from NASA POWER, biophysical remote sensing variables derived from Sentinel-1/2, and proximal computer vision captured via mobile devices using a late fusion architecture and an optimized convolutional neural network, EfficientNet-V2B0, which discriminates between optimal and pathological conditions in vegetative tissues and fruit. The results of the experimental validation carried out in 160 georeferenced units achieved an overall accuracy of 80.0% and an F1 score of 0.8645 for Bad Fruit. The McNemar test and the operational agreement with agro-industrial experts yielded a Cohen’s Kappa index of κ = 0.6831, with an inference latency reduced to 22.00 ms. It is concluded that the multimodal integration of satellite bio-environmental data with edge computer vision achieves substantial agreement with agronomic expert judgment under heterogeneous field conditions (Cohen’s κ = 0.6831), supporting its role as a decision-support tool rather than a replacement for expert assessment. Therefore, its adoption can enhance real-time irrigation management and crop protection, while contributing to traceability and sustainable resource management in agricultural regions with limited connectivity. Full article
(This article belongs to the Section Sustainable Agriculture)
30 pages, 711 KB  
Article
Artificial Intelligence-Driven Multimodal Sensor Fusion for Complex Market Systems via Federated Transformer-Based Learning
by Lei Shi, Mingran Tian, Yinfei Yi, Xinyi Hu, Xiaoya Wang, Yating Yang and Manzhou Li
Sensors 2026, 26(8), 2418; https://doi.org/10.3390/s26082418 - 15 Apr 2026
Abstract
In highly digitalized and networked modern trading systems, large volumes of heterogeneous data are continuously generated from multiple sources during market operations. However, due to the complexity of data structures, significant differences in temporal scales, and constraints imposed by data privacy protection, traditional [...] Read more.
In highly digitalized and networked modern trading systems, large volumes of heterogeneous data are continuously generated from multiple sources during market operations. However, due to the complexity of data structures, significant differences in temporal scales, and constraints imposed by data privacy protection, traditional single-source modeling approaches are unable to fully exploit multisource information. To address this issue, a federated multimodal prediction framework for complex market systems, termed Federated Market-Sensor Transformer (FMST), is proposed. In this framework, data originating from different information sources are uniformly modeled as multimodal time series. A multimodal market-sensor representation module is constructed to perform unified feature encoding, and a cross-modal Transformer fusion architecture is employed to characterize dynamic interaction relationships among different information sources. Meanwhile, a federated collaborative learning mechanism is introduced during the training phase, enabling multiple data nodes to perform collaborative model optimization without sharing raw data. In this manner, data privacy can be preserved while improving the cross-region generalization capability of the model. Systematic experimental evaluation is conducted on the constructed multimodal market-sensor dataset. The experimental results demonstrate that the proposed method consistently outperforms traditional statistical models and deep learning approaches across multiple evaluation metrics. In the main prediction experiment, FMST achieves a root mean square error (RMSE) of 0.1136, a mean absolute error (MAE) of 0.0832, and a coefficient of determination R2 of 0.8517, while the direction prediction accuracy reaches 74.56%, clearly outperforming baseline models including ARIMA, LSTM, Temporal CNN, Transformer, and FedAvg-LSTM. In the cross-region generalization experiment, FMST maintains strong performance, achieving an RMSE of 0.1242, an MAE of 0.0908, an R2 value of 0.8261, and a direction prediction accuracy of 72.48%. The ablation study further indicates that the three core components—multimodal market-sensor representation, cross-modal Transformer fusion, and federated collaborative learning—each make important contributions to the overall model performance. These experimental findings demonstrate that the proposed method can effectively integrate multisource market information and significantly enhance the prediction capability for complex market dynamics, providing a new technical pathway for the application of artificial intelligence-driven multimodal sensing systems in economic data analysis. Full article
(This article belongs to the Special Issue Artificial Intelligence-Driven Sensing)
35 pages, 1113 KB  
Article
Intelligent UAV-UGV-SN Systems for Monitoring and Avoiding Wildfires in Context of Sustainable Development of Smart Regions
by Dmytro Korniienko, Nazar Serhiichuk, Vyacheslav Kharchenko, Herman Fesenko, Jose Borges and Nikolaos Bardis
Sustainability 2026, 18(8), 3908; https://doi.org/10.3390/su18083908 - 15 Apr 2026
Abstract
Advancing environmental monitoring through coordinated autonomous systems is central to sustainable smart region governance and data-driven territorial management. The article presents an engineering-oriented architecture and deployment methodology for an integrated wildfire monitoring and response system that combines unmanned aerial vehicles (UAVs), unmanned ground [...] Read more.
Advancing environmental monitoring through coordinated autonomous systems is central to sustainable smart region governance and data-driven territorial management. The article presents an engineering-oriented architecture and deployment methodology for an integrated wildfire monitoring and response system that combines unmanned aerial vehicles (UAVs), unmanned ground vehicles (UGVs), and stationary sensor networks (SNs). We formalise hub-and-spoke infrastructure placement as a mixed-integer optimisation problem that accounts for platform types, endurance, travel times and logistical constraints, and propose a practical pre-processing pipeline (confidence scoring, resampling, Kalman/median filtering, strategy fusion) for heterogeneous telemetry and imagery. The system couples multimodal neural network processing (image backbones, clustering and time-series models) with online resource-allocation and mission-planning mechanisms to prioritise UAV/UGV sorties and dynamically select launch sites. The article describes scenario-driven operational modes (early warning, alarm verification, autonomous local extinguishing, post-fire recovery, sensor-gap compensation, and inter-hub reinforcement), defines validation protocols (synthetic experiments, precision/recall/F1, and hardware-in-the-loop testing), and proposes KPIs to assess environmental, social, and economic impacts for smart regions. The contribution is a reproducible, deployment-focused blueprint that bridges conceptual UAV–UGV–SN research and practical implementation, highlighting trade-offs in reliability, communication redundancy, and sustainability, and outlining directions for simulation, field pilots and algorithmic refinement. Full article
38 pages, 6558 KB  
Article
Multimodal Sensor Fusion and Temporal Deep Learning for Computer Numerical Control Toolpath and Condition Classification: A Cross-Validated Ablation Study
by Stephen S. Eacuello, Romesh S. Prasad and Manbir S. Sodhi
Sensors 2026, 26(8), 2405; https://doi.org/10.3390/s26082405 - 14 Apr 2026
Viewed by 306
Abstract
Classifying which operation a Computer Numerical Control (CNC) machine is executing, not just detecting whether it is functioning correctly, is a monitoring challenge that existing sensor-based studies rarely address. Unlike tool wear estimation, operation-type classification must resolve toolpath strategies and cutting conditions within [...] Read more.
Classifying which operation a Computer Numerical Control (CNC) machine is executing, not just detecting whether it is functioning correctly, is a monitoring challenge that existing sensor-based studies rarely address. Unlike tool wear estimation, operation-type classification must resolve toolpath strategies and cutting conditions within heterogeneous, noisy sensor streams in which modalities differ widely in their discriminative value. Which sensors are genuinely necessary, and how many can be removed before performance degrades, directly informs retrofit cost and monitoring system design. We present a systematic cross-validated ablation study for a nine-class CNC toolpath and condition classification task, using 120 operation files collected from a desktop CNC mill instrumented with six distributed sensor units spanning inertial, acoustic, environmental, and electrical modalities. To handle multimodal fusion under sensor noise, we introduce the Multimodal Denoising Temporal Attention Encoder with Long Short-Term Memory (MM-DTAE-LSTM), which combines learned modality weighting, cross-modal attention, and a self-supervised denoising objective, followed by recurrent temporal modeling for classification. We evaluate MM-DTAE-LSTM against five baseline model families across five cumulative sensor-ablation levels and ten temporal resolutions, using file-level cross-validation to prevent data leakage from overlapping windows. MM-DTAE-LSTM maintains 92.5% classification accuracy when nearly half the sensor channels are removed (56 of 110 features), whereas simpler baselines degrade by up to 10.7 percentage points under the same reduction. Analysis of variance reveals that pressure channels encode session-level atmospheric variation rather than machining dynamics, exposing how models that cannot suppress uninformative modalities rely on environmental confounds rather than machining physics. Together, these findings translate into concrete sensor-selection and deployment recommendations for cost-effective CNC process monitoring at under USD 500 in hardware, though generalization to industrial machines, diverse materials, and production environments requires further validation. Full article
(This article belongs to the Special Issue Sensors and IoT Technologies for the Smart Industry)
Show Figures

Figure 1

38 pages, 588 KB  
Review
A Unified Information Bottleneck Framework for Multimodal Biomedical Machine Learning
by Liang Dong
Entropy 2026, 28(4), 445; https://doi.org/10.3390/e28040445 - 14 Apr 2026
Viewed by 111
Abstract
Multimodal biomedical machine learning increasingly integrates heterogeneous data sources (including medical imaging, multi-omics profiles, electronic health records, and wearable sensor signals) to support clinical diagnosis, prognosis, and treatment response prediction. Despite strong empirical performance, most existing multimodal systems lack a principled theoretical foundation [...] Read more.
Multimodal biomedical machine learning increasingly integrates heterogeneous data sources (including medical imaging, multi-omics profiles, electronic health records, and wearable sensor signals) to support clinical diagnosis, prognosis, and treatment response prediction. Despite strong empirical performance, most existing multimodal systems lack a principled theoretical foundation for understanding why fusion improves prediction, how information is distributed across modalities, and when models can be trusted under incomplete or shifting data. This paper develops a unified information-theoretic framework that formalizes multimodal biomedical learning as an information optimization problem. We formulate multimodal representation learning through the information bottleneck principle, deriving a variational objective that balances predictive sufficiency against informational compression in an architecture-agnostic manner. Building on this foundation, we introduce information-theoretic tools for decomposing modality contributions via conditional mutual information, quantifying redundancy and synergy, and diagnosing fusion collapse. We further show that robustness to missing modalities can be cast as an information consistency problem and extend the framework to longitudinal disease modeling through transfer entropy and sequential information bottleneck objectives. Applications to multimodal foundation models, uncertainty quantification, calibration, and out-of-distribution detection are developed. Empirical case studies across three biomedical datasets (TCGA breast cancer multi-omics, TCGA glioma clinical-plus-molecular data, and OASIS-2 longitudinal Alzheimer’s data) show that the framework’s key quantities are computable and interpretable on real data: MI decomposition identifies modality dominance and redundancy; the VMIB traces a compression–prediction tradeoff in the information plane; entropy-based selective prediction raises accuracy from 0.787 to 0.939 at 50% coverage; transfer entropy reveals stage-dependent modality influence in disease progression; and pretraining/adaptation diagnostics distinguish efficient from wasteful fine-tuning strategies. Together, these results develop entropy and mutual information as organizing principles for the design, analysis, and evaluation of multimodal biomedical AI systems. Full article
31 pages, 6244 KB  
Article
Physics-Driven Multi-Modal Fusion for SAR Ship Detection Under Motion Defocusing
by Xinmei Qiang, Ze Yu, Xianxun Yao, Dongxu Li, Ruijuan Deng, Na Pu and Shengjie Zhong
Remote Sens. 2026, 18(8), 1166; https://doi.org/10.3390/rs18081166 - 14 Apr 2026
Viewed by 210
Abstract
Synthetic aperture radar (SAR) ship detection is severely limited by the artifacts caused by motion. Due to the complex six-degree-of-freedom (6-DOF) motion of ships, the ship imaging exhibits aberration phenomena including spatial blurring, discrete ghosting, and Lorentz linear blurring. Traditional detectors rely on [...] Read more.
Synthetic aperture radar (SAR) ship detection is severely limited by the artifacts caused by motion. Due to the complex six-degree-of-freedom (6-DOF) motion of ships, the ship imaging exhibits aberration phenomena including spatial blurring, discrete ghosting, and Lorentz linear blurring. Traditional detectors rely on the identification of static spatial features. When the phase coherence is disrupted, they tend to fail. To overcome this problem, we propose a multimodal fusion framework based on physical principles. This framework establishes a theoretical connection between the ship hydrodynamic response and imaging degradation through short, long, and ultra-long coherence processing intervals (CPI). The framework adopts a cascaded architecture: first, a lightweight YOLOv8 performs rapid global screening, followed by a signal backtracking mechanism that extracts high-fidelity time-frequency domain (TFD) and range instantaneous Doppler (RID) features from the original distance compressed data. In the second-level detection, these physical features are adaptively fused with spatial intensity through a YOLOv8 network integrated with the convolutional block attention module (CBAM) to reduce the false detection rate. The validation on high-fidelity simulations and real GF-3 datasets shows that this method consistently achieves an average precision (mAP) of over 95%, outperforming several widely used detectors, and demonstrates strong generalization ability in extreme imaging conditions, suitable for maritime detection scenarios. Full article
(This article belongs to the Special Issue Ship Imaging, Detection and Recognition for High-Resolution SAR)
Show Figures

Figure 1

26 pages, 4138 KB  
Article
Self-Supervised Cascade Denoising Auto-Encoder for Accurate Spatial Positioning of Target by Fusing Uncalibrated Video and Low-Cost GNSS
by Xiaofei Zeng, Ruliang He, Songchen Han, Wei Li, Menglong Yang and Binbin Liang
Remote Sens. 2026, 18(8), 1161; https://doi.org/10.3390/rs18081161 - 13 Apr 2026
Viewed by 278
Abstract
Accurate measurement of the spatial position of targets in a fixed camera is critical in remote sensing applications. Visual spatial positioning methods that rely solely on images are susceptible to adverse factors such as inaccurate camera calibration, imprecise image target detection, and incorrect [...] Read more.
Accurate measurement of the spatial position of targets in a fixed camera is critical in remote sensing applications. Visual spatial positioning methods that rely solely on images are susceptible to adverse factors such as inaccurate camera calibration, imprecise image target detection, and incorrect feature point selection. Complementary to images, the ubiquitous Global Navigation Satellite System (GNSS) data can provide spatial positions of targets, but most of them are low-cost GNSSs with significant positioning noise. In order to fuse these two valuable but flawed positioning measurements to improve the accuracy and stability of spatial positioning, we propose a deep learning multi-modal spatial positioning method by fusing sequential uncalibrated video images and low-cost GNSSs. Firstly, a self-supervised cascade denoising auto-encoder (SCDAE) architecture is built to endow the auto-encoder with robustness to noise in the raw inputs. Then, based on the SCDAE and Bayesian optimal estimation, a Bayesian self-supervised multi-modal fusion positioning method SCDAE-MFP is presented to achieve accurate and stable spatial positioning by self-supervised manifold learning. Specifically, to provide visual self-supervision to the SCDAE-MFP, a visual position denoising auto-encoder module based on dual unsupervised learning is proposed. Extensive experimental results on public datasets showed that SCDAE-MFP outperformed five other classical and state-of-the-art baseline methods by an average of 56.79% in reducing positioning errors. Full article
(This article belongs to the Special Issue GNSS and Multi-Sensor Integrated Precise Positioning and Applications)
Show Figures

Figure 1

31 pages, 13700 KB  
Article
A Framework for Winter Wheat Soil Moisture Retrieval Based on UAV Remote Sensing and AutoML
by Daokuan Zhong, Caixia Li, Shenglin Li, James E. Kanneh, Pengyuan Zhu, Hao Liu, Ni Song, Huifeng Ning and Chitao Sun
Remote Sens. 2026, 18(8), 1147; https://doi.org/10.3390/rs18081147 - 12 Apr 2026
Viewed by 309
Abstract
Soil moisture content (SMC) is a critical factor in agricultural management; however, traditional monitoring methods face limitations regarding spatial resolution and the acquisition of regional dynamics. Unmanned Aerial Vehicle (UAV) remote sensing offers new opportunities for precision monitoring. This study proposes a UAV-based [...] Read more.
Soil moisture content (SMC) is a critical factor in agricultural management; however, traditional monitoring methods face limitations regarding spatial resolution and the acquisition of regional dynamics. Unmanned Aerial Vehicle (UAV) remote sensing offers new opportunities for precision monitoring. This study proposes a UAV-based multi-modal remote sensing method for soil moisture estimation. Specifically, novel dual-band and three-band hyperspectral (HS) indices were constructed, and visible (RGB) and thermal infrared (TIR) information were integrated to form a multi-modal data system; simultaneously, multi-modal estimation models were developed by combining four AutoML methods: TPOT, AutoGluon, H2O AutoML, and FLAML. The results indicate that the H2O AutoML model, fusing multi-modal data, exhibited the best performance in estimating soil moisture at depths of 0–20 cm and 20–40 cm (R ≥ 0.72, RMSE 1.99–2.17%), demonstrating superior stability and generalization capabilities compared to other models. This study has made progress in hyperspectral index construction, multi-modal fusion, and soil moisture retrieval, providing a new technical approach for the refined management of agricultural water resources. Full article
(This article belongs to the Special Issue Near Real-Time (NRT) Agriculture Monitoring)
Show Figures

Figure 1

Back to TopTop