Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,306)

Search Parameters:
Keywords = sensing modalities

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 2941 KB  
Article
Hybrid Drift-Flux and Deep Learning Framework for Accurate Multiphase Flowrate Prediction via Multi-Modal ERT/ECT Fusion in Horizontal Wells
by Qingsheng Zhang, Fei Xu, Jianxiong Li, Xiaomin Liu, Aihua Liu and Xiuwu Wang
Processes 2026, 14(13), 2054; https://doi.org/10.3390/pr14132054 (registering DOI) - 24 Jun 2026
Abstract
Accurate multiphase flow measurement in horizontal wells is fundamentally challenged by the antagonistic electrical responses of water and gas: Electrical Resistance Tomography (ERT) loses sensitivity to thin liquid films, while Electrical Capacitance Tomography (ECT) suffers signal saturation in conductive water, preventing either modality [...] Read more.
Accurate multiphase flow measurement in horizontal wells is fundamentally challenged by the antagonistic electrical responses of water and gas: Electrical Resistance Tomography (ERT) loses sensitivity to thin liquid films, while Electrical Capacitance Tomography (ECT) suffers signal saturation in conductive water, preventing either modality from covering the full operating envelope alone. This study proposes a physics-guided hybrid modeling framework that integrates multi-modal ERT/ECT sensing to achieve high-precision flowrate inversion. The framework utilizes a corrected multi-modal fusion algorithm, achieving a liquid holdup MAPE of 2.5 ± 0.5% representing a nearly two-fold improvement over the best single-modality system (Direct ERT, 4.5%). For velocity estimation, an optimized cross-correlation method yields results with ± 3.0% error, incorporating multi-sensor and multi-sequence fusion. A key finding is that deep neural networks exhibit Architectural Phase Specialization: multi-branch architectures (MB-DNN) perform strongly on localized, heterogeneous liquid structures (2.0% liquid error), whereas fully-connected architectures (FC-DNN) excel at capturing the global patterns of the continuous gas core (1.2% gas error). By hybridizing a calibrated drift-flux physical model with these phase-specialized DNNs, the framework achieves overall averaged errors of 1.8% for gas and 1.5% for liquid across the full experimental envelope. The proposed framework was evaluated on 444,313 experimental samples and subsequently validated in a three-month industrial trial at the Puguang gas field under extreme conditions (26 MPa, 80 °C), where it maintained a prediction error of ± 2.3%. This work establishes a scalable, physically consistent paradigm for intelligent hydrocarbon production monitoring. Full article
(This article belongs to the Topic Petroleum and Gas Engineering, 2nd edition)
Show Figures

Figure 1

24 pages, 10002 KB  
Article
A Wireless Analog Interface with Near Frame-Accurate Synchronization for Optical Motion Capture
by Taylor M. Pierce, Emerson Noble, Lucas Davis, Jesus Wilkins and Kenneth J. Loh
Electronics 2026, 15(13), 2787; https://doi.org/10.3390/electronics15132787 (registering DOI) - 24 Jun 2026
Abstract
Human kinematic analysis is an increasingly important tool in biomechanics, human performance, and wearable sensing research. Many emerging sensing modalities utilize custom sensors requiring accurate temporal alignment with ground-truth biomechanical movement data. Optical motion capture systems provide high-fidelity kinematic measurements but operate as [...] Read more.
Human kinematic analysis is an increasingly important tool in biomechanics, human performance, and wearable sensing research. Many emerging sensing modalities utilize custom sensors requiring accurate temporal alignment with ground-truth biomechanical movement data. Optical motion capture systems provide high-fidelity kinematic measurements but operate as closed, self-contained systems, making time synchronization with external sensor data non-trivial, particularly in wireless and mobile contexts. This work presents a wireless analog interface system built using commercially available components that enables alignment between analog sensor data (e.g., from custom wearables and Internet-of-Things devices) and a commercial motion capture system. The proposed architecture consists of a wearable data acquisition node and a receiver node interfaced directly with an optical motion capture system, allowing synchronized recording of analog sensor signals alongside kinematic data. Notably, the system reconstructs signals into the commercial hardware interface rather than relying on triggers or sync outputs, resulting in a single data file containing kinematics and sensor readings. Benchtop testing demonstrated a mean end-to-end frame delay of ~6 ms, with 95% of the sample exhibiting delay within 15 ms. Accounting for the typical offset, this leaves a standard deviation of 4 ms, within one motion capture frame of the true timestamp (at 100 Hz). Voltage reconstruction accuracy was within 30 mV across the tested conditions, with gain compression below 2.7%. Adjacent channel crosstalk remained below −83 dB across all test conditions. The use of commercial off-the-shelf components supports replication and adaptation by other research groups and integration with different optical motion capture systems. Full article
Show Figures

Figure 1

55 pages, 1767 KB  
Review
Three-Dimensional Reconstruction and Real-Time Deformation of Flexible Bodies: A Scoping Review (2009–2025)
by Silvia Zisu and Silviu Butnariu
Sensors 2026, 26(13), 4007; https://doi.org/10.3390/s26134007 (registering DOI) - 24 Jun 2026
Abstract
Following the PRISMA-ScR framework for scoping reviews, we systematically searched five databases (Scopus, IEEE Xplore, ScienceDirect, SpringerLink, Web of Science) using a Boolean query combining real-time processing, 3D reconstruction, and deformation modelling terms. From 86 records identified, 56 peer-reviewed publications (2009–2025) were retained [...] Read more.
Following the PRISMA-ScR framework for scoping reviews, we systematically searched five databases (Scopus, IEEE Xplore, ScienceDirect, SpringerLink, Web of Science) using a Boolean query combining real-time processing, 3D reconstruction, and deformation modelling terms. From 86 records identified, 56 peer-reviewed publications (2009–2025) were retained after two-stage screening and organized into a unified taxonomy covering sensing modalities (RGB-D, LiDAR, tactile), reconstruction pipelines (volumetric fusion, NRSfM, neural radiance fields), and deformation models (FEM, PBD, mass-spring, GNN-based surrogates, differentiable simulators). Of the 56 included works, 60% were published between 2022 and 2025, confirming the field’s rapid growth. Neural and implicit representations account for 20% of contributions, FEM-based methods for 16%, and hybrid or application-specific pipelines for 21%. Four systemic gaps emerge: the absence of a unified physics-aware benchmark; unresolved speed–accuracy trade-offs (PBD achieves >30 FPS on desktop GPUs for 103–104 vertex meshes but lacks mapping to physical material constants (Young’s modulus, Poisson’s ratio), limiting material fidelity; full-order FEM ensures physically consistent stress–strain behavior but runs at only 1–10 FPS without order reduction; reduced-order FEM recovers interactive rates for low-frequency deformation modes); fragile handling of occlusions and multi-object contact; and limited end-to-end integration of sensing and simulation. The findings support the presentation of a research roadmap centered on model order reduction, differentiable physics, multimodal sensing fusion, and standardized evaluation protocols, with implications for robust digital twins of deformable environments. Full article
(This article belongs to the Special Issue Recent Progress in 3D Computer Vision and Robotics)
34 pages, 2325 KB  
Article
Attention-Based Multimodal Framework for Athlete-Performance Analysis and Rehabilitation Monitoring Using Vision and Wearable Sensors
by Mohammed Alonazi, Iqra Aijaz Abro, Maha Abdelhaq, Raed Alsaqour, Ahmad Jalal and Hui Liu
Bioengineering 2026, 13(7), 718; https://doi.org/10.3390/bioengineering13070718 (registering DOI) - 23 Jun 2026
Abstract
Background: Advances in monitoring systems featuring wearable sensors, computer vision, and artificial intelligence (AI) have been increasingly used in sports science and rehabilitation practices as a means of movement pattern analysis, injury prevention, and training optimization. These technologies are becoming essential components of [...] Read more.
Background: Advances in monitoring systems featuring wearable sensors, computer vision, and artificial intelligence (AI) have been increasingly used in sports science and rehabilitation practices as a means of movement pattern analysis, injury prevention, and training optimization. These technologies are becoming essential components of athlete-performance analysis and rehabilitation-monitoring systems designed to support biomechanical assessment, athlete development, and movement-quality evaluation. Athlete-performance analysis and rehabilitation monitoring increasingly rely on intelligent multimodal sensing systems capable of continuously evaluating movement quality, biomechanical patterns, training execution, and recovery progress. Human activity recognition (HAR) serves as a key enabling technology for these applications by providing automated assessment of human movement using wearable and vision-based sensing modalities. Therefore, the purpose of this study was to develop and evaluate an attention-based multimodal framework that integrates wearable inertial sensing and RGB video analysis for robust athlete-performance assessment and rehabilitation monitoring through accurate recognition of human movement patterns. Methods: Athlete-performance analysis and rehabilitation monitoring combining inertial sensor data and RGB-based visual information was introduced. Inertial signals were segmented with adaptive windowing, whereas silhouette refinement was performed to analyze motion structures from visual inputs in support of athlete-performance analysis and rehabilitation monitoring. Temporal, spatial, and motion features such as trajectory, orientation, and skeleton-based space-time representations were calculated from multimodal inputs. The proposed framework was designed to capture complex movement dynamics associated with rehabilitation exercises and sports-related motion patterns across heterogeneous sensing environments. Extracted features were then combined and optimized with a multimodal feature fusion approach, while the Ranger optimization algorithm was utilized during the process. An attention-based deep learning classifier was implemented to classify movement activities. Results: The results showed that the proposed framework reached accuracy scores of 88.40% and 87.96% on the VIDIMU dataset and the UTD-MHAD dataset respectively. Recognition performance across both inertial and vision-based modalities provided greater robustness than single-modality solutions. The integration of wearable sensing and computer vision modalities further improved the ability of the framework to analyze complex movement behaviors under varying execution conditions and environmental variations. Conclusion: The proposed multimodal framework provides a foundation for intelligent athlete-performance and rehabilitation-monitoring systems by integrating wearable sensing, computer vision, and attention-based artificial intelligence for robust movement analysis. The findings highlight its potential to support biomechanical assessment, movement-quality evaluation, training-performance monitoring, rehabilitation tracking, and injury-risk management in modern sports and healthcare environments. Full article
11 pages, 3829 KB  
Article
Predictors of Diagnostic Yield in Shape-Sensing Robotic-Assisted Bronchoscopy (ssRAB): A Retrospective Single-Center Study
by Hruy Menghesha, Jan Arensmeyer, Philipp Feodorovici, Mark Coburn, Dirk Skowasch, Tatjana Dell, Julian Luetkens, Joachim Schmidt and Donatas Zalepugas
Diagnostics 2026, 16(13), 1954; https://doi.org/10.3390/diagnostics16131954 (registering DOI) - 23 Jun 2026
Abstract
Background/Objectives: Robotic-assisted bronchoscopy has emerged as an advanced technique for the evaluation of peripheral pulmonary lesions, offering improved navigation and targeting accuracy. While several studies investigating other diagnostic modalities have identified factors associated with higher diagnostic yield, such determinants remain poorly defined for [...] Read more.
Background/Objectives: Robotic-assisted bronchoscopy has emerged as an advanced technique for the evaluation of peripheral pulmonary lesions, offering improved navigation and targeting accuracy. While several studies investigating other diagnostic modalities have identified factors associated with higher diagnostic yield, such determinants remain poorly defined for shape-sensing robotic-assisted bronchoscopy (ssRAB). This study therefore aimed to identify predictors of diagnostic yield in robotic bronchoscopy. Methods: This retrospective single-center study included all consecutive patients who underwent ssRAB (IONTM system, Intuitive Surgical, Sunnyvale, CA, USA) between August 2024 and March 2026. Lung nodules undergoing marker placement only or procedures performed without cone-beam CT (CBCT) guidance were excluded. Collected variables included demographic characteristics, lesion size, lesion density (solid, part-solid, ground-glass), biopsy modality, and number of biopsy samples obtained. Diagnostic yield was defined as a definitive pathological diagnosis of the target lesion. Predictors of diagnostic success were assessed using univariable logistic regression. Results: In total, 111 pulmonary nodules were included in the analysis. The overall diagnostic yield was 88.3% (98/111). The mean patient age was 64.94 ± 7.9 years, with a predominance of female patients (58.4%). No significant associations were observed between diagnostic yield and lesion size (odds ratio [OR] 1.014 per mm; p = 0.764), lesion density (p = 0.892), or biopsy instrument (p = 0.835). However, an increased number of biopsy samples showed a positive association with diagnostic yield, showing a statistical trend (OR 1.22 per additional sample; p = 0.084). Conclusions: Robotic-assisted bronchoscopy provides a high diagnostic yield for peripheral pulmonary lesions. The number of biopsy samples appears to be the most relevant modifiable factor influencing diagnostic success, underscoring the importance of adequate tissue acquisition. In contrast, lesion characteristics and biopsy modality did not significantly affect outcomes in this cohort. Full article
(This article belongs to the Section Biomedical Optics)
Show Figures

Figure 1

26 pages, 12724 KB  
Article
A Hierarchical Semantic Consistency Constraint Framework for Hyperspectral and LiDAR Data Joint Classification
by Jie Shen, Yimeng Ma and Houqun Yang
Remote Sens. 2026, 18(12), 2058; https://doi.org/10.3390/rs18122058 (registering DOI) - 22 Jun 2026
Abstract
Hyperspectral image (HSI) and LiDAR data fusion is valuable for land-cover classification in complex surface scenes. Existing methods typically extract features from each modality independently and then consider how to fuse them, ignoring the semantic consistency between features of different modalities and across [...] Read more.
Hyperspectral image (HSI) and LiDAR data fusion is valuable for land-cover classification in complex surface scenes. Existing methods typically extract features from each modality independently and then consider how to fuse them, ignoring the semantic consistency between features of different modalities and across different hierarchical levels. Moreover, fully mining and exploiting the complementary information between multimodal remote sensing data remains a critical issue. To address these challenges, this paper proposes a hierarchical semantic consistency constraint (HSCC) framework for HSI and LiDAR data joint classification. The framework is co-constructed by a progressive interactive fusion network (PIFNet) and a semantic consistency constraint (SCC) strategy. Specifically, PIFNet progressively calibrates the semantic representations of multimodal features at different abstraction levels through Cross-Modal Shared Attention and Symmetric Cross-Attention mechanisms, promoting information parity in deep interactions. The SCC strategy establishes multi-level semantic associations and employs a semantic consistency constraint loss to guide the network to autonomously maintain the consistency of the same land-cover object across heterogeneous feature representations, thereby further enhancing the discriminative power of the fused features. Experiments on three public datasets, MUUFL, Houston2013, and Augsburg, demonstrate that HSCC outperforms current state-of-the-art methods, validating its effectiveness in multi-source remote sensing data fusion classification tasks. Full article
Show Figures

Figure 1

21 pages, 1456 KB  
Article
A Camera-Based Multimodal Defect Sensing Framework for Substation Equipment Monitoring via Cross-Modal Feature Mapping
by Ziquan Liu, Hai Xue, Chengbo Hu, Chao Wei and Can Zhang
Sensors 2026, 26(12), 3935; https://doi.org/10.3390/s26123935 (registering DOI) - 21 Jun 2026
Viewed by 126
Abstract
To address the limitations of vision-only defect detection, image–semantic misalignment, and spatial-logic conflicts in complex substation inspection scenarios, this paper proposes a camera-sensor-based multimodal defect sensing framework with cross-modal feature mapping for substation equipment monitoring. The proposed framework integrates field inspection images acquired [...] Read more.
To address the limitations of vision-only defect detection, image–semantic misalignment, and spatial-logic conflicts in complex substation inspection scenarios, this paper proposes a camera-sensor-based multimodal defect sensing framework with cross-modal feature mapping for substation equipment monitoring. The proposed framework integrates field inspection images acquired by camera sensors, defect textual descriptions, and equipment topology knowledge and establishes a unified domain-adaptive pre-training–bidirectional cross-modal mapping–hierarchical reasoning workflow. First, a Contrastive Language–Image Pre-training (CLIP)-based domain-adaptive pre-training strategy is developed to enhance the representation of equipment categories, defect attributes, and inspection-scene semantics. Second, a bidirectional cross-modal feature mapping network is constructed to model fine-grained interactions between candidate visual regions and textual semantics, where uncertainty-aware fusion and prototype constraints are introduced to improve semantic alignment and defect discrimination. Third, a hierarchical neuro-symbolic reasoning module incorporates equipment topology and spatial rules for posterior verification, logical consistency checking, and false-positive suppression. Experiments on a substation inspection image dataset demonstrate that the proposed method achieves 90.8% mAP@0.5, 68.7% mAP@0.5:0.95, and 89.4% F1-score, outperforming mainstream and recent detection models. Full article
28 pages, 7428 KB  
Article
A New Multi-Modal Data Fusion Framework for Delamination Detection in Concrete Bridge Decks
by Maria Rashidi, Shayan Ghazimoghadam, Vahid Mousavi, Sattar Dorafshan and Behruz Bozorg
Sensors 2026, 26(12), 3926; https://doi.org/10.3390/s26123926 (registering DOI) - 20 Jun 2026
Viewed by 258
Abstract
Bridge decks are continuously subjected to high environmental exposure, traffic loading, and material aging, leading to progressive delamination which can negatively affect structural integrity and public safety. More specifically, subsurface delamination of concrete and corroded steel reinforcement must be repaired to keep the [...] Read more.
Bridge decks are continuously subjected to high environmental exposure, traffic loading, and material aging, leading to progressive delamination which can negatively affect structural integrity and public safety. More specifically, subsurface delamination of concrete and corroded steel reinforcement must be repaired to keep the decks operational. Among non-destructive evaluation techniques, Ground-Penetrating Radar (GPR) and Infrared Thermography (IRT) offer complementary capabilities for detecting subsurface and near-surface defects; however, effective GPR-IRT data fusion remains challenging due to fundamental differences in sensing principles, spatial resolution and sensitivity. This study introduces a Physics-Enhanced Multi-Modal Fusion (PE-MMF) framework that integrates GPR and IRT data to improve delamination detection in reinforced concrete bridge decks. The proposed approach leverages transfer learning, cross-modal attention mechanisms, and gated fusion to enable robust learning from heterogeneous sensor inputs. Furthermore, a systematic feature selection protocol is integrated to identify physically meaningful indicators that remain consistent across different bridges, enhancing generalization capability. The framework is trained and validated using the publicly available SDNET2021 dataset, comprising co-registered GPR and IRT measurements from five in-service bridge decks with verified delamination ground truth. Results demonstrate substantial performance improvements, with average F1-score gains of up to 55% over IRT-based methods and 25% over GPR-based methods across all tested bridges. Comparative analysis against state-of-the-art methods confirmed the superior generalization capability of the proposed multi-modal approach over single-modality approaches. The findings highlight the potential of deep learning-based sensor fusion as a scalable and data-efficient decision-support tool to prioritize regions for detailed physical investigation during long-term infrastructure monitoring. Full article
(This article belongs to the Special Issue Intelligent Remote Sensing for Urban Building Health Assessment)
Show Figures

Figure 1

32 pages, 2698 KB  
Review
Integrating Artificial Intelligence with Wearable Sensors for Advanced Health Monitoring and Diagnosis
by Dongyoun Kim, Syed Saad Ahmed, Amirhossein Amjad, Kwanghee Won and Xiaojun Xian
Biosensors 2026, 16(6), 344; https://doi.org/10.3390/bios16060344 (registering DOI) - 18 Jun 2026
Viewed by 343
Abstract
Wearable healthcare technologies are transforming the healthcare landscape by enabling remote, real-time health data collection, supporting early diagnosis, personalizing treatment plans, and reducing healthcare costs and medical burdens. Central to these advancements are wearable sensors, which continuously capture physiological data such as heart [...] Read more.
Wearable healthcare technologies are transforming the healthcare landscape by enabling remote, real-time health data collection, supporting early diagnosis, personalizing treatment plans, and reducing healthcare costs and medical burdens. Central to these advancements are wearable sensors, which continuously capture physiological data such as heart rate, temperature, activity levels, and biomarker concentrations. However, the large volume and complexity of this data demand effective processing to extract meaningful medical insights. Artificial intelligence (AI) and machine learning (ML) have significantly enhanced the capabilities of wearable sensors by enabling advanced data analysis, pattern recognition, and predictive modeling. AI-enhanced wearable sensors can detect early signs of health issues, such as heart attacks, chronic diseases, and mental health conditions like stress, often before clinical symptoms become apparent. This review examines the integration of AI/ML models with wearable sensors across physical activity recognition, stress assessment, cardiovascular monitoring, personal exposure monitoring, and sweat biomarker detection. Unlike prior application-centered reviews, we emphasize methodological and translational evaluation by comparing task formulations, sensing modalities, dataset scale, validation protocols, performance metrics, and deployment constraints across domains. We further discuss advanced architectures, multimodal fusion, explainable AI, edge deployment, privacy and regulatory considerations, and the translational gap between research prototypes and clinically deployable wearable AI systems. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI)-Driven Biosensing)
Show Figures

Figure 1

21 pages, 6896 KB  
Article
MFD-DF: A PM2.5 Concentration Prediction Method Based on Multimodal Feature Decomposition and Dynamic Fusion
by Chen Song, Quanbo Long, Zhaobo Su, Yanchao Jiang, Li Wan, Xiankun Zhang, Tiantian Lv, Wenhu Hao and Zuxuan Shi
Atmosphere 2026, 17(6), 616; https://doi.org/10.3390/atmos17060616 (registering DOI) - 18 Jun 2026
Viewed by 111
Abstract
Accurate air pollutant concentration prediction is crucial for public health and sustainable urban development. Existing methods predominantly rely on single-modal data, resulting in inadequate representation of pollutant spatiotemporal evolution, poor prediction accuracy, and limited generalization capabilities. To address these challenges, this research proposes [...] Read more.
Accurate air pollutant concentration prediction is crucial for public health and sustainable urban development. Existing methods predominantly rely on single-modal data, resulting in inadequate representation of pollutant spatiotemporal evolution, poor prediction accuracy, and limited generalization capabilities. To address these challenges, this research proposes a novel PM2.5 prediction framework termed MFD-DF that integrates ground-station time series and satellite remote sensing images. In feature extraction, learnable decomposition and deformable convolution are introduced, and a Cross-Modal Slot Attention module explicitly decomposes features to resolve information blurring. Subsequently, a dynamic cross-modal alignment mechanism is designed alongside a learnable Time-Expansion Network (TEN) to ensure fine-grained interaction. Furthermore, a local-global attention feature fusion mechanism is proposed to optimize data integration efficacy. Experimental results demonstrate that in single-step PM2.5 prediction tasks, the proposed MFD-DF achieves significant improvements of approximately 10–20% in MAE, RMSE, and MAPE compared to state-of-the-art baselines. In multi-step PM2.5 prediction, it effectively alleviates the error accumulation problem in long-sequence forecasting, demonstrating superior robustness and accuracy. Full article
(This article belongs to the Section Air Quality)
Show Figures

Figure 1

34 pages, 2338 KB  
Review
A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey
by Wan D. Bae, Shayma Alkobaisi, Muhammad Farhan Safdar and Prachitee Chouhan
AgriEngineering 2026, 8(6), 249; https://doi.org/10.3390/agriengineering8060249 - 18 Jun 2026
Viewed by 232
Abstract
Precision agriculture increasingly relies on machine learning applied to high-resolution data acquired by unmanned aerial vehicles (UAVs) to support crop monitoring, stress detection, and yield forecasting. This survey presents a structured review of machine learning methods for UAV-enabled precision agriculture and organizes over [...] Read more.
Precision agriculture increasingly relies on machine learning applied to high-resolution data acquired by unmanned aerial vehicles (UAVs) to support crop monitoring, stress detection, and yield forecasting. This survey presents a structured review of machine learning methods for UAV-enabled precision agriculture and organizes over 100 peer-reviewed studies within a unified four-dimensional taxonomy defined by sensing modality, data type, model family, and analytical task. The taxonomy enables systematic comparison across RGB, multispectral, hyperspectral, LiDAR, and IoT data sources and across classical machine learning, deep learning, hybrid sequential models, and emerging transformer-based architectures. We analyze how modeling choices interact with data characteristics to influence robustness, cross-environment generalization, computational efficiency, and deployment feasibility on UAV and edge platforms. Recurring challenges include limited labeled data, domain shift across seasons and fields, multimodal heterogeneity, occlusion, and real-time processing constraints. We identify emerging research directions, including data-efficient learning, representation-level multimodal fusion, domain adaptation, lightweight architectures for embedded deployment, and uncertainty aware decision support. By formalizing the landscape through a unified taxonomy, this survey provides a foundation for designing scalable, robust, and deployable machine learning systems for next-generation precision agriculture. Full article
Show Figures

Graphical abstract

25 pages, 8924 KB  
Article
3D Localization of Heat Sources Using LiDAR–Thermal Data Fusion and Multisensor Calibration
by Rafał Gasz, Mateusz Pluskota and Krzysztof Schwierz
Sensors 2026, 26(12), 3876; https://doi.org/10.3390/s26123876 - 18 Jun 2026
Viewed by 227
Abstract
Integration of LiDAR and thermal sensing has become increasingly important in robotics, infrastructure diagnostics, environmental monitoring, and autonomous perception systems. LiDAR sensors provide accurate three-dimensional geometric information but do not directly capture thermal properties of observed objects, whereas thermal cameras provide temperature distributions [...] Read more.
Integration of LiDAR and thermal sensing has become increasingly important in robotics, infrastructure diagnostics, environmental monitoring, and autonomous perception systems. LiDAR sensors provide accurate three-dimensional geometric information but do not directly capture thermal properties of observed objects, whereas thermal cameras provide temperature distributions without explicit spatial structure. Fusion of both sensing modalities enables thermally augmented 3D scene reconstruction and spatial localization of temperature anomalies. This paper presents a practical LiDAR–thermal fusion framework for three-dimensional localization of heat sources using an Ouster OS1 LiDAR sensor and a FLIR A70 thermal camera. The proposed framework includes intrinsic thermal-camera calibration, extrinsic LiDAR–thermal calibration, multimodal data synchronization, projection of LiDAR points onto the thermal image plane, and assignment of temperature values to spatial points. Additionally, a dedicated thermally distinguishable calibration target is proposed to enable reliable multimodal feature extraction under low-contrast LWIR imaging conditions. The developed framework was experimentally validated using real radiometric thermal data and LiDAR point clouds acquired under laboratory conditions. Quantitative evaluation demonstrated reprojection errors below 1 pixel and a mean hottest-point localisation error of approximately 4.1 cm at a distance of 12.3 m. The results confirm that accurate spatial localisation of thermal anomalies can be achieved using a geometry-based multimodal fusion approach without relying on computationally expensive learning-based methods. The proposed framework emphasises practical deployment, deterministic calibration, and applicability in scenarios where limited training data or constrained computational resources make learning-based approaches difficult to apply. The proposed system may be applied to building energy diagnostics, industrial inspection, technical infrastructure monitoring, and robotic perception systems that require reliable spatial localisation of heat sources under real measurement conditions. Full article
(This article belongs to the Collection 3D Imaging and Sensing System)
Show Figures

Figure 1

26 pages, 3157 KB  
Article
Geometric Scene Formalization in Vision-Based Educational Sensing via Multimodal Large Models
by Yanjing Cao and Lian Chen
Appl. Sci. 2026, 16(12), 6172; https://doi.org/10.3390/app16126172 - 18 Jun 2026
Viewed by 135
Abstract
This paper studies geometric scene formalization in vision-based educational sensing environments, where textual conditions and geometric diagram images jointly constitute heterogeneous perceptual inputs. The goal is to convert multimodal sensed information into standardized formal representations for machine understandable educational analysis. Existing methods remain [...] Read more.
This paper studies geometric scene formalization in vision-based educational sensing environments, where textual conditions and geometric diagram images jointly constitute heterogeneous perceptual inputs. The goal is to convert multimodal sensed information into standardized formal representations for machine understandable educational analysis. Existing methods remain limited by unstable cross modal alignment, inadequate expression of geometric relational constraints, and insufficient verifiability of generated outputs. To overcome these challenges, a unified modeling framework is proposed based on multimodal large models with structure-aware prompting and verification feedback. A geometry-oriented structure prompt injection mechanism is first introduced to encode prior cues of geometric entities, relational patterns, and constraint dependencies, which enhances the intrinsic alignment among textual descriptions, visually sensed diagram regions, and formal symbolic representations. In addition, an external verification feedback strategy is employed to constrain and iteratively refine the initial outputs, thereby improving structural consistency, syntactic correctness, and target proposition accuracy. To support this task, a new vision-based multimodal geometry formalization dataset is further constructed for model training and evaluation. Extensive experiments show that the proposed method can more effectively accomplish the transformation from multimodal sensed educational inputs to executable formal expressions, while also demonstrating stronger robustness and reliability in complex visual conditions. These results indicate that the proposed framework offers a feasible solution for structured scene interpretation, automatic problem analysis, error diagnosis, and intelligent feedback in vision-based educational systems. Full article
Show Figures

Figure 1

0 pages, 4337 KB  
Proceeding Paper
Next-Day Forest Fire Risk Prediction Using Machine Learning and Multimodal Satellite Data
by Prajwal Mohapatra, Swayam Subhankar Sahoo, Adyasha Das and Rururaj Pradhan
Eng. Proc. 2026, 124(1), 120; https://doi.org/10.3390/engproc2026124120 (registering DOI) - 17 Jun 2026
Abstract
Predicting forest fire occurrence is essential for proactive disaster preparedness and environmental protection. We introduce a machine learning-based system that forecasts next-day fire probability at high spatial resolution using satellite-derived, multi-modal geospatial data. In contrast to existing reactive systems that rely on thermal [...] Read more.
Predicting forest fire occurrence is essential for proactive disaster preparedness and environmental protection. We introduce a machine learning-based system that forecasts next-day fire probability at high spatial resolution using satellite-derived, multi-modal geospatial data. In contrast to existing reactive systems that rely on thermal anomaly detection (e.g., MODIS or VIIRS-SNPP), our approach is fully predictive, generating pixel-wise fire risk maps a day in advance. Our study focuses on Uttarakhand, India, which is an ecologically sensitive region that experiences frequent and severe forest fires. We curated a domain-specific geospatial dataset spanning 1 April to 29 May 2016. It includes daily 30 m GeoTIFF images with 10 bands comprising weather (e.g., temperature, wind, precipitation), topography (slope, aspect), fuel map, and fire mask. We constructed this dataset from diverse sources and aligned all bands spatially and temporally. To demonstrate the usefulness of this dataset, we implement a deep convolutional neural network (CNN) using the ResUNet-A architecture, chosen for its robust performance in the semantic segmentation of high-resolution remote sensing data. Our model is trained from scratch to produce high-resolution fire probability maps and classify fire/no-fire pixels. Our solution helps with planning and decision-making for early intervention, especially in areas with high risk. It supports the UN’s SDG 13 (Climate Action) and SDG 15 (Life on Land) by enhancing resilience and conserving ecosystems. The presented dataset and methodology can serve as a benchmark for future research on wildfire risk prediction using Earth observation data. Full article
(This article belongs to the Proceedings of The 6th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

30 pages, 21671 KB  
Article
Semantic Translation and LLM-RAG Fusion of Multi-Source Heterogeneous Data for Production Cognition in Discrete Manufacturing
by Pingwen Zheng, Liping Wang, Changchun Liu and Dunbing Tang
Electronics 2026, 15(12), 2692; https://doi.org/10.3390/electronics15122692 - 17 Jun 2026
Viewed by 110
Abstract
Multi-source heterogeneous data in discrete manufacturing shop floors, including vibration signals, equipment logs, visual monitoring data, and handwritten production reports, exhibit significant differences in modality and semantic representation. Traditional fusion methods often fail to bridge the semantic gap between low-level sensing signals and [...] Read more.
Multi-source heterogeneous data in discrete manufacturing shop floors, including vibration signals, equipment logs, visual monitoring data, and handwritten production reports, exhibit significant differences in modality and semantic representation. Traditional fusion methods often fail to bridge the semantic gap between low-level sensing signals and high-level manufacturing cognition, limiting intelligent anomaly analysis and decision-making capability. To address this issue, this paper proposes a semantic translation and fusion framework for industrial heterogeneous data based on Knowledge Graph (KG), Retrieval-Augmented Generation (RAG), and Large Language Models (LLMs). First, a unified semantic translation mechanism is developed to convert multimodal industrial data into structured semantic representations for cross-modal alignment. Second, an industrial knowledge graph and RAG mechanism are introduced to integrate process knowledge, maintenance manuals, and historical fault records into the reasoning process. Third, an LLM-driven reasoning framework is designed for multimodal semantic fusion, anomaly identification, causal analysis, and optimization recommendation generation. In addition, a digital twin-based visualization interface is constructed to realize real-time interaction between production lines, industrial data, and intelligent cognitive reports. Experimental results demonstrate that the proposed framework significantly improves industrial reasoning accuracy, anomaly analysis correctness, and response efficiency compared with general-purpose LLMs, providing an effective solution for intelligent cognition and decision-making in discrete manufacturing systems. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

Back to TopTop