Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (455)

Search Parameters:
Keywords = multi-modal sensor fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 49730 KB  
Article
AMSRDet: An Adaptive Multi-Scale UAV Infrared-Visible Remote Sensing Vehicle Detection Network
by Zekai Yan and Yuheng Li
Sensors 2026, 26(3), 817; https://doi.org/10.3390/s26030817 - 26 Jan 2026
Abstract
Unmanned Aerial Vehicle (UAV) platforms enable flexible and cost-effective vehicle detection for intelligent transportation systems, yet small-scale vehicles in complex aerial scenes pose substantial challenges from extreme scale variations, environmental interference, and single-sensor limitations. We present AMSRDet (Adaptive Multi-Scale Remote Sensing Detector), an [...] Read more.
Unmanned Aerial Vehicle (UAV) platforms enable flexible and cost-effective vehicle detection for intelligent transportation systems, yet small-scale vehicles in complex aerial scenes pose substantial challenges from extreme scale variations, environmental interference, and single-sensor limitations. We present AMSRDet (Adaptive Multi-Scale Remote Sensing Detector), an adaptive multi-scale detection network fusing infrared (IR) and visible (RGB) modalities for robust UAV-based vehicle detection. Our framework comprises four novel components: (1) a MobileMamba-based dual-stream encoder extracting complementary features via Selective State-Space 2D (SS2D) blocks with linear complexity O(HWC), achieving 2.1× efficiency improvement over standard Transformers; (2) a Cross-Modal Global Fusion (CMGF) module capturing global dependencies through spatial-channel attention while suppressing modality-specific noise via adaptive gating; (3) a Scale-Coordinate Attention Fusion (SCAF) module integrating multi-scale features via coordinate attention and learned scale-aware weighting, improving small object detection by 2.5 percentage points; and (4) a Separable Dynamic Decoder generating scale-adaptive predictions through content-aware dynamic convolution, reducing computational cost by 48.9% compared to standard DETR decoders. On the DroneVehicle dataset, AMSRDet achieves 45.8% mAP@0.5:0.95 (81.2% mAP@0.5) at 68.3 Frames Per Second (FPS) with 28.6 million (M) parameters and 47.2 Giga Floating Point Operations (GFLOPs), outperforming twenty state-of-the-art detectors including YOLOv12 (+0.7% mAP), DEIM (+0.8% mAP), and Mamba-YOLO (+1.5% mAP). Cross-dataset evaluation on Camera-vehicle yields 52.3% mAP without fine-tuning, demonstrating strong generalization across viewpoints and scenarios. Full article
(This article belongs to the Special Issue AI and Smart Sensors for Intelligent Transportation Systems)
Show Figures

Figure 1

26 pages, 4329 KB  
Review
Advanced Sensor Technologies in Cutting Applications: A Review
by Motaz Hassan, Roan Kirwin, Chandra Sekhar Rakurty and Ajay Mahajan
Sensors 2026, 26(3), 762; https://doi.org/10.3390/s26030762 - 23 Jan 2026
Viewed by 194
Abstract
Advances in sensing technologies are increasingly transforming cutting operations by enabling data-driven condition monitoring, predictive maintenance, and process optimization. This review surveys recent developments in sensing modalities for cutting systems, including vibration sensors, acoustic emission sensors, optical and vision-based systems, eddy-current sensors, force [...] Read more.
Advances in sensing technologies are increasingly transforming cutting operations by enabling data-driven condition monitoring, predictive maintenance, and process optimization. This review surveys recent developments in sensing modalities for cutting systems, including vibration sensors, acoustic emission sensors, optical and vision-based systems, eddy-current sensors, force sensors, and emerging hybrid/multi-modal sensing frameworks. Each sensing approach offers unique advantages in capturing mechanical, acoustic, geometric, or electromagnetic signatures related to tool wear, process instability, and fault development, while also showing modality-specific limitations such as noise sensitivity, environmental robustness, and integration complexity. Recent trends show a growing shift toward hybrid and multi-modal sensor fusion, where data from multiple sensors are combined using advanced data analytics and machine learning to improve diagnostic accuracy and reliability under changing cutting conditions. The review also discusses how artificial intelligence, Internet of Things connectivity, and edge computing enable scalable, real-time monitoring solutions, along with the challenges related to data needs, computational costs, and system integration. Future directions highlight the importance of robust fusion architectures, physics-informed and explainable models, digital twin integration, and cost-effective sensor deployment to accelerate adoption across various manufacturing environments. Overall, these advancements position advanced sensing and hybrid monitoring strategies as key drivers of intelligent, Industry 4.0-oriented cutting processes. Full article
Show Figures

Figure 1

20 pages, 17064 KB  
Article
PriorSAM-DBNet: A SAM-Prior-Enhanced Dual-Branch Network for Efficient Semantic Segmentation of High-Resolution Remote Sensing Images
by Qiwei Zhang, Yisong Wang, Ning Li, Quanwen Jiang and Yong He
Sensors 2026, 26(2), 749; https://doi.org/10.3390/s26020749 - 22 Jan 2026
Viewed by 68
Abstract
Semantic segmentation of high-resolution remote sensing imagery is a critical technology for the intelligent interpretation of sensor data, supporting automated environmental monitoring and urban sensing systems. However, processing data from dense urban scenarios remains challenging due to sensor signal occlusions (e.g., shadows) and [...] Read more.
Semantic segmentation of high-resolution remote sensing imagery is a critical technology for the intelligent interpretation of sensor data, supporting automated environmental monitoring and urban sensing systems. However, processing data from dense urban scenarios remains challenging due to sensor signal occlusions (e.g., shadows) and the complexity of parsing multi-scale targets from optical sensors. Existing approaches often exhibit a trade-off between the accuracy of global semantic modeling and the precision of complex boundary recognition. While the Segment Anything Model (SAM) offers powerful zero-shot structural priors, its direct application to remote sensing is hindered by domain gaps and the lack of inherent semantic categorization. To address these limitations, we propose a dual-branch cooperative network, PriorSAM-DBNet. The main branch employs a Densely Connected Swin (DC-Swin) Transformer to capture cross-scale global features via a hierarchical shifted window attention mechanism. The auxiliary branch leverages SAM’s zero-shot capability to exploit structural universality, generating object-boundary masks as robust signal priors while bypassing semantic domain shifts. Crucially, we introduce a parameter-efficient Scaled Subsampling Projection (SSP) module that employs a weight-sharing mechanism to align cross-modal features, freezing the massive SAM backbone to ensure computational viability for practical sensor applications. Furthermore, a novel Attentive Cross-Modal Fusion (ACMF) module is designed to dynamically resolve semantic ambiguities by calibrating the global context with local structural priors. Extensive experiments on the ISPRS Vaihingen, Potsdam, and LoveDA-Urban datasets demonstrate that PriorSAM-DBNet outperforms state-of-the-art approaches. By fine-tuning only 0.91 million parameters in the auxiliary branch, our method achieves mIoU scores of 82.50%, 85.59%, and 53.36%, respectively. The proposed framework offers a scalable, high-precision solution for remote sensing semantic segmentation, particularly effective for disaster emergency response where rapid feature recognition from sensor streams is paramount. Full article
Show Figures

Figure 1

35 pages, 5497 KB  
Article
Robust Localization of Flange Interface for LNG Tanker Loading and Unloading Under Variable Illumination a Fusion Approach of Monocular Vision and LiDAR
by Mingqin Liu, Han Zhang, Jingquan Zhu, Yuming Zhang and Kun Zhu
Appl. Sci. 2026, 16(2), 1128; https://doi.org/10.3390/app16021128 - 22 Jan 2026
Viewed by 22
Abstract
The automated localization of the flange interface in LNG tanker loading and unloading imposes stringent requirements for accuracy and illumination robustness. Traditional monocular vision methods are prone to localization failure under extreme illumination conditions, such as intense glare or low light, while LiDAR, [...] Read more.
The automated localization of the flange interface in LNG tanker loading and unloading imposes stringent requirements for accuracy and illumination robustness. Traditional monocular vision methods are prone to localization failure under extreme illumination conditions, such as intense glare or low light, while LiDAR, despite being unaffected by illumination, suffers from limitations like a lack of texture information. This paper proposes an illumination-robust localization method for LNG tanker flange interfaces by fusing monocular vision and LiDAR, with three scenario-specific innovations beyond generic multi-sensor fusion frameworks. First, an illumination-adaptive fusion framework is designed to dynamically adjust detection parameters via grayscale mean evaluation, addressing extreme illumination (e.g., glare, low light with water film). Second, a multi-constraint flange detection strategy is developed by integrating physical dimension constraints, K-means clustering, and weighted fitting to eliminate background interference and distinguish dual flanges. Third, a customized fusion pipeline (ROI extraction-plane fitting-3D circle center solving) is established to compensate for monocular depth errors and sparse LiDAR point cloud limitations using flange radius prior. High-precision localization is achieved via four key steps: multi-modal data preprocessing, LiDAR-camera spatial projection, fusion-based flange circle detection, and 3D circle center fitting. While basic techniques such as LiDAR-camera spatiotemporal synchronization and K-means clustering are adapted from prior works, their integration with flange-specific constraints and illumination-adaptive design forms the core novelty of this study. Comparative experiments between the proposed fusion method and the monocular vision-only localization method are conducted under four typical illumination scenarios: uniform illumination, local strong illumination, uniform low illumination, and low illumination with water film. The experimental results based on 20 samples per illumination scenario (80 valid data sets in total) show that, compared with the monocular vision method, the proposed fusion method reduces the Mean Absolute Error (MAE) of localization accuracy by 33.08%, 30.57%, and 75.91% in the X, Y, and Z dimensions, respectively, with the overall 3D MAE reduced by 61.69%. Meanwhile, the Root Mean Square Error (RMSE) in the X, Y, and Z dimensions is decreased by 33.65%, 32.71%, and 79.88%, respectively, and the overall 3D RMSE is reduced by 64.79%. The expanded sample size verifies the statistical reliability of the proposed method, which exhibits significantly superior robustness to extreme illumination conditions. Full article
Show Figures

Figure 1

30 pages, 1726 KB  
Article
A Sensor-Oriented Multimodal Medical Data Acquisition and Modeling Framework for Tumor Grading and Treatment Response Analysis
by Linfeng Xie, Shanhe Xiao, Bihong Ming, Zhe Xiang, Zibo Rui, Xinyi Liu and Yan Zhan
Sensors 2026, 26(2), 737; https://doi.org/10.3390/s26020737 - 22 Jan 2026
Viewed by 27
Abstract
In precision oncology research, achieving joint modeling of tumor grading and treatment response, together with interpretable mechanism analysis, based on multimodal medical imaging and clinical data remains a challenging and critical problem. From a sensing perspective, these imaging and clinical data can be [...] Read more.
In precision oncology research, achieving joint modeling of tumor grading and treatment response, together with interpretable mechanism analysis, based on multimodal medical imaging and clinical data remains a challenging and critical problem. From a sensing perspective, these imaging and clinical data can be regarded as heterogeneous sensor-derived signals acquired by medical imaging sensors and clinical monitoring systems, providing continuous and structured observations of tumor characteristics and patient states. Existing approaches typically rely on invasive pathological grading, while grading prediction and treatment response modeling are often conducted independently. Moreover, multimodal fusion procedures generally lack explicit structural constraints, which limits their practical utility in clinical decision-making. To address these issues, a grade-guided multimodal collaborative modeling framework was proposed. Built upon mature deep learning models, including 3D ResNet-18, MLP, and CNN–Transformer, tumor grading was incorporated as a weakly supervised prior into the processes of multimodal feature fusion and treatment response modeling, thereby enabling an integrated solution for non-invasive grading prediction, treatment response subtype discovery, and intrinsic mechanism interpretation. Through a grade-guided feature fusion mechanism, discriminative information that is highly correlated with tumor malignancy and treatment sensitivity is emphasized in the multimodal joint representation, while irrelevant features are suppressed to prevent interference with model learning. Within a unified framework, grading prediction and grade-conditioned treatment response modeling are jointly realized. Experimental results on real-world clinical datasets demonstrate that the proposed method achieved an accuracy of 84.6% and a kappa coefficient of 0.81 in the tumor-grading prediction task, indicating a high level of consistency with pathological grading. In the treatment response prediction task, the proposed model attained an AUC of 0.85, a precision of 0.81, and a recall of 0.79, significantly outperforming single-modality models, conventional early-fusion models, and multimodal CNN–Transformer models without grading constraints. In addition, treatment-sensitive and treatment-resistant subtypes identified under grading conditions exhibited stable and significant stratification differences in clustering consistency and survival analysis, validating the potential value of the proposed approach for clinical risk assessment and individualized treatment decision-making. Full article
(This article belongs to the Special Issue Application of Optical Imaging in Medical and Biomedical Research)
Show Figures

Figure 1

20 pages, 908 KB  
Article
Wearable ECG-PPG Deep Learning Model for Cardiac Index-Based Noninvasive Cardiac Output Estimation in Cardiac Surgery Patients
by Minwoo Kim, Min Dong Sung, Jimyeoung Jung, Sung Pil Cho, Junghwan Park, Sarah Soh, Hyun Chel Joo and Kyung Soo Chung
Sensors 2026, 26(2), 735; https://doi.org/10.3390/s26020735 - 22 Jan 2026
Viewed by 28
Abstract
Accurate cardiac output (CO) measurement is vital for hemodynamic management; however, it usually requires invasive monitoring, which limits its continuous and out-of-hospital use. Wearable sensors integrated with deep learning offer a noninvasive alternative. This study developed and validated a lightweight deep learning model [...] Read more.
Accurate cardiac output (CO) measurement is vital for hemodynamic management; however, it usually requires invasive monitoring, which limits its continuous and out-of-hospital use. Wearable sensors integrated with deep learning offer a noninvasive alternative. This study developed and validated a lightweight deep learning model using wearable electrocardiography (ECG) and photoplethysmography (PPG) signals to predict CO and examined whether cardiac index-based normalization (Cardiac Index (CI) = CO/body surface area) improves performance. Twenty-seven patients who underwent cardiac surgery and had pulmonary artery catheters were prospectively enrolled. Single-lead ECG (HiCardi+ chest patch) and finger PPG (WristOx2 3150) were recorded simultaneously and processed through an ECG–PPG fusion network with cross-modal interaction. Three models were trained as follows: (1) CI prediction, (2) direct CO prediction, and (3) indirect CO prediction. The total number of CO = predicted CI × body surface area. Reference values were derived from thermodilution. The CI model achieved the best performance, and the indirect CO model showed significant reductions in error/agreement metrics (MAE/RMSE/bias; p < 0.0001), while correlation-based metrics are reported descriptively without implying statistical significance. The Pearson correlation coefficient (PCC) and percentage error (PE) for the indirect CO estimates (PCC = 0.904; PE = 23.75%). The indirect CO estimates met the predefined PE < 30% agreement benchmark for method-comparison; this is not a universal clinical standard. These results demonstrate that wearable ECG–PPG fusion deep learning can achieve accurate, noninvasive CO estimation and that CI-based normalization enhances model agreement with pulmonary artery catheter measurements, supporting continuous catheter-free hemodynamic monitoring. Full article
Show Figures

Figure 1

21 pages, 15860 KB  
Article
Robot Object Detection and Tracking Based on Image–Point Cloud Instance Matching
by Hongxing Wang, Rui Zhu, Zelin Ye and Yaxin Li
Sensors 2026, 26(2), 718; https://doi.org/10.3390/s26020718 - 21 Jan 2026
Viewed by 126
Abstract
Effectively fusing the rich semantic information from camera images with the high-precision geometric measurements provided by LiDAR point clouds is a key challenge in mobile robot environmental perception. To address this problem, this paper proposes a highly extensible instance-aware fusion framework designed to [...] Read more.
Effectively fusing the rich semantic information from camera images with the high-precision geometric measurements provided by LiDAR point clouds is a key challenge in mobile robot environmental perception. To address this problem, this paper proposes a highly extensible instance-aware fusion framework designed to achieve efficient alignment and unified modeling of heterogeneous sensory data. The proposed approach adopts a modular processing pipeline. First, semantic instance masks are extracted from RGB images using an instance segmentation network, and a projection mechanism is employed to establish spatial correspondences between image pixels and LiDAR point cloud measurements. Subsequently, three-dimensional bounding boxes are reconstructed through point cloud clustering and geometric fitting, and a reprojection-based validation mechanism is introduced to ensure consistency across modalities. Building upon this representation, the system integrates a data association module with a Kalman filter-based state estimator to form a closed-loop multi-object tracking framework. Experimental results on the KITTI dataset demonstrate that the proposed system achieves strong 2D and 3D detection performance across different difficulty levels. In multi-object tracking evaluation, the method attains a MOTA score of 47.8 and an IDF1 score of 71.93, validating the stability of the association strategy and the continuity of object trajectories in complex scenes. Furthermore, real-world experiments on a mobile computing platform show an average end-to-end latency of only 173.9 ms, while ablation studies further confirm the effectiveness of individual system components. Overall, the proposed framework exhibits strong performance in terms of geometric reconstruction accuracy and tracking robustness, and its lightweight design and low latency satisfy the stringent requirements of practical robotic deployment. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

23 pages, 54360 KB  
Article
ATM-Net: A Lightweight Multimodal Fusion Network for Real-Time UAV-Based Object Detection
by Jiawei Chen, Junyu Huang, Zuye Zhang, Jinxin Yang, Zhifeng Wu and Renbo Luo
Drones 2026, 10(1), 67; https://doi.org/10.3390/drones10010067 - 20 Jan 2026
Viewed by 125
Abstract
UAV-based object detection faces critical challenges including extreme scale variations (targets occupy 0.1–2% image area), bird’s-eye view complexities, and all-weather operational demands. Single RGB sensors degrade under poor illumination while infrared sensors lack spatial details. We propose ATM-Net, a lightweight multimodal RGB–infrared fusion [...] Read more.
UAV-based object detection faces critical challenges including extreme scale variations (targets occupy 0.1–2% image area), bird’s-eye view complexities, and all-weather operational demands. Single RGB sensors degrade under poor illumination while infrared sensors lack spatial details. We propose ATM-Net, a lightweight multimodal RGB–infrared fusion network for robust UAV vehicle detection. ATM-Net integrates three innovations: (1) Asymmetric Recurrent Fusion Module (ARFM) performs “extraction→fusion→separation” cycles across pyramid levels, balancing cross-modal collaboration and modality independence. (2) Tri-Dimensional Attention (TDA) recalibrates features through orthogonal Channel-Width, Height-Channel, and Height-Width branches, enabling comprehensive multi-dimensional feature enhancement. (3) Multi-scale Adaptive Feature Pyramid Network (MAFPN) constructs enhanced representations via bidirectional flow and multi-path aggregation. Experiments on VEDAI and DroneVehicle datasets demonstrate superior performance—92.4% mAP50 and 64.7% mAP50-95 on VEDAI, 83.7% mAP on DroneVehicle—with only 4.83M parameters. ATM-Net achieves optimal accuracy–efficiency balance for resource-constrained UAV edge platforms. Full article
Show Figures

Figure 1

27 pages, 1619 KB  
Article
Uncertainty-Aware Multimodal Fusion and Bayesian Decision-Making for DSS
by Vesna Antoska Knights, Marija Prchkovska, Luka Krašnjak and Jasenka Gajdoš Kljusurić
AppliedMath 2026, 6(1), 16; https://doi.org/10.3390/appliedmath6010016 - 20 Jan 2026
Viewed by 69
Abstract
Uncertainty-aware decision-making increasingly relies on multimodal sensing pipelines that must fuse correlated measurements, propagate uncertainty, and trigger reliable control actions. This study develops a unified mathematical framework for multimodal data fusion and Bayesian decision-making under uncertainty. The approach integrates adaptive Covariance Intersection (aCI) [...] Read more.
Uncertainty-aware decision-making increasingly relies on multimodal sensing pipelines that must fuse correlated measurements, propagate uncertainty, and trigger reliable control actions. This study develops a unified mathematical framework for multimodal data fusion and Bayesian decision-making under uncertainty. The approach integrates adaptive Covariance Intersection (aCI) for correlation-robust sensor fusion, a Gaussian state–space backbone with Kalman filtering, heteroskedastic Bayesian regression with full posterior sampling via an affine-invariant MCMC sampler, and a Bayesian likelihood-ratio test (LRT) coupled to a risk-sensitive proportional–derivative (PD) control law. Theoretical guarantees are provided by bounding the state covariance under stability conditions, establishing convexity of the aCI weight optimization on the simplex, and deriving a Bayes-risk-optimal decision threshold for the LRT under symmetric Gaussian likelihoods. A proof-of-concept agro-environmental decision-support application is considered, where heterogeneous data streams (IoT soil sensors, meteorological stations, and drone-derived vegetation indices) are fused to generate early-warning alarms for crop stress and to adapt irrigation and fertilization inputs. The proposed pipeline reduces predictive variance and sharpens posterior credible intervals (up to 34% narrower 95% intervals and 44% lower NLL/Brier score under heteroskedastic modeling), while a Bayesian uncertainty-aware controller achieves 14.2% lower water usage and 35.5% fewer false stress alarms compared to a rule-based strategy. The framework is mathematically grounded yet domain-independent, providing a probabilistic pipeline that propagates uncertainty from raw multimodal data to operational control actions, and can be transferred beyond agriculture to robotics, signal processing, and environmental monitoring applications. Full article
(This article belongs to the Section Probabilistic & Statistical Mathematics)
Show Figures

Figure 1

23 pages, 2725 KB  
Article
Text- and Face-Conditioned Multi-Anchor Conditional Embedding for Robust Periocular Recognition
by Po-Ling Fong, Tiong-Sik Ng and Andrew Beng Jin Teoh
Appl. Sci. 2026, 16(2), 942; https://doi.org/10.3390/app16020942 - 16 Jan 2026
Viewed by 121
Abstract
Periocular recognition is essential when full-face images cannot be used because of occlusion, privacy constraints, or sensor limitations, yet in many deployments, only periocular images are available at run time, while richer evidence, such as archival face photos and textual metadata, exists offline. [...] Read more.
Periocular recognition is essential when full-face images cannot be used because of occlusion, privacy constraints, or sensor limitations, yet in many deployments, only periocular images are available at run time, while richer evidence, such as archival face photos and textual metadata, exists offline. This mismatch makes it hard to deploy conventional multimodal fusion. This motivates the notion of conditional biometrics, where auxiliary modalities are used only during training to learn stronger periocular representations while keeping deployment strictly periocular-only. In this paper, we propose Multi-Anchor Conditional Periocular Embedding (MACPE), which maps periocular, facial, and textual features into a shared anchor-conditioned space via a learnable anchor bank that preserves periocular micro-textures while aligning higher-level semantics. Training combines identity classification losses on periocular and face branches with a symmetric InfoNCE loss over anchors and a pulling regularizer that jointly aligns periocular, facial, and textual embeddings without collapsing into face-dominated solutions; captions generated by a vision language model provide complementary semantic supervision. At deployment, only the periocular encoder is used. Experiments across five periocular datasets show that MACPE consistently improves Rank-1 identification and reduces EER at a fixed FAR compared with periocular-only baselines and alternative conditioning methods. Ablation studies verify the contributions of anchor-conditioned embeddings, textual supervision, and the proposed loss design. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 3529 KB  
Article
Study on Multimodal Sensor Fusion for Heart Rate Estimation Using BCG and PPG Signals
by Jisheng Xing, Xin Fang, Jing Bai, Luyao Cui, Feng Zhang and Yu Xu
Sensors 2026, 26(2), 548; https://doi.org/10.3390/s26020548 - 14 Jan 2026
Viewed by 223
Abstract
Continuous heart rate monitoring is crucial for early cardiovascular disease detection. To overcome the discomfort and limitations of ECG in home settings, we propose a multimodal temporal fusion network (MM-TFNet) that integrates ballistocardiography (BCG) and photoplethysmography (PPG) signals. The network extracts temporal features [...] Read more.
Continuous heart rate monitoring is crucial for early cardiovascular disease detection. To overcome the discomfort and limitations of ECG in home settings, we propose a multimodal temporal fusion network (MM-TFNet) that integrates ballistocardiography (BCG) and photoplethysmography (PPG) signals. The network extracts temporal features from BCG and PPG signals through temporal convolutional networks (TCNs) and bidirectional long short-term memory networks (BiLSTMs), respectively, achieving cross-modal dynamic fusion at the feature level. First, bimodal features are projected into a unified dimensional space through fully connected layers. Subsequently, a cross-modal attention weight matrix is constructed for adaptive learning of the complementary correlation between BCG mechanical vibration and PPG volumetric flow features. Combined with dynamic focusing on key heartbeat waveforms through multi-head self-attention (MHSA), the model’s robustness under dynamic activity states is significantly enhanced. Experimental validation using a publicly available BCG-PPG-ECG simultaneous acquisition dataset comprising 40 subjects demonstrates that the model achieves excellent performance with a mean absolute error (MAE) of 0.88 BPM in heart rate prediction tasks, outperforming current mainstream deep learning methods. This study provides theoretical foundations and engineering guidance for developing contactless, low-power, edge-deployable home health monitoring systems, demonstrating the broad application potential of multimodal fusion methods in complex physiological signal analysis. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

26 pages, 4529 KB  
Review
Key Technologies for Intelligent Operation of Plant Protection UAVs in Hilly and Mountainous Areas: Progress, Challenges, and Prospects
by Yali Zhang, Zhilei Sun, Wanhang Peng, Yeqing Lin, Xinting Li, Kangting Yan and Pengchao Chen
Agronomy 2026, 16(2), 193; https://doi.org/10.3390/agronomy16020193 - 13 Jan 2026
Viewed by 207
Abstract
Hilly and mountainous areas are important agricultural production regions globally. Their dramatic topography, dense fruit tree planting, and steep slopes severely restrict the application of traditional plant protection machinery. Pest and disease control has long relied on manual spraying, resulting in high labor [...] Read more.
Hilly and mountainous areas are important agricultural production regions globally. Their dramatic topography, dense fruit tree planting, and steep slopes severely restrict the application of traditional plant protection machinery. Pest and disease control has long relied on manual spraying, resulting in high labor intensity, low efficiency, and pesticide utilization rates of less than 30%. Plant protection UAVs, with their advantages of flexibility, high efficiency, and precise application, provide a feasible technical approach for plant protection operations in hilly and mountainous areas. However, steep slopes and dense orchard environments place higher demands on key technologies such as drone positioning and navigation, attitude control, trajectory planning, and terrain following. Achieving accurate identification and adaptive following of the undulating fruit tree canopy while maintaining a constant spraying distance to ensure uniform pesticide coverage has become a core technological bottleneck. This paper systematically reviews the key technologies and research progress of plant protection UAVs in hilly and mountainous operations, focusing on the principles, advantages, and limitations of core methods such as multi-sensor fusion positioning, intelligent SLAM navigation, nonlinear attitude control and intelligent control, three-dimensional trajectory planning, and multimodal terrain following. It also discusses the challenges currently faced by these technologies in practical applications. Finally, this paper discusses and envisions the future of plant protection UAVs in achieving intelligent, collaborative, and precise operations on steep slopes and in dense orchards, providing theoretical reference and technical support for promoting the mechanization and intelligentization of mountain agriculture. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

16 pages, 8228 KB  
Article
A Detection Method for Seeding Temperature in Czochralski Silicon Crystal Growth Based on Multi-Sensor Data Fusion
by Lei Jiang, Tongda Chang and Ding Liu
Sensors 2026, 26(2), 516; https://doi.org/10.3390/s26020516 - 13 Jan 2026
Viewed by 153
Abstract
The Czochralski method is the dominant technique for producing power-electronics-grade silicon crystals. At the beginning of the seeding stage, an excessively high (or low) temperature at the solid–liquid interface can cause the time required for the seed to reach the specified length to [...] Read more.
The Czochralski method is the dominant technique for producing power-electronics-grade silicon crystals. At the beginning of the seeding stage, an excessively high (or low) temperature at the solid–liquid interface can cause the time required for the seed to reach the specified length to be too long (or too short). However, the time taken for the seed to reach a specified length is strictly controlled in semiconductor crystal growth to ensure that the initial temperature is appropriate. An inappropriate initial temperature can adversely affect crystal quality and production yield. Accurately evaluating whether the current temperature is appropriate for seeding is therefore essential. However, the temperature at the solid–liquid interface cannot be directly measured, and the current manual evaluation method mainly relies on a visual inspection of the meniscus. Previous methods for detecting this temperature classified image features, lacking a quantitative assessment of the temperature. To address this challenge, this study proposed using the duration of the seeding stage as the target variable for evaluating the temperature and developed an improved multimodal fusion regression network. Temperature signals collected from a central pyrometer and an auxiliary pyrometer were transformed into time–frequency representations via wavelet transform. Features extracted from the time–frequency diagrams, together with meniscus features, were fused through a two-level mechanism with multimodal feature fusion (MFF) and channel attention (CA), followed by masking using spatial attention (SA). The fused features were then input into a random vector functional link network (RVFLN) to predict the seeding duration, thereby establishing an indirect relationship between multi-sensor data and the seeding temperature achieving a quantification of the temperature that could not be directly measured. Transfer comparison experiments conducted on our dataset verified the effectiveness of the feature extraction strategy and demonstrated the superior detection performance of the proposed model. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

25 pages, 2897 KB  
Review
Integrating UAVs and Deep Learning for Plant Disease Detection: A Review of Techniques, Datasets, and Field Challenges with Examples from Cassava
by Wasiu Akande Ahmed, Olayinka Ademola Abiola, Dongkai Yang, Seyi Festus Olatoyinbo and Guifei Jing
Horticulturae 2026, 12(1), 87; https://doi.org/10.3390/horticulturae12010087 - 12 Jan 2026
Viewed by 203
Abstract
Cassava remains a critical food-security crop across Africa and Southeast Asia but is highly vulnerable to diseases such as cassava mosaic disease (CMD) and cassava brown streak disease (CBSD). Traditional diagnostic approaches are slow, labor-intensive, and inconsistent under field conditions. This review synthesizes [...] Read more.
Cassava remains a critical food-security crop across Africa and Southeast Asia but is highly vulnerable to diseases such as cassava mosaic disease (CMD) and cassava brown streak disease (CBSD). Traditional diagnostic approaches are slow, labor-intensive, and inconsistent under field conditions. This review synthesizes current advances in combining unmanned aerial vehicles (UAVs) with deep learning (DL) to enable scalable, data-driven cassava disease detection. It examines UAV platforms, sensor technologies, flight protocols, image preprocessing pipelines, DL architectures, and existing datasets, and it evaluates how these components interact within UAV–DL disease-monitoring frameworks. The review also compares model performance across convolutional neural network-based and Transformer-based architectures, highlighting metrics such as accuracy, recall, F1-score, inference speed, and deployment feasibility. Persistent challenges—such as limited UAV-acquired datasets, annotation inconsistencies, geographic model bias, and inadequate real-time deployment—are identified and discussed. Finally, the paper proposes a structured research agenda including lightweight edge-deployable models, UAV-ready benchmarking protocols, and multimodal data fusion. This review provides a consolidated reference for researchers and practitioners seeking to develop practical and scalable cassava-disease detection systems. Full article
Show Figures

Figure 1

39 pages, 2940 KB  
Article
Trustworthy AI-IoT for Citizen-Centric Smart Cities: The IMTPS Framework for Intelligent Multimodal Crowd Sensing
by Wei Li, Ke Li, Zixuan Xu, Mengjie Wu, Yang Wu, Yang Xiong, Shijie Huang, Yijie Yin, Yiping Ma and Haitao Zhang
Sensors 2026, 26(2), 500; https://doi.org/10.3390/s26020500 - 12 Jan 2026
Viewed by 258
Abstract
The fusion of Artificial Intelligence and the Internet of Things (AI-IoT, also widely referred to as AIoT) offers transformative potential for smart cities, yet presents a critical challenge: how to process heterogeneous data streams from intelligent sensing—particularly crowd sensing data derived from citizen [...] Read more.
The fusion of Artificial Intelligence and the Internet of Things (AI-IoT, also widely referred to as AIoT) offers transformative potential for smart cities, yet presents a critical challenge: how to process heterogeneous data streams from intelligent sensing—particularly crowd sensing data derived from citizen interactions like text, voice, and system logs—into reliable intelligence for sustainable urban governance. To address this challenge, we introduce the Intelligent Multimodal Ticket Processing System (IMTPS), a novel AI-IoT smart system. Unlike ad hoc solutions, the novelty of IMTPS resides in its theoretically grounded architecture, which orchestrates Information Theory and Game Theory for efficient, verifiable extraction, and employs Causal Inference and Meta-Learning for robust reasoning, thereby synergistically converting noisy, heterogeneous data streams into reliable governance intelligence. This principled design endows IMTPS with four foundational capabilities essential for modern smart city applications: Sustainable and Efficient AI-IoT Operations: Guided by Information Theory, the IMTPS compression module achieves provably efficient semantic-preserving compression, drastically reducing data storage and energy costs. Trustworthy Data Extraction: A Game Theory-based adversarial verification network ensures high reliability in extracting critical information, mitigating the risk of model hallucination in high-stakes citizen services. Robust Multimodal Fusion: The fusion engine leverages Causal Inference to distinguish true causality from spurious correlations, enabling trustworthy integration of complex, multi-source urban data. Adaptive Intelligent System: A Meta-Learning-based retrieval mechanism allows the system to rapidly adapt to new and evolving query patterns, ensuring long-term effectiveness in dynamic urban environments. We validate IMTPS on a large-scale, publicly released benchmark dataset of 14,230 multimodal records. IMTPS demonstrates state-of-the-art performance, achieving a 96.9% reduction in storage footprint and a 47% decrease in critical data extraction errors. By open-sourcing our implementation, we aim to provide a replicable blueprint for building the next generation of trustworthy and sustainable AI-IoT systems for citizen-centric smart cities. Full article
(This article belongs to the Special Issue AI-IoT for New Challenges in Smart Cities)
Show Figures

Figure 1

Back to TopTop