Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (389)

Search Parameters:
Keywords = singlemode-multimode-singlemode

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 6831 KB  
Technical Note
Transformer-Based Multi-Modal Fusion for Martian Impact Crater Classification
by Chen Yang, Yinghong Wu, Haishi Zhao and Minghao Zhao
Remote Sens. 2026, 18(4), 599; https://doi.org/10.3390/rs18040599 (registering DOI) - 14 Feb 2026
Abstract
Impact craters, as key geomorphic features on Mars, provide important insights into surface processes and geological evolution. However, automatic classification of crater morphologies remains challenging due to substantial variations in size, degradation degree, and data quality across different types of Martian craters. This [...] Read more.
Impact craters, as key geomorphic features on Mars, provide important insights into surface processes and geological evolution. However, automatic classification of crater morphologies remains challenging due to substantial variations in size, degradation degree, and data quality across different types of Martian craters. This study proposes a multi-modal framework for Martian crater classification by integrating infrared imagery, an optical map, and digital elevation model (DEM) data. Specifically, daytime infrared imagery from THEMIS, a color map from the Tianwen-1 MoRIC instrument, and topographic data derived from combined MOLA–HRSC observations are used to capture complementary thermal, morphological, and elevation-related characteristics. A transformer-based feature extraction and cross-modal fusion strategy is adopted, where infrared imagery guides the interaction among multi-source features. Experiments on a carefully constructed dataset covering four crater categories, i.e., standard craters, layered ejecta craters, degraded craters, and secondary craters, demonstrate that the proposed approach achieves an overall precision of 0.848 and a recall of 0.851, outperforming single-modality baselines. Layered ejecta craters exhibit the highest classification performance, benefiting from their distinctive ejecta morphologies, whereas secondary craters remain more difficult to classify due to their small spatial scales. The results highlight the value of multi-modal data for Martian crater morphology classification. Full article
Show Figures

Figure 1

28 pages, 5322 KB  
Article
Facial Expression Annotation and Analytics for Dysarthria Severity Classification
by Shufei Duan, Yuxin Guo, Longhao Fu, Fujiang Li, Xinran Dong, Huizhi Liang and Wei Zhang
Sensors 2026, 26(4), 1239; https://doi.org/10.3390/s26041239 - 13 Feb 2026
Abstract
Dysarthria in patients post-stroke is often accompanied by central facial paralysis, which impairs facial motor control and emotional expression. Current assessments rely on acoustic modalities, overlooking facial pathological cues and their correlation with emotional expression, which hinders comprehensive disease assessment. To address this [...] Read more.
Dysarthria in patients post-stroke is often accompanied by central facial paralysis, which impairs facial motor control and emotional expression. Current assessments rely on acoustic modalities, overlooking facial pathological cues and their correlation with emotional expression, which hinders comprehensive disease assessment. To address this issue, we propose a multimodal severity classification framework that integrates facial and acoustic features. Firstly, a multi-level annotation algorithm based on a pre-trained model and motion amplitude was designed to overcome the problem of data scarcity. Secondly, facial topology was modeled using Delaunay triangulation, with spatial relationships captured via graph convolutional networks (GCNs), while abnormal muscle coordination is quantified using facial action units (AUs). Finally, we proposed a multimodal feature set fusion technology framework to achieve the compensation of facial visual features for acoustic modalities and the analysis of disease classification. Our experimental results using the THE-POSSD dataset demonstrate an accuracy of 92.0% and an F1 score of 91.6%, significantly outperforming single-modality baselines. This study reveals the changes in facial movements and sensitive areas of patients under different emotional states, verifies the compensatory ability of visual patterns for auditory patterns, and demonstrates the potential of this multimodal framework for objective assessment and future clinical applications in speech disorders. Full article
(This article belongs to the Section Sensing and Imaging)
23 pages, 1833 KB  
Review
From Fingerprint Spectra to Intelligent Perception: Research Advances in Spectral Techniques for Ginseng Species Identification
by Yuying Jiang, Xi Jin, Guangming Li, Hongyi Ge, Yida Yin, Huifang Zheng, Xing Li and Peng Li
Foods 2026, 15(4), 684; https://doi.org/10.3390/foods15040684 - 13 Feb 2026
Abstract
Owing to the high pharmacological relevance and multidimensional quality attributes of Panax spp., accurate authentication and quality evaluation of Panax-derived herbal materials remain challenging within traditional Chinese medicine (TCM) quality control systems. Conventional approaches often face trade-offs among analysis speed and throughput, non-destructive [...] Read more.
Owing to the high pharmacological relevance and multidimensional quality attributes of Panax spp., accurate authentication and quality evaluation of Panax-derived herbal materials remain challenging within traditional Chinese medicine (TCM) quality control systems. Conventional approaches often face trade-offs among analysis speed and throughput, non-destructive measurement, and analytical accuracy, which can limit their suitability for modern, large-scale quality control. This review summarizes recent advances in vibrational and related analytical techniques—infrared (IR) and near-infrared (NIR) spectroscopy, Raman spectroscopy, terahertz (THz) spectroscopy, hyperspectral imaging (HSI), and nuclear magnetic resonance (NMR)—for authentication and quality evaluation of Panax materials. We compare the capabilities of each modality in supporting key tasks, including species authentication, geographical origin tracing, age/cultivation-stage discrimination, and quantitative assessment of major chemical markers, with emphasis on the underlying measurement principles. In general, NIR and HSI are well suited to rapid, high-throughput screening of bulk samples, whereas Raman and NMR provide higher chemical specificity for molecular and structural characterization. To mitigate limitations of single-modality analysis, this review discusses a methodological shift from conventional spectral fingerprinting and chemometric approaches toward model-driven, data-enabled sensing strategies for robust quality evaluation. Specifically, we highlight multimodal data fusion frameworks combined with interpretable machine-learning/deep-learning methods to build robust classification and regression models for quality assessment. This perspective aims to support standardized and scalable authentication and quality evaluation of Panax herbal materials and to facilitate the digitization of quality control workflows for Chinese herbal medicines. Full article
Show Figures

Figure 1

23 pages, 2557 KB  
Article
MECFN: A Multi-Modal Temporal Fusion Network for Valve Opening Prediction in Fluororubber Material Level Control
by Weicheng Yan, Kaiping Yuan, Han Hu, Minghui Liu, Haigang Gong, Xiaomin Wang and Guantao Zhang
Electronics 2026, 15(4), 783; https://doi.org/10.3390/electronics15040783 - 12 Feb 2026
Viewed by 44
Abstract
During fluororubber production, strong material agitation and agglomeration induce severe dynamic fluctuations, irregular surface morphology, and pronounced variations in apparent material level. Under such operating conditions, conventional single-modality monitoring approaches—such as point-based height sensors or manual visual inspection—often fail to reliably capture the [...] Read more.
During fluororubber production, strong material agitation and agglomeration induce severe dynamic fluctuations, irregular surface morphology, and pronounced variations in apparent material level. Under such operating conditions, conventional single-modality monitoring approaches—such as point-based height sensors or manual visual inspection—often fail to reliably capture the true process state. This information deficiency leads to inaccurate valve opening adjustment and degrades material level control performance. To address this issue, valve opening prediction is formulated as a data-driven, control-oriented regression task for material level regulation, and an end-to-end multimodal temporal regression framework, termed MECFN (Multi-Modal Enhanced Cross-Fusion Network), is proposed. The model performs deep fusion of visual image sequences and height sensor signals. A customized Multi-Feature Extraction (MFE) module is designed to enhance visual feature representation under complex surface conditions, while two independent Transformer encoders are employed to capture long-range temporal dependencies within each modality. Furthermore, a context-aware cross-attention mechanism is introduced to enable effective interaction and adaptive fusion between heterogeneous modalities. Experimental validation on a real-world industrial fluororubber production dataset demonstrates that MECFN consistently outperforms traditional machine learning approaches and single-modality deep learning models in valve opening prediction. Quantitative results show that MECFN achieves a mean absolute error of 2.36, a root mean squared error of 3.73, and an R2 of 0.92. These results indicate that the proposed framework provides a robust and practical data-driven solution for supporting valve control and achieving stable material level regulation in industrial production environments. Full article
(This article belongs to the Special Issue AI for Industry)
Show Figures

Figure 1

37 pages, 952 KB  
Review
Detection for New Biomarkers of Tuberculosis Infection Activity Using Machine Learning Methods
by Anna An. Starshinova, Adilya Sabirova, Olesya Koroteeva, Igor Kudryavtsev, Artem Rubinstein, Arthur Aquino, Andrey S. Trulioff, Ekaterina Belyaeva, Anastasia Kulpina, Raul A. Sharipov, Ravil K. Tukfatullin, Nikolay Y. Nikolenko, Anton Mikhalev, Andrey A. Savchenko, Alexandr Borisov and Dmitry Kudlay
Diseases 2026, 14(2), 66; https://doi.org/10.3390/diseases14020066 - 11 Feb 2026
Viewed by 173
Abstract
Background/Objectives: Latent tuberculosis infection (LTBI) represents a critical reservoir for subsequent development of active tuberculosis (ATB) and poses significant challenges for early diagnosis and disease prevention. Traditional immunological assays, such as interferon-gamma release assays (IGRAs), are limited in their ability to reliably distinguish [...] Read more.
Background/Objectives: Latent tuberculosis infection (LTBI) represents a critical reservoir for subsequent development of active tuberculosis (ATB) and poses significant challenges for early diagnosis and disease prevention. Traditional immunological assays, such as interferon-gamma release assays (IGRAs), are limited in their ability to reliably distinguish LTBI from ATB. Recent advances in high-throughput omics technologies and machine learning (ML) approaches offer new opportunities for precise, biomarker-based differential diagnostics. Methods: Transcriptomic and proteomic profiling of host immune responses has revealed reproducible gene and protein signatures associated with LTBI and ATB. The integration of ML techniques—including feature selection, dimensionality reduction, multimodal learning, and explainable AI—facilitates the construction of robust diagnostic models. Single-modality signatures, derived from RNA-seq, microarrays, or proteomic assays, are complemented by multimodal approaches that incorporate soluble mediators, immunological readouts, and imaging-derived features. Deep learning frameworks, such as convolutional neural networks and transformer-based architectures, enhance the extraction of complex molecular and structural patterns from high-dimensional datasets. Results: ML-driven analyses of transcriptomic and proteomic data consistently outperform conventional immunological tests in terms of sensitivity, specificity, and clinical applicability. Multimodal integration further improves diagnostic accuracy and robustness. These advances support the translational development of concise, quantitative reverse transcription PCR (qRT-PCR)-based biomarker panels suitable for routine clinical application, enabling early and reliable differentiation between LTBI and ATB. Overall, the combination of high-throughput omics and AI-based analytical frameworks provides a promising pathway for enhancing global tuberculosis diagnostics. Conclusions: This review provides a structured and critical synthesis of transcriptomic and proteomic biomarker research for LTBI and ATB discrimination, with a particular emphasis on machine learning–based analytical frameworks. Unlike previous narrative reviews, we systematically compare data-generating platforms, modelling strategies, validation approaches, and sources of heterogeneity across studies. We further identify key translational barriers, including cohort homogeneity, platform dependency, and limited external validation, and propose directions for future research aimed at improving clinical applicability. Full article
Show Figures

Figure 1

26 pages, 15341 KB  
Article
A Multimodal Three-Channel Bearing Fault Diagnosis Method Based on CNN Fusion Attention Mechanism Under Strong Noise Conditions
by Yingyong Zou, Chunfang Li, Yu Zhang, Zhiqiang Si and Long Li
Algorithms 2026, 19(2), 144; https://doi.org/10.3390/a19020144 - 10 Feb 2026
Viewed by 166
Abstract
Bearings, as core components of mechanical equipment, play a critical role in ensuring equipment safety and reliability. Early fault detection holds significant importance. Addressing the challenges of insufficient robustness in bearing fault diagnosis under industrial high-noise conditions and the difficulty of extracting fault [...] Read more.
Bearings, as core components of mechanical equipment, play a critical role in ensuring equipment safety and reliability. Early fault detection holds significant importance. Addressing the challenges of insufficient robustness in bearing fault diagnosis under industrial high-noise conditions and the difficulty of extracting fault features from a single modality, this study proposes a three-channel multimodal fault diagnosis method that integrates a Convolutional Auto-Encoder (CAE) with a dual attention mechanism (M-CNNBiAM). This approach provides an effective technical solution for the precise diagnosis of bearing faults in high-noise environments. To suppress substantial noise interference, a CAE denoising module was designed to filter out intense noise, providing high-quality input for subsequent diagnostic networks. To address the limitations of single-modal feature extraction and restricted generalization capabilities, a three-channel time–frequency signal joint diagnosis model combining the Continuous Wavelet Transform (CWT) with an attention mechanism was proposed. This approach enables deep mining and efficient fusion of multi-domain features, thereby enhancing fault diagnosis accuracy and generalization capabilities. Experimental results demonstrate that the designed CAE module maintains excellent noise reduction performance even under −10 dB strong noise conditions. When combined with the proposed diagnostic model, it achieves an average diagnostic accuracy of 98% across both the CWRU and self-test datasets, demonstrating outstanding diagnostic precision. Furthermore, under −4 dB noise conditions, it achieves a 94% diagnostic accuracy even without relying on the CAE denoising module. With a single training cycle taking only 6.8 s, it balances training efficiency and diagnostic performance, making it well-suited for real-time, reliable bearing fault diagnosis in industrial environments with high noise levels. Full article
Show Figures

Figure 1

41 pages, 1285 KB  
Review
Multimodal Classification Algorithms for Emotional Stress Analysis with an ECG-Centered Framework: A Comprehensive Review
by Xinyang Zhang, Haimin Zhang and Min Xu
AI 2026, 7(2), 63; https://doi.org/10.3390/ai7020063 - 9 Feb 2026
Viewed by 332
Abstract
Emotional stress plays a critical role in mental health conditions such as anxiety, depression, and cognitive decline, yet its assessment remains challenging due to the subjective and episodic nature of conventional self-report methods. Multimodal physiological approaches, integrating signals such as electrocardiogram (ECG), electrodermal [...] Read more.
Emotional stress plays a critical role in mental health conditions such as anxiety, depression, and cognitive decline, yet its assessment remains challenging due to the subjective and episodic nature of conventional self-report methods. Multimodal physiological approaches, integrating signals such as electrocardiogram (ECG), electrodermal activity (EDA), and electromyography (EMG), offer a promising alternative by enabling objective, continuous, and complementary characterization of autonomic stress responses. Recent advances in machine learning and artificial intelligence (ML/AI) have become central to this paradigm, as they provide the capacity to model nonlinear dynamics, inter-modality dependencies, and individual variability that cannot be effectively captured by rule-based or single-modality methods. This paper reviews multimodal physiological stress recognition with an emphasis on ECG-centered systems and their integration with EDA and EMG. We summarize stress-related physiological mechanisms, catalog public and self-collected databases, and analyze their ecological validity, synchronization, and annotation practices. We then examine preprocessing pipelines, feature extraction methods, and multimodal fusion strategies across different stages of model design, highlighting how ML/AI techniques address modality heterogeneity and temporal misalignment. Comparative analysis shows that while deep learning models often improve within-dataset performance, their generalization across subjects and datasets remains limited. Finally, we discuss open challenges and future directions, including self-supervised learning, domain adaptation, and standardized evaluation protocols. This review provides practical insights for developing robust, generalizable, and scalable multimodal stress recognition systems for mental health monitoring. Full article
Show Figures

Figure 1

25 pages, 7216 KB  
Article
A CNN-LSTM-XGBoost Hybrid Framework for Interpretable Nitrogen Stress Classification Using Multimodal UAV Imagery
by Xiaohui Kuang, Dawei Wang, Bohan Mao, Yafeng Li, Deshan Chen, Wanna Fu, Qian Cheng, Fuyi Duan, Hao Li, Xinyue Hou and Zhen Chen
Remote Sens. 2026, 18(4), 538; https://doi.org/10.3390/rs18040538 - 7 Feb 2026
Viewed by 272
Abstract
Accurate diagnosis of nitrogen status is essential for precision fertilization in winter wheat. Single-modal or single-temporal remote sensing often fails to capture the multidimensional crop responses to nitrogen stress. In this study, we propose a hybrid framework based on CNN-LSTM-XGBoost for interpretable classification [...] Read more.
Accurate diagnosis of nitrogen status is essential for precision fertilization in winter wheat. Single-modal or single-temporal remote sensing often fails to capture the multidimensional crop responses to nitrogen stress. In this study, we propose a hybrid framework based on CNN-LSTM-XGBoost for interpretable classification of wheat nitrogen stress gradients using multimodal unmanned aerial vehicle (UAV) multispectral and thermal infrared (TIR) imagery. Field experiments were conducted at the Xinxiang base in Henan Province during the 2023–2024, following a randomized block design involving 10 cultivars, four nitrogen levels, and four water treatments. Multisource UAV images acquired at jointing, heading, and filling stages were used to construct a multimodal feature set consisting of manual features (spectral bands, vegetation indices (VIs), TIR, and their interaction terms) and seven temporal statistical features. A deep learning model (CNN-LSTM) was utilized to further extract deep spatiotemporal features, and its performance was systematically compared with traditional machine learning models. The results show that multimodal feature fusion significantly enhanced classification performance. The CNN-LSTM model achieved an accuracy of 89.38% with fused multimodal features, outperforming all traditional machine learning models. Incorporating multi-temporal features improved the F1macro of the XGBoost model to 0.9131, a 9.42 percentage-point increase over using the single heading stage alone. The hybrid model (CNN-LSTM-XGBoost) achieved the highest overall performance (Accuracy = 0.9208; F1macro = 0.9212; AUCmacro = 0.9879; Kappa = 0.8944). SHAP analysis identified TIR × NDRE as the most influential indicator, reflecting the coupled physiological response of reduced chlorophyll content and increased canopy temperature under nitrogen deficiency. The proposed multimodal, multi-temporal, and interpretable framework provides a robust technical foundation for UAV-assisted precision nitrogen management. Full article
Show Figures

Figure 1

13 pages, 3053 KB  
Article
Composite Multi-Parameter Sensor Based on Misaligned Peanut-Shaped Structure for Measuring Strain and Temperature
by Cheng Li, Bing Wu, Yu Zhang, Hang Zhu, Zhigang Gao, Jie Zhang, Linghao Kong, Xiaojun Cui, Guoyu Zhang and Feng Peng
Optics 2026, 7(1), 12; https://doi.org/10.3390/opt7010012 - 4 Feb 2026
Viewed by 157
Abstract
A composite fiber optic sensor based on a misaligned peanut-shaped structure and the single-mode fiber–multimode fiber–single-mode fiber (SMS) structure is proposed for simultaneous strain and temperature measurements. The misaligned peanut-shaped structure is formed by introducing a certain core-offset during fusion splicing. Through a [...] Read more.
A composite fiber optic sensor based on a misaligned peanut-shaped structure and the single-mode fiber–multimode fiber–single-mode fiber (SMS) structure is proposed for simultaneous strain and temperature measurements. The misaligned peanut-shaped structure is formed by introducing a certain core-offset during fusion splicing. Through a simulation analysis of the sensor, the optical field distribution of the sensor structure under different offset amounts is obtained. The experimental results demonstrate that the sensor achieves a maximum strain sensitivity of −48.21 pm/µε with an offset of 35.61 µm under a strain range of 0–600 µε and a maximum temperature sensitivity of 124.29 pm/°C at a 24.35 µm offset with a temperature range of 35–95 °C. Meanwhile, the sensor with a 35.61 µm offset has two resonance peaks that are selected for simultaneous measurements, with strain sensitivities of −48.21 pm/µε and −47.04 pm/µε and temperature sensitivities of 75.71 pm/°C and 84.29 pm/°C, respectively. Therefore, the simultaneous measurement of the strain and temperature can be achieved through a matrix method, demonstrating that the sensor possesses a dual-parameter sensing capability for the strain and temperature. Full article
Show Figures

Figure 1

17 pages, 784 KB  
Article
A Wideband Oscillation Classification Method Based on Multimodal Feature Fusion
by Yingmin Zhang, Yixiong Liu, Zongsheng Zheng and Shilin Gao
Electronics 2026, 15(3), 682; https://doi.org/10.3390/electronics15030682 - 4 Feb 2026
Viewed by 217
Abstract
With the increasing penetration of renewable energy sources and power-electronic devices, modern power systems exhibit pronounced wideband oscillation characteristics with large frequency spans, strong modal coupling, and significant time-varying behaviors. Accurate identification and classification of wideband oscillation patterns have therefore become critical challenges [...] Read more.
With the increasing penetration of renewable energy sources and power-electronic devices, modern power systems exhibit pronounced wideband oscillation characteristics with large frequency spans, strong modal coupling, and significant time-varying behaviors. Accurate identification and classification of wideband oscillation patterns have therefore become critical challenges for ensuring the secure and stable operation of “dual-high” power systems. Existing methods based on signal processing or single-modality deep-learning models often fail to fully exploit the complementary information embedded in heterogeneous data representations, resulting in limited performance when dealing with complex oscillation patterns.To address these challenges, this paper proposes a multimodal attention-based fusion network for wideband oscillation classification. A dual-branch deep-learning architecture is developed to process Gramian Angular Difference Field images and raw time-series signals in parallel, enabling collaborative extraction of global structural features and local temporal dynamics. An improved Inception module is employed in the image branch to enhance multi-scale spatial feature representation, while a gated recurrent unit network is utilized in the time-series branch to model dynamic evolution characteristics. Furthermore, an attention-based fusion mechanism is introduced to adaptively learn the relative importance of different modalities and perform dynamic feature aggregation. Extensive experiments are conducted using a dataset constructed from mathematical models and engineering-oriented simulations. Comparative studies and ablation studies demonstrate that the proposed method significantly outperforms conventional signal-processing-based approaches and single-modality deep-learning models in terms of classification accuracy, robustness, and generalization capability. The results confirm the effectiveness of multimodal feature fusion and attention mechanisms for accurate wideband oscillation classification, providing a promising solution for advanced power system monitoring and analysis. Full article
Show Figures

Figure 1

18 pages, 3652 KB  
Article
Optimizing Foundation Model to Enhance Surface Water Segmentation with Multi-Modal Remote Sensing Data
by Guochao Hu, Mengmeng Shao, Kaiyuan Li, Xiran Zhou and Xiao Xie
Water 2026, 18(3), 382; https://doi.org/10.3390/w18030382 - 2 Feb 2026
Viewed by 243
Abstract
Water resources are of critical importance across all ecological, social, and economic realms. Accurate extraction of water bodies is of significance to estimate the spatial coverage of water resources and to mitigate water-related disasters. Single-modal remote sensing images are often insufficient for accurate [...] Read more.
Water resources are of critical importance across all ecological, social, and economic realms. Accurate extraction of water bodies is of significance to estimate the spatial coverage of water resources and to mitigate water-related disasters. Single-modal remote sensing images are often insufficient for accurate water body extraction due to limitations in spectral information, weather conditions, and speckle noises. Furthermore, state-of-the-art deep learning models may be constrained by data extensibility, feature transferability, model scalability, and task producibility. This manuscript presents an integrated GeoAI framework that enhances foundation models for efficient water body extraction with multi-modal remote sensing images. The proposed framework consists of a data augmentation module tailored for optical and synthetic aperture radar (SAR) remote sensing images, as well as extraction modules augmented by three popular foundation models, namely SAM, SAMRS, and CROMA. Specifically, optical and SAR images are preprocessed and augmented independently, encoded through foundation model backbones, and subsequently decoded to generate water body segmentation masks under single-modal and multi-modal settings. Full article
Show Figures

Figure 1

27 pages, 20812 KB  
Article
A Lightweight Radar–Camera Fusion Deep Learning Model for Human Activity Recognition
by Minkyung Jeon and Sungmin Woo
Sensors 2026, 26(3), 894; https://doi.org/10.3390/s26030894 - 29 Jan 2026
Viewed by 303
Abstract
Human activity recognition in privacy-sensitive indoor environments requires sensing modalities that remain robust under illumination variation and background clutter while preserving user anonymity. To this end, this study proposes a lightweight radar–camera fusion deep learning model that integrates motion signatures from FMCW radar [...] Read more.
Human activity recognition in privacy-sensitive indoor environments requires sensing modalities that remain robust under illumination variation and background clutter while preserving user anonymity. To this end, this study proposes a lightweight radar–camera fusion deep learning model that integrates motion signatures from FMCW radar with coarse spatial cues from ultra-low-resolution camera frames. The radar stream is processed as a Range–Doppler–Time cube, where each frame is flattened and sequentially encoded using a Transformer-based temporal model to capture fine-grained micro-Doppler patterns. The visual stream employs a privacy-preserving 4×5-pixel camera input, from which a temporal sequence of difference frames is extracted and modeled with a dedicated camera Transformer encoder. The two modality-specific feature vectors—each representing the temporal dynamics of motion—are concatenated and passed through a lightweight fully connected classifier to predict human activity categories. A multimodal dataset of synchronized radar cubes and ultra-low-resolution camera sequences across 15 activity classes was constructed for evaluation. Experimental results show that the proposed fusion model achieves 98.74% classification accuracy, significantly outperforming single-modality baselines (single-radar and single-camera). Despite its performance, the entire model requires only 11 million floating-point operations (11 MFLOPs), making it highly efficient for deployment on embedded or edge devices. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

27 pages, 4885 KB  
Article
AI–Driven Multimodal Sensing for Early Detection of Health Disorders in Dairy Cows
by Agne Paulauskaite-Taraseviciene, Arnas Nakrosis, Judita Zymantiene, Vytautas Jurenas, Joris Vezys, Antanas Sederevicius, Romas Gruzauskas, Vaidas Oberauskas, Renata Japertiene, Algimantas Bubulis, Laura Kizauskiene, Ignas Silinskas, Juozas Zemaitis and Vytautas Ostasevicius
Animals 2026, 16(3), 411; https://doi.org/10.3390/ani16030411 - 28 Jan 2026
Viewed by 426
Abstract
Digital technologies that continuously quantify animal behavior, physiology, and production offer significant potential for the early identification of health and welfare disorders of dairy cows. In this study, a multimodal artificial intelligence (AI) framework is proposed for real-time health monitoring of dairy cows [...] Read more.
Digital technologies that continuously quantify animal behavior, physiology, and production offer significant potential for the early identification of health and welfare disorders of dairy cows. In this study, a multimodal artificial intelligence (AI) framework is proposed for real-time health monitoring of dairy cows through the integration of physiological, behavioral, production, and thermal imaging data, targeting veterinarian-confirmed udder, leg, and hoof infections. Predictions are generated at the cow-day level by aggregating multimodal measurements collected during daily milking events. The dataset comprised 88 lactating cows, including veterinarian-confirmed udder, leg, and hoof infections grouped under a single ‘sick’ label. To prevent information leakage, model evaluation was performed using a cow-level data split, ensuring that data from the same animal did not appear in both training and testing sets. The system is designed to detect early deviations from normal health trajectories prior to the appearance of overt clinical symptoms. All measurements, with the exception of the intra-ruminal bolus sensor, were obtained non-invasively within a commercial dairy farm equipped with automated milking and monitoring infrastructure. A key novelty of this work is the simultaneous integration of data from three independent sources: an automated milking system, a thermal imaging camera, and an intra-ruminal bolus sensor. A hybrid deep learning architecture is introduced that combines the core components of established models, including U-Net, O-Net, and ResNet, to exploit their complementary strengths for the analysis of dairy cow health states. The proposed multimodal approach achieved an overall accuracy of 91.62% and an AUC of 0.94 and improved classification performance by up to 3% compared with single-modality models, demonstrating enhanced robustness and sensitivity to early-stage disease. Full article
(This article belongs to the Section Animal Welfare)
Show Figures

Figure 1

19 pages, 1364 KB  
Article
Sleep Staging Method Based on Multimodal Physiological Signals Using Snake–ACO
by Wenjing Chu, Chen Wang, Liuwang Yang, Lin Guo, Chuquan Wu, Binhui Wang and Xiangkui Wan
Appl. Sci. 2026, 16(3), 1316; https://doi.org/10.3390/app16031316 - 28 Jan 2026
Viewed by 116
Abstract
Non-invasive electrocardiogram (ECG) and respiratory signals are easy to acquire via low-cost sensors, making them promising alternatives for sleep staging. However, existing methods using these signals often yield insufficient accuracy. To address this challenge, we incrementally optimized the sleep staging model by designing [...] Read more.
Non-invasive electrocardiogram (ECG) and respiratory signals are easy to acquire via low-cost sensors, making them promising alternatives for sleep staging. However, existing methods using these signals often yield insufficient accuracy. To address this challenge, we incrementally optimized the sleep staging model by designing a structured experimental workflow: we first preprocessed respiratory and ECG signals, then extracted fused features using an enhanced feature selection technique, which not only reduces redundant features, but also significantly improves the class discriminability of features. The resulting fused features serve as a reliable feature subset for the classifier. In the meantime, we proposed a hybrid optimization algorithm that integrates the snake optimization algorithm (SO) and ant colony optimization algorithm (ACO) for automated hyperparameter optimization of support vector machines (SVMs). Experiments were conducted using two PSG-derived public datasets, the Sleep Heart Health Study (SHHS) and MIT-BIH Polysomnography Database (MIT-BPD), to evaluate the classification performance of multimodal features compared with single-modal features. Results demonstrate that the bimodal staging using SHHS multimodal signals significantly outperformed single-modal ECG-based methods, and the overall accuracy of the SHHS dataset was improved by 12%. The SVM model optimized using the hybrid Snake–ACO algorithm achieved an average accuracy of 89.6% for wake versus sleep classification on the SHHS dataset, representing a 5.1% improvement over traditional grid search methods. Under the subject-independent partitioning experiment, the wake versus sleep classification task maintained good stability with only a 1.8% reduction in accuracy. This study provides novel insights for non-invasive sleep monitoring and clinical decision support. Full article
Show Figures

Figure 1

26 pages, 9070 KB  
Article
Research on a General-Type Hydraulic Valve Leakage Diagnosis Method Based on CLAF-MTL Feature Deep Integration
by Chengbiao Tong, Yu Xiong, Xinming Xu and Yihua Wu
Sensors 2026, 26(3), 821; https://doi.org/10.3390/s26030821 - 26 Jan 2026
Viewed by 342
Abstract
As control and execution components within hydraulic systems, hydraulic valves are critical to system efficiency and operational safety. However, existing research primarily focuses on specific valve designs, exhibiting limitations in versatility and task coordination that constrain their comprehensive diagnostic capabilities. To address these [...] Read more.
As control and execution components within hydraulic systems, hydraulic valves are critical to system efficiency and operational safety. However, existing research primarily focuses on specific valve designs, exhibiting limitations in versatility and task coordination that constrain their comprehensive diagnostic capabilities. To address these issues, this paper innovatively proposes a multi-modal feature deep fusion multi-task prediction (CLAF-MTL) model. This model employs a core architecture based on the CNN-LSTM-Additive Attention module and a fully connected network (FCN) for multi-domain features, while simultaneously embedding a multi-task learning mechanism. It resolves the multi-task prediction challenge of leakage rate regression and fault type classification, significantly enhancing diagnostic efficiency and practicality. This model innovatively designs a complementary fusion mechanism of “deep auto-features + multi-domain features” overcoming the limitations of single-modality representation. It integrates leakage rate regression and fault type classification into a unified modeling framework, dynamically optimizing dual-task weights via the MGDA-UB algorithm to achieve bidirectional complementarity between tasks. Experimental results demonstrate that the proposed method achieves an R2 of 0.9784 for leakage rate prediction and a fault type identification accuracy of 92.23% on the test set. Compared to traditional approaches, this method is the first to simultaneously address the challenge of accurately predicting both leakage rate and fault type. It exhibits superior robustness and applicability across generic valve scenarios, providing an effective solution for intelligent monitoring of valve leakage faults in hydraulic systems. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

Back to TopTop