Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,078)

Search Parameters:
Keywords = multi-modal neural network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1973 KB  
Article
Continuous Smartphone Authentication via Multimodal Biometrics and Optimized Ensemble Learning
by Chia-Sheng Cheng, Ko-Chien Chang, Hsing-Chung Chen and Chao-Lung Chou
Mathematics 2026, 14(2), 311; https://doi.org/10.3390/math14020311 - 15 Jan 2026
Abstract
The ubiquity of smartphones has transformed them into primary repositories of sensitive data; however, traditional one-time authentication mechanisms create a critical trust gap by failing to verify identity post-unlock. Our aim is to mitigate these vulnerabilities and align with the Zero Trust Architecture [...] Read more.
The ubiquity of smartphones has transformed them into primary repositories of sensitive data; however, traditional one-time authentication mechanisms create a critical trust gap by failing to verify identity post-unlock. Our aim is to mitigate these vulnerabilities and align with the Zero Trust Architecture (ZTA) framework and philosophy of “never trust, always verify,” as formally defined by the National Institute of Standards and Technology (NIST) in Special Publication 800-207. This study introduces a robust continuous authentication (CA) framework leveraging multimodal behavioral biometrics. A dedicated application was developed to synchronously capture touch, sliding, and inertial sensor telemetry. For feature modeling, a heterogeneous deep learning pipeline was employed to capture modality-specific characteristics, utilizing Convolutional Neural Networks (CNNs) for sensor data, Long Short-Term Memory (LSTM) networks for curvilinear sliding, and Gated Recurrent Units (GRUs) for discrete touch. To resolve performance degradation caused by class imbalance in Zero Trust environments, a Grid Search Optimization (GSO) strategy was applied to optimize a weighted voting ensemble, identifying the global optimum for decision thresholds and modality weights. Empirical validation on a dataset of 35,519 samples from 15 subjects demonstrates that the optimized ensemble achieves a peak accuracy of 99.23%. Sensor kinematics emerged as the primary biometric signature, followed by touch and sliding features. This framework enables high-precision, non-intrusive continuous verification, bridging the critical security gap in contemporary mobile architectures. Full article
Show Figures

Figure 1

45 pages, 9328 KB  
Review
Advancements in Machine Learning-Assisted Flexible Electronics: Technologies, Applications, and Future Prospects
by Hao Su, Hongcun Wang, Dandan Sang, Santosh Kumar, Dao Xiao, Jing Sun and Qinglin Wang
Biosensors 2026, 16(1), 58; https://doi.org/10.3390/bios16010058 - 13 Jan 2026
Viewed by 79
Abstract
The integration of flexible electronics and machine learning (ML) algorithms has become a revolutionary force driving the field of intelligent sensing, giving rise to a new generation of intelligent devices and systems. This article provides a systematic review of core technologies and practical [...] Read more.
The integration of flexible electronics and machine learning (ML) algorithms has become a revolutionary force driving the field of intelligent sensing, giving rise to a new generation of intelligent devices and systems. This article provides a systematic review of core technologies and practical applications of ML in flexible electronics. It focuses on analyzing the theoretical frameworks of algorithms such as the Long Short-Term Memory Network (LSTM), Convolutional Neural Network (CNN), and Reinforcement Learning (RL) in the intelligent processing of sensor signals (IPSS), multimodal feature extraction (MFE), process defect and anomaly detection (PDAD), and data compression and edge computing (DCEC). This study explores the performance advantages of these technologies in optimizing signal analysis accuracy, compensating for interference in high-noise environments, optimizing manufacturing process parameters, etc., and empirically analyzes their potential applications in wearable health monitoring systems, intelligent control of soft robots, performance optimization of self-powered devices, and intelligent perception of epidermal electronic systems. Full article
Show Figures

Figure 1

22 pages, 4957 KB  
Article
Machine Learning-Based Algorithm for the Design of Multimode Interference Nanodevices
by Roney das Mercês Cerqueira, Vitaly Félix Rodriguez-Esquerre and Anderson Dourado Sisnando
Nanomanufacturing 2026, 6(1), 3; https://doi.org/10.3390/nanomanufacturing6010003 - 13 Jan 2026
Viewed by 153
Abstract
Multimode interference photonic nanodevices have been increasingly used due to their broad functionality. In this study, we present a methodology based on machine learning algorithms for inverse design capable of providing the output port position (x-axis coordinate) and MMI region length [...] Read more.
Multimode interference photonic nanodevices have been increasingly used due to their broad functionality. In this study, we present a methodology based on machine learning algorithms for inverse design capable of providing the output port position (x-axis coordinate) and MMI region length (y-axis coordinate) for achieving higher optical signal transfer power. This is sufficient to design Multimode Interference 1 × 2, 1 × 3, and 1 × 4 nanodevices as power splitters in the wavelength range between 1350 and 1600 nm, which corresponds to the E, S, C, and L bands of the optical communications window. Using Multilayer Perceptron artificial neural networks, trained with k-fold cross-validation, we successfully modeled the complex relationship between geometric parameters and optical responses with high precision and low computational cost. The results of this project meet the requirements for photonic device projects of this nature, demonstrating excellent performance and manufacturing tolerance, with insertion losses ranging from 0.34 dB to 0.58 dB. Full article
Show Figures

Figure 1

19 pages, 1048 KB  
Article
Differentiated Information Mining: Semi-Supervised Graph Learning with Independent Patterns
by Kai Liu and Long Wang
Mathematics 2026, 14(2), 279; https://doi.org/10.3390/math14020279 - 12 Jan 2026
Viewed by 97
Abstract
Graph pseudo-labeling is an effective semi-supervised learning (SSL) approach to improve graph neural networks (GNNs) by leveraging unlabeled data. However, its success heavily depends on the reliability of pseudo-labels, which can often result in confirmation bias and training instability. To address these challenges, [...] Read more.
Graph pseudo-labeling is an effective semi-supervised learning (SSL) approach to improve graph neural networks (GNNs) by leveraging unlabeled data. However, its success heavily depends on the reliability of pseudo-labels, which can often result in confirmation bias and training instability. To address these challenges, we propose a dual-layer consistency semi-supervised framework (DiPat), which integrates an internal differentiating pattern consistency mechanism and an external multimodal knowledge verification mechanism. In the internal layer, DiPat extracts multiple differentiating patterns from a single information source and enforces their consistency to improve the reliability of intrinsic decisions. During the supervised training phase, the model learns to extract and separate these patterns. In the semi-supervised learning phase, the model progressively selects highly consistent samples and ranks pseudo-labels based on the minimum margin principle, mitigating the overconfidence problem common in confidence-based or ensemble-based methods. In the external layer, DiPat also integrates large multimodal language models (MLLMs) as auxiliary information sources. These models provide latent textual knowledge to cross-validate internal decisions and introduce a responsibility scoring mechanism to filter out inconsistent or unreliable external judgments. Extensive experiments on multiple benchmark datasets show that DiPat demonstrates superior robustness and generalization in low-label settings, consistently outperforming strong baseline methods. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

25 pages, 2897 KB  
Review
Integrating UAVs and Deep Learning for Plant Disease Detection: A Review of Techniques, Datasets, and Field Challenges with Examples from Cassava
by Wasiu Akande Ahmed, Olayinka Ademola Abiola, Dongkai Yang, Seyi Festus Olatoyinbo and Guifei Jing
Horticulturae 2026, 12(1), 87; https://doi.org/10.3390/horticulturae12010087 - 12 Jan 2026
Viewed by 130
Abstract
Cassava remains a critical food-security crop across Africa and Southeast Asia but is highly vulnerable to diseases such as cassava mosaic disease (CMD) and cassava brown streak disease (CBSD). Traditional diagnostic approaches are slow, labor-intensive, and inconsistent under field conditions. This review synthesizes [...] Read more.
Cassava remains a critical food-security crop across Africa and Southeast Asia but is highly vulnerable to diseases such as cassava mosaic disease (CMD) and cassava brown streak disease (CBSD). Traditional diagnostic approaches are slow, labor-intensive, and inconsistent under field conditions. This review synthesizes current advances in combining unmanned aerial vehicles (UAVs) with deep learning (DL) to enable scalable, data-driven cassava disease detection. It examines UAV platforms, sensor technologies, flight protocols, image preprocessing pipelines, DL architectures, and existing datasets, and it evaluates how these components interact within UAV–DL disease-monitoring frameworks. The review also compares model performance across convolutional neural network-based and Transformer-based architectures, highlighting metrics such as accuracy, recall, F1-score, inference speed, and deployment feasibility. Persistent challenges—such as limited UAV-acquired datasets, annotation inconsistencies, geographic model bias, and inadequate real-time deployment—are identified and discussed. Finally, the paper proposes a structured research agenda including lightweight edge-deployable models, UAV-ready benchmarking protocols, and multimodal data fusion. This review provides a consolidated reference for researchers and practitioners seeking to develop practical and scalable cassava-disease detection systems. Full article
Show Figures

Figure 1

20 pages, 7206 KB  
Article
Effect Investigation of Process Parameters on 3D Printed Composites Tensile Performance Boosted by Attention Mechanism-Enhanced Multi-Modal Convolutional Neural Networks
by Zeyuan Gao, Zhibin Han, Yaoming Fu, Huiyang Lv, Meng Li, Xin Zhao and Jianjian Zhu
Polymers 2026, 18(2), 203; https://doi.org/10.3390/polym18020203 - 12 Jan 2026
Viewed by 231
Abstract
Fused Deposition Modeling (FDM) is a widely used additive manufacturing technique that enables the fabrication of components using polymeric and composite materials; however, the mechanical performance of printed parts is jointly influenced by multiple printing parameters, which complicates the control and prediction of [...] Read more.
Fused Deposition Modeling (FDM) is a widely used additive manufacturing technique that enables the fabrication of components using polymeric and composite materials; however, the mechanical performance of printed parts is jointly influenced by multiple printing parameters, which complicates the control and prediction of their mechanical properties. In this study, an attention-enhanced multi-modal convolutional neural network (ATT-MM-CNN) is developed to predict the tensile performance of carbon fiber reinforced polylactic acid (PLA-CF) composites manufactured by FDM. Four key printing parameters, layer thickness, nozzle temperature, material flow rate, and printing speed, are systematically investigated, resulting in 256 parameter combinations and corresponding tensile test data for constructing a multi-modal dataset. By integrating multi-modal feature representations and incorporating an attention mechanism, the proposed model effectively learns the nonlinear relationships between printing parameters and mechanical performance under multi-parameter conditions. The results show that all evaluation metrics, including accuracy, precision, recall, and F1-score, exceed 0.95, and the prediction accuracy is improved by at least 17.3% compared with baseline models. These findings demonstrate that the proposed ATT-MM-CNN provides an effective and reliable framework for tensile property prediction and process-parameter optimization of FDM-printed composite structures. Full article
(This article belongs to the Section Artificial Intelligence in Polymer Science)
Show Figures

Graphical abstract

31 pages, 10745 KB  
Article
CNN-GCN Coordinated Multimodal Frequency Network for Hyperspectral Image and LiDAR Classification
by Haibin Wu, Haoran Lv, Aili Wang, Siqi Yan, Gabor Molnar, Liang Yu and Minhui Wang
Remote Sens. 2026, 18(2), 216; https://doi.org/10.3390/rs18020216 - 9 Jan 2026
Viewed by 197
Abstract
The existing multimodal image classification methods often suffer from several key limitations: difficulty in effectively balancing local detail and global topological relationships in hyperspectral image (HSI) feature extraction; insufficient multi-scale characterization of terrain features from light detection and ranging (LiDAR) elevation data; and [...] Read more.
The existing multimodal image classification methods often suffer from several key limitations: difficulty in effectively balancing local detail and global topological relationships in hyperspectral image (HSI) feature extraction; insufficient multi-scale characterization of terrain features from light detection and ranging (LiDAR) elevation data; and neglect of deep inter-modal interactions in traditional fusion methods, often accompanied by high computational complexity. To address these issues, this paper proposes a comprehensive deep learning framework combining convolutional neural network (CNN), a graph convolutional network (GCN), and wavelet transform for the joint classification of HSI and LiDAR data, including several novel components: a Spectral Graph Mixer Block (SGMB), where a CNN branch captures fine-grained spectral–spatial features by multi-scale convolutions, while a parallel GCN branch models long-range contextual features through an enhanced gated graph network. This dual-path design enables simultaneous extraction of local detail and global topological features from HSI data; a Spatial Coordinate Block (SCB) to enhance spatial awareness and improve the perception of object contours and distribution patterns; a Multi-Scale Elevation Feature Extraction Block (MSFE) for capturing terrain representations across varying scales; and a Bidirectional Frequency Attention Encoder (BiFAE) to enable efficient and deep interaction between multimodal features. These modules are intricately designed to work in concert, forming a cohesive end-to-end framework, which not only achieves a more effective balance between local details and global contexts but also enables deep yet computationally efficient interaction across features, significantly strengthening the discriminability and robustness of the learned representation. To evaluate the proposed method, we conducted experiments on three multimodal remote sensing datasets: Houston2013, Augsburg, and Trento. Quantitative results demonstrate that our framework outperforms state-of-the-art methods, achieving OA values of 98.93%, 88.05%, and 99.59% on the respective datasets. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

28 pages, 12746 KB  
Article
Spatiotemporal Dynamics of Forest Biomass in the Hainan Tropical Rainforest Based on Multimodal Remote Sensing and Machine Learning
by Zhikuan Liu, Qingping Ling, Wenlu Zhao, Zhongke Feng, Huiqing Pei, Pietro Grimaldi and Zixuan Qiu
Forests 2026, 17(1), 85; https://doi.org/10.3390/f17010085 - 8 Jan 2026
Viewed by 161
Abstract
Tropical rainforests play a vital role in maintaining global ecological balance, carbon cycling, and biodiversity conservation, making research on their biomass dynamics scientifically significant. This study integrates multi-source remote sensing data, including canopy height derived from GEDI and ICESat-2 satellite-borne lidar, Landsat imagery, [...] Read more.
Tropical rainforests play a vital role in maintaining global ecological balance, carbon cycling, and biodiversity conservation, making research on their biomass dynamics scientifically significant. This study integrates multi-source remote sensing data, including canopy height derived from GEDI and ICESat-2 satellite-borne lidar, Landsat imagery, and environmental variables, to estimate forest biomass dynamics in Hainan’s tropical rainforests at a 30 m spatial resolution, involving a correlation analysis of factors influencing spatiotemporal changes in Hainan Tropical Rainforest biomass. The research aims to investigate the spatiotemporal variations in forest biomass and identify key environmental drivers influencing biomass accumulation. Four machine learning algorithms—Backpropagation Neural Network (BP), Convolutional Neural Network (CNN), Random Forest (RF), and Gradient Boosting Decision Tree (GBDT)—were applied to estimate biomass across five forest types from 2003 to 2023. Results indicate the Random Forest model achieved the highest accuracy (R2 = 0.82). Forest biomass and carbon stocks in Hainan Tropical Rainforest National Park increased significantly, with total carbon stocks rising from 29.03 million tons of carbon to 42.47 million tons of carbon—a 46.36% increase over 20 years. These findings demonstrate that integrating multimodal remote sensing data with advanced machine learning provides an effective approach for accurately assessing biomass dynamics, supporting forest management and carbon sink evaluations in tropical rainforest ecosystems. Full article
Show Figures

Figure 1

15 pages, 1386 KB  
Article
Symmetry and Asymmetry Principles in Deep Speaker Verification Systems: Balancing Robustness and Discrimination Through Hybrid Neural Architectures
by Sundareswari Thiyagarajan and Deok-Hwan Kim
Symmetry 2026, 18(1), 121; https://doi.org/10.3390/sym18010121 - 8 Jan 2026
Viewed by 133
Abstract
Symmetry and asymmetry are foundational design principles in artificial intelligence, defining the balance between invariance and adaptability in multimodal learning systems. In audio-visual speaker verification, where speech and lip-motion features are jointly modeled to determine whether two utterances belong to the same individual, [...] Read more.
Symmetry and asymmetry are foundational design principles in artificial intelligence, defining the balance between invariance and adaptability in multimodal learning systems. In audio-visual speaker verification, where speech and lip-motion features are jointly modeled to determine whether two utterances belong to the same individual, these principles govern both fairness and discriminative power. In this work, we analyze how symmetry and asymmetry emerge within a gated-fusion architecture that integrates Time-Delay Neural Networks and Bidirectional Long Short-Term Memory encoders for speech, ResNet-based visual lip encoders, and a shared Conformer-based temporal backbone. Structural symmetry is preserved through weight-sharing across paired utterances and symmetric cosine-based scoring, ensuring verification consistency regardless of input order. In contrast, asymmetry is intentionally introduced through modality-dependent temporal encoding, multi-head attention pooling, and a learnable gating mechanism that dynamically re-weights the contribution of audio and visual streams at each timestep. This controlled asymmetry allows the model to rely on visual cues when speech is noisy, and conversely on speech when lip visibility is degraded, yielding adaptive robustness under cross-modal degradation. Experimental results demonstrate that combining symmetric embedding space design with adaptive asymmetric fusion significantly improves generalization, reducing Equal Error Rate (EER) to 3.419% on VoxCeleb-2 test dataset without sacrificing interpretability. The findings show that symmetry ensures stable and fair decision-making, while learnable asymmetry enables modality awareness together forming a principled foundation for next-generation audio-visual speaker verification systems. Full article
Show Figures

Figure 1

27 pages, 13798 KB  
Article
A Hierarchical Deep Learning Architecture for Diagnosing Retinal Diseases Using Cross-Modal OCT to Fundus Translation in the Lack of Paired Data
by Ekaterina A. Lopukhova, Gulnaz M. Idrisova, Timur R. Mukhamadeev, Grigory S. Voronkov, Ruslan V. Kutluyarov and Elizaveta P. Topolskaya
J. Imaging 2026, 12(1), 36; https://doi.org/10.3390/jimaging12010036 - 8 Jan 2026
Viewed by 187
Abstract
The paper focuses on automated diagnosis of retinal diseases, particularly Age-related Macular Degeneration (AMD) and diabetic retinopathy (DR), using optical coherence tomography (OCT), while addressing three key challenges: disease comorbidity, severe class imbalance, and the lack of strictly paired OCT and fundus data. [...] Read more.
The paper focuses on automated diagnosis of retinal diseases, particularly Age-related Macular Degeneration (AMD) and diabetic retinopathy (DR), using optical coherence tomography (OCT), while addressing three key challenges: disease comorbidity, severe class imbalance, and the lack of strictly paired OCT and fundus data. We propose a hierarchical modular deep learning system designed for multi-label OCT screening with conditional routing to specialized staging modules. To enable DR staging when fundus images are unavailable, we use cross-modal alignment between OCT and fundus representations. This approach involves training a latent bridge that projects OCT embeddings into the fundus feature space. We enhance clinical reliability through per-class threshold calibration and implement quality control checks for OCT-only DR staging. Experiments demonstrate robust multi-label performance (macro-F1 =0.989±0.006 after per-class threshold calibration) and reliable calibration (ECE =2.1±0.4%), and OCT-only DR staging is feasible in 96.1% of cases that meet the quality control criterion. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

34 pages, 6460 KB  
Article
Explainable Gait Multi-Anchor Space-Aware Temporal Convolutional Networks for Gait Recognition in Neurological, Orthopedic, and Healthy Cohorts
by Abdullah Alharthi
Mathematics 2026, 14(2), 230; https://doi.org/10.3390/math14020230 - 8 Jan 2026
Viewed by 165
Abstract
Gait recognition using wearable sensor data is crucial for healthcare, rehabilitation, and monitoring neurological and musculoskeletal disorders. This study proposes a deep learning framework for gait classification using inertial measurements from four body-mounted IMU sensors (head, lower back, and both feet). The data [...] Read more.
Gait recognition using wearable sensor data is crucial for healthcare, rehabilitation, and monitoring neurological and musculoskeletal disorders. This study proposes a deep learning framework for gait classification using inertial measurements from four body-mounted IMU sensors (head, lower back, and both feet). The data were collected from a publicly available, clinically annotated dataset comprising 1356 gait trials from 260 individuals with diverse pathologies. The framework, G-MASA-TCN (Gait Multi-Anchor, Space-Aware Temporal Convolutional Network), integrates multi-scale temporal fusion, graph-informed spatial modeling, and residual dilated convolutions to extract discriminative gait signatures. To ensure both high performance and interpretability, Integrated Gradients is incorporated as an explainable AI (XAI) method, providing sensor-level and temporal attributes that reveal the features driving model decisions. The framework is evaluated via repeated cross-validation experiments, reporting detailed metrics with cross-run statistical analysis (mean ± standard deviation) to assess robustness. Results show that G-MASA-TCN achieves 98% classification accuracy for neurological, orthopedic, and healthy cohorts, demonstrating superior stability and resilience compared to baseline architectures, including Gated Recurrent Unit (GRU), Transformer neural networks, and standard TCNs, and 98.4% accuracy in identifying individual subjects based on gait. Furthermore, the model offers clinically meaningful insights into which sensors and gait phases contribute most to its predictions. This work presents an accurate, interpretable, and reliable tool for gait pathology recognition, with potential for translation to real-world clinical settings. Full article
(This article belongs to the Special Issue Deep Neural Network: Theory, Algorithms and Applications)
Show Figures

Graphical abstract

27 pages, 712 KB  
Review
Segmentation and Classification of Lung Cancer Images Using Deep Learning
by Xiaoli Yang, Angchao Duan, Ziyan Jiang, Xiao Li, Chenchen Wang, Jiawen Wang and Jiayi Zhou
Appl. Sci. 2026, 16(2), 628; https://doi.org/10.3390/app16020628 - 7 Jan 2026
Viewed by 278
Abstract
Lung cancer ranks among the world’s most prevalent and deadly diseases. Early detection is crucial for improving patient survival rates. Computed tomography (CT) is a common method for lung cancer screening and diagnosis. With the advancement of computer-aided diagnosis (CAD) systems, deep learning [...] Read more.
Lung cancer ranks among the world’s most prevalent and deadly diseases. Early detection is crucial for improving patient survival rates. Computed tomography (CT) is a common method for lung cancer screening and diagnosis. With the advancement of computer-aided diagnosis (CAD) systems, deep learning (DL) technologies have been extensively explored to aid in interpreting CT images for lung cancer identification. Therefore, this review aims to comprehensively examine DL techniques developed for lung cancer screening and diagnosis. It explores various datasets that play a crucial role in lung cancer CT image segmentation and classification tasks, analyzing their differences in aspects such as scale. Next, various evaluation metrics for measuring model performance are discussed. The segmentation section details convolutional neural network-based (CNN-based) segmentation methods, segmentation approaches using U-shaped network (U-Net) architectures, and the application and improvements of Transformer models in this domain. The classification section covers CNN-based classification methods, classification methods incorporating attention mechanisms, Transformer-based classification methods, and ensemble learning approaches. Finally, the paper summarizes the development of segmentation and classification techniques for lung cancer CT images, identifies current challenges, and outlines future research directions in areas such as dataset annotation, multimodal dataset construction, multi-model fusion, and model interpretability. Full article
Show Figures

Figure 1

28 pages, 3824 KB  
Article
Comparison Between Early and Intermediate Fusion of Multimodal Techniques: Lung Disease Diagnosis
by Ahad Alloqmani and Yoosef B. Abushark
AI 2026, 7(1), 16; https://doi.org/10.3390/ai7010016 - 7 Jan 2026
Viewed by 224
Abstract
Early and accurate diagnosis of lung diseases is essential for effective treatment and patient management. Conventional diagnostic models trained on a single data type often miss important clinical information. This study explored a multimodal deep learning framework that integrates cough sounds, chest radiograph [...] Read more.
Early and accurate diagnosis of lung diseases is essential for effective treatment and patient management. Conventional diagnostic models trained on a single data type often miss important clinical information. This study explored a multimodal deep learning framework that integrates cough sounds, chest radiograph (X-rays), and computed tomography (CT) scans to enhance disease classification performance. Two fusion strategies, early and intermediate fusion, were implemented and evaluated against three single-modality baselines. The dataset was collected from different sources. Each dataset underwent preprocessing steps, including noise removal, grayscale conversion, image cropping, and class balancing, to ensure data quality. Convolutional neural network (CNN) and Extreme Inception (Xception) architectures were used for feature extraction and classification. The results show that multimodal learning achieves superior performance compared with single models. The intermediate fusion model achieved 98% accuracy, while the early fusion model reached 97%. In contrast, single CXR and CT models achieved 94%, and the cough sound model achieved 79%. These results confirm that multimodal integration, particularly intermediate fusion, offers a more reliable framework for automated lung disease diagnosis. Full article
(This article belongs to the Section Medical & Healthcare AI)
Show Figures

Figure 1

29 pages, 5843 KB  
Article
A Multi-Level Hybrid Architecture for Structured Sentiment Analysis
by Altanbek Zulkhazhav, Gulmira Bekmanova, Banu Yergesh, Aizhan Nazyrova, Zhanar Lamasheva and Gaukhar Aimicheva
Electronics 2026, 15(2), 249; https://doi.org/10.3390/electronics15020249 - 6 Jan 2026
Viewed by 240
Abstract
This paper presents a hybrid architecture for automatic sentiment analysis of Kazakh-language political discourse. The Kazakh language is characterized by an agglutinative structure, a complex word-formation system, and the limited availability of digital resources, which significantly complicates the application of standard neural network [...] Read more.
This paper presents a hybrid architecture for automatic sentiment analysis of Kazakh-language political discourse. The Kazakh language is characterized by an agglutinative structure, a complex word-formation system, and the limited availability of digital resources, which significantly complicates the application of standard neural network approaches. To account for these characteristics, a multi-level system was developed that combines morphological and syntactic analysis rules, ontological relationships between political concepts, and multilingual representations of the XLM-R model, used in zero-shot mode. A corpus of 12,000 sentences was annotated for sentiment polarity and used for training and evaluation, while Universal Dependencies annotation was applied for morpho-syntactic analysis. Rule-based components compensate for errors related to affixation variability, modality, and directive constructions. An ontology comprising over 300 domain concepts ensures the correct interpretation of set expressions, terms, and political actors. Experimental results show that the proposed hybrid architecture outperforms both neural network baseline models and purely rule-based solutions, achieving Macro-F1 = 0.81. Ablation revealed that the contribution of modules is unevenly distributed: the ontology provides +0.04 to Macro-F1, the UD syntax +0.08, and the rule-based module +0.11. The developed system forms an interpretable and robust assessment of tonality, emotions, and discursive strategies in political discourse, and also creates a basis for further expansion of the corpus, additional training of models, and the application of hybrid methods to other tasks of analyzing low-resource languages. Full article
Show Figures

Figure 1

23 pages, 998 KB  
Article
A SIEM-Integrated Cybersecurity Prototype for Insider Threat Anomaly Detection Using Enterprise Logs and Behavioural Biometrics
by Mohamed Salah Mohamed and Abdullahi Arabo
Electronics 2026, 15(1), 248; https://doi.org/10.3390/electronics15010248 - 5 Jan 2026
Viewed by 304
Abstract
Insider threats remain a serious concern for organisations in both public and private sectors. Detecting anomalous behaviour in enterprise environments is critical for preventing insider incidents. While many prior studies demonstrate promising results using deep learning on offline datasets, few address real-time operationalisation [...] Read more.
Insider threats remain a serious concern for organisations in both public and private sectors. Detecting anomalous behaviour in enterprise environments is critical for preventing insider incidents. While many prior studies demonstrate promising results using deep learning on offline datasets, few address real-time operationalisation or calibrated alert control within a Security Information and Event Management (SIEM) workflow. This paper presents a SIEM-integrated prototype that fuses the Computer Emergency Response Team Insider Threat Test Dataset (CERT) enterprise logs (Logon, Device, HTTP, and Email) with behavioural biometrics from the Balabit mouse dynamics dataset. Per-modality one-dimensional convolutional neural network (1D CNN) branches are trained independently using imbalance-aware strategies, including downsampling, class weighting, and focal loss. A unified 20 × N feature schema ensures train–serve parity and consistent feature validation during live inference. Post-training calibration using Platt and isotonic regression enables analyst-controlled threshold tuning and stable alert budgeting inside the SIEM. The models are deployed in Splunk’s Machine Learning Toolkit (MLTK), where dashboards visualise anomaly timelines, risky users or hosts, and cross-stream overlaps. Evaluation emphasises operational performance, precision–recall balance, calibration stability, and throughput rather than headline accuracy. Results show calibrated, controllable alert volumes: for Device, precision ≈0.70 at recall ≈0.30 (PR-AUC = 0.468, ROC-AUC = 0.949); for Logon, ROC-AUC = 0.936 with an ultra-low false-positive rate at a conservative threshold. Batch CPU inference sustains ≈70.5 k windows/s, confirming real-time feasibility. This study’s main contribution is to demonstrate a calibrated, multi-modal CNN framework that integrates directly within a live SIEM pipeline. It provides a reproducible path from offline anomaly detection research to Security Operations Centre (SOC)-ready deployment, bridging the gap between academic models and operational Cybersecurity practice. Full article
(This article belongs to the Special Issue AI in Cybersecurity, 2nd Edition)
Show Figures

Figure 1

Back to TopTop