Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,176)

Search Parameters:
Keywords = multi-mode recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
11 pages, 1707 KB  
Article
A Retrospective Study of the Ultrasound Imaging Characteristics of Juvenile Xanthogranuloma
by Hong Wang, Xiaoyan Peng and Yujia Yang
J. Clin. Med. 2026, 15(6), 2134; https://doi.org/10.3390/jcm15062134 - 11 Mar 2026
Abstract
Objectives: To strengthen the recognition of juvenile xanthogranuloma (JXG) by analyzing ultrasound findings. Methods: This study retrospectively enrolled these patients with pathologically confirmed JXG from January 2011 to March 2025. The clinical, imaging, pathological features, and prognosis of all included patients were analyzed. [...] Read more.
Objectives: To strengthen the recognition of juvenile xanthogranuloma (JXG) by analyzing ultrasound findings. Methods: This study retrospectively enrolled these patients with pathologically confirmed JXG from January 2011 to March 2025. The clinical, imaging, pathological features, and prognosis of all included patients were analyzed. All the imaging features were evaluated in consensus by two radiologists. Results: Fourteen patients were included in the study. A total of 78.6% presented with solitary masses. The age of the patients ranged from 2 months to 48 years. Those aged ≤1 year accounted for 64.3% of the sample. The lesions were predominantly located on the head and face, and the skin of most patients was yellowish-orange. The ultrasound manifestations are mostly hypoechoic masses with clear boundaries and regular shapes. Contrast-enhanced ultrasound shows a slight homogeneous enhancement, and on shear wave elastography, it appears to be relatively hard. Conclusions: JXGs are more common in infants or young children and present with yellowish-orange, cutaneous lesions. Ultrasound revealed homogeneous, well-circumscribed, regular hypoechoic nodules. Multimodal imaging may be helpful for preoperative diagnosis. Full article
(This article belongs to the Special Issue Advances in the Diagnosis and Treatment of Skin Cancer)
Show Figures

Figure 1

15 pages, 3340 KB  
Review
Less-Invasive Hemodynamic and Tissue Perfusion Monitoring in Sepsis and Septic Shock: A Narrative Review
by Marialaura Scarcella, Paolo Formenti, Gian Marco Petroni, Riccardo Monti and Edoardo De Robertis
J. Clin. Med. 2026, 15(5), 2061; https://doi.org/10.3390/jcm15052061 - 8 Mar 2026
Viewed by 126
Abstract
Sepsis and septic shock remain major causes of morbidity and mortality in critically ill patients. Hemodynamic management is a cornerstone of treatment, yet the optimal monitoring strategy to guide resuscitation is still debated. The progressive decline in the use of invasive techniques, such [...] Read more.
Sepsis and septic shock remain major causes of morbidity and mortality in critically ill patients. Hemodynamic management is a cornerstone of treatment, yet the optimal monitoring strategy to guide resuscitation is still debated. The progressive decline in the use of invasive techniques, such as pulmonary artery catheterization, has favored the development of less-invasive and non-invasive monitoring approaches. Recent technologies allow continuous assessment of cardiovascular function through arterial waveform analysis, non-invasive blood pressure monitoring, and predictive algorithms, while increasing attention has been directed toward the evaluation of tissue perfusion and oxygenation. This reflects the recognition that normalization of macrocirculatory variables does not necessarily ensure adequate microcirculatory perfusion in sepsis. This narrative review summarizes current evidence on less-invasive hemodynamic and tissue perfusion monitoring in sepsis and septic shock, discussing their physiological rationale and potential role within contemporary, multimodal resuscitation strategies. Full article
(This article belongs to the Special Issue Sepsis: Clinical Advances and Practical Updates)
Show Figures

Figure 1

14 pages, 494 KB  
Review
Acquired Epidermodysplasia Verruciformis in Patients with Iatrogenic Immunosuppression
by Neha S. Momin, Peter L. Rady and Stephen K. Tyring
J. Clin. Med. 2026, 15(5), 2049; https://doi.org/10.3390/jcm15052049 - 7 Mar 2026
Viewed by 178
Abstract
Background: Acquired epidermodysplasia verruciformis (AEV) is a rare cutaneous disorder arising in immunocompromised individuals. AEV is characterized by flat-topped, wart-like, or hypopigmented lesions predominantly on sun-exposed areas. Unlike classic genetic EV, AEV develops in the absence of germline mutations or family history. AEV [...] Read more.
Background: Acquired epidermodysplasia verruciformis (AEV) is a rare cutaneous disorder arising in immunocompromised individuals. AEV is characterized by flat-topped, wart-like, or hypopigmented lesions predominantly on sun-exposed areas. Unlike classic genetic EV, AEV develops in the absence of germline mutations or family history. AEV most commonly arises in patients receiving iatrogenic immunosuppressive therapy for organ transplantation, autoimmune disease, or hematologic disorders. Methods: A comprehensive literature review was conducted via the PubMed database. Case reports and case series studies describing AEV in transplant and non-transplant iatrogenic immunosuppression were identified through a literature search. There were no restrictions on language or publication year. The last search was conducted in July 2025. Reports were analyzed for patient demographics, immunosuppressive agents, HPV subtypes, clinical and histopathologic features, and treatment outcomes. Results: AEV occurs across a broad spectrum of immunosuppressive therapies, including calcineurin inhibitors, antimetabolites, biologics, tyrosine kinase inhibitors, and cytotoxic chemotherapy. β-HPV subtypes, most commonly HPV 5 and 8, drive lesion formation in the context of impaired cell-mediated immunity. Histopathology demonstrates keratinocyte vacuolization, acanthosis, and perinuclear halos. Lesions may persist despite immunosuppressive adjustment, due to viral latency and incomplete immune reconstitution. Treatment strategies are varied and include topical retinoids, immune response modifiers, systemic retinoids, and HPV vaccination, and have variable efficacy. AEV carries an elevated risk of cutaneous squamous cell carcinoma, particularly in transplant recipients, and highlights the need for proactive dermatologic management. Conclusions: AEV represents a clinically significant consequence of immunosuppression mediated by β-HPV. Early recognition, monitoring for malignant transformation, and individualized multimodal therapy are critical. Future studies should evaluate targeted interventions to enhance antiviral immunity and establish standardized treatment guidelines. Full article
(This article belongs to the Section Dermatology)
Show Figures

Figure 1

17 pages, 1701 KB  
Article
CLIP-ArASL: A Lightweight Multimodal Model for Arabic Sign Language Recognition
by Naif Alasmari
Appl. Sci. 2026, 16(5), 2573; https://doi.org/10.3390/app16052573 - 7 Mar 2026
Viewed by 126
Abstract
Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. [...] Read more.
Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. This paper introduces CLIP-ArASL, a lightweight CLIP-style multimodal approach for static ArASL letter recognition that aligns visual hand gestures with bilingual textual descriptions. The approach integrates an EfficientNet-B0 image encoder with a MiniLM text encoder to learn a shared embedding space using a hybrid objective that combines contrastive and cross-entropy losses. This design supports supervised classification on seen classes and zero-shot prediction on unseen classes using textual class representations. The proposed approach is evaluated on two public datasets, ArASL2018 and ArASL21L. Under supervised evaluation, recognition accuracies of 99.25±0.14% and 91.51±1.29% are achieved, respectively. Zero-shot performance is assessed by withholding 20% of gesture classes during training and predicting them using only their textual descriptions. In this setting, accuracies of 55.2±12.15% on ArASL2018 and 37.6±9.07% on ArASL21L are obtained. These results show that multimodal vision–language alignment supports semantic transfer and enables recognition of unseen classes. Full article
(This article belongs to the Special Issue Machine Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

34 pages, 4142 KB  
Article
Subject-Independent Multimodal Interaction Modeling for Joint Emotion and Immersion Estimation in Virtual Reality
by Haibing Wang and Mujiangshan Wang
Symmetry 2026, 18(3), 451; https://doi.org/10.3390/sym18030451 - 6 Mar 2026
Viewed by 115
Abstract
Virtual Reality (VR) has emerged as a powerful medium for immersive human–computer interaction, where users’ emotional and experiential states play a pivotal role in shaping engagement and perception. However, existing affective computing approaches often model emotion recognition and immersion estimation as independent problems, [...] Read more.
Virtual Reality (VR) has emerged as a powerful medium for immersive human–computer interaction, where users’ emotional and experiential states play a pivotal role in shaping engagement and perception. However, existing affective computing approaches often model emotion recognition and immersion estimation as independent problems, overlooking their intrinsic coupling and the structured relationships underlying multimodal physiological signals. In this work, we propose a modality-aware multi-task learning framework that jointly models emotion recognition and immersion estimation from a graph-structured and symmetry-aware interaction perspective. Specifically, heterogeneous physiological and behavioral modalities—including eye-tracking, electrocardiogram (ECG), and galvanic skin response (GSR)—are treated as relational components with structurally symmetric encoding and fusion mechanisms, while their cross-modality dependencies are adaptively aggregated to preserve interaction symmetry at the representation level and introduce controlled asymmetry at the task-optimization level through weighted multi-task learning, without introducing explicit graph neural network architectures. To support reproducible evaluation, the VREED dataset is further extended with quantitative immersion annotations derived from presence-related self-reports via weighted aggregation and factor analysis. Extensive experiments demonstrate that the proposed framework consistently outperforms recurrent, convolutional, and Transformer-based baselines. Compared with the strongest Transformer baseline, the proposed framework yields consistent relative performance gains of approximately 3–7% for emotion recognition metrics and reduces immersion estimation errors by nearly 9%. Beyond empirical improvements, this study provides a structured interpretation of multimodal affective modeling that highlights symmetry, coupling, and controlled symmetry breaking in multi-task learning, offering a principled foundation for adaptive VR systems, emotion-driven personalization, and dynamic user experience optimization. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

18 pages, 395 KB  
Review
Low-Dose Naltrexone in Chronic Pain Management: Mechanisms, Evidence, and Clinical Implications
by Alyssa McKenzie, Tiffany Bittar, Rachel Dombrower, Dupinder Raman, Hatim Hussain, Nitchanan Theeraphapphong, Sophia M. McKenzie and Alaa Abd-Elsayed
J. Pers. Med. 2026, 16(3), 151; https://doi.org/10.3390/jpm16030151 - 6 Mar 2026
Viewed by 221
Abstract
Chronic pain imposes a substantial burden on global health and remains challenging to manage, despite ongoing advances in pharmacologic and interventional therapies. Recognition of chronic pain as a condition driven by central sensitization and neuroimmune dysregulation has prompted interest in therapies that target [...] Read more.
Chronic pain imposes a substantial burden on global health and remains challenging to manage, despite ongoing advances in pharmacologic and interventional therapies. Recognition of chronic pain as a condition driven by central sensitization and neuroimmune dysregulation has prompted interest in therapies that target these mechanisms rather than peripheral nociception alone. Low-dose naltrexone (LDN), administered at doses substantially lower than those used for opioid or alcohol use disorders, has emerged as a repurposed treatment with potential analgesic and anti-inflammatory properties. This review summarizes the pharmacologic characteristics of LDN, with emphasis on its proposed mechanisms involving transient opioid receptor blockade, modulation of microglial activation, Toll-like receptor signaling, and central neuroimmune pathways. Available clinical evidence evaluating LDN across a range of chronic pain conditions, such as fibromyalgia, neuropathic pain syndromes, inflammatory and autoimmune disorders, headache disorders, and other centralized pain states, is critically reviewed. Although early trials, observational studies, and case series suggest potential benefit in selected populations, the overall evidence base remains limited, heterogeneous, and characterized by variability in dosing strategies and outcome measures. Safety, tolerability, and practical considerations relevant to contemporary pain practice are discussed, including interactions with opioid therapy and challenges related to off-label use. Finally, key gaps in the current evidence and priorities for future research are highlighted, underscoring the need for larger, well-designed randomized trials and mechanism-informed studies to better define LDN’s role in multimodal chronic pain management. Full article
(This article belongs to the Section Mechanisms of Diseases)
Show Figures

Figure 1

24 pages, 6373 KB  
Article
Augmented Reality-Based Training System Using Multimodal Language Model for Context-Aware Guidance and Activity Recognition in Complex Machine Operations
by Waseem Ahmed and Qingjin Peng
Designs 2026, 10(2), 30; https://doi.org/10.3390/designs10020030 - 5 Mar 2026
Viewed by 169
Abstract
Augmented Reality (AR) and Large Language Models (LLMs) have made significant advances across many fields, opening new possibilities, particularly in complex machine operations. In complex operations, non-expert users often struggle to perform high-precision tasks and require constant supervision to execute tasks correctly. This [...] Read more.
Augmented Reality (AR) and Large Language Models (LLMs) have made significant advances across many fields, opening new possibilities, particularly in complex machine operations. In complex operations, non-expert users often struggle to perform high-precision tasks and require constant supervision to execute tasks correctly. This paper proposes a novel AR-MLLM-based training system that integrates AR, multimodal large language models (MLLMs), and prompt engineering to interpret real-time machine feedback and user activity. It converts extensive technical text into structured, step-by-step commands. The system uses a prompt structure developed through an iterative design method and refined across multiple machine operation scenarios, enabling ChatGPT to generate task-specific contextual digital overlays directly on the physical machines. A case study with participants was conducted to assess the effectiveness and usability of the AR-MLLM system in Coordinate Measuring Machine (CMM) operation training. The experimental results demonstrate high accuracy in task recognition and feature measurement activity. The data further show reduced time and user workload during task execution with the proposed AR-MLLM system. The proposed system not only provides real-time guidance and enhances efficiency in CMM operation training but also demonstrates the potential of the AR-MLLM design framework for broader industrial applications. Full article
Show Figures

Figure 1

20 pages, 4120 KB  
Article
An Efficient Finger Vein Recognition Method Based on Improved Lightweight MobileNet
by Xuhui Zhang, Yuxi Liu, Yixin Yan, Jiabin Li and Lei Xu
Sensors 2026, 26(5), 1634; https://doi.org/10.3390/s26051634 - 5 Mar 2026
Viewed by 128
Abstract
Finger vein recognition has emerged as a highly robust and intrinsically stable biometric technology, demonstrating great potential in identity authentication and intelligent security applications. However, conventional methods still suffer from constraints in feature representation and computational efficiency, particularly under challenging conditions such as [...] Read more.
Finger vein recognition has emerged as a highly robust and intrinsically stable biometric technology, demonstrating great potential in identity authentication and intelligent security applications. However, conventional methods still suffer from constraints in feature representation and computational efficiency, particularly under challenging conditions such as illumination variation, pose deviation, and noise interference. To address these challenges, this study presents an efficient finger vein recognition approach based on a lightweight convolutional neural network (LCNN) architecture. The proposed framework integrates a multi-stage image preprocessing pipeline for automatic vein region detection, advanced denoising, and refined texture enhancement, which is subsequently followed by compact feature modeling within a lightweight deep network. Extensive experiments on the public Shandong University Machine Learning and Applications-Homologous Multi-Modal Traits (SDUMLA-HMT) dataset and a self-acquired Laboratory Finger-Vein (Lab-Vein) dataset validate the superiority of the proposed method, achieving recognition accuracies of 97.1% and 98.3%, respectively, surpassing existing benchmark models. Moreover, the model demonstrates notable reductions in parameter complexity and computational cost, achieving an average inference time of only 12.6 ms, which confirms its strong real-time capability and suitability for embedded deployment. Overall, the proposed approach attains a desirable trade-off between accuracy and efficiency, offering meaningful implications for the advancement of lightweight biometric recognition systems. Full article
Show Figures

Figure 1

24 pages, 1883 KB  
Article
A Multi-Scale Vision–Sensor Collaborative Framework for Small-Target Insect Pest Management
by Chongyu Wang, Yicheng Chen, Shangshan Chen, Ranran Chen, Ziqi Xia, Ruoyu Hu and Yihong Song
Insects 2026, 17(3), 281; https://doi.org/10.3390/insects17030281 - 4 Mar 2026
Viewed by 227
Abstract
In complex agricultural production environments, small-target pests—characterized by tiny scales, strong background confusion, and close dependence on environmental conditions—pose major challenges to precise monitoring and green pest control. To facilitate the transition from experience-driven to data-driven pest management, a multi-scale vision–sensor collaborative recognition [...] Read more.
In complex agricultural production environments, small-target pests—characterized by tiny scales, strong background confusion, and close dependence on environmental conditions—pose major challenges to precise monitoring and green pest control. To facilitate the transition from experience-driven to data-driven pest management, a multi-scale vision–sensor collaborative recognition method is proposed for field and protected agriculture scenarios to improve the accuracy and stability of small-target pest recognition under complex conditions. The method jointly models multi-scale visual representations and pest ecological mechanisms: a multi-scale visual feature module enhances fine-grained texture and morphological cues of small targets in deep networks, alleviating feature sparsity and scale mismatch, while environmental sensor data, including temperature, humidity, and illumination, are introduced as priors to modulate visual features and explicitly incorporate ecological constraints into the discrimination process. Stable multimodal fusion and pest category prediction are then achieved through a vision–sensor collaborative discrimination module. Experiments on a multimodal dataset collected from real farmland and greenhouse environments in Linhe District, Bayannur City, Inner Mongolia, demonstrate that the proposed method achieves approximately 93.1% accuracy, 92.0% precision, 91.2% recall, and a 91.6% F1-score on the test set, significantly outperforming traditional machine learning approaches, single-scale deep learning models, and multi-scale vision baselines without environmental priors. Category-level evaluations show balanced performance across multiple small-target pests, including aphids, thrips, whiteflies, leafhoppers, spider mites, and leaf beetles, while ablation studies confirm the critical contributions of multi-scale visual modeling, environmental prior modulation, and vision–sensor collaborative discrimination. Full article
Show Figures

Figure 1

17 pages, 7794 KB  
Review
Artificial Intelligence and Digital Technology in Cardiovascular Imaging: A Narrative Review
by Constantinos H. Papadopoulos, Dimitris Karelas, Christina Floropoulou, Konstantina Tzavida, Dimitrios Oikonomidis, Athanasios Tasoulis, Evangelos Tatsis, Ioannis Kouloulias and Nikolaos P. E. Kadoglou
BioTech 2026, 15(1), 22; https://doi.org/10.3390/biotech15010022 - 3 Mar 2026
Viewed by 212
Abstract
The rapid expansion of digital technologies and artificial intelligence (AI) has profoundly transformed cardiovascular imaging, enabling more precise, efficient, and reproducible assessment of cardiac structure and function. This narrative review summarizes recent advances in AI-driven methods across echocardiography, cardiac computed tomography, cardiac magnetic [...] Read more.
The rapid expansion of digital technologies and artificial intelligence (AI) has profoundly transformed cardiovascular imaging, enabling more precise, efficient, and reproducible assessment of cardiac structure and function. This narrative review summarizes recent advances in AI-driven methods across echocardiography, cardiac computed tomography, cardiac magnetic resonance, and nuclear imaging, with emphasis on image acquisition, automated quantification, and diagnostic and prognostic interpretation. We reviewed contemporary literature describing machine-learning and deep-learning applications for image reconstruction, segmentation, radiomics, and multimodal data integration. Current evidence demonstrates that AI improves image quality, reduces acquisition and analysis time, and enables automated, highly reproducible measurements of chamber volumes, function, tissue characterization, coronary anatomy, and myocardial perfusion, while facilitating advanced pattern recognition for differential diagnosis and risk stratification. Furthermore, digital platforms support remote acquisition, tele-echocardiography, and AI-assisted training of non-expert operators. Despite these advances, challenges remain regarding external validation, generalizability across vendors and populations, explainability, data governance, and regulatory compliance. In conclusion, AI and digital technologies are reshaping cardiovascular imaging by enhancing accuracy, efficiency, and accessibility, but their safe and effective clinical integration requires robust multicenter validation, transparent reporting, and ethical-legal frameworks that ensure trust, equity, and accountability. Full article
(This article belongs to the Special Issue Advances in Bioimaging Technology)
Show Figures

Figure 1

28 pages, 1396 KB  
Article
Environmental–Visual Fusion for Proactive Tomato Late Blight Management in Protected Horticulture
by Puxing Gao, Peigen Yang, Tangji Ke, Saiwei Wang, Yulong Wang, Fengman Xu and Yihong Song
Horticulturae 2026, 12(3), 299; https://doi.org/10.3390/horticulturae12030299 - 3 Mar 2026
Viewed by 155
Abstract
In protected horticultural production, tomato late blight shows strong environmental inducibility, with a short latent period, rapid risk accumulation, and a limited control window, which challenges conventional post-event disease monitoring. To address this, a tomato late blight risk perception and predictive control approach [...] Read more.
In protected horticultural production, tomato late blight shows strong environmental inducibility, with a short latent period, rapid risk accumulation, and a limited control window, which challenges conventional post-event disease monitoring. To address this, a tomato late blight risk perception and predictive control approach for protected production is proposed, integrating deep temporal modeling of environmental factors, visual symptom perception, and risk-driven greenhouse control to enable prospective assessment and proactive intervention. Based on disease mechanisms and real greenhouse conditions, an artificial intelligence (AI) framework covering perception, prediction, and regulation is constructed, moving beyond reliance on visible symptoms alone. Long-term evolution of key variables, including temperature, air humidity, leaf wetness, and light intensity, is modeled using deep temporal networks, while early weak lesions and subtle texture changes are captured by visual models. Cross-modal fusion in a unified risk space generates continuous risk scores to drive greenhouse regulation. Experiments on a multimodal dataset from a real greenhouse in Bayannur, Inner Mongolia, show that the proposed method outperforms vision-based and environment-based baselines in recognition and risk prediction. It achieves about 0.95 accuracy, 0.94 F1-score, and over 0.97 area under the receiver operating characteristic curve (AUC), while providing more than 20 h of early warning before disease onset. In environmental modeling, the deep temporal model consistently surpasses threshold-based methods, logistic regression, and long short-term memory/gated recurrent unit (LSTM/GRU) baselines in risk lead time, false alert rate, and prediction stability. Full article
(This article belongs to the Special Issue Artificial Intelligence in Horticulture Production)
Show Figures

Figure 1

22 pages, 340 KB  
Article
From Patient Emotion Recognition to Provider Understanding: A Multimodal Data Mining Framework for Emotion-Aware Clinical Counseling Systems
by Saahithi Mallarapu, Xinyan Liu, Pegah Zargarian, Seyyedeh Fatemeh Mottaghian, Ramyashree Suresha, Vasudha Jain and Akram Bayat
Computers 2026, 15(3), 161; https://doi.org/10.3390/computers15030161 - 3 Mar 2026
Viewed by 185
Abstract
Computational analysis of therapeutic communication presents challenges in multi-label classification, severe class imbalance, and heterogeneous multimodal data integration. We introduce a bidirectional analytical framework addressing patient emotion recognition and provider behavior analysis. For patient-side analysis, we employ ClinicalBERT on human-annotated CounselChat (1482 interactions, [...] Read more.
Computational analysis of therapeutic communication presents challenges in multi-label classification, severe class imbalance, and heterogeneous multimodal data integration. We introduce a bidirectional analytical framework addressing patient emotion recognition and provider behavior analysis. For patient-side analysis, we employ ClinicalBERT on human-annotated CounselChat (1482 interactions, 25 categories, imbalance 60:1), achieving a macro-F1 of 0.74 through class weighting and threshold optimization, representing a six-fold improvement over naive baselines and 6–13 point improvement over modern imbalance methods. For provider-side analysis, we process 330 YouTube therapy sessions through automated pipelines (speaker diarization, automatic speech recognition, temporal segmentation), yielding 14,086 annotated segments. Our architecture combines DeBERTa-v3-base with WavLM-base-plus through cross-modal attention mechanisms adapted from multimodal Transformer frameworks. On controlled human-annotated HOPE data (178 sessions, 12,500 utterances), the model achieves a macro-F1 of 0.91 with Cohen’s kappa of 0.87, comparable to inter-rater reliability reported in psychotherapy process research. On YouTube data, a macro-F1 of 0.71 demonstrates feasibility while highlighting annotation quality impacts. Cross-dataset transfer and systematic attention analyses validate domain-specific effectiveness and interpretability. Full article
Show Figures

Figure 1

19 pages, 2509 KB  
Article
Emotion Recognition Using Multi-View EEG-fNIRS and Cross-Attention Feature Fusion
by Ni Yan, Guijun Chen and Xueying Zhang
Biosensors 2026, 16(3), 145; https://doi.org/10.3390/bios16030145 - 2 Mar 2026
Viewed by 261
Abstract
To improve the accuracy of emotion recognition, this paper proposes a multi-view EEG-fNIRS and cross-attention fusion module named FGCN-TCNN-CAF, which employs a differentiated modeling strategy for the frequency, spatial, and temporal features of EEG-fNIRS signals. First, frequency-domain and time-domain features are extracted from [...] Read more.
To improve the accuracy of emotion recognition, this paper proposes a multi-view EEG-fNIRS and cross-attention fusion module named FGCN-TCNN-CAF, which employs a differentiated modeling strategy for the frequency, spatial, and temporal features of EEG-fNIRS signals. First, frequency-domain and time-domain features are extracted from EEG, and time-domain features are obtained from fNIRS signals. Then, a frequency-domain graph convolutional network (FGCN) and a time-domain convolutional network (TCNN) are deployed in parallel. The EEG feature views from different frequency bands are modeled using an FGCN module to capture graph-structured relationships, while the time-domain views of EEG and fNIRS are processed by a TCNN module to extract spatial and temporal features. Finally, a cross-attention fusion network (CAF) is applied to achieve interactive fusion of multimodal features. Experiments demonstrate that the proposed multi-view EEG approach achieves higher recognition accuracy compared to using only the EEG view. Additionally, the mmultimodalrecognition results outperform single-modal EEG and single-modal fNIRS by 1.73% and 6.65%, respectively. When compared with other emotion recognition models, the proposed method achieves the highest accuracy of 96.09%, proving its superior performance. Full article
(This article belongs to the Special Issue Applications of AI in Non-Invasive Biosensing Technologies)
Show Figures

Figure 1

24 pages, 8953 KB  
Article
Face Recognition System Using CLIP and FAISS for Scalable and Real-Time Identification
by Antonio Labinjan, Sandi Baressi Šegota, Ivan Lorencin and Nikola Tanković
Math. Comput. Appl. 2026, 31(2), 36; https://doi.org/10.3390/mca31020036 - 1 Mar 2026
Viewed by 200
Abstract
Face recognition is increasingly being adopted in industries such as education, security, and personalized services. This research introduces a face recognition system that leverages the embedding capabilities of the CLIP model. The model is trained on multimodal data, such as images and text [...] Read more.
Face recognition is increasingly being adopted in industries such as education, security, and personalized services. This research introduces a face recognition system that leverages the embedding capabilities of the CLIP model. The model is trained on multimodal data, such as images and text and it generates high-dimensional features, which are then stored in a vector index for further queries. The system is designed to facilitate accurate real-time identification, with potential applications in areas such as attendance tracking and security screening. Specific use cases include event check-ins, implementation of advanced security systems, and more. The process involves encoding known faces into high-dimensional vectors, indexing them using a vector index FAISS, and comparing them to unknown images based on L2 (euclidean) distance. Experimental results demonstrate a high accuracy that exceeds 90% and prove efficient scalability and good performance efficiency even in datasets with a high volume of entries. Notably, the system exhibits superior computational efficiency compared to traditional deep convolutional neural networks (CNNs), significantly reducing CPU load and memory consumption while maintaining competitive inference speeds. In the first iteration of experiments, the system achieved over 90% accuracy on live video feeds where each identity had a single reference video for both training and validation; however, when tested on a more challenging dataset with many low-quality classes, accuracy dropped to approximately 73%, highlighting the impact of dataset quality and variability on performance. Full article
Show Figures

Figure 1

20 pages, 1419 KB  
Article
Building Prototype Evolution Pathway for Emotion Recognition in User-Generated Videos
by Yujie Liu, Zhenyang Dong, Yante Li and Guoying Zhao
Big Data Cogn. Comput. 2026, 10(3), 73; https://doi.org/10.3390/bdcc10030073 - 28 Feb 2026
Viewed by 228
Abstract
Large-scale pretrained foundation models are increasingly essential for affective analysis in user-generated videos. However, current approaches typically reuse generic multi-modal representations directly with task-specific adapters learned from scratch, and their performance is limited by the large affective domain gap and scarce emotion annotations. [...] Read more.
Large-scale pretrained foundation models are increasingly essential for affective analysis in user-generated videos. However, current approaches typically reuse generic multi-modal representations directly with task-specific adapters learned from scratch, and their performance is limited by the large affective domain gap and scarce emotion annotations. To address these issues, we introduce a novel paradigm that leverages auxiliary cross-modal priors to enhance unimodal emotion modeling, effectively exploiting modality-shared semantics and modality-specific inductive biases. Specifically, we propose a progressive prototype evolution framework that gradually transforms a neutral prototype into discriminative emotional representations through fine-grained cross-modal interactions with visual cues. The auxiliary prior serves as a structural constraint, reframing the adaptation challenge from a difficult domain shift problem into a more tractable prototype shift within the affective space. To ensure robust prototype construction and guided evolution, we further design category-aggregated prompting and bidirectional supervision mechanisms. Extensive experiments on VideoEmotion-8, Ekman-6, and MusicVideo-6 validate the superiority of our approach, achieving state-of-the-art results and demonstrating the effectiveness of leveraging auxiliary modality priors for foundation-model-based emotion recognition. Full article
(This article belongs to the Special Issue Sentiment Analysis in the Context of Big Data)
Show Figures

Figure 1

Back to TopTop