Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (467)

Search Parameters:
Keywords = person fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 1696 KiB  
Review
Integration of Multi-Modal Biosensing Approaches for Depression: Current Status, Challenges, and Future Perspectives
by Xuanzhu Zhao, Zhangrong Lou, Pir Tariq Shah, Chengjun Wu, Rong Liu, Wen Xie and Sheng Zhang
Sensors 2025, 25(15), 4858; https://doi.org/10.3390/s25154858 - 7 Aug 2025
Abstract
Depression represents one of the most prevalent mental health disorders globally, significantly impacting quality of life and posing substantial healthcare challenges. Traditional diagnostic methods rely on subjective assessments and clinical interviews, often leading to misdiagnosis, delayed treatment, and suboptimal outcomes. Recent advances in [...] Read more.
Depression represents one of the most prevalent mental health disorders globally, significantly impacting quality of life and posing substantial healthcare challenges. Traditional diagnostic methods rely on subjective assessments and clinical interviews, often leading to misdiagnosis, delayed treatment, and suboptimal outcomes. Recent advances in biosensing technologies offer promising avenues for objective depression assessment through detection of relevant biomarkers and physiological parameters. This review examines multi-modal biosensing approaches for depression by analyzing electrochemical biosensors for neurotransmitter monitoring alongside wearable sensors tracking autonomic, neural, and behavioral parameters. We explore sensor fusion methodologies, temporal dynamics analysis, and context-aware frameworks that enhance monitoring accuracy through complementary data streams. The review discusses clinical validation across diagnostic, screening, and treatment applications, identifying performance metrics, implementation challenges, and ethical considerations. We outline technical barriers, user acceptance factors, and data privacy concerns while presenting a development roadmap for personalized, continuous monitoring solutions. This integrative approach holds significant potential to revolutionize depression care by enabling earlier detection, precise diagnosis, tailored treatment, and sensitive monitoring guided by objective biosignatures. Successful implementation requires interdisciplinary collaboration among engineers, clinicians, data scientists, and end-users to balance technical sophistication with practical usability across diverse healthcare contexts. Full article
(This article belongs to the Special Issue Integrated Sensor Systems for Medical Applications)
Show Figures

Figure 1

15 pages, 2070 KiB  
Article
Machine Learning for Personalized Prediction of Electrocardiogram (EKG) Use in Emergency Care
by Hairong Wang and Xingyu Zhang
J. Pers. Med. 2025, 15(8), 358; https://doi.org/10.3390/jpm15080358 - 6 Aug 2025
Abstract
Background: Electrocardiograms (EKGs) are essential tools in emergency medicine, often used to evaluate chest pain, dyspnea, and other symptoms suggestive of cardiac dysfunction. Yet, EKGs are not universally administered to all emergency department (ED) patients. Understanding and predicting which patients receive an [...] Read more.
Background: Electrocardiograms (EKGs) are essential tools in emergency medicine, often used to evaluate chest pain, dyspnea, and other symptoms suggestive of cardiac dysfunction. Yet, EKGs are not universally administered to all emergency department (ED) patients. Understanding and predicting which patients receive an EKG may offer insights into clinical decision making, resource allocation, and potential disparities in care. This study examines whether integrating structured clinical data with free-text patient narratives can improve prediction of EKG utilization in the ED. Methods: We conducted a retrospective observational study to predict electrocardiogram (EKG) utilization using data from 13,115 adult emergency department (ED) visits in the nationally representative 2021 National Hospital Ambulatory Medical Care Survey–Emergency Department (NHAMCS-ED), leveraging both structured features—demographics, vital signs, comorbidities, arrival mode, and triage acuity, with the most influential selected via Lasso regression—and unstructured patient narratives transformed into numerical embeddings using Clinical-BERT. Four supervised learning models—Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF) and Extreme Gradient Boosting (XGB)—were trained on three inputs (structured data only, text embeddings only, and a late-fusion combined model); hyperparameters were optimized by grid search with 5-fold cross-validation; performance was evaluated via AUROC, accuracy, sensitivity, specificity and precision; and interpretability was assessed using SHAP values and Permutation Feature Importance. Results: EKGs were administered in 30.6% of adult ED visits. Patients who received EKGs were more likely to be older, White, Medicare-insured, and to present with abnormal vital signs or higher triage severity. Across all models, the combined data approach yielded superior predictive performance. The SVM and LR achieved the highest area under the ROC curve (AUC = 0.860 and 0.861) when using both structured and unstructured data, compared to 0.772 with structured data alone and 0.823 and 0.822 with unstructured data alone. Similar improvements were observed in accuracy, sensitivity, and specificity. Conclusions: Integrating structured clinical data with patient narratives significantly enhances the ability to predict EKG utilization in the emergency department. These findings support a personalized medicine framework by demonstrating how multimodal data integration can enable individualized, real-time decision support in the ED. Full article
(This article belongs to the Special Issue Machine Learning in Epidemiology)
Show Figures

Figure 1

29 pages, 959 KiB  
Review
Machine Learning-Driven Insights in Cancer Metabolomics: From Subtyping to Biomarker Discovery and Prognostic Modeling
by Amr Elguoshy, Hend Zedan and Suguru Saito
Metabolites 2025, 15(8), 514; https://doi.org/10.3390/metabo15080514 - 1 Aug 2025
Viewed by 256
Abstract
Cancer metabolic reprogramming plays a critical role in tumor progression and therapeutic resistance, underscoring the need for advanced analytical strategies. Metabolomics, leveraging mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy, offers a comprehensive and functional readout of tumor biochemistry. By enabling both targeted [...] Read more.
Cancer metabolic reprogramming plays a critical role in tumor progression and therapeutic resistance, underscoring the need for advanced analytical strategies. Metabolomics, leveraging mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy, offers a comprehensive and functional readout of tumor biochemistry. By enabling both targeted metabolite quantification and untargeted profiling, metabolomics captures the dynamic metabolic alterations associated with cancer. The integration of metabolomics with machine learning (ML) approaches further enhances the interpretation of these complex, high-dimensional datasets, providing powerful insights into cancer biology from biomarker discovery to therapeutic targeting. This review systematically examines the transformative role of ML in cancer metabolomics. We discuss how various ML methodologies—including supervised algorithms (e.g., Support Vector Machine, Random Forest), unsupervised techniques (e.g., Principal Component Analysis, t-SNE), and deep learning frameworks—are advancing cancer research. Specifically, we highlight three major applications of ML–metabolomics integration: (1) cancer subtyping, exemplified by the use of Similarity Network Fusion (SNF) and LASSO regression to classify triple-negative breast cancer into subtypes with distinct survival outcomes; (2) biomarker discovery, where Random Forest and Partial Least Squares Discriminant Analysis (PLS-DA) models have achieved >90% accuracy in detecting breast and colorectal cancers through biofluid metabolomics; and (3) prognostic modeling, demonstrated by the identification of race-specific metabolic signatures in breast cancer and the prediction of clinical outcomes in lung and ovarian cancers. Beyond these areas, we explore applications across prostate, thyroid, and pancreatic cancers, where ML-driven metabolomics is contributing to earlier detection, improved risk stratification, and personalized treatment planning. We also address critical challenges, including issues of data quality (e.g., batch effects, missing values), model interpretability, and barriers to clinical translation. Emerging solutions, such as explainable artificial intelligence (XAI) approaches and standardized multi-omics integration pipelines, are discussed as pathways to overcome these hurdles. By synthesizing recent advances, this review illustrates how ML-enhanced metabolomics bridges the gap between fundamental cancer metabolism research and clinical application, offering new avenues for precision oncology through improved diagnosis, prognosis, and tailored therapeutic strategies. Full article
(This article belongs to the Special Issue Nutritional Metabolomics in Cancer)
Show Figures

Figure 1

21 pages, 651 KiB  
Article
PAD-MPFN: Dynamic Fusion with Popularity Decay for News Recommendation
by Biyang Ma, Yiwei Deng and Huifan Gao
Electronics 2025, 14(15), 3057; https://doi.org/10.3390/electronics14153057 - 30 Jul 2025
Viewed by 147
Abstract
News recommendation systems must simultaneously address multiple challenges, including dynamic user interest modeling, nonlinear popularity patterns, and diversity recommendation in cold-start scenarios. We present a Popularity-Aware Dynamic Multi-Perspective Fusion Network (PAD-MPFN) that innovatively integrates three key components: adaptive subspace projection for multi-source interest [...] Read more.
News recommendation systems must simultaneously address multiple challenges, including dynamic user interest modeling, nonlinear popularity patterns, and diversity recommendation in cold-start scenarios. We present a Popularity-Aware Dynamic Multi-Perspective Fusion Network (PAD-MPFN) that innovatively integrates three key components: adaptive subspace projection for multi-source interest fusion, logarithmic time-decay factors for popularity bias mitigation, and dynamic gating mechanisms for personalized recommendation weighting. The framework uniquely combines sequential behavior analysis, social graph propagation, and temporal popularity modeling through a unified architecture. Experimental results on the MIND dataset, an open-source version of MSN News, demonstrate that PAD-MPFN outperforms existing methods in terms of recommendation performance and cold-start scenarios while effectively alleviating information overload. This study offers a new solution for dynamic interest modeling and diverse recommendation. Full article
(This article belongs to the Special Issue Data-Driven Intelligence in Autonomous Systems)
Show Figures

Figure 1

22 pages, 1588 KiB  
Article
Scaffold-Free Functional Deconvolution Identifies Clinically Relevant Metastatic Melanoma EV Biomarkers
by Shin-La Shu, Shawna Benjamin-Davalos, Xue Wang, Eriko Katsuta, Megan Fitzgerald, Marina Koroleva, Cheryl L. Allen, Flora Qu, Gyorgy Paragh, Hans Minderman, Pawel Kalinski, Kazuaki Takabe and Marc S. Ernstoff
Cancers 2025, 17(15), 2509; https://doi.org/10.3390/cancers17152509 - 30 Jul 2025
Viewed by 343
Abstract
Background: Melanoma metastasis, driven by tumor microenvironment (TME)-mediated crosstalk facilitated by extracellular vesicles (EVs), remains a major therapeutic challenge. A critical barrier to clinical translation is the overlap in protein cargo between tumor-derived and healthy cell EVs. Objective: To address this, we developed [...] Read more.
Background: Melanoma metastasis, driven by tumor microenvironment (TME)-mediated crosstalk facilitated by extracellular vesicles (EVs), remains a major therapeutic challenge. A critical barrier to clinical translation is the overlap in protein cargo between tumor-derived and healthy cell EVs. Objective: To address this, we developed Scaffold-free Functional Deconvolution (SFD), a novel computational approach that leverages a comprehensive healthy cell EV protein database to deconvolute non-oncogenic background signals. Methods: Beginning with 1915 proteins (identified by MS/MS analysis on an Orbitrap Fusion Lumos Mass Spectrometer using the IonStar workflow) from melanoma EVs isolated using REIUS, SFD applies four sequential filters: exclusion of normal melanocyte EV proteins, prioritization of metastasis-linked entries (HCMDB), refinement via melanocyte-specific databases, and validation against TCGA survival data. Results: This workflow identified 21 high-confidence targets implicated in metabolic-associated acidification, immune modulation, and oncogenesis, and were analyzed for reduced disease-free and overall survival. SFD’s versatility was further demonstrated by surfaceome profiling, confirming enrichment of H7-B3 (CD276), ICAM1, and MIC-1 (GDF-15) in metastatic melanoma EV via Western blot and flow cytometry. Meta-analysis using Vesiclepedia and STRING categorized these targets into metabolic, immune, and oncogenic drivers, revealing a dense interaction network. Conclusions: Our results highlight SFD as a powerful tool for identifying clinically relevant biomarkers and therapeutic targets within melanoma EVs, with potential applications in drug development and personalized medicine. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

19 pages, 2698 KiB  
Article
Orga-Dete: An Improved Lightweight Deep Learning Model for Lung Organoid Detection and Classification
by Xuan Huang, Qin Gao, Hanwen Zhang, Fuhong Min, Dong Li and Gangyin Luo
Appl. Sci. 2025, 15(15), 8377; https://doi.org/10.3390/app15158377 - 28 Jul 2025
Viewed by 259
Abstract
Lung organoids play a crucial role in modeling drug responses in pulmonary diseases. However, their morphological analysis remains hindered by manual detection inefficiencies and the high computational cost of existing algorithms. To overcome these challenges, this study proposes Orga-Dete—a lightweight, high-precision detection model [...] Read more.
Lung organoids play a crucial role in modeling drug responses in pulmonary diseases. However, their morphological analysis remains hindered by manual detection inefficiencies and the high computational cost of existing algorithms. To overcome these challenges, this study proposes Orga-Dete—a lightweight, high-precision detection model based on YOLOv11n—which first employs data augmentation to mitigate the small-scale dataset and class imbalance issues, then optimizes via a triple co-optimization strategy: a bi-directional feature pyramid network for enhanced multi-scale feature fusion, MPCA for stronger micro-organoid feature response, and EMASlideLoss to address class imbalance. Validated on a lung organoid microscopy dataset, Orga-Dete achieves 81.4% mAP@0.5 with only 2.25 M parameters and 6.3 GFLOPs, surpassing the baseline model YOLOv11n by 3.5%. Ablation experiments confirm the synergistic effects of these modules in enhancing morphological feature extraction. With its balance of precision and efficiency, Orga-Dete offers a scalable solution for high-throughput organoid analysis, underscoring its potential for personalized medicine and drug screening. Full article
Show Figures

Figure 1

27 pages, 1587 KiB  
Article
Incorporating Uncertainty Estimation and Interpretability in Personalized Glucose Prediction Using the Temporal Fusion Transformer
by Antonio J. Rodriguez-Almeida, Carmelo Betancort, Ana M. Wägner, Gustavo M. Callico, Himar Fabelo and on behalf of the WARIFA Consortium
Sensors 2025, 25(15), 4647; https://doi.org/10.3390/s25154647 - 26 Jul 2025
Viewed by 443
Abstract
More than 14% of the world’s population suffered from diabetes mellitus in 2022. This metabolic condition is defined by increased blood glucose concentrations. Among the different types of diabetes, type 1 diabetes, caused by a lack of insulin secretion, is particularly challenging to [...] Read more.
More than 14% of the world’s population suffered from diabetes mellitus in 2022. This metabolic condition is defined by increased blood glucose concentrations. Among the different types of diabetes, type 1 diabetes, caused by a lack of insulin secretion, is particularly challenging to treat. In this regard, automatic glucose level estimation implements Continuous Glucose Monitoring (CGM) devices, showing positive therapeutic outcomes. AI-based glucose prediction has commonly followed a deterministic approach, usually with a lack of interpretability. Therefore, these AI-based methods do not provide enough information in critical decision-making scenarios, like in the medical field. This work intends to provide accurate, interpretable, and personalized glucose prediction using the Temporal Fusion Transformer (TFT), and also includes an uncertainty estimation. The TFT was trained using two databases, an in-house-collected dataset and the OhioT1DM dataset, commonly used for glucose forecasting benchmarking. For both datasets, the set of input features to train the model was varied to assess their impact on model interpretability and prediction performance. Models were evaluated using common prediction metrics, diabetes-specific metrics, uncertainty estimation, and interpretability of the model, including feature importance and attention. The obtained results showed that TFT outperforms existing methods in terms of RMSE by at least 13% for both datasets. Full article
(This article belongs to the Collection Deep Learning in Biomedical Informatics and Healthcare)
Show Figures

Figure 1

24 pages, 12286 KiB  
Article
A UAV-Based Multi-Scenario RGB-Thermal Dataset and Fusion Model for Enhanced Forest Fire Detection
by Yalin Zhang, Xue Rui and Weiguo Song
Remote Sens. 2025, 17(15), 2593; https://doi.org/10.3390/rs17152593 - 25 Jul 2025
Viewed by 461
Abstract
UAVs are essential for forest fire detection due to vast forest areas and inaccessibility of high-risk zones, enabling rapid long-range inspection and detailed close-range surveillance. However, aerial photography faces challenges like multi-scale target recognition and complex scenario adaptation (e.g., deformation, occlusion, lighting variations). [...] Read more.
UAVs are essential for forest fire detection due to vast forest areas and inaccessibility of high-risk zones, enabling rapid long-range inspection and detailed close-range surveillance. However, aerial photography faces challenges like multi-scale target recognition and complex scenario adaptation (e.g., deformation, occlusion, lighting variations). RGB-Thermal fusion methods integrate visible-light texture and thermal infrared temperature features effectively, but current approaches are constrained by limited datasets and insufficient exploitation of cross-modal complementary information, ignoring cross-level feature interaction. A time-synchronized multi-scene, multi-angle aerial RGB-Thermal dataset (RGBT-3M) with “Smoke–Fire–Person” annotations and modal alignment via the M-RIFT method was constructed as a way to address the problem of data scarcity in wildfire scenarios. Finally, we propose a CP-YOLOv11-MF fusion detection model based on the advanced YOLOv11 framework, which can learn heterogeneous features complementary to each modality in a progressive manner. Experimental validation proves the superiority of our method, with a precision of 92.5%, a recall of 93.5%, a mAP50 of 96.3%, and a mAP50-95 of 62.9%. The model’s RGB-Thermal fusion capability enhances early fire detection, offering a benchmark dataset and methodological advancement for intelligent forest conservation, with implications for AI-driven ecological protection. Full article
(This article belongs to the Special Issue Advances in Spectral Imagery and Methods for Fire and Smoke Detection)
Show Figures

Figure 1

35 pages, 5195 KiB  
Article
A Multimodal AI Framework for Automated Multiclass Lung Disease Diagnosis from Respiratory Sounds with Simulated Biomarker Fusion and Personalized Medication Recommendation
by Abdullah, Zulaikha Fatima, Jawad Abdullah, José Luis Oropeza Rodríguez and Grigori Sidorov
Int. J. Mol. Sci. 2025, 26(15), 7135; https://doi.org/10.3390/ijms26157135 - 24 Jul 2025
Viewed by 463
Abstract
Respiratory diseases represent a persistent global health challenge, underscoring the need for intelligent, accurate, and personalized diagnostic and therapeutic systems. Existing methods frequently suffer from limitations in diagnostic precision, lack of individualized treatment, and constrained adaptability to complex clinical scenarios. To address these [...] Read more.
Respiratory diseases represent a persistent global health challenge, underscoring the need for intelligent, accurate, and personalized diagnostic and therapeutic systems. Existing methods frequently suffer from limitations in diagnostic precision, lack of individualized treatment, and constrained adaptability to complex clinical scenarios. To address these challenges, our study introduces a modular AI-powered framework that integrates an audio-based disease classification model with simulated molecular biomarker profiles to evaluate the feasibility of future multimodal diagnostic extensions, alongside a synthetic-data-driven prescription recommendation engine. The disease classification model analyzes respiratory sound recordings and accurately distinguishes among eight clinical classes: bronchiectasis, pneumonia, upper respiratory tract infection (URTI), lower respiratory tract infection (LRTI), asthma, chronic obstructive pulmonary disease (COPD), bronchiolitis, and healthy respiratory state. The proposed model achieved a classification accuracy of 99.99% on a holdout test set, including 94.2% accuracy on pediatric samples. In parallel, the prescription module provides individualized treatment recommendations comprising drug, dosage, and frequency trained on a carefully constructed synthetic dataset designed to emulate real-world prescribing logic.The model achieved over 99% accuracy in medication prediction tasks, outperforming baseline models such as those discussed in research. Minimal misclassification in the confusion matrix and strong clinician agreement on 200 prescriptions (Cohen’s κ = 0.91 [0.87–0.94] for drug selection, 0.78 [0.74–0.81] for dosage, 0.96 [0.93–0.98] for frequency) further affirm the system’s reliability. Adjusted clinician disagreement rates were 2.7% (drug), 6.4% (dosage), and 1.5% (frequency). SHAP analysis identified age and smoking as key predictors, enhancing model explainability. Dosage accuracy was 91.3%, and most disagreements occurred in renal-impaired and pediatric cases. However, our study is presented strictly as a proof-of-concept. The use of synthetic data and the absence of access to real patient records constitute key limitations. A trialed clinical deployment was conducted under a controlled environment with a positive rate of satisfaction from experts and users, but the proposed system must undergo extensive validation with de-identified electronic medical records (EMRs) and regulatory scrutiny before it can be considered for practical application. Nonetheless, the findings offer a promising foundation for the future development of clinically viable AI-assisted respiratory care tools. Full article
Show Figures

Figure 1

38 pages, 6851 KiB  
Article
FGFNet: Fourier Gated Feature-Fusion Network with Fractal Dimension Estimation for Robust Palm-Vein Spoof Detection
by Seung Gu Kim, Jung Soo Kim and Kang Ryoung Park
Fractal Fract. 2025, 9(8), 478; https://doi.org/10.3390/fractalfract9080478 - 22 Jul 2025
Viewed by 264
Abstract
The palm-vein recognition system has garnered attention as a biometric technology due to its resilience to external environmental factors, protection of personal privacy, and low risk of external exposure. However, with recent advancements in deep learning-based generative models for image synthesis, the quality [...] Read more.
The palm-vein recognition system has garnered attention as a biometric technology due to its resilience to external environmental factors, protection of personal privacy, and low risk of external exposure. However, with recent advancements in deep learning-based generative models for image synthesis, the quality and sophistication of fake images have improved, leading to an increased security threat from counterfeit images. In particular, palm-vein images acquired through near-infrared illumination exhibit low resolution and blurred characteristics, making it even more challenging to detect fake images. Furthermore, spoof detection specifically targeting palm-vein images has not been studied in detail. To address these challenges, this study proposes the Fourier-gated feature-fusion network (FGFNet) as a novel spoof detector for palm-vein recognition systems. The proposed network integrates masked fast Fourier transform, a map-based gated feature fusion block, and a fast Fourier convolution (FFC) attention block with global contrastive loss to effectively detect distortion patterns caused by generative models. These components enable the efficient extraction of critical information required to determine the authenticity of palm-vein images. In addition, fractal dimension estimation (FDE) was employed for two purposes in this study. In the spoof attack procedure, FDE was used to evaluate how closely the generated fake images approximate the structural complexity of real palm-vein images, confirming that the generative model produced highly realistic spoof samples. In the spoof detection procedure, the FDE results further demonstrated that the proposed FGFNet effectively distinguishes between real and fake images, validating its capability to capture subtle structural differences induced by generative manipulation. To evaluate the spoof detection performance of FGFNet, experiments were conducted using real palm-vein images from two publicly available palm-vein datasets—VERA Spoofing PalmVein (VERA dataset) and PLUSVein-contactless (PLUS dataset)—as well as fake palm-vein images generated based on these datasets using a cycle-consistent generative adversarial network. The results showed that, based on the average classification error rate, FGFNet achieved 0.3% and 0.3% on the VERA and PLUS datasets, respectively, demonstrating superior performance compared to existing state-of-the-art spoof detection methods. Full article
Show Figures

Figure 1

22 pages, 6496 KiB  
Article
Real-Time Search and Rescue with Drones: A Deep Learning Approach for Small-Object Detection Based on YOLO
by Francesco Ciccone and Alessandro Ceruti
Drones 2025, 9(8), 514; https://doi.org/10.3390/drones9080514 - 22 Jul 2025
Viewed by 669
Abstract
Unmanned aerial vehicles are increasingly used in civil Search and Rescue operations due to their rapid deployment and wide-area coverage capabilities. However, detecting missing persons from aerial imagery remains challenging due to small object sizes, cluttered backgrounds, and limited onboard computational resources, especially [...] Read more.
Unmanned aerial vehicles are increasingly used in civil Search and Rescue operations due to their rapid deployment and wide-area coverage capabilities. However, detecting missing persons from aerial imagery remains challenging due to small object sizes, cluttered backgrounds, and limited onboard computational resources, especially when managed by civil agencies. In this work, we present a comprehensive methodology for optimizing YOLO-based object detection models for real-time Search and Rescue scenarios. A two-stage transfer learning strategy was employed using VisDrone for general aerial object detection and Heridal for Search and Rescue-specific fine-tuning. We explored various architectural modifications, including enhanced feature fusion (FPN, BiFPN, PB-FPN), additional detection heads (P2), and modules such as CBAM, Transformers, and deconvolution, analyzing their impact on performance and computational efficiency. The best-performing configuration (YOLOv5s-PBfpn-Deconv) achieved a mAP@50 of 0.802 on the Heridal dataset while maintaining real-time inference on embedded hardware (Jetson Nano). Further tests at different flight altitudes and explainability analyses using EigenCAM confirmed the robustness and interpretability of the model in real-world conditions. The proposed solution offers a viable framework for deploying lightweight, interpretable AI systems for UAV-based Search and Rescue operations managed by civil protection authorities. Limitations and future directions include the integration of multimodal sensors and adaptation to broader environmental conditions. Full article
Show Figures

Figure 1

24 pages, 824 KiB  
Article
MMF-Gait: A Multi-Model Fusion-Enhanced Gait Recognition Framework Integrating Convolutional and Attention Networks
by Kamrul Hasan, Khandokar Alisha Tuhin, Md Rasul Islam Bapary, Md Shafi Ud Doula, Md Ashraful Alam, Md Atiqur Rahman Ahad and Md. Zasim Uddin
Symmetry 2025, 17(7), 1155; https://doi.org/10.3390/sym17071155 - 19 Jul 2025
Viewed by 403
Abstract
Gait recognition is a reliable biometric approach that uniquely identifies individuals based on their natural walking patterns. It is widely used to recognize individuals who are challenging to camouflage and do not require a person’s cooperation. The general face-based person recognition system often [...] Read more.
Gait recognition is a reliable biometric approach that uniquely identifies individuals based on their natural walking patterns. It is widely used to recognize individuals who are challenging to camouflage and do not require a person’s cooperation. The general face-based person recognition system often fails to determine the offender’s identity when they conceal their face by wearing helmets and masks to evade identification. In such cases, gait-based recognition is ideal for identifying offenders, and most existing work leverages a deep learning (DL) model. However, a single model often fails to capture a comprehensive selection of refined patterns in input data when external factors are present, such as variation in viewing angle, clothing, and carrying conditions. In response to this, this paper introduces a fusion-based multi-model gait recognition framework that leverages the potential of convolutional neural networks (CNNs) and a vision transformer (ViT) in an ensemble manner to enhance gait recognition performance. Here, CNNs capture spatiotemporal features, and ViT features multiple attention layers that focus on a particular region of the gait image. The first step in this framework is to obtain the Gait Energy Image (GEI) by averaging a height-normalized gait silhouette sequence over a gait cycle, which can handle the left–right gait symmetry of the gait. After that, the GEI image is fed through multiple pre-trained models and fine-tuned precisely to extract the depth spatiotemporal feature. Later, three separate fusion strategies are conducted, and the first one is decision-level fusion (DLF), which takes each model’s decision and employs majority voting for the final decision. The second is feature-level fusion (FLF), which combines the features from individual models through pointwise addition before performing gait recognition. Finally, a hybrid fusion combines DLF and FLF for gait recognition. The performance of the multi-model fusion-based framework was evaluated on three publicly available gait databases: CASIA-B, OU-ISIR D, and the OU-ISIR Large Population dataset. The experimental results demonstrate that the fusion-enhanced framework achieves superior performance. Full article
(This article belongs to the Special Issue Symmetry and Its Applications in Image Processing)
Show Figures

Figure 1

33 pages, 15612 KiB  
Article
A Personalized Multimodal Federated Learning Framework for Skin Cancer Diagnosis
by Shuhuan Fan, Awais Ahmed, Xiaoyang Zeng, Rui Xi and Mengshu Hou
Electronics 2025, 14(14), 2880; https://doi.org/10.3390/electronics14142880 - 18 Jul 2025
Viewed by 344
Abstract
Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable [...] Read more.
Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable knowledge-sharing without compromising patient confidentiality. While federated learning (FL) offers a promising solution, existing methods struggle with heterogeneous and missing modalities across institutions, which reduce the diagnostic accuracy. To address these challenges, we propose an effective and flexible Personalized Multimodal Federated Learning framework (PMM-FL), which enables efficient cross-client knowledge transfer while maintaining personalized performance under heterogeneous and incomplete modality conditions. Our study contains three key contributions: (1) A hierarchical aggregation strategy that decouples multi-module aggregation from local deployment via global modular-separated aggregation and local client fine-tuning. Unlike conventional FL (which synchronizes all parameters in each round), our method adopts a frequency-adaptive synchronization mechanism, updating parameters based on their stability and functional roles. (2) A multimodal fusion approach based on multitask learning, integrating learnable modality imputation and attention-based feature fusion to handle missing modalities. (3) A custom dataset combining multi-year International Skin Imaging Collaboration(ISIC) challenge data (2018–2024) to ensure comprehensive coverage of diverse skin cancer types. We evaluate PMM-FL through diverse experiment settings, demonstrating its effectiveness in heterogeneous and incomplete modality federated learning settings, achieving 92.32% diagnostic accuracy with only a 2% drop in accuracy under 30% modality missingness, with a 32.9% communication overhead decline compared with baseline FL methods. Full article
(This article belongs to the Special Issue Multimodal Learning and Transfer Learning)
Show Figures

Figure 1

14 pages, 1509 KiB  
Article
A Multi-Modal Deep Learning Approach for Predicting Eligibility for Adaptive Radiation Therapy in Nasopharyngeal Carcinoma Patients
by Zhichun Li, Zihan Li, Sai Kit Lam, Xiang Wang, Peilin Wang, Liming Song, Francis Kar-Ho Lee, Celia Wai-Yi Yip, Jing Cai and Tian Li
Cancers 2025, 17(14), 2350; https://doi.org/10.3390/cancers17142350 - 15 Jul 2025
Viewed by 357
Abstract
Background: Adaptive radiation therapy (ART) can improve prognosis for nasopharyngeal carcinoma (NPC) patients. However, the inter-individual variability in anatomical changes, along with the resulting extension of treatment duration and increased workload for the radiologists, makes the selection of eligible patients a persistent challenge [...] Read more.
Background: Adaptive radiation therapy (ART) can improve prognosis for nasopharyngeal carcinoma (NPC) patients. However, the inter-individual variability in anatomical changes, along with the resulting extension of treatment duration and increased workload for the radiologists, makes the selection of eligible patients a persistent challenge in clinical practice. The purpose of this study was to predict eligible ART candidates prior to radiation therapy (RT) for NPC patients using a classification neural network. By leveraging the fusion of medical imaging and clinical data, this method aimed to save time and resources in clinical workflows and improve treatment efficiency. Methods: We collected retrospective data from 305 NPC patients who received RT at Hong Kong Queen Elizabeth Hospital. Each patient sample included pre-treatment computed tomographic (CT) images, T1-weighted magnetic resonance imaging (MRI) data, and T2-weighted MRI images, along with clinical data. We developed and trained a novel multi-modal classification neural network that combines ResNet-50, cross-attention, multi-scale features, and clinical data for multi-modal fusion. The patients were categorized into two labels based on their re-plan status: patients who received ART during RT treatment, as determined by the radiation oncologist, and those who did not. Results: The experimental results demonstrated that the proposed multi-modal deep prediction model outperformed other commonly used deep learning networks, achieving an area under the curve (AUC) of 0.9070. These results indicated the ability of the model to accurately classify and predict ART eligibility for NPC patients. Conclusions: The proposed method showed good performance in predicting ART eligibility among NPC patients, highlighting its potential to enhance clinical decision-making, optimize treatment efficiency, and support more personalized cancer care. Full article
Show Figures

Figure 1

20 pages, 5700 KiB  
Article
Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features
by Hyeonuk Bhin and Jongsuk Choi
Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025
Viewed by 471
Abstract
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article
(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)
Show Figures

Figure 1

Back to TopTop