Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (78)

Search Parameters:
Keywords = out-of-distribution data

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 2618 KB  
Article
Improving Coronary Artery Disease Diagnosis in Cardiac MRI with Self-Supervised Learning
by Usman Khalid, Mehmet Kaya and Reda Alhajj
Diagnostics 2025, 15(20), 2618; https://doi.org/10.3390/diagnostics15202618 - 17 Oct 2025
Abstract
The Background/Objectives: The excessive dependence on data annotation, the lack of labeled data, and the substantial expense of data annotation, especially in healthcare, have constrained the efficacy of conventional supervised learning methodologies. Self-supervised learning (SSL) has arisen as a viable option by utilizing [...] Read more.
The Background/Objectives: The excessive dependence on data annotation, the lack of labeled data, and the substantial expense of data annotation, especially in healthcare, have constrained the efficacy of conventional supervised learning methodologies. Self-supervised learning (SSL) has arisen as a viable option by utilizing unlabeled data via pretext tasks. This paper examines the efficacy of supervised (pseudo-labels) and unsupervised (no pseudo-labels) pretext models in semi-supervised learning (SSL) for the classification of coronary artery disease (CAD) utilizing cardiac MRI data, highlighting performance in scenarios of data scarcity, out-of-distribution (OOD) conditions, and adversarial robustness. Methods: Two datasets, referred to as CAD Cardiac MRI and Ohio State Cardiac MRI Raw Data (OCMR), were utilized to establish three pretext tasks: (i) supervised Gaussian noise addition, (ii) supervised image rotation, and (iii) unsupervised generative reconstruction. These models were evaluated against  Simple Framework for Contrastive Learning (SimCLR), a prevalent unsupervised contrastive learning framework. Performance was assessed under three data reduction scenarios (20%, 50%, 70%), out-of-distribution situations, and adversarial attacks utilizing FGSM and PGD, alongside other significant evaluation criteria. Results: The Gaussian noise-based model attained the highest validation accuracy (up to 99.9%) across all data reduction scenarios and exhibited superiority over adversarial perturbations and all other employed measures. The rotation-based model exhibited considerable susceptibility to attacks and diminished accuracy with reduced data. The generative reconstruction model demonstrated moderate efficacy with minimal performance decline. SimCLR exhibited strong performance under standard conditions but shown inferior robustness relative to the Gaussian noise model. Conclusions: Meticulously crafted self-supervised pretext tasks exhibit potential in cardiac MRI classification, showcasing dependable performance and generalizability despite little data. These initial findings underscore SSL’s capacity to create reliable models for safety-critical healthcare applications and encourage more validation across varied datasets and clinical environments. Full article
Show Figures

Figure 1

36 pages, 7238 KB  
Article
Physics-Aware Reinforcement Learning for Flexibility Management in PV-Based Multi-Energy Microgrids Under Integrated Operational Constraints
by Shimeng Dong, Weifeng Yao, Zenghui Li, Haiji Zhao, Yan Zhang and Zhongfu Tan
Energies 2025, 18(20), 5465; https://doi.org/10.3390/en18205465 - 16 Oct 2025
Abstract
The growing penetration of photovoltaic (PV) generation in multi-energy microgrids has amplified the challenges of maintaining real-time operational efficiency, reliability, and safety under conditions of renewable variability and forecast uncertainty. Conventional rule-based or optimization-based strategies often suffer from limited adaptability, while purely data-driven [...] Read more.
The growing penetration of photovoltaic (PV) generation in multi-energy microgrids has amplified the challenges of maintaining real-time operational efficiency, reliability, and safety under conditions of renewable variability and forecast uncertainty. Conventional rule-based or optimization-based strategies often suffer from limited adaptability, while purely data-driven reinforcement learning approaches risk violating physical feasibility constraints, leading to unsafe or economically inefficient operation. To address this challenge, this paper develops a Physics-Informed Reinforcement Learning (PIRL) framework that embeds first-order physical models and a structured feasibility projection mechanism directly into the training process of a Soft Actor–Critic (SAC) algorithm. Unlike traditional deep reinforcement learning, which explores the state–action space without physical safeguards, PIRL restricts learning trajectories to a physically admissible manifold, thereby preventing battery over-discharge, thermal discomfort, and infeasible hydrogen operation. Furthermore, differentiable penalty functions are employed to capture equipment degradation, user comfort, and cross-domain coupling, ensuring that the learned policy remains interpretable, safe, and aligned with engineering practice. The proposed approach is validated on a modified IEEE 33-bus distribution system coupled with 14 thermal zones and hydrogen facilities, representing a realistic and complex multi-energy microgrid environment. Simulation results demonstrate that PIRL reduces constraint violations by 75–90% and lowers operating costs by 25–30% compared with rule-based and DRL baselines while also achieving faster convergence and higher sample efficiency. Importantly, the trained policy generalizes effectively to out-of-distribution weather conditions without requiring retraining, highlighting the value of incorporating physical inductive biases for resilient control. Overall, this work establishes a transparent and reproducible reinforcement learning paradigm that bridges the gap between physical feasibility and data-driven adaptability, providing a scalable solution for safe, efficient, and cost-effective operation of renewable-rich multi-energy microgrids. Full article
Show Figures

Figure 1

20 pages, 2131 KB  
Article
Test-Time Augmentation for Cross-Domain Leukocyte Classification via OOD Filtering and Self-Ensembling
by Lorenzo Putzu, Andrea Loddo and Cecilia Di Ruberto
J. Imaging 2025, 11(9), 295; https://doi.org/10.3390/jimaging11090295 - 28 Aug 2025
Viewed by 676
Abstract
Domain shift poses a major challenge in many Machine Learning applications due to variations in data acquisition protocols, particularly in the medical field. Test-time augmentation (TTA) can solve the domain shift issue and improve robustness by aggregating predictions from multiple augmented versions of [...] Read more.
Domain shift poses a major challenge in many Machine Learning applications due to variations in data acquisition protocols, particularly in the medical field. Test-time augmentation (TTA) can solve the domain shift issue and improve robustness by aggregating predictions from multiple augmented versions of the same input. However, TTA may inadvertently generate unrealistic or Out-of-Distribution (OOD) samples that negatively affect prediction quality. In this work, we introduce a filtering procedure that removes from the TTA images all the OOD samples whose representations lie far from the training data distribution. Moreover, all the retained TTA images are weighted inversely to their distance from the training data. The final prediction is provided by a Self-Ensemble with Confidence, which is a lightweight ensemble strategy that fuses predictions from the original and retained TTA samples using a weighted soft voting scheme, without requiring multiple models or retraining. This method is model-agnostic and can be integrated with any deep learning architecture, making it broadly applicable across various domains. Experiments on cross-domain leukocyte classification benchmarks demonstrate that our method consistently improves over standard TTA and Baseline inference, particularly when strong domain shifts are present. Ablation studies and statistical tests confirm the effectiveness and significance of each component. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

24 pages, 1651 KB  
Article
Attentive Neural Processes for Few-Shot Learning Anomaly-Based Vessel Localization Using Magnetic Sensor Data
by Luis Fernando Fernández-Salvador, Borja Vilallonga Tejela, Alejandro Almodóvar, Juan Parras and Santiago Zazo
J. Mar. Sci. Eng. 2025, 13(9), 1627; https://doi.org/10.3390/jmse13091627 - 26 Aug 2025
Viewed by 737
Abstract
Underwater vessel localization using passive magnetic anomaly sensing is a challenging problem due to the variability in vessel magnetic signatures and operational conditions. Data-based approaches may fail to generalize even to slightly different conditions. Thus, we propose an Attentive Neural Process (ANP) approach, [...] Read more.
Underwater vessel localization using passive magnetic anomaly sensing is a challenging problem due to the variability in vessel magnetic signatures and operational conditions. Data-based approaches may fail to generalize even to slightly different conditions. Thus, we propose an Attentive Neural Process (ANP) approach, in order to take advantage of its few-shot capabilities to generalize, for robust localization of underwater vessels based on magnetic anomaly measurements. Our ANP models the mapping from multi-sensor magnetic readings to position as a stochastic function: it cross-attends to a variable-size set of context points and fuses these with a global latent code that captures trajectory-level factors. The decoder outputs a Gaussian over coordinates, providing both point estimates and well-calibrated predictive variance. We validate our approach using a comprehensive dataset of magnetic disturbance fields, covering 64 distinct vessel configurations (combinations of varying hull sizes, submersion depths (water-column height over a seabed array), and total numbers of available sensors). Six magnetometer sensors in a fixed circular arrangement record the magnetic field perturbations as a vessel traverses sinusoidal trajectories. We compare the ANP against baseline multilayer perceptron (MLP) models: (1) base MLPs trained separately on each vessel configuration, and (2) a domain-randomized search (DRS) MLP trained on the aggregate of all configurations to evaluate generalization across domains. The results demonstrate that the ANP achieves superior generalization to new vessel conditions, matching the accuracy of configuration-specific MLPs while providing well-calibrated uncertainty quantification. This uncertainty-aware prediction capability is crucial for real-world deployments, as it can inform adaptive sensing and decision-making. Across various in-distribution scenarios, the ANP halves the mean absolute error versus a domain-randomized MLP (0.43 m vs. 0.84 m). The model is even able to generalize to out-of-distribution data, which means that our approach has the potential to facilitate transferability from offline training to real-world conditions. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

20 pages, 5323 KB  
Article
An Object-Based Deep Learning Approach for Building Height Estimation from Single SAR Images
by Babak Memar, Luigi Russo, Silvia Liberata Ullo and Paolo Gamba
Remote Sens. 2025, 17(17), 2922; https://doi.org/10.3390/rs17172922 - 22 Aug 2025
Viewed by 1078
Abstract
The accurate estimation of building heights using very-high-resolution (VHR) synthetic aperture radar (SAR) imagery is crucial for various urban applications. This paper introduces a deep learning (DL)-based methodology for automated building height estimation from single VHR COSMO-SkyMed images: an object-based regression approach based [...] Read more.
The accurate estimation of building heights using very-high-resolution (VHR) synthetic aperture radar (SAR) imagery is crucial for various urban applications. This paper introduces a deep learning (DL)-based methodology for automated building height estimation from single VHR COSMO-SkyMed images: an object-based regression approach based on bounding box detection followed by height estimation. This model was trained and evaluated on a unique multi-continental dataset comprising eight geographically diverse cities across Europe, North and South America, and Asia, employing a cross-validation strategy to explicitly assess out-of-distribution (OOD) generalization. The results demonstrate highly promising performance, particularly on European cities where the model achieves a Mean Absolute Error (MAE) of approximately one building story (2.20 m in Munich), significantly outperforming recent state-of-the-art methods in similar OOD scenarios. Despite the increased variability observed when generalizing to cities in other continents, particularly in Asia with its distinct urban typologies and the prevalence of high-rise structures, this study underscores the significant potential of DL for robust cross-city and cross-continental transfer learning in building height estimation from single VHR SAR data. Full article
Show Figures

Graphical abstract

19 pages, 3365 KB  
Article
Robust Federated Learning Against Data Poisoning Attacks: Prevention and Detection of Attacked Nodes
by Pretom Roy Ovi and Aryya Gangopadhyay
Electronics 2025, 14(15), 2970; https://doi.org/10.3390/electronics14152970 - 25 Jul 2025
Cited by 1 | Viewed by 1193
Abstract
Federated learning (FL) enables collaborative model building among a large number of participants without sharing sensitive data to the central server. Because of its distributed nature, FL has limited control over local data and the corresponding training process. Therefore, it is susceptible to [...] Read more.
Federated learning (FL) enables collaborative model building among a large number of participants without sharing sensitive data to the central server. Because of its distributed nature, FL has limited control over local data and the corresponding training process. Therefore, it is susceptible to data poisoning attacks where malicious workers use malicious training data to train the model. Furthermore, attackers on the worker side can easily manipulate local data by swapping the labels of training instances, adding noise to training instances, and adding out-of-distribution training instances in the local data to initiate data poisoning attacks. And local workers under such attacks carry incorrect information to the server, poison the global model, and cause misclassifications. So, the prevention and detection of such data poisoning attacks is crucial to build a robust federated training framework. To address this, we propose a prevention strategy in federated learning, namely confident federated learning, to protect workers from such data poisoning attacks. Our proposed prevention strategy at first validates the label quality of local training samples by characterizing and identifying label errors in the local training data, and then excludes the detected mislabeled samples from the local training. To this aim, we experiment with our proposed approach on both the image and audio domains, and our experimental results validated the robustness of our proposed confident federated learning in preventing the data poisoning attacks. Our proposed method can successfully detect the mislabeled training samples with above 85% accuracy and exclude those detected samples from the training set to prevent data poisoning attacks on the local workers. However, our prevention strategy can successfully prevent the attack locally in the presence of a certain percentage of poisonous samples. Beyond that percentage, the prevention strategy may not be effective in preventing attacks. In such cases, detection of the attacked workers is needed. So, in addition to the prevention strategy, we propose a novel detection strategy in the federated learning framework to detect the malicious workers under attack. We propose to create a class-wise cluster representation for every participating worker by utilizing the neuron activation maps of local models and analyze the resulting clusters to filter out the workers under attack before model aggregation. We experimentally demonstrated the efficacy of our proposed detection strategy in detecting workers affected by data poisoning attacks, along with the attack types, e.g., label-flipping or dirty labeling. In addition, our experimental results suggest that the global model could not converge even after a large number of training rounds in the presence of malicious workers, whereas after detecting the malicious workers with our proposed detection method and discarding them from model aggregation, we ensured that the global model achieved convergence within very few training rounds. Furthermore, our proposed approach stays robust under different data distributions and model sizes and does not require prior knowledge about the number of attackers in the system. Full article
Show Figures

Figure 1

22 pages, 12753 KB  
Article
Detecting Out-of-Distribution Samples in Complex IoT Traffic Based on Distance Loss
by Chengye Zhao, Jinxin Zuo, Mingrui Fan, Yun Cai, Yueming Lu and Chonghua Wang
Appl. Sci. 2025, 15(13), 7522; https://doi.org/10.3390/app15137522 - 4 Jul 2025
Viewed by 456
Abstract
Out-of-distribution (OOD) detection is critical for securing Internet of Things (IoT) systems, particularly in applications such as intrusion detection and device identification. However, conventional classification-based approaches struggle in IoT environments due to challenges like large class numbers and data imbalance. To address these [...] Read more.
Out-of-distribution (OOD) detection is critical for securing Internet of Things (IoT) systems, particularly in applications such as intrusion detection and device identification. However, conventional classification-based approaches struggle in IoT environments due to challenges like large class numbers and data imbalance. To address these limitations, we propose a novel framework that combines class mean clustering and a group-level feature distance loss to optimize both intra-group compactness and inter-group separability. Our framework utilizes Mahalanobis distance for robust OOD scoring and Kernel density estimation (KDE) for adaptive threshold selection, enabling precise boundary estimation under varying data distributions. Experimental results on real-world IoT datasets show that our framework outperforms baseline techniques, achieving at least a 10% improvement in AUROC and a 33% reduction in FPR95, demonstrating its scalability and effectiveness in complex, imbalanced IoT scenarios. Full article
(This article belongs to the Special Issue IoT Technology and Information Security)
Show Figures

Graphical abstract

28 pages, 11832 KB  
Article
On the Minimum Dataset Requirements for Fine-Tuning an Object Detector for Arable Crop Plant Counting: A Case Study on Maize Seedlings
by Samuele Bumbaca and Enrico Borgogno-Mondino
Remote Sens. 2025, 17(13), 2190; https://doi.org/10.3390/rs17132190 - 25 Jun 2025
Viewed by 1106
Abstract
Object detection is essential for precision agriculture applications like automated plant counting, but the minimum dataset requirements for effective model deployment remain poorly understood for arable crop seedling detection on orthomosaics. This study investigated how much annotated data is required to achieve standard [...] Read more.
Object detection is essential for precision agriculture applications like automated plant counting, but the minimum dataset requirements for effective model deployment remain poorly understood for arable crop seedling detection on orthomosaics. This study investigated how much annotated data is required to achieve standard counting accuracy (R2 = 0.85) for maize seedlings across different object detection approaches. We systematically evaluated traditional deep learning models requiring many training examples (YOLOv5, YOLOv8, YOLO11, RT-DETR), newer approaches requiring few examples (CD-ViTO), and methods requiring zero labeled examples (OWLv2) using drone-captured orthomosaic RGB imagery. We also implemented a handcrafted computer graphics algorithm as baseline. Models were tested with varying training sources (in-domain vs. out-of-distribution data), training dataset sizes (10–150 images), and annotation quality levels (10–100%). Our results demonstrate that no model trained on out-of-distribution data achieved acceptable performance, regardless of dataset size. In contrast, models trained on in-domain data reached the benchmark with as few as 60–130 annotated images, depending on architecture. Transformer-based models (RT-DETR) required significantly fewer samples (60) than CNN-based models (110–130), though they showed different tolerances to annotation quality reduction. Models maintained acceptable performance with only 65–90% of original annotation quality. Despite recent advances, neither few-shot nor zero-shot approaches met minimum performance requirements for precision agriculture deployment. These findings provide practical guidance for developing maize seedling detection systems, demonstrating that successful deployment requires in-domain training data, with minimum dataset requirements varying by model architecture. Full article
Show Figures

Figure 1

15 pages, 4572 KB  
Article
A Focus on Important Samples for Out-of-Distribution Detection
by Jiaqi Wan, Guoliang Wen, Guangming Sun, Yuntian Zhu and Zhaohui Hu
Mathematics 2025, 13(12), 1998; https://doi.org/10.3390/math13121998 - 17 Jun 2025
Viewed by 639
Abstract
To ensure the reliability and security of machine learning classification models when deployed in the open world, it is crucial that these models can detect out-of-distribution (OOD) data that exhibits semantic shifts from the in-distribution (ID) data used during training. This necessity has [...] Read more.
To ensure the reliability and security of machine learning classification models when deployed in the open world, it is crucial that these models can detect out-of-distribution (OOD) data that exhibits semantic shifts from the in-distribution (ID) data used during training. This necessity has spurred extensive research on OOD detection. Previous methods required a large amount of finely labeled OOD data for model training, which is costly or performed poorly in open-world scenarios. To address these limitations, we propose a novel method named focus on important samples (FIS) in this paper. FIS leverages model-predicted OOD scores to identify and focus on important samples that are more beneficial for model training. By learning from these important samples, our method aims to achieve reliable OOD detection performance while reducing training costs and the risk of overfitting training data, thereby enabling the model to better distinguish between ID and OOD data. Extensive experiments across diverse OOD detection scenarios demonstrate that FIS achieves superior performance compared to existing approaches, highlighting its robust and efficient OOD detection performance in practical applications. Full article
(This article belongs to the Special Issue Machine Learning and Mathematical Methods in Computer Vision)
Show Figures

Figure 1

21 pages, 6048 KB  
Article
GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer
by Deressa Wodajo Deressa, Hannes Mareen, Peter Lambert, Solomon Atnafu, Zahid Akhtar and Glenn Van Wallendael
Appl. Sci. 2025, 15(12), 6622; https://doi.org/10.3390/app15126622 - 12 Jun 2025
Cited by 2 | Viewed by 3474
Abstract
Deepfakes have raised significant concerns due to their potential to spread false information and compromise the integrity of digital media. Current deepfake detection models often struggle to generalize across a diverse range of deepfake generation techniques and video content. In this work, we [...] Read more.
Deepfakes have raised significant concerns due to their potential to spread false information and compromise the integrity of digital media. Current deepfake detection models often struggle to generalize across a diverse range of deepfake generation techniques and video content. In this work, we propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection. Our model combines ConvNeXt and Swin Transformer models for feature extraction, and it utilizes an Autoencoder and Variational Autoencoder to learn from latent data distributions. By learning from the visual artifacts and latent data distribution, GenConViT achieves an improved performance in detecting a wide range of deepfake videos. The model is trained and evaluated on DFDC, FF++, TM, DeepfakeTIMIT, and Celeb-DF (v2) datasets. The proposed GenConViT model demonstrates strong performance in deepfake video detection, achieving high accuracy across the tested datasets. While our model shows promising results in deepfake video detection by leveraging visual and latent features, we demonstrate that further work is needed to improve its generalizability when encountering out-of-distribution data. Our model provides an effective solution for identifying a wide range of fake videos while preserving the integrity of media. Full article
Show Figures

Figure 1

18 pages, 1276 KB  
Article
GazeMap: Dual-Pathway CNN Approach for Diagnosing Alzheimer’s Disease from Gaze and Head Movements
by Hyuntaek Jung, Shinwoo Ham, Hyunyoung Kil, Jung Eun Shin and Eun Yi Kim
Mathematics 2025, 13(11), 1867; https://doi.org/10.3390/math13111867 - 3 Jun 2025
Cited by 1 | Viewed by 902 | Correction
Abstract
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that impairs cognitive function, making early detection crucial for timely intervention. This study proposes a novel AD detection framework integrating gaze and head movement analysis via a dual-pathway convolutional neural network (CNN). Unlike conventional methods [...] Read more.
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that impairs cognitive function, making early detection crucial for timely intervention. This study proposes a novel AD detection framework integrating gaze and head movement analysis via a dual-pathway convolutional neural network (CNN). Unlike conventional methods relying on linguistic, speech, or neuroimaging data, our approach leverages non-invasive video-based tracking, offering a more accessible and cost-effective solution to early AD detection. To enhance feature representation, we introduce GazeMap, a novel transformation converting 1D gaze and head pose time-series data into 2D spatial representations, effectively capturing both short- and long-term temporal interactions while mitigating missing or noisy data. The dual-pathway CNN processes gaze and head movement features separately before fusing them to improve diagnostic accuracy. We validated our framework using a clinical dataset (112 participants) from Konkuk University Hospital and an out-of-distribution dataset from senior centers and nursing homes. Our method achieved 91.09% accuracy on in-distribution data collected under controlled clinical settings, and 83.33% on out-of-distribution data from real-world scenarios, outperforming several time-series baseline models. Model performance was validated through cross-validation on in-distribution data and tested on an independent out-of-distribution dataset. Additionally, our gaze-saliency maps provide interpretable visualizations, revealing distinct AD-related gaze patterns. Full article
Show Figures

Figure 1

19 pages, 4751 KB  
Article
Numerical Simulation Data-Aided Domain-Adaptive Generalization Method for Fault Diagnosis
by Tao Yan, Jianchun Guo, Yuan Zhou, Lixia Zhu, Bo Fang and Jiawei Xiang
Sensors 2025, 25(11), 3482; https://doi.org/10.3390/s25113482 - 31 May 2025
Viewed by 801
Abstract
In order to deal with the cross-domain distribution offset problem in mechanical fault diagnosis under different operating conditions. Domain-adaptive (DA) methods, such as domain adversarial neural networks (DANNs), maximum mean discrepancy (MMD), and correlation alignment (CORAL), have been advanced in recent years, producing [...] Read more.
In order to deal with the cross-domain distribution offset problem in mechanical fault diagnosis under different operating conditions. Domain-adaptive (DA) methods, such as domain adversarial neural networks (DANNs), maximum mean discrepancy (MMD), and correlation alignment (CORAL), have been advanced in recent years, producing notable outcomes. However, these techniques rely on the accessibility of target data, restricting their use in real-time fault diagnosis applications. To address this issue, effectively extracting fault features in the source domain and generalizing them to unseen target tasks becomes a viable strategy in machinery fault detection. A fault diagnosis domain generalization method using numerical simulation data is proposed. Firstly, the finite element model (FEM) is used to generate simulation data under certain working conditions as an auxiliary domain. Secondly, this auxiliary domain is integrated with measurement data obtained under different operating conditions to form a multi-source domain. Finally, adversarial training is conducted on the multi-source domain to learn domain-invariant features, thereby enhancing the model’s generalization capability for out-of-distribution data. Experimental results on bearings and gears show that the generalization performance of the proposed method is better than that of the existing baseline methods, with the average accuracy improved by 2.83% and 8.9%, respectively. Full article
Show Figures

Figure 1

21 pages, 6196 KB  
Article
Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition
by Babak Abbaschian and Adel Elmaghraby
Sensors 2025, 25(7), 1991; https://doi.org/10.3390/s25071991 - 22 Mar 2025
Viewed by 858
Abstract
The focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep learning techniques have significantly advanced SER systems, their robustness concerning [...] Read more.
The focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep learning techniques have significantly advanced SER systems, their robustness concerning speaker gender and out-of-distribution data has not been thoroughly examined. Furthermore, standards for SER remain rooted in landmark papers from the 2000s, even though modern deep learning architectures can achieve comparable or superior results to the state of the art of that era. In this research, we address these challenges by creating a new super corpus from existing databases, providing a larger pool of samples. We benchmark this dataset using various deep learning architectures, setting a new baseline for the task. Additionally, our experiments reveal that models trained on this super corpus demonstrate superior generalization and accuracy and exhibit lower gender bias compared to models trained on individual databases. We further show that traditional preprocessing techniques, such as denoising and normalization, are insufficient to address inherent biases in the data. However, our data augmentation approach effectively shifts these biases, improving model fairness across gender groups and emotions and, in some cases, fully debiasing the models. Full article
(This article belongs to the Special Issue Emotion Recognition and Cognitive Behavior Analysis Based on Sensors)
Show Figures

Graphical abstract

27 pages, 1340 KB  
Article
Asymmetric Training and Symmetric Fusion for Image Denoising in Edge Computing
by Yupeng Zhang and Xiaofeng Liao
Symmetry 2025, 17(3), 424; https://doi.org/10.3390/sym17030424 - 12 Mar 2025
Cited by 1 | Viewed by 865
Abstract
Effectively handling mixed noise types and varying intensities is crucial for accurate information extraction and analysis, particularly in resource-limited edge computing scenarios. Conventional image denoising approaches struggle with unseen noise distributions, limiting their effectiveness in real-world applications such as object detection, classification, and [...] Read more.
Effectively handling mixed noise types and varying intensities is crucial for accurate information extraction and analysis, particularly in resource-limited edge computing scenarios. Conventional image denoising approaches struggle with unseen noise distributions, limiting their effectiveness in real-world applications such as object detection, classification, and change detection. To address these challenges, we introduce a novel image denoising framework that integrates asymmetric learning with symmetric fusion. It leverages a pretrained model trained only on clean images to provide semantic priors, while a supervised module learns direct noise-to-clean mappings using paired noisy–clean data. The asymmetry in our approach stems from its dual training objectives: a pretrained encoder extracts semantic priors from noise-free data, while a supervised module learns noise-to-clean mappings. The symmetry is achieved through a structured fusion of pretrained priors and supervised features, enhancing generalization across diverse noise distributions, including those in edge computing environments. Extensive evaluations across multiple noise types and intensities, including real-world remote sensing data, demonstrate the superior robustness of our approach. Our method achieves state-of-the-art performance in both in-distribution and out-of-distribution noise scenarios, significantly enhancing image quality for downstream tasks such as environmental monitoring and disaster response. Future work may explore extending this framework to specialized applications like hyperspectral imaging and nighttime analysis while further refining the interplay between symmetry and asymmetry in deep-learning-based image restoration. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Embedded Systems)
Show Figures

Figure 1

23 pages, 2010 KB  
Article
ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies
by Costin F. Ciușdel, Alex Serban and Tiziano Passerini
Appl. Sci. 2025, 15(3), 1415; https://doi.org/10.3390/app15031415 - 30 Jan 2025
Viewed by 1304
Abstract
While traditional self-supervised learning methods improve performance and robustness across various medical tasks, they rely on single-vector embeddings that may not capture fine-grained concepts such as anatomical structures or organs. The ability to identify such concepts and their characteristics without supervision has the [...] Read more.
While traditional self-supervised learning methods improve performance and robustness across various medical tasks, they rely on single-vector embeddings that may not capture fine-grained concepts such as anatomical structures or organs. The ability to identify such concepts and their characteristics without supervision has the potential to improve pre-training methods, and enable novel applications such as fine-grained image retrieval and concept-based outlier detection. In this paper, we introduce ConceptVAE, a novel pre-training framework that detects and disentangles fine-grained concepts from their style characteristics in a self-supervised manner. We present a suite of loss terms and model architecture primitives designed to discretise input data into a preset number of concepts along with their local style. We validate ConceptVAE both qualitatively and quantitatively, demonstrating its ability to detect fine-grained anatomical structures such as blood pools and septum walls from 2D cardiac echocardiographies. Quantitatively, ConceptVAE outperforms traditional self-supervised methods in tasks such as region-based instance retrieval, semantic segmentation, out-of-distribution detection, and object detection. Additionally, we explore the generation of in-distribution synthetic data that maintains the same concepts as the training data but with distinct styles, highlighting its potential for more calibrated data generation. Overall, our study introduces and validates a promising new pre-training technique based on concept-style disentanglement, opening multiple avenues for developing models for medical image analysis that are more interpretable and explainable than black-box approaches. Full article
(This article belongs to the Special Issue Artificial Intelligence for Healthcare)
Show Figures

Figure 1

Back to TopTop