Machine Learning and Knowledge Extraction

28 pages, 4886 KB

Open AccessArticle

Equivariant Transition Matrices for Explainable Deep Learning: A Lie Group Linearization Approach

by Pavlo Radiuk, Oleksander Barmak, Leonid Bedratyuk and Iurii Krak

Mach. Learn. Knowl. Extr. 2026, 8(4), 92; https://doi.org/10.3390/make8040092 - 6 Apr 2026

Deep learning systems deployed in regulated settings require explanations that are accurate and stable under nuisance transformations, yet classical post hoc transition matrices rely on fidelity-only fitting that fails to guarantee consistent explanations under spatial rotations or other group actions. In this work, [...] Read more.

Deep learning systems deployed in regulated settings require explanations that are accurate and stable under nuisance transformations, yet classical post hoc transition matrices rely on fidelity-only fitting that fails to guarantee consistent explanations under spatial rotations or other group actions. In this work, we propose Equivariant Transition Matrices, a post hoc approach that augments transition matrices with Lie-group-aware structural constraints to bridge this research gap. Our method estimates infinitesimal generators in the formal and mental feature spaces, enforces an approximate intertwining relation at the Lie algebra level, and solves the resulting convex Least-Squares problem via singular value decomposition for small networks or implicit operators for large systems. We introduce diagnostics for symmetry validation and an unsupervised strategy for regularization weight selection. On a controlled synthetic benchmark, our approach reduces the symmetry defect from 13,100 to 0.0425 while increasing the mean squared error marginally from 0.00367 to 0.00524. On the MNIST dataset, the symmetry defect decreases by 72.6 percent (141.19 to 38.65) with changes in structural similarity and peak signal-to-noise ratio below 0.03 percent and 0.06 percent, respectively. These results demonstrate that explanation-level equivariance can be reliably imposed post-training, providing geometrically consistent interpretations for fixed deep models. Full article

(This article belongs to the Special Issue Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning)

► Show Figures

Figure 1

27 pages, 1279 KB

Open AccessArticle

Query-Adaptive Hybrid Search

by Pavel Posokhov, Stepan Skrylnikov, Sergei Masliukhin, Alina Zavgorodniaia, Olesia Koroteeva and Yuri Matveev

Mach. Learn. Knowl. Extr. 2026, 8(4), 91; https://doi.org/10.3390/make8040091 - 5 Apr 2026

Abstract

The modern information retrieval field increasingly relies on hybrid search systems combining sparse retrieval with dense neural models. However, most existing hybrid frameworks employ static mixing coefficients and independent component training, failing to account for the specific needs of individual queries and corpus [...] Read more.

The modern information retrieval field increasingly relies on hybrid search systems combining sparse retrieval with dense neural models. However, most existing hybrid frameworks employ static mixing coefficients and independent component training, failing to account for the specific needs of individual queries and corpus heterogeneity. In this paper, we introduce an adaptive hybrid retrieval framework featuring query-driven alpha prediction that dynamically calibrates the mixing weights based on query latent representations instantiated in a lightweight low-latency configuration and a full-capacity encoder-scale predictor, enabling flexible trade-offs between computational efficiency and retrieval accuracy without relying on resource-inefficient LLM-based online evaluation. Furthermore, we propose antagonist negative sampling, a novel training paradigm that optimizes the dense encoder to resolve the systematic failures of the lexical retriever, prioritizing hard negatives where BM25 exhibits high uncertainty. Empirical evaluations on large-scale multilingual benchmarks (MLDR and MIRACL) indicate that our approach demonstrates superior average performance compared to state-of-the-art models such as BGE-M3 and mGTE, achieving an nDCG@10 of 74.3 on long-document retrieval. Notably, our framework recovers up to 92.5% of the theoretical oracle performance and yields significant improvements in nDCG@10 across 16 languages, particularly in challenging long-context scenarios. Full article

(This article belongs to the Special Issue Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning)

► Show Figures

Figure 1

21 pages, 2647 KB

Open AccessArticle

Fine-Tuned Nonlinear Autoregressive Recurrent Neural Network Model for Dam Displacement Time Series Prediction

by Vukašin Ćirović, Vesna Ranković, Nikola Milivojević, Vladimir Milivojević and Brankica Majkić-Dursun

Mach. Learn. Knowl. Extr. 2026, 8(4), 90; https://doi.org/10.3390/make8040090 - 5 Apr 2026

Abstract

Dam monitoring data are nonlinear and nonstationary time series. Most existing data-driven dam displacement models are developed independently for each measuring point, disregarding the fact that a dam is a complex structure composed of various interconnected elements that form a unified whole. Regardless [...] Read more.

Dam monitoring data are nonlinear and nonstationary time series. Most existing data-driven dam displacement models are developed independently for each measuring point, disregarding the fact that a dam is a complex structure composed of various interconnected elements that form a unified whole. Regardless of the dam type, all points on the dam are exposed to the same external environmental influences. To account for the correlation between displacement time series at different points, this paper proposes a novel fine-tuned deep-learning nonlinear autoregressive (NAR) model based on a Long Short-Term Memory (LSTM) network for predicting dam tangential displacement, and a new method for generating source data to train the base model. The models for three measuring points were developed and tested on experimental data collected over a period of slightly more than twelve years. Compared with the model without fine-tuning, the proposed approach achieves an average mean square error (MSE) reduction of 80.68% on the training set and 65.79% on the test set, as well as an average mean absolute error (MAE) reduction of 51.05% and 52.62%, respectively. Furthermore, the proposed model outperforms Random Forest (RF), Support Vector Regression (SVR), and Multi-Layer Perceptron (MLP) models for dam displacement prediction. Full article

► Show Figures

Figure 1

25 pages, 964 KB

Open AccessArticle

Adapting EHR Foundational Models to Predict Diabetes Complications with Precision Explainability

by Timothy Joseph, Ahmed Dhaouadi, Jayroop Ramesh, Assim Sagahyroon and Fadi Aloul

Mach. Learn. Knowl. Extr. 2026, 8(4), 89; https://doi.org/10.3390/make8040089 - 4 Apr 2026

Abstract

Diabetes mellitus is a chronic condition that frequently leads to severe complications that are difficult to detect in their early stages using conventional clinical monitoring. This paper presents a data-driven framework for predicting multiple diabetes-related complications using structured electronic health record data while [...] Read more.

Diabetes mellitus is a chronic condition that frequently leads to severe complications that are difficult to detect in their early stages using conventional clinical monitoring. This paper presents a data-driven framework for predicting multiple diabetes-related complications using structured electronic health record data while ensuring clinically meaningful explainability. The proposed approach adapts a pretrained electronic health record foundation model to operate on static patient data and integrates it with classical machine learning baselines to address class imbalance, feature sparsity, and interpretability challenges. A multi-label prediction setting covering eight common diabetes complications is evaluated using a real-world dataset from a regional diabetes center in the United Arab Emirates. Synthetic data generation and clinical constraint enforcement are applied to improve robustness for underrepresented outcomes, while feature selection is guided by model importance and attribution-based explanations. The best-performing configuration, a weighted ensemble combining a low-rank adapted Hyena-based foundation model with a tree-based predictor, achieved an average F1-score of 0.77, an average recall of 0.85, and an example-based F1-score of 0.71, outperforming all individual models. In addition, this ensemble produced the most stable explanations under input perturbations, indicating improved consistency of dominant clinical risk drivers. These results demonstrate that explainable foundation model-based ensembles can deliver accurate, robust, and clinically transparent risk prediction for diabetes complications. Full article

(This article belongs to the Topic Deep Supplement Learning for Healthcare and Biomedical Applications)

► Show Figures

Figure 1

24 pages, 2403 KB

Open AccessArticle

Named Entity Recognition with Feature-Enhanced BiLSTM and CRF for Fine-Grained Aspect Identification in Large-Scale Textual Reviews

by Shaheen Khatoon, Jibran Mir and Azhar Mahmood

Mach. Learn. Knowl. Extr. 2026, 8(4), 88; https://doi.org/10.3390/make8040088 - 2 Apr 2026

Abstract

Named Entity Recognition (NER) plays a crucial role in Aspect-Based Sentiment Identification (ABSI), enabling the extraction of domain-specific aspects and their associated sentiment expressions from unstructured textual reviews. In complex domains such as movie reviews, sentiment is frequently conveyed through references to named [...] Read more.

Named Entity Recognition (NER) plays a crucial role in Aspect-Based Sentiment Identification (ABSI), enabling the extraction of domain-specific aspects and their associated sentiment expressions from unstructured textual reviews. In complex domains such as movie reviews, sentiment is frequently conveyed through references to named entities (e.g., actors, directors, or movie titles) and other contextual cues. However, many existing ABSI approaches treat NER as a separate preprocessing step, limiting the effective modeling of entity–aspect–opinion relationships. Integrating NER directly into the ABSI framework, allows entity-specific opinions to be more accurately identified, overlapping aspects to be disambiguated, and contextual sentiment expressions to be captured more effectively. To address these challenges, this study proposes an integrated NER-based aspect identification model built on feature-enhanced LSTM and BiLSTM architectures. Linguistic features, including Parts-of-Speech (POS) tags and chunking information, are incorporated to enrich contextual representations, while a Conditional Random Field (CRF) decoding layer models inter-label dependencies for coherent sequence-level predictions of named entities, aspects, and associated opinion expressions. Compared with large transformer-based models, the proposed BiLSTM-CRF architecture offers lower computational complexity, fewer parameters, and allows explicit integration and analysis of linguistic features that are often implicitly encoded in transformer attention mechanisms. The model is evaluated through multiple experimental variants across three domains. Four configurations are applied to movie-review data to jointly extract person names, movie titles, and aspect-opinion pairs, while six configurations assess cross-domain robustness on restaurant and laptop review datasets. Results show that the BiLSTM-CRF model augmented with POS features consistently outperforms baseline configurations in the movie domain and remains competitive across domains, achieving an F1-score of 0.89. These findings demonstrate that explicit linguistic feature integration within a CRF-based sequence modeling can provide an effective and computationally efficient alternative to large-scale transformer fine-tuning for structured, entity-linked ABSI tasks. Full article

(This article belongs to the Section Learning)

► Show Figures

Graphical abstract

40 pages, 5294 KB

Open AccessArticle

Optimizing Carbon Capture Efficiency: Knowledge Extraction from Process Simulations of Post-Combustion Amine Scrubbing

by Mohammad Fazle Rabbi

Mach. Learn. Knowl. Extr. 2026, 8(4), 87; https://doi.org/10.3390/make8040087 - 2 Apr 2026

Abstract

Post-combustion amine scrubbing using monoethanolamine (MEA) remains a leading carbon capture technology, yet its deployment is constrained by high regeneration energy requirements and the computational expense of rigorous process simulation. This study presents an integrated framework coupling high-fidelity rate-based process simulation with explainable [...] Read more.

Post-combustion amine scrubbing using monoethanolamine (MEA) remains a leading carbon capture technology, yet its deployment is constrained by high regeneration energy requirements and the computational expense of rigorous process simulation. This study presents an integrated framework coupling high-fidelity rate-based process simulation with explainable machine learning to systematically characterize a ten-dimensional operating space for MEA-based CO₂ absorption. Latin hypercube sampling generated 10,000 steady-state cases, and five regression architectures were benchmarked under identical protocols. A neural network achieved the highest accuracy (R² = 0.9729, RMSE = 1.43%), while XGBoost was selected as the operational surrogate due to its robust computational efficiency (1.5 ms inference latency) and native compatibility with exact Shapley value decomposition. SHAP analysis identified liquid-to-gas ratio as the dominant efficiency determinant, contributing 46.6% of total predictive importance, followed by inlet temperature and MEA concentration, with these three parameters collectively explaining 85% of efficiency variation and establishing a compact control hierarchy suitable for reduced-order control architectures. Bivariate interaction analysis located a high-efficiency operating region, while sensitivity analysis confirmed the strong influence of inlet temperature across the operating envelope. Pareto optimization via NSGA-II generated tiered operational guidelines spanning the 85% to 98% capture efficiency range, quantifying a 39% specific regeneration duty penalty (3.1 to 4.3 MJ/kg CO₂) for pursuing maximum versus baseline capture targets. The framework demonstrates how explainable machine learning converts opaque process simulations into actionable engineering knowledge, providing a transparent and computationally efficient basis for design optimization and digital twin deployment in post-combustion carbon capture systems. Full article

(This article belongs to the Section Learning)

► Show Figures

Figure 1

27 pages, 1494 KB

Open AccessSystematic Review

Quantum Machine Learning for Phishing Detection: A Systematic Review of Current Techniques, Challenges, and Future Directions

by Yanche Ari Kustiawan and Khairil Imran Ghauth

Mach. Learn. Knowl. Extr. 2026, 8(4), 86; https://doi.org/10.3390/make8040086 - 27 Mar 2026

Abstract

Phishing remains a major cybersecurity threat, yet the application of quantum machine learning (QML) to phishing detection is still at an early stage. This study presents a systematic literature review aimed at providing a concise overview of existing QML-based approaches for phishing detection, [...] Read more.

Phishing remains a major cybersecurity threat, yet the application of quantum machine learning (QML) to phishing detection is still at an early stage. This study presents a systematic literature review aimed at providing a concise overview of existing QML-based approaches for phishing detection, identifying methodological trends, limitations, and future research directions. A PRISMA-guided review protocol was applied to peer-reviewed journal and conference articles published between 2021 and 2025, retrieved from major scientific databases. Eligible studies were analyzed in terms of QML models, feature encoding strategies, experimental settings, evaluation metrics, and study quality using an adapted Newcastle–Ottawa Scale. The results indicate that current research is limited in volume and largely focuses on hybrid quantum–classical models, particularly quantum support vector machines and variational quantum classifiers. Reported performance is highly dependent on encoding methods, circuit depth, and simulator-based experimentation, with few studies evaluating real quantum hardware. Common challenges include small datasets, lack of external validation, hardware noise, scalability constraints, and the absence of standardized benchmarks. Overall, the review suggests that QML for phishing detection remains exploratory and is not yet competitive with mature classical approaches, but it holds potential as an experimental research direction, provided that future studies address robustness, reproducibility, and practical deployment constraints. Full article

(This article belongs to the Section Thematic Reviews)

► Show Figures

Graphical abstract

52 pages, 51167 KB

Open AccessArticle

Detection and Comparative Evaluation of Noise Perturbations in Simulated Dynamical Systems and ECG Signals Using Complexity-Based Features

by Kevin Mallinger, Sebastian Raubitzek, Sebastian Schrittwieser and Edgar Weippl

Mach. Learn. Knowl. Extr. 2026, 8(4), 85; https://doi.org/10.3390/make8040085 - 25 Mar 2026

Abstract

Noise contamination is a common challenge in the analysis of time series data, where stochastic perturbations can obscure deterministic dynamics and complicate the interpretation of signals from chaotic and physiological systems. Reliable identification of noise regimes and their intensity is therefore essential for [...] Read more.

Noise contamination is a common challenge in the analysis of time series data, where stochastic perturbations can obscure deterministic dynamics and complicate the interpretation of signals from chaotic and physiological systems. Reliable identification of noise regimes and their intensity is therefore essential for robust analysis of dynamical and biomedical signals, where incorrect attribution of stochastic perturbations can lead to misleading interpretations of system behavior. For this reason, the present study examines the role of complexity-based descriptors for identifying stochastic perturbations in time series and analyzes how these metrics respond to different noise regimes across heterogeneous dynamical systems. A supervised learning approach based on complexity descriptors was developed to analyze controlled perturbations in multiple signal types. Gaussian, pink, and low-frequency noise disturbances were injected at predefined intensity levels into the Rössler and Lorenz chaotic systems, the Hénon map, and synthetic electrocardiogram signals, while AR(1) processes were used for validation on inherently stochastic signals. From these systems, eighteen entropy-based, fractal, statistical, and singular value decomposition-based complexity metrics were extracted from either raw signals or reconstructed phase spaces. These features were used to perform three classification tasks that capture different aspects of noise characterization, including detecting the presence of noise, identifying the perturbation type, and discriminating between different noise intensities. In addition to predictive modeling, the study evaluates the complexity profiles and feature relevance of the metrics under varying perturbation regimes. The results show that no single complexity metric consistently discriminates noise regimes across all systems. Instead, system-specific relevance patterns emerge. Under given experimental constraints (data partitioning, machine learning algorithm, etc.), Approximate Entropy provides the strongest discrimination for the Lorenz system and the Hénon map, the Coefficient of Variation, Sample and Permutation Entropy dominate classification for ECG signals, and the Condition Number and Variance of first derivative together with Fisher Information are most informative for the Rössler system. Across all datasets, the proposed framework achieves an average accuracy of 99% for noise presence detection, 98.4% for noise type classification, and 98.5% for noise intensity classification. These findings demonstrate that complexity metrics capture structural and statistical signatures of stochastic perturbations across a diverse set of dynamic systems. Full article

► Show Figures

Figure 1

35 pages, 3268 KB

Open AccessReview

Tabular Data Distillation: An Extensive Comparison

by Corneliu Florea and Eduard Barnoviciu

Mach. Learn. Knowl. Extr. 2026, 8(4), 84; https://doi.org/10.3390/make8040084 - 24 Mar 2026

Abstract

In this paper, we present an extensive evaluation of tabular data distillation methods for downstream classification and regression tasks. Our analysis considers multiple distillation approaches that are problem-type independent (i.e., unsupervised). For downstream learners, we focus on non-neural models such as Random Forest, [...] Read more.

In this paper, we present an extensive evaluation of tabular data distillation methods for downstream classification and regression tasks. Our analysis considers multiple distillation approaches that are problem-type independent (i.e., unsupervised). For downstream learners, we focus on non-neural models such as Random Forest, XGBoost, and Support Vector Machines, as our goal is to evaluate the quality of the distilled data independently of the learner. The evaluation is conducted on 17 classification and nine regression problems. Our findings can be summarized as follows: (1) in all cases, applying a distillation method leads to a decrease in performance compared to the baseline; (2) overall, coreset-based methods are the most effective, with performance losses that are minimal—ranging from around 3% in classification accuracy or regression correlation to, in some cases, being negligible; (3) performance loss is moderately correlated with dataset tailness, measured as the proportion of outliers; (4) all distillation methods alter dataset consistency, narrowing the range of hyperparameter values that yield good performance; and (5) the Coreset Leverage Score remains fast, regardless of the size of the original set and of the distilled set. Full article

► Show Figures

Graphical abstract

23 pages, 51743 KB

Open AccessArticle

Debiased Multiplex Tokenization Using Mamba-Based Pointers for Efficient and Versatile Map-Free Visual Relocalization

by Wenshuai Wang, Hong Liu, Shengquan Li, Peifeng Jiang, Dandan Che and Runwei Ding

Mach. Learn. Knowl. Extr. 2026, 8(3), 83; https://doi.org/10.3390/make8030083 - 23 Mar 2026

Abstract

Visual localization plays a critical role for mobile robots to estimate their position and orientation in GPS-denied environments. However, its efficiency, robustness, and generalization are fundamentally undermined by severe viewpoint changes and dramatic appearance variations, which present persistent challenges for image-based feature representation [...] Read more.

Visual localization plays a critical role for mobile robots to estimate their position and orientation in GPS-denied environments. However, its efficiency, robustness, and generalization are fundamentally undermined by severe viewpoint changes and dramatic appearance variations, which present persistent challenges for image-based feature representation and pose estimation under real-world conditions. Recently, map-free visual relocalization (MFVR) has emerged as a promising paradigm for lightweight deployment and privacy isolation on edge devices, while how to learn compact and invariant image tokens without relying on structural 3D maps still remains a core problem, particularly in highly dynamic or long-term scenarios. In this paper, we propose the Debiased Multiplex Tokenizer as a novel method (termed as DMT-Loc) for efficient and versatile MFVR to address these issues. Specifically, DMT-Loc is built upon a pretrained vision Mamba encoder and integrates three key modules for relative pose regression: First, Multiplex Interactive Tokenization yields robust image tokens with non-local affinities and cross-domain descriptions. Second, Debiased Anchor Registration facilitates anchor token matching through proximity graph retrieval and autoregressive pointer attribution. Third, Geometry-Informed Pose Regression empowers multi-layer perceptrons with a symmetric swap gating mechanism operating inside each decoupled regression head to support accurate and flexible pose prediction in both pair-wise and multi-view modes. Extensive evaluations across seven public datasets demonstrate that DMT-Loc substantially outperforms existing baselines and ablation variants in diverse indoor and outdoor environments. Full article

► Show Figures

Graphical abstract

18 pages, 4159 KB

Open AccessArticle

Advancing Breast Cancer Lesion Analysis in Real-Time Sonography Through Multi-Layer Transfer Learning and Adaptive Tracking

by Suliman Thwib, Radwan Qasrawi, Ghada Issa, Razan AbuGhoush, Hussein AlMasri and Marah Qawasmi

Mach. Learn. Knowl. Extr. 2026, 8(3), 82; https://doi.org/10.3390/make8030082 - 21 Mar 2026

Abstract

Background: Real-time and accurate analysis of breast ultrasounds is crucial for diagnosis but remains challenging due to issues like low image contrast and operator dependency. This study aims to address these challenges by developing an integrated framework for real-time lesion detection and [...] Read more.

Background: Real-time and accurate analysis of breast ultrasounds is crucial for diagnosis but remains challenging due to issues like low image contrast and operator dependency. This study aims to address these challenges by developing an integrated framework for real-time lesion detection and tracking. Methods: The proposed system combines Contrast-Limited Adaptive Histogram Equalization (CLAHE) for image preprocessing, a transfer learning-enhanced YOLOv11 model following a continual learning paradigm for cross-center generalization in for lesion detection, and a novel Detection-Based Tracking (DBT) approach that integrates Kernelized Correlation Filters (KCF) with periodic detection verification. The framework was evaluated on a dataset comprising 11,383 static images and 40 ultrasound video sequences, with a subset verified through biopsy and the remainder annotated by two radiologists based on radiological reports. Results: The proposed framework demonstrated high performance across all components. The transfer learning strategy (TL12) significantly improved detection outcomes, achieving a mean Average Precision (mAP) of 0.955, a sensitivity of 0.938, and an F1 score of 0.956. The DBT method (KCF + YOLO) achieved high tracking accuracy, with a success rate of 0.984, an Intersection over Union (IoU) of 0.85, and real-time operation at 54 frames per second (FPS) with a latency of 7.74 ms. The use of CLAHE preprocessing was shown to be a critical factor in improving both detection and tracking stability across diverse imaging conditions. Conclusions: This research presents a robust, fully integrated framework that bridges the gap between speed and accuracy in breast ultrasound analysis. The system’s high performance and real-time efficiency underscore its strong potential for clinical adoption to enhance diagnostic workflows, reduce operator variability, and improve breast cancer assessment. Full article

► Show Figures

Figure 1

31 pages, 7155 KB

Open AccessArticle

Deep Learning-Based Synthesis, Classification and Analysis of Sedimentation Boundaries in Analytical Centrifugation Experiments

by Moritz Moß, Sebastian Boldt, Gurbandurdy Dovletov, Adjie Salman, Josef Pauli, Dietmar Lerche, Marco Gleiß, Hermann Nirschl, Johannes Walter and Wolfgang Peukert

Mach. Learn. Knowl. Extr. 2026, 8(3), 81; https://doi.org/10.3390/make8030081 - 20 Mar 2026

Abstract

Applications for machine learning (ML) and deep learning (DL) are constantly growing and have already been adopted in the field of particle measurement technology. Even though analytical (ultra-)centrifugation (AC/AUC) is a widely used technique for characterizing dispersed particle systems, ML and DL have [...] Read more.

Applications for machine learning (ML) and deep learning (DL) are constantly growing and have already been adopted in the field of particle measurement technology. Even though analytical (ultra-)centrifugation (AC/AUC) is a widely used technique for characterizing dispersed particle systems, ML and DL have not yet been applied in this area. Data evaluation and interpretation in AC/AUC can be challenging and often requires expert knowledge. DL models can help, but their development is limited by a lack of annotated training data. One solution is to generate and use synthetic data instead. In the first part of this study, a model was trained to synthesize data from experiments using a combination of Variational Autoencoder (VAE) and Generative Adversarial Networks (GANs). The results appear highly realistic. Novice users could distinguish real from synthetic samples with only 63% accuracy. Then, a classifier was trained on experimental AC data to categorize real-world examples based on their underlying separation kinetics, testing different DL architectures. After initial training, the models were further fine-tuned with synthetic AC data. ResNet34 models achieved the best performance with 94% accuracy, comparable to an AC expert (91%), while inexperienced users reached only 53%. In the second part of our study, a regression model was trained for the analysis of sedimentation coefficients. Therefore, various generative models were developed and evaluated for synthesizing AUC data based on numerically simulated sedimentation boundaries. The best results were achieved by combining VAE and GAN architectures with embedded physical constraints. However, the generative networks have so far led to additional smearing of the profiles, resulting in a broadening of the sedimentation coefficient distribution and indicating that further refinement is necessary. Full article

► Show Figures

Figure 1

30 pages, 9811 KB

Open AccessArticle

Audio-Based Screening of Respiratory Diseases Using Machine Learning: A Methodological Framework Evaluated on a Clinically Validated COVID-19 Cough Dataset

by Arley Magnolia Aquino-García, Humberto Pérez-Espinosa, Javier Andreu-Perez and Ansel Y. Rodríguez González

Mach. Learn. Knowl. Extr. 2026, 8(3), 80; https://doi.org/10.3390/make8030080 - 20 Mar 2026

Abstract

The development of AI-driven computational methods has enabled rapid and non-invasive analysis of respiratory sounds using acoustic data, particularly cough recordings. Although the COVID-19 pandemic accelerated research on cough-based acoustic analysis, many early studies were limited by insufficient data quality, lack of standardized [...] Read more.

The development of AI-driven computational methods has enabled rapid and non-invasive analysis of respiratory sounds using acoustic data, particularly cough recordings. Although the COVID-19 pandemic accelerated research on cough-based acoustic analysis, many early studies were limited by insufficient data quality, lack of standardized protocols, and limited reproducibility due to data scarcity. In this study, we propose an audio analysis framework for cough-based respiratory disease screening research using COVID-19 as a clinically validated case dataset. All analyses were conducted on a single clinically acquired multicentric dataset collected under standardized conditions in certified laboratories in Mexico and Spain, comprising cough recordings from 1105 individuals. Model training and testing were performed exclusively within this dataset. The framework incorporates signal preprocessing and a comparative evaluation of segmentation strategies, showing that segmented cough analysis significantly outperforms full-signal analysis. Class imbalance was addressed using the Synthetic Minority Over-sampling Technique (SMOTE) for CNN2D models and the supervised Resample filter implemented in WEKA for classical machine learning models, both applied exclusively to the training subset to generate balanced training sets and prevent data leakage. Feature extraction and classification were carried out using Random Forest, Support Vector Machine (SVM), XGBoost, and a 2D Convolutional Neural Network (CNN2D), with hyperparameter optimization via AutoML. The proposed framework achieved a best balanced screening performance of 85.58% sensitivity and 86.65% specificity (Random Forest with GeMAPSvB01), while the highest-specificity configuration reached 93.90% specificity with 18.14% sensitivity (CNN2D with SMOTE and AutoML). These results demonstrate the methodological feasibility of the proposed framework under the evaluated conditions. Full article

(This article belongs to the Topic Deep Supplement Learning for Healthcare and Biomedical Applications)

► Show Figures

Figure 1

23 pages, 4795 KB

Open AccessArticle

RolEmo: A Role-Aware Commonsense-Augmented Contrastive Learning Framework for Emotion Classification

by Muhammad Abulaish and Anjali Bhardwaj

Mach. Learn. Knowl. Extr. 2026, 8(3), 79; https://doi.org/10.3390/make8030079 - 19 Mar 2026

Abstract

Emotion classification is a fundamental task in affective computing, with applications in human–computer interaction, mental health monitoring, and social media analysis. Although most existing methods formulate it as a flat classification problem, emotional expressions are inherently structured and grounded in semantic roles such [...] Read more.

Emotion classification is a fundamental task in affective computing, with applications in human–computer interaction, mental health monitoring, and social media analysis. Although most existing methods formulate it as a flat classification problem, emotional expressions are inherently structured and grounded in semantic roles such as the emotion cue, stimulus, experiencer, and target. However, the relative contribution of these roles to emotion inference has not been systematically examined. Unlike prior models, we propose RolEmo, a role-aware framework for emotion classification that explicitly incorporates semantic role information. The framework employs a controlled role-masking strategy to analyze the contribution of individual roles, augments textual representations with external commonsense knowledge to capture implicit affective context, and applies supervised contrastive learning to structure the embedding space by bringing emotionally similar instances closer while separating opposing ones. We evaluate RolEmo on three benchmark datasets annotated with semantic roles. Experimental results demonstrate that RolEmo outperforms the strongest baseline across three datasets by up to

16.4 %

,

25.8 %

, and

23.2 %

in the Full Text, Only Role, and Without Role settings, respectively. The analysis further indicates that the cue and stimulus roles provide the most reliable signals for emotion classification, with their removal causing performance drops of up to

6.2 %

in macro f1-score, while experiencer and target roles exhibit more variable effects. These findings highlight the importance of structured semantic modeling and commonsense reasoning for robust and interpretable emotion understanding. Full article

(This article belongs to the Section Learning)

► Show Figures

Figure 1

29 pages, 10106 KB

Open AccessArticle

Polynomial Chaos Expanded Gaussian Process

by Dominik Polke, Tim Kösters, Elmar Ahle and Dirk Söffker

Mach. Learn. Knowl. Extr. 2026, 8(3), 78; https://doi.org/10.3390/make8030078 - 19 Mar 2026

Abstract

In complex and unknown processes, global models are fitted over the entire input domain but often tend to perform poorly whenever the response surface exhibits non-stationary behavior and varying smoothness. A common approach is to use local models, which requires partitioning the input [...] Read more.

In complex and unknown processes, global models are fitted over the entire input domain but often tend to perform poorly whenever the response surface exhibits non-stationary behavior and varying smoothness. A common approach is to use local models, which requires partitioning the input domain into subdomains and training multiple models, thereby adding significant complexity. Recognizing this limitation, this study addresses the need for models that represent the input–output relationship consistently over the full domain while still adapting to local variations in the response. It introduces a novel machine learning approach: the Polynomial Chaos Expanded Gaussian Process (PCEGP), leveraging polynomial chaos expansion to calculate input-dependent hyperparameters of the Gaussian process (GP). This provides a mathematically interpretable approach that incorporates non-stationary covariance functions and heteroscedastic noise estimation to generate locally adapted models. The model performance is compared to different algorithms in benchmark tests for regression tasks. The results demonstrate low prediction errors of the PCEGP, highlighting model performance that is often competitive with or better than previous methods. A key advantage of the presented model is its interpretable hyperparameters along with training and prediction runtimes comparable to those of a standard GP. Full article

(This article belongs to the Section Learning)

► Show Figures

Figure 1

21 pages, 2957 KB

Open AccessArticle

Automated Single-Slice Lumbar QCT HU Value Measurement with Clinical Workflow

by Zhe-Yu Ye, Jun-Mu Peng, Bing-Qian Lu and Tamotsu Kamishima

Mach. Learn. Knowl. Extr. 2026, 8(3), 77; https://doi.org/10.3390/make8030077 - 19 Mar 2026

Abstract

Manual single-slice lumbar quantitative computed tomography (QCT) depends on operator-driven slice selection and trabecular region-of-interest (ROI) placement. We developed a fully automated single-slice workflow for vertebral trabecular Hounsfield unit (HU) measurement that combines unsuitable-slice prescreening, dual-purpose segmentation, intra-patient slice-quality ranking, and a deterministic [...] Read more.

Manual single-slice lumbar quantitative computed tomography (QCT) depends on operator-driven slice selection and trabecular region-of-interest (ROI) placement. We developed a fully automated single-slice workflow for vertebral trabecular Hounsfield unit (HU) measurement that combines unsuitable-slice prescreening, dual-purpose segmentation, intra-patient slice-quality ranking, and a deterministic inner ROI rule. The pipeline includes an Eligibility Gate, QC-Envelope segmentation for broad, vertebral- and usability-preserving delineation, PairRank-Swin for best-slice selection, and dedicated trabecular segmentation for final quantitative analysis. In the independent external cohort, 4 cases were considered non-evaluable by both manual review and the pipeline, and 2 additional borderline-quality cases were manually measured but rejected by the pipeline; therefore, paired HU agreement analysis included 44 evaluable cases. Agreement remained high, with Pearson’s r = 0.987, Lin’s CCC = 0.985, mean bias −0.44 HU, and limits of agreement from −14.88 to +13.99 HU. Coverage was 84.1% within ±10 HU and 97.7% within ±15 HU. Ablation analysis showed that slice ranking and ROI erosion were the most critical components. In an open module-level baseline comparison, QC-Envelope segmentation substantially outperformed TotalSegmentator. This workflow provides high agreement with expert HU measurement while preserving reviewable intermediate outputs. Full article

► Show Figures

Graphical abstract

31 pages, 1705 KB

Open AccessReview

A Review of Deep Learning Model Approach for Pain Assessment in Infant Cry Sounds

by Anthony McCofie, Dmitry Goldgof, Jacqueline Hausmann, Peter R. Mouton, Yu Sun and Md Imran Hossain

Mach. Learn. Knowl. Extr. 2026, 8(3), 76; https://doi.org/10.3390/make8030076 - 19 Mar 2026

Abstract

Infant cries serve as a primary indicator of distress and pain; however, distinguishing pain-related cries from those triggered by other needs remains a challenging task, even for trained professionals. Timely and accurate pain assessment is essential for appropriate medical intervention, particularly in preverbal [...] Read more.

Infant cries serve as a primary indicator of distress and pain; however, distinguishing pain-related cries from those triggered by other needs remains a challenging task, even for trained professionals. Timely and accurate pain assessment is essential for appropriate medical intervention, particularly in preverbal infants who cannot express their needs verbally. Recently, Deep Learning (DL) models have demonstrated significant potential in addressing this challenge by enabling automated and efficient pain assessment through audio signal processing. In this survey, we review methods for pain assessment from infant cry sounds, covering deep learning architectures, modern Transformer-based models, and emerging Vision-Language Model (VLM) pipelines. The review includes approaches that integrate Mel-spectrogram representations of cry audio with multimodal model frameworks to improve robustness, interpretability, and cross-modal reasoning in pain detection. By summarizing recent advancements and identifying limitations and open challenges in current methodologies, this review aims to provide insights into future research directions that may enhance the robustness, generalizability, and clinical applicability of automated infant pain assessment tools. Full article

(This article belongs to the Section Thematic Reviews)

► Show Figures

Figure 1

17 pages, 1961 KB

Open AccessArticle

ActivityRDI: A Centralized Solution Framework for Activity Retrieval and Detection Intelligence Based on Knowledge Graphs, Large Language Models, and Imbalanced Learning

by Lili Zhang and Quanyan Zhu

Mach. Learn. Knowl. Extr. 2026, 8(3), 75; https://doi.org/10.3390/make8030075 - 18 Mar 2026

Abstract

We propose a centralized Activity Retrieval and Detection Intelligence (ActivityRDI) solution framework, demonstrate its application performance in network threat detection in detail, and show its generalization to other domains. Network threat detection is challenging owing to the complex nature of attack activities and [...] Read more.

We propose a centralized Activity Retrieval and Detection Intelligence (ActivityRDI) solution framework, demonstrate its application performance in network threat detection in detail, and show its generalization to other domains. Network threat detection is challenging owing to the complex nature of attack activities and the limited historically revealed threat data from which to learn. To help enhance the existing methods (e.g., analytics, machine learning, and artificial intelligence) to detect the network threats, we propose a multi-agent AI solution for agile threat detection. In this solution, a knowledge graph is used to analyze changes in user activity patterns and calculate the risk of unknown threats. Then, an imbalanced learning model is used to prune and weight the knowledge graph and to calculate the risk of known threats. Finally, a large language model (LLM) is used to retrieve and interpret the risk associated with user activities from the knowledge graph and the imbalanced learning model. The preliminary results show that the solution improves the threat capture rate by 3–4% and adds natural language interpretations of the risk predictions based on user activities with 95% accuracy. Furthermore, a demonstration application has been built to show how the proposed solution framework can be deployed and used. The generalizability of the proposed solution in other domains is also shown through an application to customer engagement, with 97% accuracy. Full article

► Show Figures

Figure 1

42 pages, 1179 KB

Open AccessArticle

Towards Reliable LLM Grading Through Self-Consistency and Selective Human Review: Higher Accuracy, Less Work

by Luke Korthals, Emma Akrong, Gali Geller, Hannes Rosenbusch, Raoul Grasman and Ingmar Visser

Mach. Learn. Knowl. Extr. 2026, 8(3), 74; https://doi.org/10.3390/make8030074 - 16 Mar 2026

Abstract

Large language models (LLMs) show promise for grading open-ended assessments but still exhibit inconsistent accuracy, systematic biases, and limited reliability across assignments. To address these concerns, we introduce SURE (Selective Uncertainty-based Re-Evaluation), a human-in-the-loop pipeline that combines repeated LLM prompting, uncertainty-based flagging, and [...] Read more.

Large language models (LLMs) show promise for grading open-ended assessments but still exhibit inconsistent accuracy, systematic biases, and limited reliability across assignments. To address these concerns, we introduce SURE (Selective Uncertainty-based Re-Evaluation), a human-in-the-loop pipeline that combines repeated LLM prompting, uncertainty-based flagging, and selective human regrading. Three LLMs—gpt-4.1-nano, gpt-5-nano, and the open-source gpt-oss-20b—graded answers of 46 students to 130 open questions and coding exercises across five assignments. Each student answer was scored 20 times to derive majority-voted predictions and self-consistency-based certainty estimates. We simulated human regrading by flagging low-certainty cases and replacing them with scores from four human graders. We used the first assignment as a training set for tuning certainty thresholds and to explore LLM output diversification via sampling parameters, rubric shuffling, varied personas, multilingual prompts, and post hoc ensembles. We then evaluated the effectiveness and efficiency of SURE on the other four assignments using a fixed certainty threshold. Across assignments, fully automated grading with a single prompt resulted in substantial underscoring, and majority-voting based on 20 prompts improved but did not eliminate this bias. Low certainty (i.e., high output diversity) was diagnostic of incorrect LLM scores, enabling targeted human regrading that improved grading accuracy while reducing manual grading time by 40–90%. Aggregating responses from all three LLMs in an ensemble improved certainty-based flagging and most consistently approached human-level accuracy, with 70–90% of the grades students would receive falling inside human-grader ranges. A reanalysis based on outputs from a more diversified LLM ensemble comprised of gpt-5, codestral-25.01, and llama-3.3-70b-instruct replicated these findings but also suggested that large reasoning models such as gpt-5 might eliminate the need for human oversight of LLM grading entirely. These findings demonstrate that self-consistency-based uncertainty estimation and selective human oversight can substantially improve the reliability and efficiency of AI-assisted grading. Full article

(This article belongs to the Section Learning)

► Show Figures

Graphical abstract

Journal Description

Machine Learning and Knowledge Extraction

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Topical Collections

Further Information

Guidelines

MDPI Initiatives

Follow MDPI