Journal Description
Machine Learning and Knowledge Extraction
Machine Learning and Knowledge Extraction
is an international, peer-reviewed, open access, monthly journal on machine learning and applications, see our video on YouTube explaining the MAKE journal concept.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 27 days after submission; acceptance to publication is undertaken in 4.4 days (median values for papers published in this journal in the second half of 2025).
- Journal Rank: JCR - Q1 (Engineering, Electrical and Electronic) / CiteScore - Q1 (Engineering (miscellaneous))
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
- Journal Cluster of Artificial Intelligence: AI, AI in Medicine, Algorithms, BDCC, MAKE, MTI, Stats, Virtual Worlds and Computers.
Impact Factor:
6.0 (2024);
5-Year Impact Factor:
5.7 (2024)
Latest Articles
Edge-Optimized Deep and Transfer Learning for Efficient DDoS Detection in IIoT Networks
Mach. Learn. Knowl. Extr. 2026, 8(6), 166; https://doi.org/10.3390/make8060166 (registering DOI) - 16 Jun 2026
Abstract
The increasing convergence of Operational Technology (OT) and Information Technology (IT) within the Industrial Internet of Things (IIoT) brings about remarkable improvements in monitoring and automation. However, it also exposes industrial systems to large-scale Distributed Denial of Service (DDoS) attacks. Edge-based defences are
[...] Read more.
The increasing convergence of Operational Technology (OT) and Information Technology (IT) within the Industrial Internet of Things (IIoT) brings about remarkable improvements in monitoring and automation. However, it also exposes industrial systems to large-scale Distributed Denial of Service (DDoS) attacks. Edge-based defences are essential in satisfying low-latency demands and data sovereignty rules, yet they must function under severe resource limitations and adapt to shifting traffic characteristics without cloud assistance. In this work, we introduce a lightweight hybrid deep learning architecture that fuses a Convolutional Neural Network (CNN) with a Convolutional Block Attention Module (CBAM) and a Multi-Layer Perceptron (MLP) in a single detector. A sequential transfer learning scheme is adopted, including a feature projection layer that handles differences in input dimensionality. The model is pre-trained on the CIC-DDoS2019 dataset, then adapted to the more recent CICIoT23 dataset. Evaluations are performed on both datasets while preserving their natural class imbalance. We provide extensive ablation and variance analysis under identical experimental conditions. The proposed method achieves 99.52% accuracy on CICIoT23 while maintaining 99.65% recall, which is a crucial property for critical systems. Real-time measurements on a CPU-only testbed show an average inference latency of 0.013 ms, inference-only throughput exceeding 93,000 packets/s, and end-to-end batch throughput of approximately 38,000 packets/s. The solution demonstrates effective domain adaptation, sub-millisecond latency, and suitability for resource-constrained IIoT edge gateways.
Full article
(This article belongs to the Section Safety, Security, Privacy, and Cyber Resilience)
►
Show Figures
Open AccessArticle
What PISA Measures and What It Misses: A Two-Stage LLM-Based Alignment of IT Workforce Skills with Educational Proficiency
by
Andreea-Maria Tanasă, Oprea Simona-Vasilica and Adela Bâra
Mach. Learn. Knowl. Extr. 2026, 8(6), 165; https://doi.org/10.3390/make8060165 (registering DOI) - 15 Jun 2026
Abstract
Aligning information technology (IT) workforce demands with educational assessments is essential for bridging skills gaps; yet, no prior corpus maps IT task reasoning to Programme for International Student Assessment (PISA) proficiency levels. This paper introduces a large language model (LLM)-powered framework aligning IT
[...] Read more.
Aligning information technology (IT) workforce demands with educational assessments is essential for bridging skills gaps; yet, no prior corpus maps IT task reasoning to Programme for International Student Assessment (PISA) proficiency levels. This paper introduces a large language model (LLM)-powered framework aligning IT competencies with PISA 2022 and the OECD (Organisation for Economic Co-operation and Development) Learning Compass 2030, drawing on O*NET v30.2 (Occupational Information Network), ESCO (European Skills, Competences, Qualifications, and Occupations) v1.2.1, PISA descriptors and OECD definitions. The framework operates in two stages: Stage 1 aligns 562 IT task statements with minimum PISA 2022 proficiency levels via LLM annotation and cross-model validation; and Stage 2 extends this mapping to the OECD Learning Compass 2030 through the semantic clustering of task embeddings and a bidirectional gap analysis of 95 ESCO transversal skills. Using Gemini 2.5 Flash, 562 tasks are annotated with minimum PISA levels across Mathematical, Reading, and Science literacy (first stage). Annotation reliability is assessed through a five-model cross-validation against a blind human domain expert (treated as a reference benchmark, not a gold standard) on a stratified 100-task sample (17.8% of the corpus), with agreement ranging from fair (Gemini 2.5 Flash, κ = 0.29) to moderate (Claude Haiku 4.5, κ = 0.50; LLaMA 3.3 70B, κ = 0.44). A bias-correction sensitivity analysis confirms that distributional findings remain stable after accounting for the primary annotator’s systematic overestimation, and OLS-calibrated alignment against O*NET ability ratings provides directional plausibility support. Validated tasks are embedded and clustered into 25 technical profiles via K-Means, each classified against OECD dimensions. The framework is extended to 95 ESCO transversal skills in 24 clusters. Bidirectional analysis reveals that, while every PISA proficiency level is engaged by at least one transversal cluster, 33% of these clusters, covering creative, ethical, social–emotional, and dispositional competencies, fall entirely outside PISA’s cognitive scope. This boundary mapping identifies where the PISA-based alignment is valid and where complementary tools are required for a full readiness assessment.
Full article
(This article belongs to the Special Issue LLM-Inspired New Generation Machine Learning: Hyperparameter Optimization and Uncertainty Quantification)
►▼
Show Figures

Figure 1
Open AccessArticle
Do Foundation Models Truly Outperform Domain-Specific Models? Evidence from Digital Pathology
by
Chaima Ben Rabah and Ahmed Serag
Mach. Learn. Knowl. Extr. 2026, 8(6), 164; https://doi.org/10.3390/make8060164 (registering DOI) - 12 Jun 2026
Abstract
Foundation models (FMs) are increasingly proposed as general-purpose solutions for computational pathology, with the potential to simplify clinical artificial intelligence deployment by reducing the need for task-specific architectures. However, their reliability across cancer domains with distinct morphological characteristics remains unclear, limiting confidence in
[...] Read more.
Foundation models (FMs) are increasingly proposed as general-purpose solutions for computational pathology, with the potential to simplify clinical artificial intelligence deployment by reducing the need for task-specific architectures. However, their reliability across cancer domains with distinct morphological characteristics remains unclear, limiting confidence in real-world clinical use. We benchmarked seven general-purpose pathology FMs and three domain-specific FMs across eleven patch-level datasets spanning three clinically relevant domains: pediatric hematology, prostate cancer, and breast cancer, using both linear probing and last-layer fine-tuning adaptation strategies. By jointly evaluating pediatric leukemia, male-predominant prostate cancer, and female-predominant breast cancer, this study is, to our knowledge, the first to explicitly examine specialist-versus-generalist FM behavior across age- and sex-stratified cancer populations. Performance differences were strongly domain dependent. In hematology, the specialist FM DINOBloom matched and, in several datasets, marginally exceeded leading generalist models (AUC 0.990–0.999 vs. GigaPath 0.981–1.000), suggesting advantages for highly distinctive cellular morphology. In prostate cancer grading, the generalist FM UNI2-h consistently outperformed the specialist HistoEncoder (AUC 0.956–0.977 vs. 0.908–0.964). In breast cancer, UNI2-h achieved the best overall performance across all tasks. No publicly available breast-cancer-specific FM currently exists for direct comparison; therefore, breast cancer results characterize general FM transferability rather than specialist-versus-generalist differences. Importantly, cross-dataset experiments revealed substantial performance degradation under dataset shift in both prostate and breast cancer, indicating that current FMs are not yet robust enough for heterogeneous multi-site clinical use. These findings support the use of generalist FMs as efficient backbones for well-characterized single-site, patch-level tasks, while challenging the assumption that high benchmark performance necessarily reflects true clinical readiness and demonstrating that pathology FMs are not uniformly superior to specialist models.
Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)
►▼
Show Figures

Graphical abstract
Open AccessArticle
From Legal Text to NP-Complete Decision Models: MPNet Retrieval and Policy Information Extraction
by
Aigerim Aitim, Anel Auyezova, Bakhtgerey Sinchev and Aksulu Mukhanova
Mach. Learn. Knowl. Extr. 2026, 8(6), 163; https://doi.org/10.3390/make8060163 (registering DOI) - 12 Jun 2026
Abstract
This study addresses the growing need to convert unstructured legal and policy documents into formal computational models that support transparent decision-making. The purpose of the work is to develop an applied framework that connects Legal NLP and policy information extraction with canonical combinatorial
[...] Read more.
This study addresses the growing need to convert unstructured legal and policy documents into formal computational models that support transparent decision-making. The purpose of the work is to develop an applied framework that connects Legal NLP and policy information extraction with canonical combinatorial decision models, including set cover, set packing, subset sum, vertex cover, and independent set. The proposed method combines MPNet-based dense semantic retrieval for locating relevant legal passages, a Legal NLP layer for extracting obligations, prohibitions, exceptions, thresholds, and eligibility conditions, and a formal modeling stage that maps the extracted constraints to NP-complete formulations, including set cover, set packing, subset sum, vertex cover, and independent set. The framework is designed to transform regulatory text into machine-interpretable structures suitable for constraint-aware reasoning and policy analysis. The results show that the integration of semantic retrieval and structured legal information extraction improves the consistency, interpretability, and practical usability of formal problem construction from legal and policy documents. The proposed approach provides a reproducible bridge between legal text analytics and combinatorial decision modeling and supports legal decision support, compliance analysis, and policy-oriented intelligent systems.
Full article
(This article belongs to the Topic AI and Computational Methods for Modelling, Simulations and Optimizing of Advanced Systems: Innovations in Complexity, 2nd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
MedToolica: Finetuning-Free Agentic Compositional Tool Learning for 3D CT Reasoning
by
Abdullah Hosseini and Ahmed Serag
Mach. Learn. Knowl. Extr. 2026, 8(6), 162; https://doi.org/10.3390/make8060162 - 11 Jun 2026
Abstract
Clinical reasoning over 3D CT scans is inherently compositional, requiring the integration of anatomical measurement, pathology assessment, spatial comparison, and clinical interpretation. We introduce MedToolica, a finetuning-free, role-based agentic framework for quantitative 3D abdominal CT reasoning that decomposes complex queries into structured sub-tasks
[...] Read more.
Clinical reasoning over 3D CT scans is inherently compositional, requiring the integration of anatomical measurement, pathology assessment, spatial comparison, and clinical interpretation. We introduce MedToolica, a finetuning-free, role-based agentic framework for quantitative 3D abdominal CT reasoning that decomposes complex queries into structured sub-tasks coordinated through specialized expert tools. Empirical evaluation across quantitative reasoning benchmarks demonstrates that MedToolica is particularly effective in organ-centric measurement tasks when supported by reliable expert tools, achieving strong quantitative agreement (e.g., for organ HU estimation versus for finetuned baselines) and notable gains on multi-step visual reasoning tasks. In contrast, lesion-oriented tasks remain constrained by upstream tool limitations, indicating that reasoning sophistication alone cannot compensate for unreliable perception. Furthermore, we observe that the capability of the core language model substantially influences orchestration quality: smaller LLM orchestrators exhibit reduced overall accuracy due to higher execution failure rates ( vs. ) and increased susceptibility to hallucination ( vs. ). Collectively, these findings identify expert tool reliability and orchestration capability as critical determinants of performance in compositional medical AI and highlight both the promise and current limitations of finetuning-free agentic reasoning for quantitative 3D CT analysis.
Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)
►▼
Show Figures

Figure 1
Open AccessArticle
Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection
by
Khrystyna Lipianina-Honcharenko, Pavlo Bykovyy, Andriy Krysovatyy, Myroslav Komar and Borys Yazlyuk
Mach. Learn. Knowl. Extr. 2026, 8(6), 161; https://doi.org/10.3390/make8060161 - 11 Jun 2026
Abstract
Large language models (LLMs) increasingly require robust evaluation under realistic instruction-following conditions, particularly for fine-tuned task-specific adapters operating in multilingual environments. This study proposes a scenario-adaptive evaluation framework for assessing the reliability of fine-tuned text models across two application regimes: misinformation detection (disinfo)
[...] Read more.
Large language models (LLMs) increasingly require robust evaluation under realistic instruction-following conditions, particularly for fine-tuned task-specific adapters operating in multilingual environments. This study proposes a scenario-adaptive evaluation framework for assessing the reliability of fine-tuned text models across two application regimes: misinformation detection (disinfo) and knowledge-grounded factual biography generation (heroes). The framework integrates automated generation of balanced risk-oriented scenarios, bilingual evaluation in English and Ukrainian, the LLM-as-a-Judge paradigm, and multidimensional robustness analysis through the Alignment Robustness Index (ARI). Six LoRA-adapted models based on Qwen2.5-3B-Instruct, SmolLM2-1.7B-Instruct, and TinyLlama-1.1B-Chat-v1.0 were evaluated. The implemented pipeline generated 2052 scenarios and 6156 model responses, producing a final bilingual analytical subset of 4104 judged records. Experimental results show that task-specific adaptation produces task-dependent robustness profiles. In the disinfo case, Qwen2.5-3B achieved the strongest overall performance, combining the highest safety and classification accuracy. In contrast, the heroes case revealed a more compressed and multidimensional vulnerability space without a single dominant model. The results further demonstrate the importance of multilingual evaluation, as weaker adapters exhibited more pronounced cross-lingual safety gaps. Overall, the framework provides a reproducible and practically applicable methodology for evaluating fine-tuned language models under imperfect instruction conditions.
Full article
(This article belongs to the Special Issue Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Interpretable Machine Learning for the Shear Capacity of RC Corbels: A Validated, Application-Driven Model
by
Wael Kassem
Mach. Learn. Knowl. Extr. 2026, 8(6), 160; https://doi.org/10.3390/make8060160 - 10 Jun 2026
Abstract
This paper demonstrates the application of a robust machine learning methodology to develop an accurate and, critically, an interpretable data-driven model for RC corbel shear assessment. A primary focus of this work is the use of advanced explainability techniques to rigorously validate the
[...] Read more.
This paper demonstrates the application of a robust machine learning methodology to develop an accurate and, critically, an interpretable data-driven model for RC corbel shear assessment. A primary focus of this work is the use of advanced explainability techniques to rigorously validate the model’s predictive logic against fundamental principles of structural mechanics, directly confronting the limitations of “black-box” approaches. To implement this framework, an extensive database of 515 experimental tests was assembled. Different machine-learning (ML) techniques, including Random Forest, AdaBoost, Support Vector Machine, and XGBoost, were systematically evaluated to define the optimal predictive model. The most accurate algorithm, XGBoost, was selected and optimized to achieve exceptional performance, with a coefficient of determination ( ) of 0.98 evaluated across the full database and a mean absolute relative deviation (MARD) of only 4%; on the held-out testing subset the model retains an of 0.97 and a MARD of 15%, confirming that predictive performance does not degrade appreciably on unseen specimens. The predictive model was shown to be substantially more accurate and generalizable than current design approaches, including both ACI code provisions and other prominent analytical models from the literature. Crucially, the Shapley Additive exPlanations (SHAP) technique was used to rigorously interrogate the model’s predictive logic. The analysis showed that the model’s feature attributions are consistent with established structural mechanics, correctly identifying the governing influence of parameters like the shear span-to-depth ratio and reinforcement indices for distinct failure modes. This explainability analysis establishes that the learned associations agree with structural expectations; it does not by itself demonstrate mechanistic causality. The study provides a validated methodology for creating trustworthy ML models and indicates, subject to further validation, uncertainty quantification, and a clearly defined applicability domain, how such interpretable tools might complement existing design provisions.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Figure 1
Open AccessPerspective
Digital Twins: A Computational Realization of the Scientific Method in Dynamical Systems
by
Frank Emmert-Streib
Mach. Learn. Knowl. Extr. 2026, 8(6), 159; https://doi.org/10.3390/make8060159 - 10 Jun 2026
Abstract
The scientific method is widely acknowledged as an authoritative framework that provides guiding principles for empirical research across disciplines. Despite this central role, it is rarely examined explicitly as a conceptual framework. In this paper, we revive attention to its role by revealing
[...] Read more.
The scientific method is widely acknowledged as an authoritative framework that provides guiding principles for empirical research across disciplines. Despite this central role, it is rarely examined explicitly as a conceptual framework. In this paper, we revive attention to its role by revealing a connection to digital twins, which have received considerable attention in recent years. Specifically, we argue that the digital twins framework can be interpreted as a computational realization of the scientific method in the context of dynamical systems. This connection is rooted in the dynamical nature of models, since dynamical systems arise across many scientific fields, from physics to economics, and also constitute a core component of digital twins. The main benefits of this connection include a common scientific language for knowledge transfer, a systematic approach that emphasizes the mechanisms of continuous learning and model selection, and a practical framework for implementing the scientific method computationally across disciplines.
Full article
(This article belongs to the Collection Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction)
►▼
Show Figures

Graphical abstract
Open AccessArticle
EA-StrongSORT: An Efficient Attention StrongSORT Framework for Detection-Based Tumor Tracking in Cine-MRI TrackRAD2025 Dataset
by
Alyaa Amer, Noha Ghatwary, Salema Fayed, Sahar Magdy, Alla Hussein, Rania Kadry and Amina I. Abdelmaksoud
Mach. Learn. Knowl. Extr. 2026, 8(6), 158; https://doi.org/10.3390/make8060158 - 9 Jun 2026
Abstract
MRI-guided radiotherapy (MRIgRT) enables the real-time visualization of tumor motion, allowing adaptive radiation delivery based on dynamic anatomical changes. However, respiratory-induced tumor motion remains a major challenge, particularly for thoracic and abdominal tumors. Continuous tumor motion may reduce treatment accuracy and increase radiation
[...] Read more.
MRI-guided radiotherapy (MRIgRT) enables the real-time visualization of tumor motion, allowing adaptive radiation delivery based on dynamic anatomical changes. However, respiratory-induced tumor motion remains a major challenge, particularly for thoracic and abdominal tumors. Continuous tumor motion may reduce treatment accuracy and increase radiation exposure to surrounding healthy tissues. Therefore, reliable and efficient tumor tracking is essential for real-time motion management in MRI-guided radiotherapy. Recent advances in artificial intelligence have demonstrated significant potential for medical image analysis; however, many existing tumor tracking approaches rely on segmentation-based methods that require detailed annotations and complex processing, which can limit their use in real-time clinical environments. In this work, we propose a detection-based tumor tracking framework that integrates the YOLOv11 object detection model with an enhanced StrongSORT tracking algorithm (EA-StrongSORT). The proposed approach replaces the conventional re-identification backbone with a lightweight EfficientNetV2 architecture augmented with an Efficient Channel Attention (ECA) mechanism. The overall framework follows a tracking-by-detection concept, where tumor regions are first detected and subsequently associated across frames. The proposed framework is evaluated on the TrackRAD2025 dataset using multiple YOLOv11 variants to analyze the balance between performance and model complexity. Experimental results demonstrate that the lightweight YOLOv11n model achieves the best detection performance, with a precision of 0.912, recall of 0.607, mean Average Precision (mAP) of 0.771, and of 0.608. Furthermore, the proposed tracking framework achieves stable temporal association, with Multiple-Object Tracking Accuracy (MOTA) scores exceeding 91% and Higher-Order Tracking Accuracy (HOTA) scores around 90%. The proposed framework demonstrates the potential of detection-based tumor localization and tracking for real-time MRI-guided radiotherapy applications.
Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)
►▼
Show Figures

Figure 1
Open AccessArticle
Generative Artificial Intelligence and Probabilistic Trees for the Linguistic Data Summarization in Wave Energy Decision-Making
by
Iliana Pérez Pupo, Luis Segundo Alvarado Acuña, Pedro Y. Piñero Pérez, Raykenler Yzquierdo Herrera and Maikel Yelandi Leyva Vázquez
Mach. Learn. Knowl. Extr. 2026, 8(6), 157; https://doi.org/10.3390/make8060157 - 9 Jun 2026
Abstract
This paper presents a hybrid model that combines linguistic data summarization techniques, algorithms for constructing probabilistic trees, and various generative artificial intelligence models for learning and generating linguistic summaries to aid decision-making. The proposal is validated using methodological triangulation techniques that demonstrate high
[...] Read more.
This paper presents a hybrid model that combines linguistic data summarization techniques, algorithms for constructing probabilistic trees, and various generative artificial intelligence models for learning and generating linguistic summaries to aid decision-making. The proposal is validated using methodological triangulation techniques that demonstrate high consistency in the knowledge discovered. The proposal also compares different generative artificial intelligence models; among the evaluated models, Gemini achieved the best performance. However, it is evident that, in certain contexts and tasks, small language models can be effective, yielding results comparable to large language models (LLMs) at a lower computational cost. This study applies the algorithms in a case study analyzing oceanographic data from Northern Chile. In the validation scenario, the combination of linguistic data summarization methods with unsupervised learning techniques effectively models human tolerance for imprecision when processing complex data and generated linguistic summaries easily interpretable by human decision-makers with high levels of confidence. Studies of energy capacities in the studied region and their behavior in both winter and summer are presented.
Full article
(This article belongs to the Special Issue Using Large Language Models for Scientific Problem Solving and Engineering Design)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Decoupling Privacy Noise from Optimization in Transformer Forecasting
by
Bhagiradh Kantheti and Carlos A. Paz De Araujo
Mach. Learn. Knowl. Extr. 2026, 8(6), 156; https://doi.org/10.3390/make8060156 - 4 Jun 2026
Abstract
Strong differential privacy often collapses utility in transformer-based time-series forecasting because noise is injected directly into high-dimensional gradients (e.g., DP-SGD), severely corrupting the optimization process. We introduce Low-Dimensional Feature-Path Privacy for Transformers (LDPT), which enforces privacy by routing calibrated perturbations through a low-dimensional
[...] Read more.
Strong differential privacy often collapses utility in transformer-based time-series forecasting because noise is injected directly into high-dimensional gradients (e.g., DP-SGD), severely corrupting the optimization process. We introduce Low-Dimensional Feature-Path Privacy for Transformers (LDPT), which enforces privacy by routing calibrated perturbations through a low-dimensional feature bottleneck ( ) that is independent of the model parameter count. LDPT implements noise via classically simulated quantum channels (Lindblad/depolarizing dynamics) and finite-shot POVM measurements, providing an auditable mapping from privacy budget to perturbation magnitude while keeping the transformer gradients clean. Across the ETT datasets and multiple prediction horizons, LDPT substantially preserves forecasting utility under its native local -QDP guarantee. At a nominal per-pass , LDPT limits MSE degradation to under 6%. In contrast, DP-SGD with global -DP applied to the identical transformer architecture suffers over 100% MSE degradation. Because these methods operate under different privacy definitions (local -QDP vs. global -DP), this comparison illustrates the impact of noise placement rather than equivalent privacy protection. To isolate the effect of the calibration mechanism, we further evaluate a classical Gaussian mechanism on the same feature-path bottleneck, which requires orders-of-magnitude larger noise and severely degrades utility. Membership inference attacks confirm that LDPT does not amplify membership leakage beyond the non-private baseline. These results demonstrate that decoupling privacy noise from optimization through low-dimensional feature-path placement and tight channel-based calibration is critical for practical privacy-preserving transformer forecasting.
Full article
(This article belongs to the Section Safety, Security, Privacy, and Cyber Resilience)
►▼
Show Figures

Graphical abstract
Open AccessArticle
A Sovereign Conversational Assistant Powered by ALIA and Mistral for the AI Act Age: Architecture, Governance, and Evaluation
by
Alejandro Carmona-Martínez, Antonio J. Jara and Alicia Asín
Mach. Learn. Knowl. Extr. 2026, 8(6), 155; https://doi.org/10.3390/make8060155 - 4 Jun 2026
Abstract
Digital Twins and Living Labs are increasingly used to support conservation, safety, accessibility, and visitor experience in cultural-heritage sites. Their practical value, however, depends on interfaces that can explain heterogeneous evidence, expose provenance, and operate under public-sector governance constraints. This paper presents a
[...] Read more.
Digital Twins and Living Labs are increasingly used to support conservation, safety, accessibility, and visitor experience in cultural-heritage sites. Their practical value, however, depends on interfaces that can explain heterogeneous evidence, expose provenance, and operate under public-sector governance constraints. This paper presents a Sovereign Conversational Assistant (SCA) for the Libelium Heritage Living Lab, implemented as a small-language-model (SLM) and retrieval-augmented generation (RAG) stack that combines curated heritage and operational knowledge bases with provenance logging, refusal controls, and language enforcement. We first compare the Spanish public model BSC-LT/ALIA-40b-instruct-2601 with mistralai/Mistral-Small-3.2-24B-Instruct-2506 using 19 canonical test conditions executed over 155 repeated runs across five categories: historical queries, client experience, data analysis, hallucination resistance, and safety/ethics. Mistral passed all repeated runs, whereas ALIA passed 129/155 runs, showing strong factual and visitor-information behaviour but weaker numerical analysis, cross-lingual safety, and Spanish-language enforcement. To address external validity, we add a non-sovereign baseline comparison over the 13 canonical prompts against claude-opus-4-7, gemini-3.5-flash, and gpt-5.5 under the same RAG-conditioned harness. In this prompt-level comparison, mean final scores were ALIA 0.963, Claude Opus 4.7 0.938, Gemini 3.5 Flash 0.892, GPT-5.5 0.877, and Mistral 0.871; no pairwise difference was significant after Holm correction, and ALIA was non-inferior to the best external baseline at margins of 0.05 and 0.10, whereas Mistral was not. The contribution is therefore not a new RAG algorithm, but an operational method for deploying and evaluating a governance-aware, sovereign assistant for cultural-heritage Digital Twins, together with evidence that sovereign models can be competitive in controlled heritage RAG tasks while still requiring larger, human-calibrated benchmarks before stronger claims are made.
Full article
(This article belongs to the Special Issue Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning)
►▼
Show Figures

Figure 1
Open AccessArticle
Lack of Evidence for Well-Separated Clinical Phenotypes in Surgically Treated Infective Endocarditis Using Routine Clinical Variables: A Machine Learning Approach
by
Diego Sangiorgi, Elisa Mikus, Mariafrancesca Fiorentino, Antonino Costantino, Simone Calvi, Elena Tenti, Anna Milione and Carlo Savini
Mach. Learn. Knowl. Extr. 2026, 8(6), 154; https://doi.org/10.3390/make8060154 - 4 Jun 2026
Abstract
Background: Infective endocarditis (IE) is characterized by marked heterogeneity in microbiological etiology, clinical presentation, valvular involvement, and patient complexity, which complicates risk stratification. Unsupervised machine learning has been proposed to identify latent clinical phenotypes in complex diseases; however, whether IE exhibits a natural
[...] Read more.
Background: Infective endocarditis (IE) is characterized by marked heterogeneity in microbiological etiology, clinical presentation, valvular involvement, and patient complexity, which complicates risk stratification. Unsupervised machine learning has been proposed to identify latent clinical phenotypes in complex diseases; however, whether IE exhibits a natural cluster structure remains unclear. Methods: In a cohort of 739 patients undergoing surgery for IE, unsupervised clustering was performed using K-medoids based on Gower distance to account for mixed-type variables, which is a common scenario in clinical settings. The optimal number of clusters was selected by maximizing the average silhouette width and the gap statistic. Density and semi-parametric algorithms (K-prototypes, KAMILA, hierarchical clustering, and HDBSCAN) were applied as a sensitivity analysis. Differences in postoperative outcomes across clusters were explored using logistic regression. Results: K-medoids clustering identified three patient groups; however, the average silhouette width was low (0.129), indicating very weak separation between clusters. Sensitivity analysis confirmed the absence of a natural cluster structure. Despite this, a descriptive comparison of forced clusters revealed a gradient of clinical severity, with one group characterized by older age, higher comorbidity burden, complex infection features, and worse postoperative outcomes. Conclusions: Unsupervised clustering did not identify natural clinical phenotypes in surgically treated IE, likely reflecting the extreme intrinsic heterogeneity of the disease. Although forced clustering highlighted clinically interpretable gradients of risk, these groups should not be considered true latent phenotypes. Alternative approaches, such as continuous risk modeling, may be more appropriate for patient stratification in IE.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Graphical abstract
Open AccessArticle
UncerKAN-Mamba: A Clinically Robust, Transparent, and Explainable AI Framework for Low-Latency Skin Lesion Segmentation with Deterministic Single-Pass Uncertainty Estimation
by
Hüseyin Kutlu and Cemil Çolak
Mach. Learn. Knowl. Extr. 2026, 8(6), 153; https://doi.org/10.3390/make8060153 - 3 Jun 2026
Abstract
Background: Accurate skin lesion segmentation is central to early melanoma detection, yet existing uncertainty estimation methods such as Monte Carlo (MC) Dropout and Deep Ensembles impose heavy computational overhead, and most segmentation architectures offer no insight into where or why predictions may fail.
[...] Read more.
Background: Accurate skin lesion segmentation is central to early melanoma detection, yet existing uncertainty estimation methods such as Monte Carlo (MC) Dropout and Deep Ensembles impose heavy computational overhead, and most segmentation architectures offer no insight into where or why predictions may fail. Methods: We present UncerKAN-Mamba, an explainable segmentation architecture integrating an EfficientNet-B4 encoder with Mamba state space model (SSM) blocks inside a UNet++ decoder, augmented by a Kolmogorov–Arnold Network (KAN) uncertainty head. The KAN head uses spline variance across multiple basis degrees as a deterministic, single-pass uncertainty proxy. Explainability is assessed via Grad-CAM, error–uncertainty overlap, and boundary uncertainty profiling. Results: Trained on the ISIC 2018 training partition (n = 1815 of 2594 total images) and externally validated on PH2 (n = 200) and ISIC 2016 (n = 1279), UncerKAN-Mamba achieved Dice = 0.8958, 0.9214, and 0.9360, respectively, achieving segmentation performance statistically comparable to the strongest contemporary baselines on ISIC 2018 and PH2, and ranking second to Attention U-Net on ISIC 2016 (Bonferroni-corrected Wilcoxon and paired bootstrap tests). KAN uncertainty yielded the strongest Pearson correlation with segmentation error (r = 0.674–0.731, best on ISIC 2016)—1.7–2.0× higher than MC Dropout and 1.3–1.4× higher than Deep Ensembles. Quantitative XAI metrics (AUROC = 0.971, PAvPU = 0.554, ECE = 0.0126–0.0419) confirmed strong interpretability and excellent calibration at 36–47 FPS. Conclusions: UncerKAN-Mamba delivers clinically robust, transparent, low-latency skin lesion segmentation suitable for interactive clinical review with deterministic single-pass uncertainty—the first use of KAN spline variance for uncertainty quantification in medical image segmentation.
Full article
(This article belongs to the Special Issue Clinically Robust and Transparent AI-Assisted Medical Diagnostics: From Learning Dynamics to Real-World Deployment)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Preserving Spatial and Frequency Information in CNNs: Hilbert Curve Flattening and Wavelet Pooling for Explainable Medical Image Analysis
by
Jesús Jaime Moreno Escobar
Mach. Learn. Knowl. Extr. 2026, 8(6), 152; https://doi.org/10.3390/make8060152 - 1 Jun 2026
Abstract
Conventional CNN architectures often struggle with information loss during feature extraction, particularly in pooling and flattening layers, where spatial coherence and high-frequency details critical for tasks such as medical diagnostics are compromised. To address this, we introduce a novel integration of Hilbert curve
[...] Read more.
Conventional CNN architectures often struggle with information loss during feature extraction, particularly in pooling and flattening layers, where spatial coherence and high-frequency details critical for tasks such as medical diagnostics are compromised. To address this, we introduce a novel integration of Hilbert curve flattening and multiscale frequency-selective wavelet pooling, which preserves diagnostically relevant features while optimizing computational efficiency. Multifrequency selective wavelet pooling improves the performance and adaptability of convolutional neural networks by preserving spatial adjacency structures and eliminating duplicate information. Here, raster flattening was replaced with a conventional Hilbert curve that organized data more efficiently, and wavelet pooling performed feature selection across frequency bands better than average pooling or max-pooling. On standard architectures (Inception, VGG16, ResNet, EfficientNet), our approach consistently produced an improved precision of 1.42% over earlier methods across all datasets and classes, including diagnosis of autism via structural MRI in a proof-of-concept dataset (38 subjects, 4 in the test set), with high precision, at 99%. Hence, validation on larger independent cohorts will be part of the future work. The synergy of Hilbert curve flattening and multiscale frequency-selective wavelet pooling mitigates signal decomposition losses and maintains spatial frequency relationships, advancing CNNs for high-stakes applications like medical imaging and remote sensing. These new strategies enhance spatial coherence and global efficiency, ensuring robustness in applications ranging from medical imaging to time-series forecasting.
Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)
►▼
Show Figures

Graphical abstract
Open AccessArticle
BRAG: Bayesian Retrieval-Augmented Generation; A Methodological Framework for Evidence-Governed Decision Support
by
Lebede Ngartera, Saralees Nadarajah, Rodoumta Koina and Youssou Gningue
Mach. Learn. Knowl. Extr. 2026, 8(6), 151; https://doi.org/10.3390/make8060151 - 1 Jun 2026
Abstract
In high-stakes settings, the most consequential failure of a language model is not a wrong answer but an answer it was not entitled to give. Existing retrieval-augmented generation (RAG) pipelines retrieve context, generate text, and perhaps add citations, but they do not decide
[...] Read more.
In high-stakes settings, the most consequential failure of a language model is not a wrong answer but an answer it was not entitled to give. Existing retrieval-augmented generation (RAG) pipelines retrieve context, generate text, and perhaps add citations, but they do not decide whether the evidence justifies answering, how uncertain the answer is, or at what level the system should intervene. We argue that LLMs should not only generate answers; they should be embedded inside a selective decision architecture that jointly estimates answerability, quantifies uncertainty, verifies structural validity, and chooses among direct response, escalation, abstention, or failure. We introduce BRAG (Bayesian Retrieval-Augmented Generation), a framework that operationalises this shift from answer generation to evidence-governed decision support. BRAG estimates an answerability posterior, decomposes uncertainty into epistemic and aleatoric components, and applies a structural validity gate prior to answer emission. Evaluation is conducted using controlled Monte Carlo simulation (n = 2400 queries) and a calibrated statistical pilot (N = 500), both parametric models of the pipeline’s output distribution, together with a governed operational validation that executes the full released pipeline end-to-end on independently generated MIMIC-IV-schema records (N = 100; not credentialed patient records), expert adjudication on a stratified subset (N = 200), and secondary transfer experiments on SEC EDGAR and CUAD. In simulation, BRAG reduces hallucination from 0.257 to 0.016 (93.8%) and achieves the highest coverage-adjusted utility (0.632) among five systems. In the synthetic MIMIC-IV-schema pilot, hallucination decreases from 0.292 to 0.020 (93.2%), with utility 0.538 at 89.6% coverage and an answerability AUROC of 0.692, which is moderate in absolute terms and is therefore positioned as a routing signal that operates jointly with the deterministic validity gate rather than as a stand-alone clinical classifier. Expert adjudication yields substantial agreement (Cohen’s κ = 0.778) and 93.5% concordance with BRAG decisions. Cross-domain transfer demonstrates 96–97% hallucination reduction without retriever modification, while ablation identifies the structural validity gate as the primary safety mechanism and the answerability posterior as the primary coverage and routing-precision mechanism. These results show that combining answerability estimation with structural validity enforcement can substantially reduce unsupported outputs. All findings are methodological rather than clinical: every evaluation tier uses synthetic or schema-conformant data, and validation on credentialed de-identified patient records remains necessary before any clinical deployment.
Full article
(This article belongs to the Section Data)
►▼
Show Figures

Figure 1
Open AccessArticle
Temporal Knowledge Extraction Through BayeStack with Multi-Level Explainability for Optimal Sepsis Classification
by
Anjana Geetha, K. L. Nisha, Arun Sankar Muttathu Sivasankara Pillai and Sreenath Rajeev
Mach. Learn. Knowl. Extr. 2026, 8(6), 150; https://doi.org/10.3390/make8060150 - 1 Jun 2026
Abstract
Sepsis, a life-threatening condition causing significant global mortality, requires rapid diagnosis and intervention. Although recent advances in machine learning have supported clinical decision-making, existing sepsis classification approaches exhibit several limitations, including inadequate temporal modeling of disease progression, lack of systematic hyperparameter optimization, fragmented
[...] Read more.
Sepsis, a life-threatening condition causing significant global mortality, requires rapid diagnosis and intervention. Although recent advances in machine learning have supported clinical decision-making, existing sepsis classification approaches exhibit several limitations, including inadequate temporal modeling of disease progression, lack of systematic hyperparameter optimization, fragmented interpretability approaches that do not fully address multi-stakeholder clinical needs, and challenges in achieving balanced sensitivity–specificity trade-offs. These limitations restrict effective extraction of knowledge from complex temporal clinical data and hinder actionable decision-making. To address these challenges, this work proposes BayeStack, a temporal knowledge-extraction framework that integrates Bayesian optimization-driven ensemble learning with hierarchical interpretability to optimize sepsis classification. This framework captures the progression of sepsis through multi-window temporal aggregation, performs optimal classification by applying AUROC-maximizing hyperparameter space exploration, and enables comprehensive clinical knowledge extraction by applying a three-level interpretability framework that includes global feature importance, population-level partial dependence analysis, and patient-specific contribution-level analysis. Evaluation results indicated that BayeStack achieved an AUROC of 0.99 with balanced sensitivity and specificity of 0.97, substantially outperforming all baseline methods ( ). Ablation studies validated that temporal aggregation and data balancing contributed to performance improvements. A strong Spearman correlation ( ) validated the feature ranking convergence and effectiveness of the ensemble strategy. The interpretability framework provides insights into complementary model behavior and extracts evidence-based clinical thresholds for priority-based treatment monitoring, thereby enabling robust clinical decision support. This first phase systematic integration framework of traditional machine learning models establishes baseline performance and explainability standards for subsequent deep learning advancements.
Full article
(This article belongs to the Topic Deep Supplement Learning for Healthcare and Biomedical Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Document Image Binarization Using Various Machine Learning Models and Ensembles Trained on Classic Local and Global Binarization Algorithms and Image Statistics
by
Nicolae Tarbă, Costin-Anton Boiangiu and Mihai-Lucian Voncilă
Mach. Learn. Knowl. Extr. 2026, 8(6), 149; https://doi.org/10.3390/make8060149 - 1 Jun 2026
Abstract
Image binarization is a preprocessing technique that maps an image’s pixel values to either black or white, and it is crucial in many fields of computer vision, such as document digitization and medical imaging. Thresholding is a popular image binarization technique for grayscale
[...] Read more.
Image binarization is a preprocessing technique that maps an image’s pixel values to either black or white, and it is crucial in many fields of computer vision, such as document digitization and medical imaging. Thresholding is a popular image binarization technique for grayscale images because it splits pixel values into greater than or lower than a specific threshold. Global thresholding is fast because it computes only one threshold for the entire image, but it cannot handle many types of noise specific to document images. Local thresholding has greater computational complexity because it adjusts the thresholds for each pixel based on the surrounding pixels, but it can handle such types of noise, although it risks introducing noise in uniform areas of the image. Mixed global–local approaches can mitigate this risk while still being able to handle most types of noise. This paper proposes a mixed global–local thresholding method that harnesses two popular automatic machine learning frameworks to train machine learning models using the results of several thresholding algorithms and other image statistics. Cross-validation was performed to ensure that the selected models are robust and perform well on new data. We obtained results comparable with other state-of-the-art methods on popular document image binarization datasets.
Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)
►▼
Show Figures

Figure 1
Open AccessArticle
Stratified Fréchet Distance: A Three-Layer Diagnostic Framework for Conditional Time Series Generation Under Data Scarcity
by
Tsuyoshi Okita
Mach. Learn. Knowl. Extr. 2026, 8(6), 148; https://doi.org/10.3390/make8060148 - 29 May 2026
Abstract
Evaluating conditional time-series generation models remains challenging in battery research, where degradation data are often limited and experiments cover only a small number of operating conditions. The widely used Fréchet Inception Distance (FID) summarizes all conditions into a single score, which can obscure
[...] Read more.
Evaluating conditional time-series generation models remains challenging in battery research, where degradation data are often limited and experiments cover only a small number of operating conditions. The widely used Fréchet Inception Distance (FID) summarizes all conditions into a single score, which can obscure failures under rare but safety-critical conditions. Several condition-aware extensions of FID, including Conditional Fréchet Inception Distance (CFID), partially address this limitation by evaluating each condition separately. However, these approaches do not assess whether physically meaningful relationships between operating conditions are preserved, and their reliability deteriorates when only a few samples are available for each condition. To address these issues, we propose a three-layer diagnostic framework for evaluating conditional generative models under limited-data conditions. The first layer, Stratified Fréchet Distance, identifies the specific operating conditions and degradation phases where generation quality degrades. The second layer, based on Conditional Response Consistency (CRC), Conditional Distance Ratio (CDR), and Mean-Order Preservation (MOP), evaluates whether the model preserves the distance structure and ordering between conditions. MOP detects condition-ordering defects that CRC cannot identify when the real data distance matrix is non-monotone. This layer also enables statistically meaningful comparisons even when only a small number of samples are available. The third layer detects strata where statistical estimates are unreliable and provides a more stable alternative for evaluation. We validate the framework on four battery degradation datasets using two generative model architectures. The proposed approach reveals condition-specific failures that are not captured by conventional FID. It localizes generation errors to the late-stage high-temperature degradation regime that is most relevant to battery safety. The framework also detects structural distortions with statistical significance. In addition, it consistently ranks physics-informed model variants across quality differences spanning seven orders of magnitude. These results demonstrate that the proposed framework provides a practical and physically interpretable evaluation methodology for conditional generative modeling in battery degradation analysis.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Figure 1
Open AccessArticle
Generation of Synthetic Dataset for Part Segmentation Problems
by
Lovro Sever, Petar Kosec, Stanko Škec and Tomislav Martinec
Mach. Learn. Knowl. Extr. 2026, 8(6), 147; https://doi.org/10.3390/make8060147 - 29 May 2026
Abstract
Part segmentation of industrial 3D models is often limited by the lack of sufficiently large and consistently labeled training datasets. This study proposes a workflow for generating synthetic segmentation datasets from robust parametric computer-aided design (CAD) models and evaluates its applicability on a
[...] Read more.
Part segmentation of industrial 3D models is often limited by the lack of sufficiently large and consistently labeled training datasets. This study proposes a workflow for generating synthetic segmentation datasets from robust parametric computer-aided design (CAD) models and evaluates its applicability on a dental abutment case. The workflow includes the definition of a modeling strategy, creation of a robust parametric CAD model, automated generation of valid geometry variants, and preparation of labeled training data for point-cloud-based segmentation. In the experimental part of the study, a synthetic dataset of segmented dental abutment geometries was generated from the developed parametric CAD model and used to train a PointNeXt-S part-segmentation model. The segmentation performance of the trained model was evaluated on manually labeled real-world abutments. Results show that the segmentation of industrial 3D models improved with increasing synthetic training-set size and further improved when data augmentation was applied. The best-performing augmented model achieved a mean Intersection over Union (IoU) of 89.2% on the real-world validation set, compared with 82.4% without augmentation. The findings indicate that parametric-CAD-based synthetic dataset generation can provide an effective basis for training segmentation models for complex industrial geometries.
Full article
(This article belongs to the Section Data)
►▼
Show Figures

Graphical abstract
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
AI, Applied Sciences, Electronics, MAKE
Deep Supplement Learning for Healthcare and Biomedical Applications
Topic Editors: Tahir Cetin Akinci, Ömer Faruk ErtuǧrulDeadline: 30 June 2026
Topic in
Atmosphere, Earth, Encyclopedia, Entropy, Fractal Fract, MAKE, Meteorology
Revisiting Butterfly Effect, Multiscale Dynamics, and Predictability Using Ai-Enhanced Modeling Framework (AEMF) and Chaos Theory
Topic Editors: Bo-Wen Shen, Roger A. Pielke Sr., Xubin ZengDeadline: 31 July 2026
Topic in
Algorithms, Applied Sciences, Electronics, MAKE, AI, Software
Applications of NLP, AI, and ML in Software Engineering
Topic Editors: Affan Yasin, Javed Ali Khan, Lijie WenDeadline: 30 August 2026
Topic in
AI, MAKE, Robotics, Sensors, Electronics
Deep Visual Recognition: Methods, and Applications
Topic Editors: Min Young Kim, Francisco Gomez-DonosoDeadline: 30 October 2026
Conferences
Special Issues
Special Issue in
MAKE
Language Acquisition and Understanding
Guest Editors: Michal Ptaszynski, Rafal Rzepka, Masaharu YoshiokaDeadline: 15 July 2026
Special Issue in
MAKE
Advancing Natural Language Processing for Low-Resource Languages and Dialects
Guest Editors: Tanjim Mahmud, Michal Ptaszynski, Karl AnderssonDeadline: 31 July 2026
Special Issue in
MAKE
Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning
Guest Editor: Konstantinos DiamantarasDeadline: 31 August 2026
Special Issue in
MAKE
Explainable Artificial Intelligence: Theoretical Foundations and Methodological Advances
Guest Editors: Sheng Du, Javier Del Ser LorenteDeadline: 31 August 2026
Topical Collections
Topical Collection in
MAKE
Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction
Collection Editor: Andreas Holzinger
Topical Collection in
MAKE
Feature Papers in Safety, Security, Privacy, and Cyber Resilience
Collection Editor: Simon Tjoa
Topical Collection in
MAKE
Robust and Uncertainty-Aware Learning from Real-World Data
Collection Editors: Federico Cabitza, Andrea Campagner



