Journal Description
Machine Learning and Knowledge Extraction
Machine Learning and Knowledge Extraction
is an international, peer-reviewed, open access, monthly journal on machine learning and applications, see our video on YouTube explaining the MAKE journal concept.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 27 days after submission; acceptance to publication is undertaken in 4.4 days (median values for papers published in this journal in the second half of 2025).
- Journal Rank: JCR - Q1 (Engineering, Electrical and Electronic) / CiteScore - Q1 (Engineering (miscellaneous))
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
- Journal Cluster of Artificial Intelligence: AI, AI in Medicine, Algorithms, BDCC, MAKE, MTI, Stats, Virtual Worlds and Computers.
Impact Factor:
6.0 (2024);
5-Year Impact Factor:
5.7 (2024)
Latest Articles
Preserving Spatial and Frequency Information in CNNs: Hilbert Curve Flattening and Wavelet Pooling for Explainable Medical Image Analysis
Mach. Learn. Knowl. Extr. 2026, 8(6), 152; https://doi.org/10.3390/make8060152 - 1 Jun 2026
Abstract
Conventional CNN architectures often struggle with information loss during feature extraction, particularly in pooling and flattening layers, where spatial coherence and high-frequency details critical for tasks such as medical diagnostics are compromised. To address this, we introduce a novel integration of Hilbert curve
[...] Read more.
Conventional CNN architectures often struggle with information loss during feature extraction, particularly in pooling and flattening layers, where spatial coherence and high-frequency details critical for tasks such as medical diagnostics are compromised. To address this, we introduce a novel integration of Hilbert curve flattening and multiscale frequency-selective wavelet pooling, which preserves diagnostically relevant features while optimizing computational efficiency. Multifrequency selective wavelet pooling improves the performance and adaptability of convolutional neural networks by preserving spatial adjacency structures and eliminating duplicate information. Here, raster flattening was replaced with a conventional Hilbert curve that organized data more efficiently, and wavelet pooling performed feature selection across frequency bands better than average pooling or max-pooling. On standard architectures (Inception, VGG16, ResNet, EfficientNet), our approach consistently produced an improved precision of 1.42% over earlier methods across all datasets and classes, including diagnosis of autism via structural MRI in a proof-of-concept dataset (38 subjects, 4 in the test set), with high precision, at 99%. Hence, validation on larger independent cohorts will be part of the future work. The synergy of Hilbert curve flattening and multiscale frequency-selective wavelet pooling mitigates signal decomposition losses and maintains spatial frequency relationships, advancing CNNs for high-stakes applications like medical imaging and remote sensing. These new strategies enhance spatial coherence and global efficiency, ensuring robustness in applications ranging from medical imaging to time-series forecasting.
Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)
►
Show Figures
Open AccessArticle
BRAG: Bayesian Retrieval-Augmented Generation; A Methodological Framework for Evidence-Governed Decision Support
by
Lebede Ngartera, Saralees Nadarajah, Rodoumta Koina and Youssou Gningue
Mach. Learn. Knowl. Extr. 2026, 8(6), 151; https://doi.org/10.3390/make8060151 - 1 Jun 2026
Abstract
In high-stakes settings, the most consequential failure of a language model is not a wrong answer but an answer it was not entitled to give. Existing retrieval-augmented generation (RAG) pipelines retrieve context, generate text, and perhaps add citations, but they do not decide
[...] Read more.
In high-stakes settings, the most consequential failure of a language model is not a wrong answer but an answer it was not entitled to give. Existing retrieval-augmented generation (RAG) pipelines retrieve context, generate text, and perhaps add citations, but they do not decide whether the evidence justifies answering, how uncertain the answer is, or at what level the system should intervene. We argue that LLMs should not only generate answers; they should be embedded inside a selective decision architecture that jointly estimates answerability, quantifies uncertainty, verifies structural validity, and chooses among direct response, escalation, abstention, or failure. We introduce BRAG (Bayesian Retrieval-Augmented Generation), a framework that operationalises this shift from answer generation to evidence-governed decision support. BRAG estimates an answerability posterior, decomposes uncertainty into epistemic and aleatoric components, and applies a structural validity gate prior to answer emission. Evaluation is conducted using controlled Monte Carlo simulation ( queries) and a calibrated statistical pilot ( ), both parametric models of the pipeline’s output distribution, together with a governed operational validation that executes the full released pipeline end-to-end on independently generated MIMIC-IV-schema records ( ; not credentialed patient records), expert adjudication on a stratified subset ( ), and secondary transfer experiments on SEC EDGAR and CUAD. In simulation, BRAG reduces hallucination from 0.257 to 0.016 (93.8%) and achieves the highest coverage-adjusted utility (0.632) among five systems. In the synthetic MIMIC-IV-schema pilot, hallucination decreases from 0.292 to 0.020 (93.2%), with utility 0.538 at 89.6% coverage and an answerability AUROC of 0.692, which is moderate in absolute terms and is therefore positioned as a routing signal that operates jointly with the deterministic validity gate rather than as a stand-alone clinical classifier. Expert adjudication yields substantial agreement (Cohen’s ) and 93.5% concordance with BRAG decisions. Cross-domain transfer demonstrates 96–97% hallucination reduction without retriever modification, while ablation identifies the structural validity gate as the primary safety mechanism and the answerability posterior as the primary coverage and routing-precision mechanism. These results show that combining answerability estimation with structural validity enforcement can substantially reduce unsupported outputs. All findings are methodological rather than clinical: every evaluation tier uses synthetic or schema-conformant data, and validation on credentialed de-identified patient records remains necessary before any clinical deployment.
Full article
(This article belongs to the Section Data)
Open AccessArticle
Temporal Knowledge Extraction Through BayeStack with Multi-Level Explainability for Optimal Sepsis Classification
by
Anjana Geetha, K. L. Nisha, Arun Sankar Muttathu Sivasankara Pillai and Sreenath Rajeev
Mach. Learn. Knowl. Extr. 2026, 8(6), 150; https://doi.org/10.3390/make8060150 - 1 Jun 2026
Abstract
Sepsis, a life-threatening condition causing significant global mortality, requires rapid diagnosis and intervention. Although recent advances in machine learning have supported clinical decision-making, existing sepsis classification approaches exhibit several limitations, including inadequate temporal modeling of disease progression, lack of systematic hyperparameter optimization, fragmented
[...] Read more.
Sepsis, a life-threatening condition causing significant global mortality, requires rapid diagnosis and intervention. Although recent advances in machine learning have supported clinical decision-making, existing sepsis classification approaches exhibit several limitations, including inadequate temporal modeling of disease progression, lack of systematic hyperparameter optimization, fragmented interpretability approaches that do not fully address multi-stakeholder clinical needs, and challenges in achieving balanced sensitivity–specificity trade-offs. These limitations restrict effective extraction of knowledge from complex temporal clinical data and hinder actionable decision-making. To address these challenges, this work proposes BayeStack, a temporal knowledge-extraction framework that integrates Bayesian optimization-driven ensemble learning with hierarchical interpretability to optimize sepsis classification. This framework captures the progression of sepsis through multi-window temporal aggregation, performs optimal classification by applying AUROC-maximizing hyperparameter space exploration, and enables comprehensive clinical knowledge extraction by applying a three-level interpretability framework that includes global feature importance, population-level partial dependence analysis, and patient-specific contribution-level analysis. Evaluation results indicated that BayeStack achieved an AUROC of 0.99 with balanced sensitivity and specificity of 0.97, substantially outperforming all baseline methods ( ). Ablation studies validated that temporal aggregation and data balancing contributed to performance improvements. A strong Spearman correlation ( ) validated the feature ranking convergence and effectiveness of the ensemble strategy. The interpretability framework provides insights into complementary model behavior and extracts evidence-based clinical thresholds for priority-based treatment monitoring, thereby enabling robust clinical decision support. This first phase systematic integration framework of traditional machine learning models establishes baseline performance and explainability standards for subsequent deep learning advancements.
Full article
(This article belongs to the Topic Deep Supplement Learning for Healthcare and Biomedical Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Document Image Binarization Using Various Machine Learning Models and Ensembles Trained on Classic Local and Global Binarization Algorithms and Image Statistics
by
Nicolae Tarbă, Costin-Anton Boiangiu and Mihai-Lucian Voncilă
Mach. Learn. Knowl. Extr. 2026, 8(6), 149; https://doi.org/10.3390/make8060149 - 1 Jun 2026
Abstract
Image binarization is a preprocessing technique that maps an image’s pixel values to either black or white, and it is crucial in many fields of computer vision, such as document digitization and medical imaging. Thresholding is a popular image binarization technique for grayscale
[...] Read more.
Image binarization is a preprocessing technique that maps an image’s pixel values to either black or white, and it is crucial in many fields of computer vision, such as document digitization and medical imaging. Thresholding is a popular image binarization technique for grayscale images because it splits pixel values into greater than or lower than a specific threshold. Global thresholding is fast because it computes only one threshold for the entire image, but it cannot handle many types of noise specific to document images. Local thresholding has greater computational complexity because it adjusts the thresholds for each pixel based on the surrounding pixels, but it can handle such types of noise, although it risks introducing noise in uniform areas of the image. Mixed global–local approaches can mitigate this risk while still being able to handle most types of noise. This paper proposes a mixed global–local thresholding method that harnesses two popular automatic machine learning frameworks to train machine learning models using the results of several thresholding algorithms and other image statistics. Cross-validation was performed to ensure that the selected models are robust and perform well on new data. We obtained results comparable with other state-of-the-art methods on popular document image binarization datasets.
Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)
►▼
Show Figures

Figure 1
Open AccessArticle
Stratified Fréchet Distance: A Three-Layer Diagnostic Framework for Conditional Time Series Generation Under Data Scarcity
by
Tsuyoshi Okita
Mach. Learn. Knowl. Extr. 2026, 8(6), 148; https://doi.org/10.3390/make8060148 - 29 May 2026
Abstract
Evaluating conditional time-series generation models remains challenging in battery research, where degradation data are often limited and experiments cover only a small number of operating conditions. The widely used Fréchet Inception Distance (FID) summarizes all conditions into a single score, which can obscure
[...] Read more.
Evaluating conditional time-series generation models remains challenging in battery research, where degradation data are often limited and experiments cover only a small number of operating conditions. The widely used Fréchet Inception Distance (FID) summarizes all conditions into a single score, which can obscure failures under rare but safety-critical conditions. Several condition-aware extensions of FID, including Conditional Fréchet Inception Distance (CFID), partially address this limitation by evaluating each condition separately. However, these approaches do not assess whether physically meaningful relationships between operating conditions are preserved, and their reliability deteriorates when only a few samples are available for each condition. To address these issues, we propose a three-layer diagnostic framework for evaluating conditional generative models under limited-data conditions. The first layer, Stratified Fréchet Distance, identifies the specific operating conditions and degradation phases where generation quality degrades. The second layer, based on Conditional Response Consistency (CRC), Conditional Distance Ratio (CDR), and Mean-Order Preservation (MOP), evaluates whether the model preserves the distance structure and ordering between conditions. MOP detects condition-ordering defects that CRC cannot identify when the real data distance matrix is non-monotone. This layer also enables statistically meaningful comparisons even when only a small number of samples are available. The third layer detects strata where statistical estimates are unreliable and provides a more stable alternative for evaluation. We validate the framework on four battery degradation datasets using two generative model architectures. The proposed approach reveals condition-specific failures that are not captured by conventional FID. It localizes generation errors to the late-stage high-temperature degradation regime that is most relevant to battery safety. The framework also detects structural distortions with statistical significance. In addition, it consistently ranks physics-informed model variants across quality differences spanning seven orders of magnitude. These results demonstrate that the proposed framework provides a practical and physically interpretable evaluation methodology for conditional generative modeling in battery degradation analysis.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Figure 1
Open AccessArticle
Generation of Synthetic Dataset for Part Segmentation Problems
by
Lovro Sever, Petar Kosec, Stanko Škec and Tomislav Martinec
Mach. Learn. Knowl. Extr. 2026, 8(6), 147; https://doi.org/10.3390/make8060147 - 29 May 2026
Abstract
Part segmentation of industrial 3D models is often limited by the lack of sufficiently large and consistently labeled training datasets. This study proposes a workflow for generating synthetic segmentation datasets from robust parametric computer-aided design (CAD) models and evaluates its applicability on a
[...] Read more.
Part segmentation of industrial 3D models is often limited by the lack of sufficiently large and consistently labeled training datasets. This study proposes a workflow for generating synthetic segmentation datasets from robust parametric computer-aided design (CAD) models and evaluates its applicability on a dental abutment case. The workflow includes the definition of a modeling strategy, creation of a robust parametric CAD model, automated generation of valid geometry variants, and preparation of labeled training data for point-cloud-based segmentation. In the experimental part of the study, a synthetic dataset of segmented dental abutment geometries was generated from the developed parametric CAD model and used to train a PointNeXt-S part-segmentation model. The segmentation performance of the trained model was evaluated on manually labeled real-world abutments. Results show that the segmentation of industrial 3D models improved with increasing synthetic training-set size and further improved when data augmentation was applied. The best-performing augmented model achieved a mean Intersection over Union (IoU) of 89.2% on the real-world validation set, compared with 82.4% without augmentation. The findings indicate that parametric-CAD-based synthetic dataset generation can provide an effective basis for training segmentation models for complex industrial geometries.
Full article
(This article belongs to the Section Data)
►▼
Show Figures

Figure 1
Open AccessArticle
Ensemble Variability as a Signal of Confounding in Medical Imaging Models
by
Uma M. Lal-Trehan Estrada, Sunil A. Sheth, Arnau Oliver, Xavier Lladó and Luca Giancardo
Mach. Learn. Knowl. Extr. 2026, 8(6), 146; https://doi.org/10.3390/make8060146 - 27 May 2026
Abstract
Machine learning models for medical image analysis are vulnerable to hidden confounders, which can compromise generalization and clinical reliability. Existing detection strategies typically require explicit knowledge or labels of the confounder, which are often unavailable. In this work, we propose an ensemble-based framework
[...] Read more.
Machine learning models for medical image analysis are vulnerable to hidden confounders, which can compromise generalization and clinical reliability. Existing detection strategies typically require explicit knowledge or labels of the confounder, which are often unavailable. In this work, we propose an ensemble-based framework to detect potential confounder-driven learning without explicitly defining the confounders, but only which samples might be affected. Our approach leverages the variability of model performance across ensembles to identify signatures of shortcut learning. Shortcut learning occurs when a model uses non-robust features or correlations rather than learning the true underlying task, and it is often observed when confounders are present. We generate controlled dataset variants with increasing confounding levels and analyze distributions of AUC (area under the ROC curve) scores across training, validation, and test splits, revealing converging performance and reduced variance as confounding intensifies. We validate our method on two clinically relevant tasks, diabetic retinopathy detection from retinal fundus images and tumor detection from brain MRI slices. Then, we further demonstrate its practical utility on another dataset and image modality with a stroke reperfusion prediction task with suspected hidden confounders. This work provides a practical, data-driven diagnostic tool to flag potential confounding and support the reliability assessment of machine learning models in medical imaging.
Full article
(This article belongs to the Section Data)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Looking Ahead When It Is Safe: An Uncertainty-Aware Paradigm for Blood Glucose Prediction with Dynamic Horizon Control
by
Sarala Ghimire, Turgay Celik, Martin Gerdes and Christian W. Omlin
Mach. Learn. Knowl. Extr. 2026, 8(6), 145; https://doi.org/10.3390/make8060145 - 26 May 2026
Abstract
Reliable time-series forecasting under rapidly changing conditions remains a critical challenge across many domains, particularly in healthcare, where physiological signals are inherently dynamic and uncertain. Blood glucose level prediction exemplifies this challenge, as accurate and timely forecasts are essential for effective diabetes management,
[...] Read more.
Reliable time-series forecasting under rapidly changing conditions remains a critical challenge across many domains, particularly in healthcare, where physiological signals are inherently dynamic and uncertain. Blood glucose level prediction exemplifies this challenge, as accurate and timely forecasts are essential for effective diabetes management, yet traditional approaches rely on fixed prediction horizons and single-point estimates, which may yield unreliable decisions under rapidly changing physiological conditions. In this work, we propose a novel approach for adaptive horizon selection, applied to BGL prediction, employing a deep learning model. It employs evidential learning-based uncertainty quantification that decompose uncertainty into epistemic and aleatoric. Each set of models is trained to predict blood glucose levels at different future time steps, each providing both a point prediction and an associated uncertainty measure. At inference time, it dynamically balances predictive accuracy and reliability by selecting the longest horizon whose predicted uncertainty remains below a predefined threshold. This enables confidence-based horizon selection, using longer prediction horizons during stable periods and switching to shorter horizons when uncertainty signals critical glucose events requiring immediate intervention. This uncertainty-aware prediction approach promotes transparency by exposing confidence levels alongside predictions. Applicable to time-series forecasting tasks broadly, the proposed framework demonstrates encouraging potential, and when applied to BGL prediction as a representative clinical case, shows particular promise for supporting glycemic management through calibrated uncertainty estimation, offering a more transparent and interpretable alternative to fixed-horizon models toward trustworthy decision support in diabetes care.
Full article
(This article belongs to the Topic Deep Supplement Learning for Healthcare and Biomedical Applications)
►▼
Show Figures

Figure 1
Open AccessReview
Explainable Conversational Agents for Mobile Health Coaching Systems: Trust Factors, Progress and Opportunities
by
Luminous Akazua, Jianlong Zhou, Fang Chen, Niusha Shafiabady, George Tian, Andreas Holzinger and Heimo Müller
Mach. Learn. Knowl. Extr. 2026, 8(6), 144; https://doi.org/10.3390/make8060144 - 25 May 2026
Abstract
Background: Artificial Intelligence (AI) and Machine Learning (ML) technologies, such as conversational agents, are becoming increasingly essential tools across multiple industries, particularly in healthcare. This paper presents a scoping review (PRISMA-ScR) of conversational agents (CAs) in mobile health coaching systems (MHCS). It
[...] Read more.
Background: Artificial Intelligence (AI) and Machine Learning (ML) technologies, such as conversational agents, are becoming increasingly essential tools across multiple industries, particularly in healthcare. This paper presents a scoping review (PRISMA-ScR) of conversational agents (CAs) in mobile health coaching systems (MHCS). It examines existing applications of MHCS, focusing on development strategies, usage contexts, impacts on users, benefits, and research gaps, emphasizing the ability of explainable artificial intelligence (XAI) in making health guidance and decision-support recommendations transparent, trustworthy, and interpretable, if properly integrated. This scoping review identifies opportunities to maximize the use of conversational agents, explainable AI, and mobile technologies to make mobile health coaching systems more accessible and trustworthy, as well as further research gaps worth exploring. Objective: This scoping review maps the evidence on CAs and XAI-enabled technologies in MHCS, identifies trust-related design criteria, categorizes reported outcomes, and highlights opportunities for explainable conversational agents (XCA) in a mobile health context, especially in tackling general medical conditions pertinent in underserved settings. Eligibility criteria: Reported eligible resources evaluated, designed, or conceptually analyzed existing CAs, XAI techniques, and MHCS, AI-supported medical dialogue systems, e-coaching systems, and mobile health applications. We considered sources only relevant to healthcare, health coaching, trust, explainability, or patient engagement that were published between 2006 and 2025. Sources of Evidence: Searches were conducted in IEEE Xplore, Google Scholar, Springer, ScienceDirect/Elsevier, ProQuest, and ACM Digital Library, supplemented by targeted web searches and backward citation checks. Charting methods: Data were charted by system type, communication mode, health context, operational mode, technology used, XAI/trust features, degree of automation, study designs and outcome classification. We applied a revised outcome classification: generated desired outcome (GDO) and partially generated desired outcome (P-GDO), and did not generate desired outcome (DN-GDO). Results: A total of 201 resources were collected. Charted studies clustered around CAs in health, MHCS for chronic diseases and stress management, XAI methods such as LIME, SHAP, Prospector, and counterfactual explanations, and trust-related elements such as voice quality, communication style, appearance, social intelligence, privacy, and performance quality. Most health CAs and MHCS addressed chronic diseases, mental health, or behavior change; fewer addressed general medical diagnosis or autonomous mobile-based primary care support. Conclusions: Existing evidence suggests that CAs and MHCSs can support engagement, coaching, education, and selected decision-support tasks, but evidence for safe, autonomous, explainable general practice functionality remains limited. Future work should prioritize clinically supervised XCA designs, core safety assessment, interfaces with transparent explanation, data protection, culturally and linguistically responsive implementation, and future-oriented review in underserved mobile health settings.
Full article
(This article belongs to the Section Thematic Reviews)
Open AccessArticle
Integrating Hybrid Attention Mechanisms into CNN-Based Architectures to Enhance Image Classification and Interpretability
by
Alidor M. Mbayandjambe, Selain K. Kasereka, Darren Kevin T. Nguemdjom, Petro M. Tshakwanda, Milena Savova-Mratsenkova and Tasho Tashev
Mach. Learn. Knowl. Extr. 2026, 8(6), 143; https://doi.org/10.3390/make8060143 - 25 May 2026
Abstract
Integrating complementary attention mechanisms into standard Convolutional Neural Networks (CNNs) is a promising strategy for improving feature discrimination without substantial computational overhead. This paper presents a controlled empirical study of a hybrid attention module that combines Squeeze-and-Excitation Networks (SENet) and the Convolutional Block
[...] Read more.
Integrating complementary attention mechanisms into standard Convolutional Neural Networks (CNNs) is a promising strategy for improving feature discrimination without substantial computational overhead. This paper presents a controlled empirical study of a hybrid attention module that combines Squeeze-and-Excitation Networks (SENet) and the Convolutional Block Attention Module (CBAM) through an adaptive element-wise summation with a learnable weighting parameter and a residual connection. This work contributes a systematic and statistically rigorous evaluation of attention fusion across four CNN backbones (ResNet18, VGG16, AlexNet, and SqueezeNet) on the CIFAR-10 benchmark at resolution. All models were trained from scratch under a deliberately conservative protocol (50 epochs, no pretrained weights, standard augmentation) to isolate the incremental effect of attention mechanisms under controlled conditions. Under this protocol, the hybrid SENet+CBAM configuration achieves statistically significant accuracy improvements over the corresponding baselines ( , 5-fold cross-validation): ResNet18 improves from 77.93% to 90.71% (+12.78%), VGG16 from 55.78% to 70.17% (+14.39%), AlexNet from 62.67% to 71.82% (+9.15%), and SqueezeNet from 71.91% to 78.29% (+6.38%). These gains must be interpreted within the scope of this controlled setting. Absolute accuracy values are below fully optimized literature benchmarks. For VGG16 in particular, part of the improvement likely reflects correction of underfitting under the conservative protocol, not the full potential of the hybrid mechanism. Parameter overhead remains modest at 1.5–5.8%, and training convergence improves by 16.5% on average. The hybrid approach outperforms the best previously reported SENet+CBAM result for each architecture by an average of 2.32%. Grad-CAM visualizations and attention entropy analysis provide qualitative evidence of more concentrated spatial attention patterns under the hybrid configuration. These should be understood as proxy indicators rather than rigorous interpretability measures. Validation on higher-resolution benchmarks such as CIFAR-100, STL-10, and ImageNet subsets is a necessary next step before broader applicability can be claimed.
Full article
(This article belongs to the Topic Opportunities and Challenges in Explainable Artificial Intelligence (XAI))
Open AccessArticle
Wavelet-Guided Mamba-Attention Network for Boundary-Aware Colorectal Polyp Segmentation
by
Xin Liu, Nor Ashidi Mat Isa, Chao Chen, Hanxu Liu, Chao Wang and Fajin Lv
Mach. Learn. Knowl. Extr. 2026, 8(6), 142; https://doi.org/10.3390/make8060142 - 23 May 2026
Abstract
Colorectal cancer is the third most commonly diagnosed cancer worldwide, and early detection of polyps via colonoscopy is essential for improving patient survival. However, automatic polyp segmentation faces three key challenges: balancing global context with local detail, delineating ambiguous boundaries under low contrast,
[...] Read more.
Colorectal cancer is the third most commonly diagnosed cancer worldwide, and early detection of polyps via colonoscopy is essential for improving patient survival. However, automatic polyp segmentation faces three key challenges: balancing global context with local detail, delineating ambiguous boundaries under low contrast, and handling large variations in polyp size and morphology. To address these challenges, we propose WMA-Net, a Wavelet-Guided Mamba-Attention Network that uses wavelet-domain semantic–boundary separation as the organizing design principle. Rather than introducing a new individual operator, the contribution lies in how existing components—wavelet decomposition, Mamba state space modeling, multi-directional pixel difference convolution, and uncertainty-aware reverse attention—are combined and coordinated within one boundary-aware framework. The architecture integrates pixel difference convolution for multi-directional edge detection, frequency-selective cross-scale fusion with dual-stream wavelet-domain processing, Mamba-based multi-scale aggregation with linear complexity, and uncertainty-aware progressive boundary refinement. Extensive experiments on five public polyp benchmarks demonstrate state-of-the-art performance on four out of five datasets. On the seen datasets, WMA-Net achieves mean Dice scores of 94.4% on CVC-ClinicDB and 93.6% on Kvasir-SEG. On the unseen datasets, WMA-Net attains 91.7% on CVC-300, 82.3% on CVC-ColonDB, and 83.8% on ETIS-LaribPolypDB, demonstrating robust cross-dataset generalization. Comprehensive ablation studies validate the effectiveness and synergy of each proposed module.
Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)
►▼
Show Figures

Figure 1
Open AccessReview
Large Language Model Benchmarks: A Taxonomy of Capabilities, Scientific Quality Assessment, and Saturation Analysis
by
Rubén Gómez, Carlos E. Miranda, Julio-Alejandro Romero-González, Diana-Margarita Córdova-Esparza, Gendry Alfonso-Francia, Edgar-Arturo Chávez-Urbiola, Alfonso Ramirez-Pedraza and Juan Terven
Mach. Learn. Knowl. Extr. 2026, 8(6), 141; https://doi.org/10.3390/make8060141 - 22 May 2026
Abstract
The rapid evolution of Large Language Models (LLMs) has exposed limitations of static, accuracy-oriented benchmarks and increased the need for evaluation frameworks that distinguish among capabilities and benchmark quality. This survey analyzes 63 LLM benchmarks spanning 2012–2026 and organizes them into a taxonomy
[...] Read more.
The rapid evolution of Large Language Models (LLMs) has exposed limitations of static, accuracy-oriented benchmarks and increased the need for evaluation frameworks that distinguish among capabilities and benchmark quality. This survey analyzes 63 LLM benchmarks spanning 2012–2026 and organizes them into a taxonomy of six capability dimensions and 20 operational subcategories. We also propose the Benchmark Quality Assurance Index (BQAI), an AHP-weighted composite framework for assessing the scientific quality of benchmarks across seven dimensions related to annotation, clarity, standardization, reproducibility, robustness, coverage, and fairness. The BQAI is applied to 30 representative benchmarks, corresponding to 48% of the 63-benchmark corpus, with three-evaluator blinded scoring, formal inter-rater reliability validation and quadratic-weighted Cohen’s , and Monte Carlo sensitivity analysis . In addition, we synthesize public performance results for 16 models across 10 benchmarks to examine saturation trends and reporting gaps. The analysis indicates that benchmark usefulness varies substantially across evaluation settings, that several established benchmarks are becoming less discriminative for frontier models, and that important gaps remain in safety, agentic, and cross-cultural assessment. Together, the taxonomy, BQAI, and saturation analysis provide a structured perspective on the current LLM benchmark landscape and on priorities for more rigorous evaluation.
Full article
(This article belongs to the Section Thematic Reviews)
Open AccessArticle
Unnoticeable Hybrid Watermarking for Deep Neural Network Authentication Using Auxiliary Hidden Layers
by
Rodrigo Eduardo Arevalo-Ancona and Manuel Cedillo-Hernandez
Mach. Learn. Knowl. Extr. 2026, 8(6), 140; https://doi.org/10.3390/make8060140 - 22 May 2026
Abstract
The authentication and protection of deep neural network models have become challenging due to their widespread distribution and reuse, making them vulnerable to unauthorized access. This paper addresses the need for ownership verification by proposing a hybrid neural network watermarking method for secure
[...] Read more.
The authentication and protection of deep neural network models have become challenging due to their widespread distribution and reuse, making them vulnerable to unauthorized access. This paper addresses the need for ownership verification by proposing a hybrid neural network watermarking method for secure model authentication. The approach combines a steganographic watermark embedded into stable model weights with a user code for watermark recovery encoded in auxiliary hidden layers. Stable parameters are identified through a reduced training to estimate gradient variations for the watermark insertion with minimal impact on model performance. Additionally, two auxiliary layers are introduced, to store in the first layer the metadata indices from the selected weights where the watermark was embedded and in the second layer the user code, supporting secure identification and verification. Experimental evaluations demonstrate that the proposed method remains robust under different model optimization attacks, including pruning, fine-tuning, additive noise injection, and parameter overwriting, while preserving model performance. The proposed framework achieves a BER = 0 under several moderate attack scenarios across different neural network models, whereas more aggressive optimizations degrade the watermark recovery performance. These results indicate that the proposed framework provides an effective solution for neural network ownership protection while maintaining the model performance.
Full article
(This article belongs to the Section Safety, Security, Privacy, and Cyber Resilience)
►▼
Show Figures

Figure 1
Open AccessArticle
A Directional Semantic Enhancement Approach with Gated Fusion for Multimodal Arabic Sentiment Analysis
by
Ayoub Ben Cheikhi, El Habib Nfaoui and Oumayma Elbiach
Mach. Learn. Knowl. Extr. 2026, 8(5), 139; https://doi.org/10.3390/make8050139 - 21 May 2026
Abstract
Multimodal Arabic sentiment analysis has gained increasing attention due to the growing volume of user-generated multimedia content. However, effectively integrating textual, acoustic, and visual modalities remains challenging because of modality imbalance and weak cross-modal alignment. This study proposes a Directional Semantic Enhancement approach
[...] Read more.
Multimodal Arabic sentiment analysis has gained increasing attention due to the growing volume of user-generated multimedia content. However, effectively integrating textual, acoustic, and visual modalities remains challenging because of modality imbalance and weak cross-modal alignment. This study proposes a Directional Semantic Enhancement approach with Gated Fusion to address these limitations. The objective is to explicitly model similarity-guided semantic transfer between modalities while adaptively regulating information flow during fusion. The proposed architecture consists of four main stages: modality encoding, directional semantic enhancement, gated fusion, and classification. Directional semantic interactions enable structured cross-modal knowledge exchange, while adaptive gating mechanisms balance original and enhanced representations to mitigate modality-specific noise. Extensive experiments are conducted on the Ar-MuSA benchmark dataset, which contains 8700 multimodal samples. The proposed approach achieves 89.89% accuracy and an F1-score of 0.8989 with a latent dimension of 1024, outperforming early fusion, late fusion, and recent state-of-the-art methods. The study highlights the importance of controlled cross-modal alignment and provides a scalable approach for robust multimodal sentiment understanding in Arabic multimedia environments.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Figure 1
Open AccessArticle
Evaluating the Impact of Adapter-Based Fine-Tuning on Structured Parsing Performance in Large Language Models
by
Ratomir Karlović, Luka Sever, Sandi Baressi Šegota, Vedran Mrzljak and Ivan Lorencin
Mach. Learn. Knowl. Extr. 2026, 8(5), 138; https://doi.org/10.3390/make8050138 - 21 May 2026
Abstract
Recent advances in large language models (LLMs) highlight two dominant strategies for performance improvement: prompt engineering and fine-tuning. While prompt design can significantly influence model output, it remains uncertain whether lightweight fine-tuning methods, such as adapter-based training, offer meaningful advantages for structured, domain-specific
[...] Read more.
Recent advances in large language models (LLMs) highlight two dominant strategies for performance improvement: prompt engineering and fine-tuning. While prompt design can significantly influence model output, it remains uncertain whether lightweight fine-tuning methods, such as adapter-based training, offer meaningful advantages for structured, domain-specific tasks. This study builds on prior research comparing three prompting strategies for natural-language command parsing into JSON schemas. Expanding that framework, the current work investigates how adapter-based fine-tuning, where most model parameters are frozen and only small adapter modules are trained, affects model accuracy and consistency. The experiment uses the same controlled shopping-cart parsing task and dataset of 12,000 synthetic commands to ensure direct comparability. Results quantify the trade-off between computational cost and performance gains, offering evidence-based insights into whether fine-tuning is a justified investment compared to advanced prompt engineering. The contribution of this study is a clear, empirical framework for deciding when fine-tuning meaningfully enhances LLM utility in applied natural-language understanding.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Generating Multiple-Choice Knowledge Questions with Interpretable Difficulty Estimation Using Knowledge Graphs and Large Language Models
by
Mehmet Can Şakiroğlu, Halil Altay Güvenir and Kamer Kaya
Mach. Learn. Knowl. Extr. 2026, 8(5), 137; https://doi.org/10.3390/make8050137 - 20 May 2026
Abstract
Generating multiple-choice questions (MCQs) with difficulty estimation remains challenging in automated MCQ-generation systems used in adaptive, AI-assisted education. This study proposes a novel methodology for generating MCQs with difficulty estimation from input documents by utilizing knowledge graphs (KGs) and large language models (LLMs).
[...] Read more.
Generating multiple-choice questions (MCQs) with difficulty estimation remains challenging in automated MCQ-generation systems used in adaptive, AI-assisted education. This study proposes a novel methodology for generating MCQs with difficulty estimation from input documents by utilizing knowledge graphs (KGs) and large language models (LLMs). Our approach uses an LLM to construct a KG from input documents, from which MCQs are then systematically generated. Each MCQ is generated by selecting a node from the KG as the key, sampling a related triple or quintuple—optionally augmented with an extra triple—and prompting an LLM to generate a corresponding stem from these graph components. Distractors are then selected from the KG. For each MCQ, nine difficulty signals are computed and combined into a unified difficulty score using a data-driven approach. Within a 150-MCQ, proof-of-concept dataset from Wikipedia, the proposed signals show interpretable associations with empirical incorrect-answer rates aligning with human responses/performance. The results support the feasibility of the proposed pipeline, yet a larger-scale human study may be required to establish deployment-scale validity. Our approach improves automated MCQ generation by integrating structured knowledge representations with LLMs and a data-driven difficulty estimation model.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Figure 1
Open AccessArticle
Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in the World Wide Lab
by
Leon Gorissen, Jan-Niklas Schneider, Mohamed Behery, Philipp Brauner, Moritz Lennartz, David Kötter, Thomas Kaster, Oliver Petrovic, Christian Hinke, Thomas Gries, Gerhard Lakemeyer, Martina Ziefle, Christian Brecher and Constantin Häfner
Mach. Learn. Knowl. Extr. 2026, 8(5), 136; https://doi.org/10.3390/make8050136 - 20 May 2026
Abstract
The digital transformation of production requires methods for integrating, storing, and operationalizing data across organizational boundaries, yet most existing approaches remain siloed and unidirectional, lacking a systematic loop from raw data to actionable knowledge and back. We introduce Data-to-Knowledge (D2K) and Knowledge-to-Data (K2D)
[...] Read more.
The digital transformation of production requires methods for integrating, storing, and operationalizing data across organizational boundaries, yet most existing approaches remain siloed and unidirectional, lacking a systematic loop from raw data to actionable knowledge and back. We introduce Data-to-Knowledge (D2K) and Knowledge-to-Data (K2D) pipelines as a universal production concept built on networks of Digital Shadows. The Data-to-Knowledge (D2K) pipeline is realized as a cross-organizational proof of concept that captures and semantically annotates robotic trajectory data from three independent research institutes and uses those data to train an inverse-dynamics foundation model for robot control. Centralized aggregation via an existing FAIR-compliant research data repository was chosen deliberately over federated alternatives to maximize semantic interoperability and reuse of shared infrastructure; federated and privacy-preserving extensions are identified as a promising future direction. Fine-tuning the cross-organizationally trained foundation model reduces training time by approximately 85% relative to end-to-end training from scratch, while achieving comparable accuracy on a standardized inverse-dynamics benchmark. These gains are attributable to the combination of cross-site data aggregation and transfer learning; isolating the contribution of semantic annotation alone remains a topic for future ablation work. The implementation demonstrates that semantically enriched, cross-organizational D2K pipelines can accelerate model development and reduce redundant data collection within a constrained but practically relevant class of robotics tasks. We further discuss limitations, governance challenges, and how these pipelines can contribute to a broader World Wide Lab for collaborative production research.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Graphical abstract
Open AccessArticle
HybridHiT-UNet: Multi-Scale Temporal U-Net with Hierarchical Shot-Aware Transformers for Video Summarization
by
Saadman Sakib, Tanjim Mahmud, Karl Andersson and Kaushik Deb
Mach. Learn. Knowl. Extr. 2026, 8(5), 135; https://doi.org/10.3390/make8050135 - 20 May 2026
Abstract
Video summarization aims to produce a short yet informative summary of a long video while reducing the amount of redundancy. Most transformer-based methods are single-temporal scale or are unconcerned with shot-level structure, limiting temporal coherence and cross-dataset generalization. To fill these gaps, we
[...] Read more.
Video summarization aims to produce a short yet informative summary of a long video while reducing the amount of redundancy. Most transformer-based methods are single-temporal scale or are unconcerned with shot-level structure, limiting temporal coherence and cross-dataset generalization. To fill these gaps, we present HybridHiT-UNet, a supervised framework that combines three complementary parts: a pretrained Vision Transformer encoder to provide spatially rich frame representations, a multi-scale 1D Temporal U-Net backbone to provide hierarchical temporal modeling of frame representations, and a shot-aware hierarchical transformer scoring head to provide inter-shot context to importance prediction. Frame-level scores are summed into shot-level utilities and optimized with a knapsack selection on a fixed-length budget, and a weighted focal loss is used to address extreme class imbalance. Wide experiments using four benchmarks (SumMe, TVSum, OVP, and YouTube) under canonical, augmented, and transfer protocols demonstrate that HybridHiT-UNet achieves F1-scores of 65.8% on SumMe and 79.92% on TVSum, which is higher than the existing methods, which still achieve diversity scores of 64.98% and 48.68%, respectively. A systematic study further demonstrates that a 20% summary budget would yield a consistently superior coverage-diversity trade-off than the traditional 15% one, which provides useful evidence-based advice on the selection of summary length.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Graphical abstract
Open AccessSystematic Review
Explainable Artificial Intelligence (XAI) for Cancer Classification in Medical Imaging: A Systematic Review
by
Khairil Imran Ghauth and Yanche Ari Kustiawan
Mach. Learn. Knowl. Extr. 2026, 8(5), 134; https://doi.org/10.3390/make8050134 - 20 May 2026
Abstract
Our study examines the growing role of Explainable Artificial Intelligence (XAI) in cancer medical imaging, where transparency and interpretability are essential for trustworthy clinical decision making. Using a PRISMA-guided systematic literature review, 926 records published between 2020 and 2026 were identified from major
[...] Read more.
Our study examines the growing role of Explainable Artificial Intelligence (XAI) in cancer medical imaging, where transparency and interpretability are essential for trustworthy clinical decision making. Using a PRISMA-guided systematic literature review, 926 records published between 2020 and 2026 were identified from major databases, with 46 studies meeting the inclusion criteria after screening and quality assessment. The review systematically analyzes XAI techniques, model architectures, evaluation approaches, interpretability mechanisms, challenges, and future research directions. The findings show that gradient-based methods, particularly Grad-CAM, dominate the field due to their ease of integration with convolutional neural networks. At the same time, complementary approaches such as LIME, SHAP, and Integrated Gradients provide additional attribution insights. Evaluation practices remain heterogeneous, with a strong reliance on qualitative visual inspection and limited standardized quantitative frameworks. XAI contributes to interpretability primarily through spatial localization, feature attribution, and clinical decision support; however, challenges persist, including instability in explanations, coarse localization, high computational cost, and limited compatibility with transformer-based models. Overall, while XAI enhances transparency in cancer imaging, its clinical reliability remains constrained by methodological and technical limitations. Future work should focus on standardized evaluation, clinician-centered validation, and the development of robust, multimodal, and architecture-aware explainability frameworks.
Full article
(This article belongs to the Special Issue Clinically Robust and Transparent AI-Assisted Medical Diagnostics: From Learning Dynamics to Real-World Deployment)
►▼
Show Figures

Graphical abstract
Open AccessSystematic Review
Reactive to Predictive Mobility Management: A Systematic Review of ML-Driven Handover Optimization in 5G and Beyond
by
Teresia Ankome and Eisuke Hanada
Mach. Learn. Knowl. Extr. 2026, 8(5), 133; https://doi.org/10.3390/make8050133 - 18 May 2026
Abstract
Handover optimization is essential for seamless connectivity in 5G and beyond networks. Existing approaches present fundamental challenges of centralized solutions achieving coordination and accuracy but creating privacy risks under the General Data Protection Regulation (GDPR), while distributed privacy-preserving approaches protect user data but
[...] Read more.
Handover optimization is essential for seamless connectivity in 5G and beyond networks. Existing approaches present fundamental challenges of centralized solutions achieving coordination and accuracy but creating privacy risks under the General Data Protection Regulation (GDPR), while distributed privacy-preserving approaches protect user data but lack the network-wide visibility necessary for optimal mobility decisions. This systematic review synthesizes 49 peer-reviewed studies published between 2010 and 2025, identified through a PRISMA-compliant search across IEEE Xplore, ScienceDirect, SpringerLink, MDPI, ACM Digital Library, and Google Scholar. Eligible studies addressed cellular handover or mobility management using traditional signal-based, Machine Learning, Federated Learning, Software-Defined Networking strategies, and reported quantitative performance metrics. A structured quality assessment evaluated methodological rigor, dataset validation, benchmarking practices, handover-specific metrics, and scalability. Synthesis evidence shows that existing approaches do not simultaneously satisfy critical requirements for next-generation mobility management of accuracy, privacy, scalability, and real-time network-wide coordination. Machine learning achieves high accuracy (up to 97%) but depends on centralized data; Reinforcement Learning supports real-time adaptation but incurs high computational costs; federated learning preserve privacy but suffers from limited global coordination; and software-defined networking enables centralized control but requires continuous transmission of raw data. Evidence quality is further limited to simulation-based assessments and limited real-world datasets. Overall, the reviews identify a clear evolution from reactive threshold-based methods towards proactive prediction and highlights the need for unified, privacy-preserving and globally coordinated handover frameworks. The findings point toward integrating federated learning with Software-Defined Mobile Networking as promising architectural direction for 6G mobility management.
Full article
(This article belongs to the Topic AI and Computational Methods for Modelling, Simulations and Optimizing of Advanced Systems: Innovations in Complexity, 2nd Edition)
►▼
Show Figures

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
AI, Applied Sciences, Electronics, MAKE
Deep Supplement Learning for Healthcare and Biomedical Applications
Topic Editors: Tahir Cetin Akinci, Ömer Faruk ErtuǧrulDeadline: 30 June 2026
Topic in
Atmosphere, Earth, Encyclopedia, Entropy, Fractal Fract, MAKE, Meteorology
Revisiting Butterfly Effect, Multiscale Dynamics, and Predictability Using Ai-Enhanced Modeling Framework (AEMF) and Chaos Theory
Topic Editors: Bo-Wen Shen, Roger A. Pielke Sr., Xubin ZengDeadline: 31 July 2026
Topic in
Algorithms, Applied Sciences, Electronics, MAKE, AI, Software
Applications of NLP, AI, and ML in Software Engineering
Topic Editors: Affan Yasin, Javed Ali Khan, Lijie WenDeadline: 30 August 2026
Topic in
AI, MAKE, Robotics, Sensors, Electronics
Deep Visual Recognition: Methods, and Applications
Topic Editors: Min Young Kim, Francisco Gomez-DonosoDeadline: 30 October 2026
Conferences
Special Issues
Special Issue in
MAKE
Language Acquisition and Understanding
Guest Editors: Michal Ptaszynski, Rafal Rzepka, Masaharu YoshiokaDeadline: 15 July 2026
Special Issue in
MAKE
Advancing Natural Language Processing for Low-Resource Languages and Dialects
Guest Editors: Tanjim Mahmud, Michal Ptaszynski, Karl AnderssonDeadline: 31 July 2026
Special Issue in
MAKE
Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning
Guest Editor: Konstantinos DiamantarasDeadline: 31 August 2026
Special Issue in
MAKE
Explainable Artificial Intelligence: Theoretical Foundations and Methodological Advances
Guest Editors: Sheng Du, Javier Del Ser LorenteDeadline: 31 August 2026
Topical Collections
Topical Collection in
MAKE
Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction
Collection Editor: Andreas Holzinger
Topical Collection in
MAKE
Feature Papers in Safety, Security, Privacy, and Cyber Resilience
Collection Editor: Simon Tjoa
Topical Collection in
MAKE
Robust and Uncertainty-Aware Learning from Real-World Data
Collection Editors: Federico Cabitza, Andrea Campagner

