Journal Description
Machine Learning and Knowledge Extraction
Machine Learning and Knowledge Extraction
is an international, peer-reviewed, open access, monthly journal on machine learning and applications, see our video on YouTube explaining the MAKE journal concept.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 27 days after submission; acceptance to publication is undertaken in 4.4 days (median values for papers published in this journal in the second half of 2025).
- Journal Rank: JCR - Q1 (Engineering, Electrical and Electronic) / CiteScore - Q1 (Engineering (miscellaneous))
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
- Journal Cluster of Artificial Intelligence: AI, AI in Medicine, Algorithms, BDCC, MAKE, MTI, Stats, Virtual Worlds and Computers.
Impact Factor:
6.0 (2024);
5-Year Impact Factor:
5.7 (2024)
Latest Articles
CTCF: A Three-Level Coarse-to-Fine Cascade for Unsupervised Deformable Medical Image Registration
Mach. Learn. Knowl. Extr. 2026, 8(5), 122; https://doi.org/10.3390/make8050122 (registering DOI) - 2 May 2026
Abstract
Deformable medical image registration aims to spatially align anatomical structures across volumetric scans. Recent transformer-based methods achieve high overlap accuracy but often produce deformation fields with topological violations. We propose CTCF, a Cascade Transformer for Coarse-to-Fine registration that wraps a lightweight coarse-and-refined envelope
[...] Read more.
Deformable medical image registration aims to spatially align anatomical structures across volumetric scans. Recent transformer-based methods achieve high overlap accuracy but often produce deformation fields with topological violations. We propose CTCF, a Cascade Transformer for Coarse-to-Fine registration that wraps a lightweight coarse-and-refined envelope around a core registration module. Level 1 provides a coarse displacement estimate at quarter resolution, Level 2 performs the main registration via a Swin Transformer encoder with deformable cross-attention and a learned super-resolution decoder, and Level 3 applies error-driven flow refinement at half resolution. The two outer levels add only 3.0% parameter overhead yet improve registration accuracy while maintaining competitive deformation regularity relative to external baselines. The model is trained end-to-end with a composite unsupervised loss combining local normalized cross-correlation, diffusion regularization, inverse-consistency, and Jacobian-based topology preservation. On the OASIS brain MRI benchmark, CTCF achieves the highest Dice score of 0.8208 among the compared unsupervised methods while maintaining competitive SDlogJ, with all Dice improvements statistically significant at by the Wilcoxon signed-rank test. On IXI, CTCF also achieves the best Dice, HD95, SDlogJ, and fold percentage among the compared methods. A five-round ablation study validates each component: cascade decomposition isolates each level’s contribution, and resolution scaling experiments confirm the framework’s scalability, yielding further accuracy gains with zero parameter overhead.
Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)
►
Show Figures
Open AccessReview
A Survey on Self-Supervised Learning in Cybersecurity: Network Intrusion and Malware Detection
by
Josue Genaro Almaraz-Rivera, Jose Antonio Cantoral-Ceballos and Juan Felipe Botero
Mach. Learn. Knowl. Extr. 2026, 8(5), 121; https://doi.org/10.3390/make8050121 - 1 May 2026
Abstract
Self-Supervised Learning (S-SL) is a recent line of research that could represent the next step to understanding human intuition. By blending the strengths of unsupervised and supervised learning paradigms, S-SL endows Deep Learning models with stronger generalization capabilities. Although better known for its
[...] Read more.
Self-Supervised Learning (S-SL) is a recent line of research that could represent the next step to understanding human intuition. By blending the strengths of unsupervised and supervised learning paradigms, S-SL endows Deep Learning models with stronger generalization capabilities. Although better known for its applications in Computer Vision and Natural Language Processing, S-SL has also proved its value in other fields, such as cybersecurity. In this work, we review the current progress and future trends in S-SL for the two most relevant problems discussed in the cybersecurity literature: network intrusion and malware detection. The scope of this survey spans from 2019 to 2025. From an initial analysis of over 200 documents, we distill the 50 most relevant papers. We also highlight opportunity areas, such as attack detection over encrypted network traffic, RAM-based analysis of obfuscated malware, creating S-SL models for tabular data and resource-constrained devices, as well as the research on backdooring, encoder extraction, the transferability of vulnerabilities, and data memorization in S-SL. To the best of our knowledge, this is the first comprehensive survey regarding the application of Self-Supervised Learning in cybersecurity, benchmarking contrastive learning vs auxiliary pretext tasks and presenting the data requirements for implementing S-SL solutions in this field. We hope this paper provides a firm ground for further exploration.
Full article
(This article belongs to the Section Thematic Reviews)
Open AccessArticle
Individual Indicators of the Learning Process for Identifying Critical Thinking in Students in Adaptive Learning
by
Vassiliy Serbin, Mateus Mendes, Aray Kassenkhan, Akbayan Bekarystankyzy, Gulnur Ibragim and Azamat Tolegenov
Mach. Learn. Knowl. Extr. 2026, 8(5), 120; https://doi.org/10.3390/make8050120 - 1 May 2026
Abstract
The rapid digitalization of higher education has intensified the need for reliable methods to assess higher-order cognitive skills, particularly critical thinking, in adaptive learning environments. However, most existing assessment approaches rely primarily on test outcomes and academic performance indicators, which do not adequately
[...] Read more.
The rapid digitalization of higher education has intensified the need for reliable methods to assess higher-order cognitive skills, particularly critical thinking, in adaptive learning environments. However, most existing assessment approaches rely primarily on test outcomes and academic performance indicators, which do not adequately capture the multidimensional and process-based nature of critical thinking. This study proposes a multi-criteria hierarchical model for identifying and quantitatively assessing students’ critical thinking based on individual process indicators of learning activity in an intelligent educational environment. The model integrates cognitive, metacognitive, and behavioral indicators, including knowledge dynamics, task complexity, time characteristics, learning activity intensity, error rate, level of doubt, user interaction patterns, and system operating modes. These indicators are aggregated into a three-component structure representing metacognitive awareness, analytical depth, and strategic learning activity. The proposed model was empirically validated through a quasi-experimental longitudinal study involving 500 university students divided into control and experimental groups. The results demonstrate a statistically significant increase in all latent components of critical thinking and in the integral indicator within the experimental group. The model shows satisfactory internal consistency (Cronbach’s ) and acceptable construct validity confirmed by confirmatory factor analysis. The findings indicate that the proposed model can serve as a practical analytical tool for monitoring critical thinking development and supporting personalized learning trajectories in adaptive digital educational systems.
Full article
(This article belongs to the Section Learning)
Open AccessArticle
Morphology-Aware Multi-Scale Deep Representation Learning for Interpretable Knowledge Extraction in Brain Tumor MRI
by
Helala AlShehri and Mariam Busaleh
Mach. Learn. Knowl. Extr. 2026, 8(5), 119; https://doi.org/10.3390/make8050119 - 1 May 2026
Abstract
Robust brain tumor classification from magnetic resonance imaging (MRI) remains challenging due to complex structural heterogeneity and subtle inter-class variability. Beyond predictive accuracy, conventional convolutional neural networks predominantly rely on texture-dominant features and fixed receptive fields, which may limit the extraction of clinically
[...] Read more.
Robust brain tumor classification from magnetic resonance imaging (MRI) remains challenging due to complex structural heterogeneity and subtle inter-class variability. Beyond predictive accuracy, conventional convolutional neural networks predominantly rely on texture-dominant features and fixed receptive fields, which may limit the extraction of clinically meaningful structural information. This study proposes a morphology-aware multi-scale deep representation learning framework that embeds morphological inductive bias directly within hierarchical feature extraction. The proposed architecture synergistically integrates trainable morphological operations with multi-scale convolutional feature learning inside a unified residual framework, supported by an in-block morphological refinement mechanism and a morphology-aware downsampling module. Unlike prior approaches that treat morphological operators as preprocessing or auxiliary branches, the proposed design incorporates differentiable dilation and erosion into the core feature hierarchy to guide structure-aware representation formation. The model was evaluated using five-fold cross-validation and an independent test set, achieving an overall test accuracy of 99.31% with consistently high macro-averaged precision, recall, F1-score, and AUC values. Grad-CAM analysis further demonstrates that the learned representations emphasize clinically relevant tumor regions, supporting interpretable structural knowledge extraction. Ablation studies confirm that performance improvements arise from the synergistic integration of multi-scale learning and morphology-aware refinement. Overall, embedding structural inductive bias within multi-scale deep representation learning enhances robustness, stability, and interpretable knowledge extraction for brain tumor MRI analysis.
Full article
(This article belongs to the Section Learning)
Open AccessArticle
Supervised Machine Learning for Technical Debt in Python: Analysis and Prediction
by
Elif Fırıncı and Mohanad Alayedi
Mach. Learn. Knowl. Extr. 2026, 8(5), 118; https://doi.org/10.3390/make8050118 - 1 May 2026
Abstract
The appearance of technical debt (TD) becomes a critical problem, posing challenges related to software maintainability and its quality within the context of fast modern software development. The presented research focuses on the issue of TD appearance in the context of Python software
[...] Read more.
The appearance of technical debt (TD) becomes a critical problem, posing challenges related to software maintainability and its quality within the context of fast modern software development. The presented research focuses on the issue of TD appearance in the context of Python software development by employing a hybrid approach involving perception and predictive approaches. Within the scope of research, the perceptions of 86 IT practitioners and developers have been studied with regard to their reactions, adaptation, and prioritization of different types of TDs. According to the qualitative results, cyclic architectural debt stems from low test coverage, documentation deficiency, and complicated code structure. Based on the aforementioned information, the research team developed a dataset consisting of 130 Python codes in real conditions with the following characteristics: code complexity, comments-to-code ratio, code smells, and software maintainability indexes being used. Thereafter, the application of DT, LR, NB, SVM, KNN, and RF predictive models allowed detecting TDs. The presented results reveal the possibility of predicting TDs with the use of machine learning methods, with optimal performance provided by random forest and optimized logistic regression models.
Full article
(This article belongs to the Section Data)
Open AccessArticle
COMPAS: Compose Actions and Slots in Object-Centric World Models
by
Vitaliy Vorobyov, Leonid Ugadiarov, Vladimir Frolov, Alexey Kovalev and Aleksandr Panov
Mach. Learn. Knowl. Extr. 2026, 8(5), 117; https://doi.org/10.3390/make8050117 - 29 Apr 2026
Abstract
In this paper, we propose a novel approach, COMPAS (COMPose Actions and Slots), which leverages the strengths of state-of-the-art object-centric approaches for modeling the dynamics of an environment. Our method encodes the environment’s state into symbol-like, object-centric representations, known
[...] Read more.
In this paper, we propose a novel approach, COMPAS (COMPose Actions and Slots), which leverages the strengths of state-of-the-art object-centric approaches for modeling the dynamics of an environment. Our method encodes the environment’s state into symbol-like, object-centric representations, known as slots, where each slot corresponds to an individual object. This approach offers a structured and interpretable way to model complex environments by combining slots with action representations for accurate next-state prediction. The primary contribution of our work is an efficient world model with a dynamics predictor capable of predicting accurate trajectories in action-dependent environments. Additionally, our slot extractor module enhances the predictive capabilities by extracting deterministic slots that remain consistent both within a single trajectory and across episodes. Unlike slots sampled from a trainable distribution, deterministic slots are generated from a single trainable parameter together with slot positional embeddings. This design improves the consistency across episodes, which in turn leads to more accurate dynamics prediction. We present a comprehensive evaluation of our approach in various environments, demonstrating that our proposed method outperforms competing models in environments with discrete and continuous action spaces.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Figure 1
Open AccessArticle
A Tiny Vision-Based Model for Real-Time Student Attention Detection in Online Classes
by
Chaymae Yahyati, Ismail Lamaakal, Yassine Maleh, Khalid El Makkaoui and Ibrahim Ouahbi
Mach. Learn. Knowl. Extr. 2026, 8(5), 116; https://doi.org/10.3390/make8050116 - 28 Apr 2026
Abstract
Online and blended classrooms widen access but remove the in-person cues instructors use to gauge attention. Prior work typically relies on heavy, cloud-bound or multimodal models that are hard to deploy on commodity laptops, treats attention as an unordered label without calibrated probabilities,
[...] Read more.
Online and blended classrooms widen access but remove the in-person cues instructors use to gauge attention. Prior work typically relies on heavy, cloud-bound or multimodal models that are hard to deploy on commodity laptops, treats attention as an unordered label without calibrated probabilities, and evaluates on subject-overlapping splits with limited robustness analysis. This creates a gap in Tiny, deployable, calibration-aware methods validated under realistic protocols. We address this gap with a TinyML, vision-only pipeline that estimates four attention levels: (Very Low, low, high, Very High ) from short webcam clips under strict on-device budgets. Each clip of frames at is processed by a compact hybrid encoder: a CNN extracts per frame spatial features, a BiLSTM models temporal context, and a lightweight GRU refines dynamics; three parallel branches with staggered widths encourage feature diversity before fusion. We apply structured pruning of convolutional channels and recurrent units, post-training INT8 quantization, and temperature scaling for calibrated probabilities; models are exported as ONNX. On DAiSEE with subject-independent splits, the baseline attains accuracy and macro-F1, with strong ordinal agreement (QWK = 0.998, ordinal MAE = 0.03). The compressed model preserves reliability (macro-F1 = 0.995, QWK = 0.995), remains robust to low light, partial occlusion, and head yaw, and yields ∼4× smaller size and ∼2.3× CPU speedups. These results indicate a deployable, privacy-preserving approach to fine-grained, on-device attention analytics.
Full article
(This article belongs to the Special Issue Next-Generation TinyML: Innovations in Models, Security, and Applications for Constrained Intelligent Systems)
►▼
Show Figures

Figure 1
Open AccessArticle
Mapping Dialectal Landscape: A Sequence-to-Sequence Approach to Japanese Dialect-to-Standard Normalization
by
Kinga Lasek, Michal Ptaszynski, Fumito Masui, Mujahid Khalifah and Juuso Eronen
Mach. Learn. Knowl. Extr. 2026, 8(5), 115; https://doi.org/10.3390/make8050115 - 26 Apr 2026
Abstract
Despite the progressing standardization of the Japanese language, regional dialects persist, particularly among older generations, causing communication gaps, which results in problems especially in healthcare and emergency contexts. This study proposes a text-to-text normalization method to convert eight Japanese dialects into standard Japanese
[...] Read more.
Despite the progressing standardization of the Japanese language, regional dialects persist, particularly among older generations, causing communication gaps, which results in problems especially in healthcare and emergency contexts. This study proposes a text-to-text normalization method to convert eight Japanese dialects into standard Japanese using a fine-tuned mT5-small architecture. We evaluate the impact of learning rate schedulers, training duration, and data preprocessing on model performance. Our results demonstrate that the CharacTER (Character Translation Edit Rate) metric provides a more accurate evaluation than BLEU, which is practically ill-suited for the unsegmented nature of Japanese text. The optimal configuration minimizes character error rates by aligning input data with natural, unspaced Japanese orthography. Furthermore, we observe a statistically significant correlation between the model’s conversion error rate and the physical distance of the source dialect from Tokyo. This finding suggests that the model’s performance effectively serves as a proxy for measuring linguistic distance between dialectal variations and the standard language.
Full article
(This article belongs to the Section Learning)
Open AccessArticle
Explainable Combined Spatial Representations for ECG Arrhythmia Classification
by
Iulia Onică and Iulian B. Ciocoiu
Mach. Learn. Knowl. Extr. 2026, 8(5), 114; https://doi.org/10.3390/make8050114 - 25 Apr 2026
Abstract
The paper addresses ECG arrhythmia classification using a novel input fusion strategy that combines spatial representations of ECG time series recordings. Four distinct time series-to-image transformations are considered, namely classical spectrograms, Gramian Angular Field (GAF), Recursive Plot (RP), and the S-Transform (ST). Classification
[...] Read more.
The paper addresses ECG arrhythmia classification using a novel input fusion strategy that combines spatial representations of ECG time series recordings. Four distinct time series-to-image transformations are considered, namely classical spectrograms, Gramian Angular Field (GAF), Recursive Plot (RP), and the S-Transform (ST). Classification of combined 2 × 2 images generated from single-lead ECG recordings is performed using both custom and ResNet-50 deep learning architectures. Finally, several distinct explainability algorithms are used to identify the relevant regions in the input images that mainly influence the classification decisions. Experiments performed on the MIT-BIH and Chapman–Shaoxing arrhythmia datasets revealed performance comparable to more sophisticated approaches in terms of accuracy (99%), F1-score (98.6%), and AUC (0.999) values.
Full article
(This article belongs to the Section Data)
►▼
Show Figures

Figure 1
Open AccessArticle
DPA-HiVQA: Enhancing Structured Radiology Reporting with Dual-Path Cross-Attention
by
Ngoc Tuyen Do, Minh Nguyen Quang and Hai Van Pham
Mach. Learn. Knowl. Extr. 2026, 8(5), 113; https://doi.org/10.3390/make8050113 - 24 Apr 2026
Abstract
►▼
Show Figures
Structured radiology reporting can improve clinical decision support by standardizing clinical findings into hierarchical formats. However, thousands of questions in structured report templates about clinical findings are prohibitively time-consuming, which can limit clinical adoption. Furthermore, early medical VQA datasets primarily focused on free-text
[...] Read more.
Structured radiology reporting can improve clinical decision support by standardizing clinical findings into hierarchical formats. However, thousands of questions in structured report templates about clinical findings are prohibitively time-consuming, which can limit clinical adoption. Furthermore, early medical VQA datasets primarily focused on free-text and independent question–answer pairs while a recent dataset, Rad-ReStruct, introduced a hierarchical VQA, but the accompanying model still relies heavily on flattened embedding representations and single-path text–image fusion mechanisms that inadequately handle complex hierarchical dependencies in responses. In this paper, we propose DPA-HiVQA (Dual-Path Cross-Attention for Hierarchical VQA), addressing these limitations through two key contributions: (1) multi-scale image embedding representing global semantic embeddings with patch-level spatial features from domain-specific BioViL encoder; (2) dual-path cross-attention mechanism enabling simultaneous holistic semantic understanding and fine-grained spatial reasoning. Evaluated on the Rad-ReStruct benchmark, the model substantially outperforms the established benchmark baseline with an overall F1-score and Level 3 F1-score improvement by 21.2% and 31.9%, respectively. The proposed model demonstrates that dual-path cross-attention architectures can effectively connect holistic semantic understanding and fine-grained spatial detail, paving the way for practical AI-assisted structured reporting systems that reduce radiologist burden while maintaining diagnostic accuracy.
Full article

Graphical abstract
Open AccessArticle
Scaling Laws in the Tiny Regime: How Small Models Change Their Mistakes
by
Mohammed Alnemari, Rizwan Qureshi and Nader Bagherzadeh
Mach. Learn. Knowl. Extr. 2026, 8(5), 112; https://doi.org/10.3390/make8050112 - 24 Apr 2026
Abstract
►▼
Show Figures
Neural scaling laws describe how model performance improves as a power law with size, but existing work has focused almost entirely on models above 100 M parameters. The regime below 20 million parameters, where TinyML and edge AI systems operate, remains largely unexamined.
[...] Read more.
Neural scaling laws describe how model performance improves as a power law with size, but existing work has focused almost entirely on models above 100 M parameters. The regime below 20 million parameters, where TinyML and edge AI systems operate, remains largely unexamined. We train 90 models spanning 22 K to 19.8 M parameters across two architecture families (a plain ConvNet and MobileNetV2) on CIFAR-100, varying width while holding depth and training protocol fixed. Both architectures follow approximate power laws, with exponents of (ScaleCNN) and (MobileNetV2). However, the power law does not hold uniformly: local exponents decay with scale, and MobileNetV2 saturates at 19.8 M parameters ( ), hitting a data wall. The structure of errors also changes with scale. The Jaccard overlap between error sets of the smallest and largest ScaleCNN models is only 0.35; compression changes which inputs are misclassified, not merely how many. Small models develop a triage strategy, concentrating capacity on easy classes (Gini of per-class accuracy: 0.26 at 22 K params vs. 0.09 at 4.7 M) while effectively abandoning the hardest ones (bottom-5 class accuracy: 10% vs. 53%). The smallest models achieve the lowest ECE values (0.013 vs. peak 0.110 at mid-size), reversing the typical overconfidence–capacity relationship, though this partly reflects a global-mean matching artifact rather than well-calibrated per-bin confidence. On CIFAR-100, aggregate accuracy alone is therefore a misleading basis for edge deployment decisions; validation must happen at the target model size. All findings in this study are based on CIFAR-100 (32 × 32, 100 classes); their generalizability to other datasets, resolutions, and architectures remains to be verified.
Full article

Figure 1
Open AccessArticle
A Hybrid Multi-Domain Feature Fusion Model Integrating MEEMD and Dual CNN for Iris Recognition
by
Zine. Eddine Louriga, Ismail Jabri, Aziza El Ouaazizi and Anass El Affar
Mach. Learn. Knowl. Extr. 2026, 8(4), 111; https://doi.org/10.3390/make8040111 - 21 Apr 2026
Abstract
Iris biometric systems are recognized as secure alternatives to conventional authentication methods, yet challenges such as variable illumination, noise, and intricate iris textures persist. To address these issues, our study presents a novel hybrid iris recognition framework that integrates advanced deep learning with
[...] Read more.
Iris biometric systems are recognized as secure alternatives to conventional authentication methods, yet challenges such as variable illumination, noise, and intricate iris textures persist. To address these issues, our study presents a novel hybrid iris recognition framework that integrates advanced deep learning with a pioneering application of Multivariate Ensemble Empirical Mode Decomposition (MEEMD) for feature extraction—a method not previously applied in this context. Our framework first employs MEEMD to extract statistical features that capture the iris’s nonlinear and nonstationary variations. We then combine global semantic information from two pretrained convolutional neural networks—VGG16 and ResNet-152—with local micro-texture details encoded by Local Binary Patterns (LBP) to form a comprehensive feature representation. An efficient pre-processing and segmentation stage precisely isolates the iris region, and the resulting features are refined through dimensionality reduction techniques to yield a robust, compact representation. These features are subsequently classified using multiple models, each rigorously tuned via hyperparameter optimization. Experimental validation on benchmark datasets—including IITD, CASIA, and UBIRIS.v2—shows that our model achieves recognition rates of up to 98% on IITD, 97% on CASIA, and 97.30% on UBIRIS.v2, surpassing existing approaches. This work not only enhances iris recognition performance but also establishes a novel method that bridges advanced deep learning with innovative feature extraction for high-security applications.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Graphical abstract
Open AccessArticle
MENARA: Medical Natural Arabic Response Assistant
by
Ahmed Ibrahim, Abdullah Hosseini, Hoda Helmy, Maryam Arabi, Aya AlShareef, Wafa Lakhdhar and Ahmed Serag
Mach. Learn. Knowl. Extr. 2026, 8(4), 110; https://doi.org/10.3390/make8040110 - 21 Apr 2026
Abstract
Dialectal variation presents a major challenge for deploying medical language models in real-world healthcare settings, where patient–clinician communication often occurs in regional vernaculars rather than standardized language forms. This challenge is particularly pronounced in the Arabic-speaking world, where clinical interactions frequently take place
[...] Read more.
Dialectal variation presents a major challenge for deploying medical language models in real-world healthcare settings, where patient–clinician communication often occurs in regional vernaculars rather than standardized language forms. This challenge is particularly pronounced in the Arabic-speaking world, where clinical interactions frequently take place in diverse dialects that differ substantially from Modern Standard Arabic. Fine-tuning and maintaining separate models for each dialect is computationally inefficient and difficult to scale, motivating more integrated approaches. In this work, we present MENARA, an Arabic medical language model constructed by merging Egyptian Arabic, Moroccan Darija, and medical-domain specialists through model merging. We extend prior feasibility findings through comprehensive evaluation of cross-dialect performance, medical safety, and cross-lingual knowledge retention. Specifically, we introduce a fine-grained dialect composition analysis to quantify lexical purity and structured code-switching behavior, benchmark against state-of-the-art Arabic LLMs, conduct subject-matter-expert assessment of both dialectal fidelity and medical appropriateness. The results show that model merging preserves core medical competence while enabling robust dialectal adaptation, achieving strong cross-dialect fidelity while substantially reducing storage and deployment overhead compared to maintaining separate models. These findings establish model merging as a potentially practical and resource-efficient paradigm for dialect-aware medical NLP in linguistically fragmented healthcare environments.
Full article
(This article belongs to the Special Issue Advancing Natural Language Processing for Low-Resource Languages and Dialects)
►▼
Show Figures

Figure 1
Open AccessArticle
Desirability Rating-Based Counterfactual (DeRaC) Framework for Complex Multi-Output Classification Problems
by
Neelabh Kshetry and Mehmed Kantardzic
Mach. Learn. Knowl. Extr. 2026, 8(4), 109; https://doi.org/10.3390/make8040109 - 19 Apr 2026
Abstract
Counterfactual explanations are increasingly vital for understanding and trusting machine learning models. This paper presents Desirability Rating-based Counterfactual (DeRaC), which is a generalized framework for generating valid counterfactual explanations applicable to classification problems with complex output spaces, including single and multi-output
[...] Read more.
Counterfactual explanations are increasingly vital for understanding and trusting machine learning models. This paper presents Desirability Rating-based Counterfactual (DeRaC), which is a generalized framework for generating valid counterfactual explanations applicable to classification problems with complex output spaces, including single and multi-output classification with binary and multi-class outputs. By expanding the definition of counterfactual validity through a novel “desirability rating,” the approach addresses limitations in existing methods regarding complex output spaces. The framework introduces concepts such as partially valid counterfactuals and a quantitative measure of output desirability. It can be integrated with various objective functions to identify counterfactuals that satisfy properties such as similarity, proximity, and validity. Experiments demonstrate the feasibility of systematically generating counterfactuals using existing optimization techniques, achieving varying degrees of validity and similarity; specifically, Genetic Algorithm produces consistently higher counterfactual desirability albeit at the expense of longer computation times. We observed a higher average counterfactual desirability rating of 0.871 across all tested optimization methods with Powell’s method combined with DeRaC achieving the lowest average distance of 0.897 when using a mixed-objective function. The research emphasizes the context-dependent nature of counterfactuals and lays the foundation for more transparent and trustworthy machine learning systems.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Figure 1
Open AccessArticle
MIDA—Method for Industrial Data Analysis Based on CRISP-DM
by
Mateus Mendes and Torres Farinha
Mach. Learn. Knowl. Extr. 2026, 8(4), 108; https://doi.org/10.3390/make8040108 - 18 Apr 2026
Abstract
As modern computers became increasingly more popular and larger amounts of digital data were available, different methodologies were proposed to extract information from data. CRISP-DM methodology quickly spread and is currently one of the most popular approaches used for data analysis. However, it
[...] Read more.
As modern computers became increasingly more popular and larger amounts of digital data were available, different methodologies were proposed to extract information from data. CRISP-DM methodology quickly spread and is currently one of the most popular approaches used for data analysis. However, it has some shortcomings, such as being too general or business-centered. Different authors have proposed variations more suitable to specific fields in order to overcome those limitations. The present paper reviews CRISP-DM, some variations and similar methodologies, and proposes a Methodology for Industrial Data Analysis (MIDA)—a methodology conceived and improved over time, based on previous experience in industrial engineering processes. MIDA consists of eight steps and partially overlaps with CRISP-DM. It has been successfully applied in several previous projects.
Full article
(This article belongs to the Section Data)
►▼
Show Figures

Figure 1
Open AccessArticle
Enhancing Wearable-Based Elderly Activity Recognition Through a Hybrid Deep Residual Network
by
Sakorn Mekruksavanich and Anuchit Jitpattanakul
Mach. Learn. Knowl. Extr. 2026, 8(4), 107; https://doi.org/10.3390/make8040107 - 18 Apr 2026
Abstract
The rapid growth of the elderly population worldwide demands reliable activity recognition technologies to support independent living and continuous health supervision. However, conventional wearable sensor-based human activity recognition (HAR) techniques often fail to capture the complex temporal behaviour and subtle motion patterns characteristic
[...] Read more.
The rapid growth of the elderly population worldwide demands reliable activity recognition technologies to support independent living and continuous health supervision. However, conventional wearable sensor-based human activity recognition (HAR) techniques often fail to capture the complex temporal behaviour and subtle motion patterns characteristic of the elderly. To address these limitations, this study introduces a hybrid deep residual architecture—CNN-CBAM-BiGRU—that integrates convolutional neural networks (CNNs), the convolutional block attention module (CBAM), and bidirectional gated recurrent units (BiGRUs) to improve activity recognition using inertial measurement unit (IMU) data. In the proposed CNN-CBAM-BiGRU framework, CNN layers automatically derive representative features from raw sensor signals, CBAM applies adaptive channel and spatial attention to highlight informative patterns, and BiGRU captures long-range temporal relationships within activity sequences. The approach was evaluated on three benchmark datasets designed for elderly populations—HAR70+, HARTH, and SisFall—covering daily activities and fall events. The proposed model consistently outperforms existing methods across all datasets, achieving accuracies exceeding 96%, F1-scores above 93%, and a fall detection recall of 93.74%, confirming its robustness and suitability for safety-critical monitoring applications. Class-level evaluation indicates excellent recognition of static postures and consistent performance for dynamic actions. Convergence analysis further confirms efficient learning with limited overfitting across datasets. The proposed framework thus provides a robust and accurate solution for wearable-based elderly activity recognition, with strong potential for deployment in fall detection, health monitoring, and ambient assisted living systems.
Full article
(This article belongs to the Special Issue Sustainable Applications for Machine Learning—2nd Edition)
►▼
Show Figures

Graphical abstract
Open AccessArticle
vinum-Analytics
by
Nuno Ferreira, Filipe Pinto, António Valente, Diana Augusto, Manuela Reis and Salviano Soares
Mach. Learn. Knowl. Extr. 2026, 8(4), 106; https://doi.org/10.3390/make8040106 - 18 Apr 2026
Abstract
Old-vine vineyards often contain dozens of grapevine varieties intermingled and irregularly distributed, making plant-level varietal identification slow and expensive when based on ampelography or molecular approaches. This paper proposes a field-oriented computer-vision pipeline for Vitis vinifera variety identification using images with a natural
[...] Read more.
Old-vine vineyards often contain dozens of grapevine varieties intermingled and irregularly distributed, making plant-level varietal identification slow and expensive when based on ampelography or molecular approaches. This paper proposes a field-oriented computer-vision pipeline for Vitis vinifera variety identification using images with a natural background from the historic “Vinha Maria Teresa” parcel (Quinta do Crasto, Portugal). A single-class YOLO11 detector is trained to localize the vine leaf and generate standardized crops, and a YOLO11 classifier is then fine-tuned on leaf regions of interest (ROIs) for eight selected varieties in the Douro UNESCO region. We annotated 2015 vineyard images for classification and supplemented detection training with 2648 additional leaf images; detectors (YOLO11n/s/m) were benchmarked under four augmentation regimes and evaluated on a fixed 48-image subset, including runtime on CPU and GPU. The best detector reached mAP@50–95 of 0.918 on the benchmark, while YOLO11n achieved ∼27 FPS on CPU for fast cropping. On a 303-image test set, the best classifier (YOLO11s with mixed augmentations) achieved 94.06% Top-1 accuracy, 93.92% macro-F1, and 100% Top-5 accuracy with remaining errors concentrated among morphologically similar varieties. To assess deployment-oriented performance, classifiers trained under three input settings (manual crops, detector-generated crops, and full images) were evaluated on a held-out 48-image benchmark subset; removing the detection step reduced Top-1 accuracy from 75.00% to 68.75%, while the gap between manual and automatic crops was only 2.44 pp on successfully detected images with detection failures (14.6%) representing the primary operational bottleneck. Repeated retraining of the best manual-crop YOLO11s configuration across multiple random seeds showed stable performance with low variability in Top-1 accuracy and macro-F1. Under identical training conditions, ResNet50 and EfficientNet-B0 provided competitive baselines, but YOLO11s remained the strongest overall model on the held-out field benchmark. These results indicate that lightweight leaf detection plus crop-based classification can support scalable varietal identification in old vineyards under realistic acquisition conditions.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Figure 1
Open AccessArticle
Diffusion-Based Feature Denoising and Using NNMF for Robust Brain Tumor Classification
by
Hiba Adil Al-kharsan and Róbert Rajkó
Mach. Learn. Knowl. Extr. 2026, 8(4), 105; https://doi.org/10.3390/make8040105 - 18 Apr 2026
Abstract
Brain tumor classification from magnetic resonance imaging, which is also known as MRI, plays a sensitive role in computer-assisted diagnosis systems. In recent years, deep learning models have achieved high classification accuracy. However, their sensitivity to adversarial perturbations has become an important reliability
[...] Read more.
Brain tumor classification from magnetic resonance imaging, which is also known as MRI, plays a sensitive role in computer-assisted diagnosis systems. In recent years, deep learning models have achieved high classification accuracy. However, their sensitivity to adversarial perturbations has become an important reliability concern in medical applications. This study suggests a robust brain tumor classification framework that combines non-negative matrix factorization (NNMF or NMF), lightweight convolutional neural networks (CNNs), and diffusion-based feature purification. Initially, MRI images are preprocessed and converted into a non-negative data matrix, from which compact and interpretable NNMF feature representations are extracted. Statistical metrics, including AUC, Cohen’s d, and p-values, are used to rank and choose the most discriminative components. Then, a lightweight CNN classifier is trained directly on the selected feature groups. To improve adversarial robustness, a diffusion-based feature-space purification module is introduced. A forward noise method followed by a learned denoiser network is used before classification. System performance is estimated using both clean accuracy and robust accuracy under powerful adversarial attacks created by AutoAttack. The experimental results show that the proposed framework achieves competitive classification performance while significantly enhancing robustness against adversarial perturbations. The findings presuppose that combining interpretable NNMF-based representations with a lightweight deep approach and diffusion-based defense technique supplies an effective and reliable solution for medical image classification under adversarial conditions.
Full article
(This article belongs to the Section Learning)
►▼
Show Figures

Figure 1
Open AccessArticle
Evaluating the Efficacy of Large Language Models in Stock Market Decision-Making: A Decision-Focused, Price-Only, Multi-Country Analysis Using Historical Price Data
by
Maria C. Mariani, Sourav Malakar, Amrita Bagchi, Subhrajyoti Basu, Saptarsi Goswami, Osei Kofi Tweneboah, Sarbadeep Biswas, Ankit Dey and Ankit Sinha
Mach. Learn. Knowl. Extr. 2026, 8(4), 104; https://doi.org/10.3390/make8040104 - 17 Apr 2026
Abstract
►▼
Show Figures
This study provides a comparative evaluation of three state-of-the-art large language models (LLMs), namely OpenAI’s (San Francisco, CA, USA) GPT-4.0, Google’s (Google LLC, Mountain View, CA, USA) Gemini 2.0 Flash, and Meta’s (Meta Platforms, Menlo Park, CA, USA) LLaMA-4-Scout-17B-16E, in a decision-oriented framework
[...] Read more.
This study provides a comparative evaluation of three state-of-the-art large language models (LLMs), namely OpenAI’s (San Francisco, CA, USA) GPT-4.0, Google’s (Google LLC, Mountain View, CA, USA) Gemini 2.0 Flash, and Meta’s (Meta Platforms, Menlo Park, CA, USA) LLaMA-4-Scout-17B-16E, in a decision-oriented framework in which the models generate structured outputs based only on historical closing-price data. The evaluation covers 150 stocks sampled from three countries (India, the United States, and South Africa) across ten economic sectors, including Information Technology, Banking, and Pharmaceuticals. Unlike many prior studies that combine numerical and textual inputs, this study relies solely on three years of numerical time series data and examines model responses in terms of decision labels such as buy, sell, or hold. The LLMs were provided with historical closing-price sequences and prompted with three types of finance-related questions: (a) whether to buy a stock, (b) whether to sell or hold a stock, and (c) in a pairwise comparison, which stock to buy or hold. These prompts were evaluated across two investment horizons: 1 month and 3 months. Model outputs were compared against realized market outcomes during the corresponding test periods. Performance was assessed across four key dimensions: country, sector, annualized volatility, and question type. The models were not given any supplementary financial information or instructions on specific analytical methods. The results indicate that GPT-4.0 achieves the highest average accuracy (56%), followed by LLaMA-4-Scout-17B-16E (48%) and Gemini 2.0 Flash (39%). Overall performance remains moderate and varies across market conditions, with relatively higher accuracy observed in high-volatility regimes (51%). This work evaluates how LLMs behave when presented with structured numerical price sequences in a controlled decision-labeling setting and contributes to the broader discussion on the potential and limitations of LLMs for numerical decision tasks in finance.
Full article

Figure 1
Open AccessArticle
Fake News Detection Through LLM-Driven Text Augmentation Across Media and Languages
by
Abdul Sittar, Mateja Smiljanic, Alenka Guček and Marko Grobelnik
Mach. Learn. Knowl. Extr. 2026, 8(4), 103; https://doi.org/10.3390/make8040103 - 15 Apr 2026
Abstract
►▼
Show Figures
The proliferation of fake news across social media, headlines, and news articles poses major challenges for automated detection, particularly in multilingual and cross-media settings affected by data imbalance. We propose a fake news detection framework based on LLM-driven, feature-guided text augmentation. The method
[...] Read more.
The proliferation of fake news across social media, headlines, and news articles poses major challenges for automated detection, particularly in multilingual and cross-media settings affected by data imbalance. We propose a fake news detection framework based on LLM-driven, feature-guided text augmentation. The method generates realistic synthetic samples across languages, media types, and text granularities while preserving meaning and stylistic coherence. Experiments with classical and transformer-based models (Random Forest, Logistic Regression, BERT, XLM-R) across social media, headlines, and multilingual news datasets show consistent improvements in performance. For inherently balanced datasets (e.g., social media), synthetic augmentation yields negligible but stable performance changes. Across imbalanced scenarios, synthetic augmentation substantially improves minority-class recall and F1-score (e.g., fake news recall from 0.57 to 0.86), while preserving majority-class performance, leading to more balanced and reliable classifiers, whereas oversampling significantly degrades results due to overfitting on duplicated language patterns. Overall, a hybrid semantic- and style-based model proves to be the most robust strategy, outperforming oversampling and matching or exceeding baseline performance across datasets.
Full article

Graphical abstract
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Applied Sciences, ASI, Blockchains, Computers, MAKE, Software
Recent Advances in AI-Enhanced Software Engineering and Web Services
Topic Editors: Hai Wang, Zhe HouDeadline: 31 May 2026
Topic in
AI, Applied Sciences, Electronics, MAKE
Deep Supplement Learning for Healthcare and Biomedical Applications
Topic Editors: Tahir Cetin Akinci, Ömer Faruk ErtuǧrulDeadline: 30 June 2026
Topic in
Atmosphere, Earth, Encyclopedia, Entropy, Fractal Fract, MAKE, Meteorology
Revisiting Butterfly Effect, Multiscale Dynamics, and Predictability Using Ai-Enhanced Modeling Framework (AEMF) and Chaos Theory
Topic Editors: Bo-Wen Shen, Roger A. Pielke Sr., Xubin ZengDeadline: 31 July 2026
Topic in
Algorithms, Applied Sciences, Electronics, MAKE, AI, Software
Applications of NLP, AI, and ML in Software Engineering
Topic Editors: Affan Yasin, Javed Ali Khan, Lijie WenDeadline: 30 August 2026
Conferences
Special Issues
Special Issue in
MAKE
Language Acquisition and Understanding
Guest Editors: Michal Ptaszynski, Rafal Rzepka, Masaharu YoshiokaDeadline: 15 July 2026
Special Issue in
MAKE
Advancing Natural Language Processing for Low-Resource Languages and Dialects
Guest Editors: Tanjim Mahmud, Michal Ptaszynski, Karl AnderssonDeadline: 31 July 2026
Special Issue in
MAKE
Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning
Guest Editor: Konstantinos DiamantarasDeadline: 31 August 2026
Special Issue in
MAKE
Explainable Artificial Intelligence: Theoretical Foundations and Methodological Advances
Guest Editors: Sheng Du, Javier Del Ser LorenteDeadline: 31 August 2026
Topical Collections
Topical Collection in
MAKE
Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction
Collection Editor: Andreas Holzinger
Topical Collection in
MAKE
Feature Papers in Safety, Security, Privacy, and Cyber Resilience
Collection Editor: Simon Tjoa
Topical Collection in
MAKE
Robust and Uncertainty-Aware Learning from Real-World Data
Collection Editor: Federico Cabitza


