You are currently viewing a new version of our website. To view the old version click .

Machine Learning and Knowledge Extraction

Machine Learning and Knowledge Extraction is an international, peer-reviewed, open access journal on machine learning and applications.
It publishes original research articles, reviews, tutorials, research ideas, short notes and Special Issues that focus on machine learning and applications. Please see our video on YouTube explaining the MAKE journal concept. The journal is published quarterly online by MDPI.
Quartile Ranking JCR - Q1 (Engineering, Electrical and Electronic | Computer Science, Artificial Intelligence | Computer Science, Interdisciplinary Applications)

All Articles (584)

We propose a hybrid framework that integrates instance clustering with Automatic Generation of Algorithms (AGA) to produce specialized algorithms for classes of Multidimensional Knapsack Problem (MKP) instances. This approach is highly relevant given the latest trends in AI, where Large Language Models (LLMs) are actively being used to automate and refine algorithm design through evolutionary frameworks. Our method utilizes a feature-based representation of 328 MKP instances and evaluates K-means, HDBSCAN, and random clustering to produce 11 clusters per method. For each cluster, a master optimization problem was solved using Genetic Programming, evolving algorithms encoded as syntax trees. Fitness was measured as relative error against known optima, a similar objective to those being tackled in LLM-driven optimization. Experimental and statistical analyses demonstrate that clustering-guided AGA significantly reduces average relative error and accelerates convergence compared with AGA trained on randomly grouped instances. K-means produced the most consistent cluster-specialization. Cross-cluster evaluation reveals a trade-off between specialization and generalization. The results demonstrate that clustering prior to AGA is a practical preprocessing step for designing automated algorithms in NP-hard combinatorial problems, paving the way for advanced methodologies that incorporate AI techniques.

12 November 2025

Algorithm MKPA15. Average fitness:0.0168 Size:18 Depth:5. In this figure, the names of the terminals were abbreviated as follows: 1. LS: Local Search, 2. Greedy, 3. DMN: Del_Min_Normalized, 4. AMFP: Add_Max_Freville_Plateau, 5. AMP: Add_Max_Profit.

Unsupervised Hebbian learning is a biologically inspired algorithm designed to extract representations from input images, which can subsequently support supervised learning. It presents a promising alternative to traditional artificial neural networks (ANNs). Many attempts have focused on enhancing Hebbian learning by incorporating more biologically plausible components. Contrarily, we draw inspiration from recent advances in ANNs to rethink and further improve Hebbian learning in three interconnected aspects. First, we investigate the issue of overfitting in Hebbian learning and emphasize the importance of selecting an optimal number of training epochs, even in unsupervised settings. In addition, we discuss the risks and benefits of anti-Hebbian learning in model performance, and our visualizations reveal that synapses resembling the input images sometimes do not necessarily reflect effective learning. Then, we explore the impact of different activation functions on Hebbian representations, highlighting the benefits of properly utilizing negative values. Furthermore, motivated by the success of large pre-trained language models, we propose a novel approach for leveraging unlabeled data from other datasets. Unlike conventional pre-training in ANNs, experimental results demonstrate that merging trained synapses from different datasets leads to improved performance. Overall, our findings offer fresh perspectives on enhancing the future design of Hebbian learning algorithms.

11 November 2025

An illustration depicting the workflow of unsupervised Hebbian learning followed by supervised learning, accompanied by a question: Does the overfitting problem commonly found in supervised learning also occur in unsupervised Hebbian learning?

Amid ongoing efforts to develop extremely large, multimodal models, there is increasing interest in efficient Small Language Models (SLMs) that can operate without reliance on large data-centre infrastructure. However, recent SLMs (e.g., LLaMA or Phi) with up to three billion parameters are predominantly trained in high-resource languages, such as English, which limits their applicability to industries that require robust NLP solutions for less-represented languages and low-resource settings, particularly those requiring low latency and adaptability to evolving label spaces. This paper examines a retrieval-based approach to multi-label text classification (MLC) for a media monitoring dataset, with a particular focus on less-represented languages, such as Slovene. This dataset presents an extreme MLC challenge, with instances labelled using up to twelve thousand categories. The proposed method, which combines retrieval with computationally efficient prediction, effectively addresses challenges related to multilinguality, resource constraints, and frequent label changes. We adopt a model-agnostic approach that does not rely on a specific model architecture or language selection. Our results demonstrate that techniques from the extreme multi-label text classification (XMC) domain outperform traditional Transformer-based encoder models, particularly in handling dynamic label spaces without requiring continuous fine-tuning. Additionally, we highlight the effectiveness of this approach in scenarios involving rare labels, where baseline models struggle with generalisation.

11 November 2025

Subword token distribution across all samples in the NewsMon.

The growing use of machine learning (ML) and artificial intelligence across sectors has shown strong potential to improve decision-making processes. However, the adoption of ML by non-technical professionals remains limited due to the complexity of traditional development workflows, which often require software engineering and data science expertise. In recent years, low-code and no-code platforms have emerged as promising solutions to democratize ML by abstracting many of the technical tasks typically involved in software engineering pipelines. This paper investigates whether these platforms can offer a viable alternative for making ML accessible to non-expert users. Beyond predictive performance, this study also evaluates usability, setup complexity, the transparency of automated workflows, and cost management under realistic “out-of-the-box” conditions. This multidimensional perspective provides insights into the practical viability of LC/NC tools in real-world contexts. The comparative evaluation was conducted using three leading cloud-based tools: Amazon SageMaker Canvas, Google Cloud Vertex AI, and Azure Machine Learning Studio. These tools employ ensemble-based learning algorithms such as Gradient Boosted Trees, XGBoost, and Random Forests. Unlike traditional ML workflows that require extensive software engineering knowledge and manual optimization, these platforms enable domain experts to build predictive models through visual interfaces. The findings show that all platforms achieved high accuracy, with consistent identification of key features. Google Cloud Vertex AI was the most user-friendly, SageMaker Canvas offered a highly visual interface with some setup complexity, and Azure Machine Learning delivered the best model performance with a steeper learning curve. Cost transparency also varied considerably, with Google Cloud and Azure providing clearer safeguards against unexpected charges compared to Sagemaker Canvas.

7 November 2025

End-to-end pipeline of the studied use case.

News & Conferences

Issues

Open for Submission

Editor's Choice

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Mach. Learn. Knowl. Extr. - ISSN 2504-4990