You are currently viewing a new version of our website. To view the old version click .

Machine Learning and Knowledge Extraction

Machine Learning and Knowledge Extraction is an international, peer-reviewed, open access, monthly journal on machine learning and applications, see our video on YouTube explaining the MAKE journal concept. 

Quartile Ranking JCR - Q1 (Engineering, Electrical and Electronic | Computer Science, Artificial Intelligence | Computer Science, Interdisciplinary Applications)

All Articles (624)

  • Systematic Review
  • Open Access

Background: Pituitary neuroendocrine tumours (PitNETs) are clinically and biologically heterogeneous neoplasms that remain challenging to diagnose, prognosticate, and treat. Although recent WHO classifications using transcription-factor-based markers have refined pathological categorisation, histopathology alone still fails to predict tumour behaviour or support individualised therapy. Objective: This systematic review aimed to evaluate how machine learning (ML) and knowledge extraction approaches can complement pathology by integrating multi-dimensional omics datasets to generate predictive and clinically meaningful insights in PitNETs. Methods: The review followed the PRISMA 2020 statement for systematic reviews. Searches were conducted in PubMed, Google Scholar, arXiv, and SciSpace up to June 2025 to identify omics studies applying ML or computational data integration in PitNETs. Eligible studies included original research using genomic, transcriptomic, epigenomic, proteomic, or liquid biopsy data. Data extraction covered study design, ML methodology, data accessibility, and clinical annotation. Study quality and validation strategies were also assessed. Results: A total of 726 records were identified. After the reviewing process, 98 studies met inclusion criteria. PitNET research employed unsupervised clustering or regularised regression methods reflecting their suitability for high-dimensional omics datasets and the limited sample sizes. In contrast, deep learning approaches were rarely implemented, primarily due to the scarcity of large, clinically annotated cohorts required to train such models effectively. To support future research and model development, we compiled a comprehensive catalogue of all publicly available PitNET omics resources, facilitating reuse, methodological benchmarking, and integrative analyses. Conclusions: Although omics research in PitNETs is increasing, the lack of standardised, clinically annotated datasets remains a major obstacle to the development and deployment of robust predictive models. Coordinated efforts in data sharing and clinical harmonisation are required to unlock its full potential.

8 January 2026

PRISMA 2020 flow diagram for systematic reviews. Modified from: Page MJ et al. [18]. This slow diagram exemplifies how the papers were selected through the review process.
  • Systematic Review
  • Open Access

Background: Mosquito-borne viral diseases are a growing global health threat, and artificial intelligence (AI) and machine learning (ML) are increasingly proposed as forecasting tools to support early-warning and response. However, the available evidence is fragmented across pathogens, settings and modelling approaches. This review provides, to the best of our knowledge, the first comprehensive comparative assessment of AI/ML models forecasting mosquito-borne viral diseases in human populations, jointly synthesising predictive performance across model families and appraising both methodological quality and operational readiness. Methods: Following PRISMA 2020, we searched PubMed, Embase and Scopus up to August 2025. We included studies applying AI/ML or statistical models to predict arboviral incidence, outbreaks or temporal trends and reporting at least one quantitative performance metric. Given the substantial heterogeneity in outcomes, predictors and time–space scales, we conducted a descriptive synthesis. Risk of bias and applicability were evaluated using PROBAST. Results: Ninety-eight studies met the inclusion criteria, of which 91 focused on dengue. The forecasts spanned national to city-level settings and annual-to-weekly resolutions. Across classification tasks, tree-ensemble models showed the most consistent performance, with accuracies typically above 0.85, while classical ML and deep-learning models showed wider variability. For regression tasks, errors increased with temporal horizon and spatial aggregation: short-term, fine-scale forecasts (e.g., weekly city level) often achieved low absolute errors, whereas long-horizon national models frequently exhibited very large errors and unstable performance. PROBAST assessment indicated that most studies (63/98) were at high risk of bias, with only 24 judged at low risk and limited external validation. Conclusions: AI/ML models, especially tree-ensemble approaches, show strong potential for short-term, fine-scale forecasting, but their reliability drops substantially at broader spatial and temporal scales. Most remain research-stage, with limited external validation and minimal operational deployment. This review clarifies current capabilities and highlights three priorities for real-world use: standardised reporting, rigorous external validation, and context-specific calibration.

7 January 2026

Flow diagram depicting the selection process.

The increasing availability of satellite data at different spatial resolutions offers new opportunities for environmental monitoring, highlighting the limitations of medium-resolution products for fine-scale territorial analysis. However, it also raises the need to enhance the resolution of low-quality imagery to enable more detailed spatial assessments. This study investigates the effectiveness of different super-resolution techniques applied to low-resolution (LR) multispectral Sentinel-2 satellite imagery to generate high-resolution (HR) data capable of supporting advanced knowledge extraction. Three main methodologies are compared: traditional bicubic interpolation, a generic Artificial Neural Network (ANN) approach, and a Convolutional Neural Network (CNN) model specifically designed for super-resolution tasks. Model performances are evaluated in terms of their ability to reconstruct fine spatial details, while the implications of these methods for subsequent visualization and environmental analysis are critically discussed. The evaluation protocol relies on RMSE, PSNR, SSIM, and spectral-faithfulness metrics (SAM, ERGAS), showing that the CNN consistently outperforms ANN and bicubic interpolation in reconstructing geometrically coherent structures. The results confirm that super-resolution improves the apparent spatial detail of existing spectral information, thus clarifying both the practical advantages and inherent limitations of learning-based super-resolution in Earth observation workflows.

7 January 2026

Study area (red circle) located in Southern Italy and a representative image of the town and structures.

In machine learning, the Bayes classifier represents the theoretical optimum for minimizing classification errors. Since estimating high-dimensional probability densities is impractical, simplified approximations such as naïve Bayes and k-nearest neighbor are widely used as baseline classifiers. Despite their simplicity, these methods require design choices—such as the distance measures in kNN, or the feature independence in naïve Bayes. In particular, naïve Bayes relies on implicit assumptions by using Gaussian mixtures or univariate kernel density estimators. Such design choices, however, often fail to capture heterogeneous distributional structures across features. We propose a flexible naïve Bayes classifier that leverages Pareto Density Estimation (PDE), a parameter-free, non-parametric approach shown to outperform standard kernel methods in exploratory statistics. PDE avoids prior distributional assumptions and supports interpretability through visualization of class-conditional likelihoods. In addition, we address a recently described pitfall of Bayes’ theorem: the misclassification of observations with low evidence. Building on the concept of plausible Bayes, we introduce a safeguard to handle uncertain cases more reliably. While not aiming to surpass state-of-the-art classifiers, our results show that PDE-flexible naïve Bayes with uncertainty handling provides a robust, scalable, and interpretable baseline that can be applied across diverse data scenarios.

5 January 2026

Left (A–D) and right (E–H) panels show results on artificial data for the classical and plausible PDE-based naïve Bayes classifier, respectively. Each panel contains four rows: N = 500 sampled points with predicted labels (A,H), class-conditional densities estimated from the training data (B,F), the posterior probability P(C1∣x) computed from the fitted model (C,G), and the test set of N = 5000 points with its predictions (D,H). Because class 1 (dark green) has a smaller variance, its posterior decays in both tails (C), and the MAP rule assigns extreme observations to class 2 in (D); we argue in favor of using the smoothed PDE to estimate the class likelihoods and the concept by [14] to correct assignments in regions of very low likelihood (F) that are not plausible in (G). In addition, the right panel shows that the fine structure of distributions should be accounted for in the class likelihoods (F). Without prior knowledge, applying the left model (C) to the test data produces misclassifications relative to the true boundary (magenta predictions to the left of the green predications in (D)) and is less interpretable in comparison to (H).

News & Conferences

Issues

Open for Submission

Editor's Choice

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Mach. Learn. Knowl. Extr. - ISSN 2504-4990