Skip Content
You are currently on the new version of our website. Access the old version .

Machine Learning and Knowledge Extraction

Machine Learning and Knowledge Extraction is an international, peer-reviewed, open access, monthly journal on machine learning and applications, see our video on YouTube explaining the MAKE journal concept. 

Quartile Ranking JCR - Q1 (Engineering, Electrical and Electronic | Computer Science, Artificial Intelligence | Computer Science, Interdisciplinary Applications)

All Articles (652)

Semantic segmentation and deep learning methods have rarely been applied to fractional vegetation cover (FVC) segmentation tasks due to the lack of publicly available datasets for training deep learning models. FVC is a key indicator for assessing vegetation distribution, crop density, and crop responses to water availability and fertilizer application, yet conventional field-based measurement methods are time consuming, costly, labor intensive, and may lack the accuracy required for critical applications such as drought stress evaluation and water productivity. In this paper, we introduced causality-based deep learning techniques for FVC segmentation on a publicly available RGB dataset that consists of four ground cover crops: Phyla nodiflora L., Cynodon dactylon, Frankenia thymifolia Desf., and Oxalis stricta L. By separating causal from spurious correlations in pretrained features, using the stepwise intervention and reweighting (SIR) method at different encoder stages reduced confounding bias and enabled the models to learn more generalizable and task-relevant features. Extensive experiments on the FVC dataset, conducted with and without causality learning, showed that the proposed FCN + ResNet-50 model with causality learning and data augmentation achieved an accuracy of 94.80%, a precision of 94.97%, a recall of 94.35%, and an F1-score of 94.62%, which outperformed non-causal baselines and state-of-the-art transformer-based models including SegFormer and Mask2Former.

11 February 2026

Example images from dataset. (P1) Phyla nodiflora L.; (P2) Cynodon dactylon; (P3) Frankenia thymifolia Desf.; and (P4) Oxalis stricta L.

Towards LLM-Driven Cybersecurity in Autonomous Vehicles: A Big Data-Empowered Framework with Emerging Technologies

  • Aristeidis Karras,
  • Leonidas Theodorakopoulos and
  • Alexandra Theodoropoulou
  • + 1 author

Modern Autonomous Vehicles generate large volumes of heterogeneous in-vehicle data, making cybersecurity a critical challenge as adversarial attacks become increasingly adaptive, stealthy, and multi-protocol. Traditional intrusion detection systems often fail under these conditions because of their limited contextual understanding, poor robustness to distribution shifts, and insufficient regulatory transparency. This study introduces LLM-Guardian, a hierarchical intrusion detection framework with decision-making mechanisms that integrates Large Language Models (LLMs) with classical statistical detection theory, optimal transport drift analysis, graph neural networks, and formal uncertainty quantification. LLM-Guardian uses semantic anomaly scoring, conformal prediction for distribution-free confidence calibration, adaptive cumulative sum (CUSUM) sequential testing for low-latency detection, and topology-aware GNN reasoning designed to identify coordinated attacks across CAN, Ethernet, and V2X interfaces. In this work, the framework is empirically evaluated on four heterogeneous CAN-bus datasets, while the Ethernet and V2X components are instantiated at the architectural level and left as directions for future multi-protocol experimentation.

11 February 2026

Architecture of LLM-driven cybersecurity for AVs.

This paper presents a scalable machine learning pipeline for extracting actionable, product-related insights from user-generated social media comments. Leveraging sentence embeddings from SBERT and unsupervised clustering (k-Means and agglomerative), the approach structures informal and noisy comments from Instagram and YouTube into topic groups intended to support thematic analysis. A case study on feedback regarding BMW vehicles, comprising more than 26,000 comments, illustrates how the pipeline can reveal recurring user concerns, such as design critiques, usability issues, and technology-related expectations, even in short and unstructured social media comments. The proposed pipeline operates without labeled data or manual annotation, enabling scalable application and transferability across product categories and industries. By transforming large-scale, unstructured consumer feedback into interpretable themes, the pipeline provides product teams with an efficient and structured basis for data-driven product development and improvement.

11 February 2026

Flowchart of data collection and analysis process.

The use of large language models (LLMs) to automate the generation of medical case-based multiple-choice questions (MCQs) is increasing, but their accuracy, reliability, and educational validity are still not well understood. This study in a comparative framework examined nine LLMs with four different prompting methods to evaluate LLM-produced MCQs for clinical coherence and readiness for assessment. A uniform evaluation pipeline was constructed to examine automatic text-similarity measures using automated metrics (BLEU, ROUGE, and METEOR), structural and parsability measures, and operational effectiveness (latency, cost, quality-efficiency ratios). Human validation was performed on the best-performing model and prompt combination (OpenBioLLM-70B with Chain-of-Thought) focusing on the model prompt that demonstrated the best linguistic fidelity and clinically aligned reasoning. Two clinical experts independently reviewed 88 items using a five-domain rubric covering appropriateness, clarity, relevance, distractor quality, and cognitive level. Results indicated significant variation across models and prompting strategies, with Chain-of-Thought yielding the best overall performance in comparison to other strategies. The OpenBioLLM-70B model demonstrated the best overall balance of quality, parsability, and efficiency, achieving a prompt template quality score of 90.4, a consistency score of 88.8, and a response time of 3.28 s, with a quality-per-dollar value of 134.11. The expert rating confirmed clinical alignment, but there was consensus that distractor quality needed further improvements. These results provide evidence that LLMs under optimal prompting conditions can reliably support MCQ generation and provide large-scale, cost-effective support for medical assessment production.

10 February 2026

Study objectives pipeline.

News & Conferences

Issues

Open for Submission

Editor's Choice

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Mach. Learn. Knowl. Extr. - ISSN 2504-4990