Machine Learning and Knowledge Extraction

Journal Menu

Journal Browser

► Journal Browser

Mach. Learn. Knowl. Extr., Volume 8, Issue 6 (June 2026) – 2 articles

Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
You may sign up for e-mail alerts to receive table of contents of newly released issues.
PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.

63 pages, 885 KB

Open AccessReview

Large Language Model Benchmarks: A Taxonomy of Capabilities, Scientific Quality Assessment, and Saturation Analysis

by Rubén Gómez, Carlos E. Miranda, Julio-Alejandro Romero-González, Diana-Margarita Córdova-Esparza, Gendry Alfonso-Francia, Edgar-Arturo Chávez-Urbiola, Alfonso Ramirez-Pedraza and Juan Terven

Mach. Learn. Knowl. Extr. 2026, 8(6), 141; https://doi.org/10.3390/make8060141 (registering DOI) - 22 May 2026

Abstract

The rapid evolution of Large Language Models (LLMs) has exposed limitations of static, accuracy-oriented benchmarks and increased the need for evaluation frameworks that distinguish among capabilities and benchmark quality. This survey analyzes 63 LLM benchmarks spanning 2012–2026 and organizes them into a taxonomy of six capability dimensions and 20 operational subcategories. We also propose the Benchmark Quality Assurance Index (BQAI), an AHP-weighted composite framework for assessing the scientific quality of benchmarks across seven dimensions related to annotation, clarity, standardization, reproducibility, robustness, coverage, and fairness. The BQAI is applied to 30 representative benchmarks, corresponding to 48% of the 63-benchmark corpus, with three-evaluator blinded scoring, formal inter-rater reliability validation

(ICC (2, k)

and quadratic-weighted Cohen’s

κ)

, and Monte Carlo sensitivity analysis

(n = 1000 trials, \pm 10 % to \pm 50 % weight perturbation)

. In addition, we synthesize public performance results for 16 models across 10 benchmarks to examine saturation trends and reporting gaps. The analysis indicates that benchmark usefulness varies substantially across evaluation settings, that several established benchmarks are becoming less discriminative for frontier models, and that important gaps remain in safety, agentic, and cross-cultural assessment. Together, the taxonomy, BQAI, and saturation analysis provide a structured perspective on the current LLM benchmark landscape and on priorities for more rigorous evaluation. Full article

(This article belongs to the Section Thematic Reviews)

17 pages, 5508 KB

Open AccessArticle

Unnoticeable Hybrid Watermarking for Deep Neural Network Authentication Using Auxiliary Hidden Layers

by Rodrigo Eduardo Arevalo-Ancona and Manuel Cedillo-Hernandez

Mach. Learn. Knowl. Extr. 2026, 8(6), 140; https://doi.org/10.3390/make8060140 - 22 May 2026

Abstract

The authentication and protection of deep neural network models have become challenging due to their widespread distribution and reuse, making them vulnerable to unauthorized access. This paper addresses the need for ownership verification by proposing a hybrid neural network watermarking method for secure model authentication. The approach combines a steganographic watermark embedded into stable model weights with a user code for watermark recovery encoded in auxiliary hidden layers. Stable parameters are identified through a reduced training to estimate gradient variations for the watermark insertion with minimal impact on model performance. Additionally, two auxiliary layers are introduced, to store in the first layer the metadata indices from the selected weights where the watermark was embedded and in the second layer the user code, supporting secure identification and verification. Experimental evaluations demonstrate that the proposed method remains robust under different model optimization attacks, including pruning, fine-tuning, additive noise injection, and parameter overwriting, while preserving model performance. The proposed framework achieves a BER = 0 under several moderate attack scenarios across different neural network models, whereas more aggressive optimizations degrade the watermark recovery performance. These results indicate that the proposed framework provides an effective solution for neural network ownership protection while maintaining the model performance. Full article

(This article belongs to the Section Safety, Security, Privacy, and Cyber Resilience)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Mach. Learn. Knowl. Extr., Volume 8, Issue 6 (June 2026) – 2 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI