Previous Issue
Volume 8, May
 
 

Mach. Learn. Knowl. Extr., Volume 8, Issue 6 (June 2026) – 2 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
63 pages, 885 KB  
Review
Large Language Model Benchmarks: A Taxonomy of Capabilities, Scientific Quality Assessment, and Saturation Analysis
by Rubén Gómez, Carlos E. Miranda, Julio-Alejandro Romero-González, Diana-Margarita Córdova-Esparza, Gendry Alfonso-Francia, Edgar-Arturo Chávez-Urbiola, Alfonso Ramirez-Pedraza and Juan Terven
Mach. Learn. Knowl. Extr. 2026, 8(6), 141; https://doi.org/10.3390/make8060141 (registering DOI) - 22 May 2026
Abstract
The rapid evolution of Large Language Models (LLMs) has exposed limitations of static, accuracy-oriented benchmarks and increased the need for evaluation frameworks that distinguish among capabilities and benchmark quality. This survey analyzes 63 LLM benchmarks spanning 2012–2026 and organizes them into a taxonomy [...] Read more.
The rapid evolution of Large Language Models (LLMs) has exposed limitations of static, accuracy-oriented benchmarks and increased the need for evaluation frameworks that distinguish among capabilities and benchmark quality. This survey analyzes 63 LLM benchmarks spanning 2012–2026 and organizes them into a taxonomy of six capability dimensions and 20 operational subcategories. We also propose the Benchmark Quality Assurance Index (BQAI), an AHP-weighted composite framework for assessing the scientific quality of benchmarks across seven dimensions related to annotation, clarity, standardization, reproducibility, robustness, coverage, and fairness. The BQAI is applied to 30 representative benchmarks, corresponding to 48% of the 63-benchmark corpus, with three-evaluator blinded scoring, formal inter-rater reliability validation ICC(2,k) and quadratic-weighted Cohen’s κ, and Monte Carlo sensitivity analysis n=1000trials,±10%to±50%weightperturbation. In addition, we synthesize public performance results for 16 models across 10 benchmarks to examine saturation trends and reporting gaps. The analysis indicates that benchmark usefulness varies substantially across evaluation settings, that several established benchmarks are becoming less discriminative for frontier models, and that important gaps remain in safety, agentic, and cross-cultural assessment. Together, the taxonomy, BQAI, and saturation analysis provide a structured perspective on the current LLM benchmark landscape and on priorities for more rigorous evaluation. Full article
(This article belongs to the Section Thematic Reviews)
17 pages, 5508 KB  
Article
Unnoticeable Hybrid Watermarking for Deep Neural Network Authentication Using Auxiliary Hidden Layers
by Rodrigo Eduardo Arevalo-Ancona and Manuel Cedillo-Hernandez
Mach. Learn. Knowl. Extr. 2026, 8(6), 140; https://doi.org/10.3390/make8060140 - 22 May 2026
Abstract
The authentication and protection of deep neural network models have become challenging due to their widespread distribution and reuse, making them vulnerable to unauthorized access. This paper addresses the need for ownership verification by proposing a hybrid neural network watermarking method for secure [...] Read more.
The authentication and protection of deep neural network models have become challenging due to their widespread distribution and reuse, making them vulnerable to unauthorized access. This paper addresses the need for ownership verification by proposing a hybrid neural network watermarking method for secure model authentication. The approach combines a steganographic watermark embedded into stable model weights with a user code for watermark recovery encoded in auxiliary hidden layers. Stable parameters are identified through a reduced training to estimate gradient variations for the watermark insertion with minimal impact on model performance. Additionally, two auxiliary layers are introduced, to store in the first layer the metadata indices from the selected weights where the watermark was embedded and in the second layer the user code, supporting secure identification and verification. Experimental evaluations demonstrate that the proposed method remains robust under different model optimization attacks, including pruning, fine-tuning, additive noise injection, and parameter overwriting, while preserving model performance. The proposed framework achieves a BER = 0 under several moderate attack scenarios across different neural network models, whereas more aggressive optimizations degrade the watermark recovery performance. These results indicate that the proposed framework provides an effective solution for neural network ownership protection while maintaining the model performance. Full article
(This article belongs to the Section Safety, Security, Privacy, and Cyber Resilience)
Show Figures

Figure 1

Previous Issue
Back to TopTop