Previous Article in Journal
Fault Diagnosis in Internal Combustion Engines Using Artificial Intelligence Predictive Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Multi-Teacher Knowledge Distillation Framework with Aggregation Techniques for Lightweight Deep Models

Université Marie et Louis Pasteur, CNRS, Institut FEMTO-ST (UMR 6174), F-90000 Belfort, France
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Syst. Innov. 2025, 8(5), 146; https://doi.org/10.3390/asi8050146
Submission received: 2 August 2025 / Revised: 7 September 2025 / Accepted: 24 September 2025 / Published: 30 September 2025

Abstract

Knowledge Distillation (KD) is a machine learning technique in which a compact student model learns to replicate the performance of a larger teacher model by mimicking its output predictions. Multi-Teacher Knowledge Distillation extends this paradigm by aggregating knowledge from multiple teacher models to improve generalization and robustness. However, effectively integrating outputs from diverse teachers, especially in the presence of noise or conflicting predictions, remains a key challenge. In this work, we propose a Multi-Round Parallel Multi-Teacher Distillation (MPMTD) that systematically explores and combines multiple aggregation techniques. Specifically, we investigate aggregation at different levels, including loss-based and probability-distribution-based fusion. Our framework applies different strategies across distillation rounds, enabling adaptive and synergistic knowledge transfer. Through extensive experimentation, we analyze the strengths and weaknesses of individual aggregation methods and demonstrate that strategic sequencing across rounds significantly outperforms static approaches. Notably, we introduce the Byzantine-Resilient Probability Distribution aggregation method applied for the first time in a KD context, which achieves state-of-the-art performance, with an accuracy of 99.29% and an F1-score of 99.27%. We further identify optimal configurations in terms of the number of distillation rounds and the ordering of aggregation strategies, balancing accuracy with computational efficiency. Our contributions include (i) the introduction of advanced aggregation strategies into the KD setting, (ii) a systematic evaluation of their performance, and (iii) practical recommendations for real-world deployment. These findings have significant implications for distributed learning, edge computing, and IoT environments, where efficient and resilient model compression is essential.
Keywords: knowledge distillation; cross-modal; neural network compression; downsampling; multi-teachers; multi-rounds of distillation knowledge distillation; cross-modal; neural network compression; downsampling; multi-teachers; multi-rounds of distillation

Share and Cite

MDPI and ACS Style

Hamdi, A.; Noura, H.N.; Azar, J. A Multi-Teacher Knowledge Distillation Framework with Aggregation Techniques for Lightweight Deep Models. Appl. Syst. Innov. 2025, 8, 146. https://doi.org/10.3390/asi8050146

AMA Style

Hamdi A, Noura HN, Azar J. A Multi-Teacher Knowledge Distillation Framework with Aggregation Techniques for Lightweight Deep Models. Applied System Innovation. 2025; 8(5):146. https://doi.org/10.3390/asi8050146

Chicago/Turabian Style

Hamdi, Ahmed, Hassan N. Noura, and Joseph Azar. 2025. "A Multi-Teacher Knowledge Distillation Framework with Aggregation Techniques for Lightweight Deep Models" Applied System Innovation 8, no. 5: 146. https://doi.org/10.3390/asi8050146

APA Style

Hamdi, A., Noura, H. N., & Azar, J. (2025). A Multi-Teacher Knowledge Distillation Framework with Aggregation Techniques for Lightweight Deep Models. Applied System Innovation, 8(5), 146. https://doi.org/10.3390/asi8050146

Article Metrics

Back to TopTop